Agenda - Data & AI Warsaw Tech Summit

Our agenda is packed with presentations, arranged into 9 categories – find your most desired topics!

9.04.2025 - First Conference Day | Hybrid: Online + Onsite

8.00 - 9.00
Presentation
8.00 - 9.00
Registration of participants at the hotel
Plenary session (9.00 - 11.05)
Presentation
9.00 - 9.10
Conference opening
CEO & Meeting Designer
Evention
CEO and Co-founder
GetInData | Part of Xebia
Presentation
9.10 - 9.35
Building LinkedIn member trust in the new age of Generative AI
Distinguished Engineer
Linkedin

LinkedIn aims to create the world’s most trusted professional community, empowering individuals, and organizations to achieve success and realize economic opportunity. A key pillar of this vision is ensuring the authenticity of members on the platform, such as through robust identity verification. Equally important is fostering a safe, professional, and trustworthy environment for member interactions. This includes ensuring that content in members’ feeds is appropriate, aligned with their interests, and consistent with our community policies while following member controls.
Generative AI is revolutionizing the state of the art in content understanding, complementing, and enhancing both traditional AI models and human labeling efforts. This innovation significantly improves the quality and scalability of decision-making processes. This talk will provide an overview of LinkedIn’s trust and safety features, with a special focus on our application of Generative AI to advance these initiatives.

Presentation
9.35 - 9.55
The topic will be published soon
Presentation
9.55 - 10.15
TECH
BUSINESS
AI Fabric: Advanced Context Engineering for Smarter AI Solutions
Data Evangelist
Ab Initio

Context engineering is the missing link in realizing the full potential of your AI strategy. This presentation explores how advanced techniques in context management can deliver enterprise value through effective use of Retrieval-Augmented Generation (RAG). Discover how to unlock the full potential of your data catalog investments and drive more intelligent, context-aware AI applications to transform data conversations.

Presentation
10.15 - 10.35
The topic will be published soon
Presentation
10.35 - 11.05
Discussion panel - Techpoint
The topic will be published soon
11.05 - 11.30
Presentation
11.05 - 11.30
Coffee break
Parallel sessions (11.30 - 11.50)
Presentation
Session no 1
Kingestor: Real-Time Ingestion of hundred of billions of Game Events for ML and Analytics
Senior Software Engineer
King

At King, we ingest and analyze around a trillion of game events weekly to enhance gameplay and deliver player experiences to over two hundred million monthly active users. These game events enable us to improve our games via game observability, features delivery, and machine learning.

However, collection and storage of these game events bring a variety of challenges due to its sheer volume and high speed.

To address the challenges, we have developed an in-house product called Kingestor, which allows us to ingest our game events in an effective manner. It processes game events in near real-time with just a ten-minute latency, loading approximately five million events per second. Kingestor ensures data integrity through event reconciliation and deduplication, providing accurate, real-time insights for both business and technical applications. It is a scalable and adaptable product, which is designed for use across the gaming industry, making it easy to implement for businesses handling large-scale data.

Presentation
Session no 2
TECH
BUSINESS
Data Governance and Access Control in GCP BigQuery
Data Governance and Access Control
Zalando

Methods to manage data and its authorized usage in Google Cloud Platform, including access control through data governance and enforced security measures:

  • Fine grained access control in BigQuery by enforcing IAM policies
  • Column level access control by defining taxonomies and policy tags
  • Best practices for using taxonomies and policy tags
  • Row level access control in BigQuery - Access control restriction by using resource tags.
#accesscontrol #bigquery #datagovernance #datasecurity #gcp
Presentation
Session no 3
The only way to enforce best practices is to offer them out of the box: HelloFresh’s data engineering framework
Director of Data Engineering
HelloFresh

Best practices like naming conventions, proper data asset descriptions, or ownership tagging, are a must have to ensure proper governance across your data landscape. Yet still they often require manual effort and a trained, knowledgeable user base to put in place, unfortunately leading to those practices not being followed in the majority of cases. The only way to ensure those rules are followed to the letter, is by baking them in as defaults into the user experience of your data platform. HelloFresh is the world's leading meal kit company and global integrated food solutions group, shipping over a billion meals to customers per year. This talk will present HelloFresh's in-house, low-code, config-driven data engineering framework that was constructed to offer data governance and best practices out of the box. You will learn about the architecture around its open source components, and get a demonstration of the user experience designed to enable even less technical data practitioners.

Presentation
Session no 4
TECH
BUSINESS
Building and Adopting a Versatile Private AI Chat Platform: A User-Centric Approach
Data Scientist
XTB

Our presentation will cover the development and adoption of a private AI chat platform within our company. We will discuss the technical architecture, including the integration of multiple large language models (LLMs) and chat engines from various providers. A significant part of the presentation will focus on our user adoption strategy, which includes conducting workshops and identifying internal advocates to promote the technology. We will also highlight how we use this platform as a unified interface for testing new AI experiments, which streamlines development and user feedback. Additionally, we will share our experience in creating separate Python packages for chat engines to ensure scalability and reusability. The aim is to provide a comprehensive view of both the technical and human aspects of implementing AI solutions in a corporate environment.

Presentation
Session no 5
The topic will be published soon
Parallel sessions (11.55 - 12.15)
Presentation
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
The topic will be published soon
Presentation
Session no 5
The topic will be published soon
Parallel sessions (12.20 - 12.40)
Presentation
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
The topic will be published soon
Presentation
Session no 5
The topic will be published soon
Parallel sessions (12.45 - 13.05)
Presentation
Session no 1
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
The topic will be published soon
Head of Business Intelligence
Tpay
Presentation
Session no 5
TECH
BUSINESS
The greatest MEMES of Reddit with computer vision and visual transformer embeddings
Data Scientist
Warsaw University of Technology

What makes a great meme? Is it the template? The reference to recent events? Or perhaps sheer luck? Using the image embedding pipeline with the refined Vision Transformer model by Google, we explore the memesphere (yes, it's a word) of Reddit, and it's most popular meme subreddit: r/memes. We brew a recipe for the best memes, by analyzing the upvotes and comments statistics. We determine the most similar memes in terms of content and graphics to establish relations and form clusters segregated by meme templates. Finally, we answer the world-shaking question: What was the best meme of last year?

#computervision #imageembeddings #reddit #socialmedia #visualtransformers
13.05 - 13.55
Presentation
13.05 - 13.55
Lunch
Parallel sessions (13.55 - 14.25)
Presentation
Session no 1
TECH
BUSINESS
AI-Driven Software Testing: Redefining Quality and Innovation in the Telecommunications Industry
Manager, EES Test Framework & AI | AI Product Development Leader
Ericsson AB

The future of Artificial Intelligence in software testing: its transformative impact within the telecommunications industry. As AI continues to evolve, organizations are leveraging intelligent solutions to optimize testing processes, enhance product quality, and reduce time-to-market. Drawing from real-world use cases from Ericsson AB, this presentation will dive into how AI is revolutionizing testing methodologies, addressing challenges in AI deployment, and setting the stage for the next leap in testing innovation. Attendees will gain actionable insights into integrating AI into testing pipelines, handling the complexities of large-scale deployments, and overcoming the challenges that come with AI adoption in the software testing space.

#aiintelecom #aioperationalisation #aitesting #softwaretestinginnovation #telecomai
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
TECH
BUSINESS
Open-weights semantic search for customer support
Senior Director Data & AI
WebPros
ML Engineer
WebPros

We want to showcase that successful implementation of AI solutions (open-weights semantic search for customer care) doesn’t have to be costly and can deliver solid ROI - contrary to the growing sentiment in the media as we seem to be entering the trough of disillusionment (as per Gartner’s hype cycle terminology). In our internal tech talks, our peers were particularly interested in some of the thinking process behind certain decisions that we made. For instance they wanted to know how we evaluated the most suitable model from the MTEB leaderboard, how we organized the embeddings to fit our knowledge base, why we chose this and not a different vector store and more. Interestingly, this implementation and heavy internal promotion among non-tech folks spurred an avalanche of ideas from other departments. We want to spread the experience and believe there’s no better place to do this than at the tech-heavy BigData Technology Warsaw Summit.

#business-fit #embeddings #rag #semantic-search #vector-database
Presentation
Session no 4
TECH
BUSINESS
Bridging the SQL Skills Gap: How LLM-Based Text-to-SQL Boosts Team Productivity
Head of Research
Healthy.io

As organizations increasingly rely on data to drive decision-making, the ability to quickly access and analyze data has become essential. However, SQL querying remains a technical hurdle for many, slowing down workflows and limiting data access to specialized team members. This session will demonstrate how LLM-based Text-to-SQL tools can close this gap, enabling non-technical team members—such as product managers and business analysts—to generate SQL queries using natural language. In this talk will explore how to implement Text-to-SQL solutions for databases, discuss strategies to improve query accuracy through prompt engineering techniques, and optimize results by incorporating database structure representations. Attendees will walk away with actionable insights on how to empower their teams, streamline query development, and increase overall productivity.

#aiinbusinessintelligence #datademocratization #datadrivendecisions #productivityhacks #texttosql
Presentation
Session no 5
The topic will be published soon
Parallel sessions (14.30 - 15.00)
Presentation
Session no 1
The topic will be published soon
Presentation
Session no 2
TECH
BUSINESS
Leveraging Feature Store for High Sparsity Recommendations at LOT Polish Airlines
Data Scientist
LOT Polish Airlines
Data Scientist
LOT Polish Airlines

Recommendation systems are an essential part of most e-commerce industries, often responsible for a significant portion of revenue. However, every branch of this industry has its own set of exceptions and challenges that affect how recommender systems have to be designed. In airlines, these exceptions become extreme as returning visitors become sparse, many purchases are anonymous, and items, such as flight tickets, can be sold at different prices depending on the circumstances. To overcome these challenges, we propose a simple method that utilizes information collected about users and items, omitting the need for extracting user/item embeddings with matrix factorization. Additionally, we will talk about how we used a Feature Store as a foundation for this project and why it could be beneficial to implement it in your Data Science team as well.

#airlines #featurestore #recommendersystems
Presentation
Session no 3
TECH
BUSINESS
Graphs for real-time Fraud Detection and Prevention
Software Engineer
Booking.com
Software Engineering Manager
Booking.com

Exploration how our system uses a graph-based approach to store transactions and enhance fraud controls with advanced features, boosting the effectiveness of both ML models and static rules. Presentation of key components of the system, including a real-time feature computation service optimized for low latency, a visualization tool for network analysis, and a mechanism for historical feature reconstruction.

#frauddetection #graphs #payments-fraud #realtime-fraud-prevention
Presentation
Session no 4
From LLM to Agentic AI: Implement your first Agent with CrewAI
Senior Data Scientist
Kuehne+Nagel

The evolution of AI has seen a remarkable transition from standalone language models (LLMs) to compound systems integrating diverse functionalities, culminating in the rise of agentic AI. This talk traces the journey of AI systems, exploring how agentic AI enables autonomous reasoning, planning, and action, making it a pivotal development in solving complex, dynamic problems.
We will dive into the principles of agentic AI, discussing how it works and why it is essential for creating adaptive, task-oriented solutions. The session will then introduce **CrewAI**, an open-source Python package that simplifies the development of intelligent agents. Through a practical use case, participants will learn how to implement their first agent with CrewAI, gaining hands-on insights into leveraging this powerful tool to unlock new possibilities in AI-driven applications.

Presentation
Session no 5
TECH
BUSINESS
Trace in the Trucks: Using Network Analysis and Geographic Patterns to Uncover Fake Logistics Platform Accounts
Founder
Million Monkeys Software

Fast Deliveries, a freight forwarding company, faced a challenge when several of their forwarders violated non-competition agreements by working for their competitor, WHILE (names anonymized), and transferring clients during this period. The investigation centered on analyzing data from Trans.eu - a major European logistics platform where freight forwarders post and accept transportation orders, essentially serving as a digital marketplace crucial for day-to-day logistics operations.

Our team was tasked with developing a methodology to identify these fake accounts and connect them to specific former employees. Combining network analysis, geographic patterns, and temporal data allowed us to identify suspicious accounts with high confidence.

Attendees will learn about the specific investigation and gain insights into analytical techniques they can apply to their own data challenges. We will demonstrate our methodology using anonymized data from the actual investigation.

#dataforensics #frauddetection #gis #investigativeanalytics #networkanalysis
Parallel sessions (15.05 - 15.35)
Presentation
Session no 1
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
TECH
BUSINESS
Prompt Engineering vs. Fine-Tuning: Striking the Balance in Building AI Agents
Lead Data Scientist
TUI
ML Engineer
TUI

Dive into the practical and strategic considerations when choosing between these two approaches for creating effective AI agents. Prompt engineering has risen as a fast, adaptable, and low-cost way to harness the capabilities of LLMs. However, its performance often correlates directly with the size of the model - larger, more costly models are required to achieve the desired results. This trade-off raises questions about scalability and cost-efficiency, especially for organisations with resource constraints.

On the other hand, fine-tuning offers a path to tailor models for domain-specific tasks or nuanced interactions, delivering consistent performance even with smaller models. While it demands more resources upfront, fine-tuned solutions can lead to significant long-term savings by reducing reliance on oversized models:

  • The strengths and limitations of prompt engineering vs. fine-tuning in AI agent development
  • Cost implications: why prompt engineering often requires larger, more expensive models to perform well
  • Fine-tuning as a solution to achieve domain-specific precision with smaller models
  • Case study: the TUI AI travel assistant for the UK market and lessons learned
  • A hybrid approach: combining prompt engineering and fine-tuning for best results.
#aiagents #finetuning #generativeai #machinelearning #promptengineering
Presentation
Session no 5
TECH
BUSINESS
Are you ready for Generative BI? The new level of data platform evolution
Cloud Data Architect, Independent Consultant

The purpose of presentation is to demonstrate possibilities of Generative AI in the context of business intelligence. I'll focus on Copilot, AI Skills in MS Fabric, and AI/BI Genie in Databricks. The presentation will contain description of each tools, comparison, and the demo at the end. I'll present how to prepare data model and data to make it available for Gen AI. I'll demonstrate how these tools can support data democratization in an organization.

I'm also considering to include Conversation Analytics from GCP

#copilot #databricks #generetivebi #genie #msfabricks
15.35 - 15.55
Presentation
15.35 - 15.55
Coffee break
Roundtables (16.00 - 16.45)
Roundtables
16.00 - 16.45
Roundtables

Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.

There will be one roundtable sessions, hence every conference participants can take part in 1 discussion.

Roundtables
TECH
BUSINESS
1. Which LLMs are better: managed APIs or self-hosted open models?
AI R&D Director
Pearson

This roundtable will explore the pros and cons of using managed APIs (such as OpenAI GPT, Anthropic Claude, or AWS Bedrock) versus self-hosted open LLMs (e.g., Bielik or Llama). We’ll delve into the trade-offs between these two approaches, covering critical factors such as cost, scalability, performance, and control over data. Key discussion points include:

  • Performance: Are managed APIs inherently faster and more reliable, or can self-hosted solutions match their capabilities?
  • Cost Considerations: An analysis of infrastructure expenses, API pricing, and resource requirements for each option.
  • Data Security and Privacy: Which approach aligns better with stringent data security regulations and privacy concerns?
  • Flexibility and Customization: How do open models stack up when tailoring solutions to specific business needs?
  • Practical Use Cases: In what scenarios does one approach clearly outperform the other? Are there situations where only one solution is viable?

This session is designed for technical experts and decision-makers to exchange experiences, insights, and predictions about the evolving landscape of LLMs. Whether you’re actively using one of these approaches or still considering your options, this presentation/discussion will offer valuable perspectives to help you make informed decisions.

#ai #bielik #llm #openllm #publiccloud
Roundtables
TECH
BUSINESS
2. Beyond Terraform – efficiently managing SnowFlake account setup
Senior Technical Architect
IQVIA Solutions Poland

This presentation is about our IQVIA Data Transformation & Engineering department approach for configuring Snowflake, specifically but not limited to the areas of security (network policies / storage policies / password policies / single-sign-on configuration) & database management (new databases / warehouses / roles & grants).

In many cases companies tend to hire admins who manage these ad-hoc, with best backup being notepad audit trace of what they run. Configuration in such case differs per user and inconsistencies are piling up. Some are smarter and implement some Git-based solutions, like Terraform. Tools like Terraform typically have SnowFlake plugin to manage these all, but they lack templating, are always behind the latest SnowFlake SQL extensions and do not really address self-service (ad-hoc or UI-based) management needs.

In IQVIA DTE we came to the fairly good compromise between various security / auditing needs, leaving space to both automation, enforcement & self-service where appropriate, yet coming up with a very neat & simple solution which I would like to present.

#automation #configuration #rbac #snowflake
Roundtables
TECH
BUSINESS
3. Taming GenAI in Production: pitfalls and solutions in real-world deployments
Lead solutions architect & CEO
datarabbit

The goal is pretty much how it is described in the official/public description for the roundtable i.e.:

GenAI has been with us for a while - enough for a number of actual systems to be deployed into production. Moving beyond the initial hype about it, this roundtable will tackle the real challenges and best practices of running GenAI systems in real-world, related to:

  • Robust deployment architectures and strategies.
  • Quality control and handling model hallucinations and similar limitations in production.
  • Building effective monitoring and observability systems around these.
  • Ensuring solutions are cost effective and scalable.
  • Security, compliance, and data privacy in real-world settings - and can it be fulfilled in both cases of self-hosting LLMs and utilizing 3rd party APIs.
  • And more...

We invite both engineering and business leaders, architects, and AI practitioners that already deployed GenAI systems to production, as well the ones that have to yet turn their proofs-of concepts into serious deployments and would like to get to know how to do so - but the knowledge of the basics of such systems and aspects around them are required.

Essentially - we would like to gather a group of really component people for discussion, exchange of knowledge, and establishing best practices for GenAI systems.

#beyondpoc #genai #llms #production
Roundtables
TECH
BUSINESS
4. Technical and Strategic Lessons from Implementing GDPR
Big Data Engineer
Agile Lab

Join us for a dynamic discussion that begins with insights from a gdpr implementation journey  to then expand on broader data governance topics. We will see how privacy compliance can inform and enhance general data management practices: going beyond legal obligations to create an opportunity for upgrading and optimizing data infrastructure and operations, leading to better overall data governance.

Key discussion points will include:

  • Converting compliance requirements into opportunities for standardizing data processes
  • Improving data sharing workflows between production and development environments
  • Enhancing data documentation and observability through privacy implementation
  • Technical solutions for privacy preservation, including encryption strategies that maintain data usefulness and integrity
  • Technical solutions for data stewardship.

This session welcomes privacy officers, data engineers, architects, and technology leaders at any stage of their gdpr or data governance implementation journey. Come ready to share your experiences, challenges, and successes in balancing compliance needs with operational efficiency. Compliance is not only about avoiding fines, but can also be an opportunity for spreading good design and best practices.

#data engineering #encryption #gdpr
Presentation
TECH
BUSINESS
5. Evolving from Data Engineering to Data Platform: Key Mindset and Technical Shifts for Success
Engineering Manager for Data Platform
SumUp

This is a discussion about the transformation from traditional data engineering to a data platform-oriented approach. As organizations scale their data operations to serve hundreds or thousands of users, the conventional centralized data engineering model faces significant challenges. In this roundtable, we'll explore and share experiences about critical shifts in both mindset and technology that enable this evolution.

Key discussion points will include:

  • How to effectively treat data problems the same as production incidents.
  • Democratizing data engineering skills for everyone across the organization, and not just a role
  • Focus on UI over YAML self-service platform

This session is ideal for data leaders, engineers, and practitioners who are facing or have overcome similar challenges in their organizations. At the end of this roundtable, you’ll hopefully be able to get an idea how to start a data platform team in your organization and get inspired by work done by your peers.

#dataengineering #dataplatform #datastrategy
Roundtables
TECH
BUSINESS
6. Automating Data Governance with LLMs
Professor of software engineering
HTW Berlin

The exchange of sensitive data across systems, both within and between organizations, is often constrained by legal documents, such as privacy policies. Verifying the legitimacy of data access requests can be labor-intensive, leading to delays or, worse, undocumented and unauthorized access. As organizations adopt data mesh and face increasing complexity in data protection requirements, the intersection of GenAI and data governance presents exciting opportunities and challenges.

Key discussion points will include:

  • Challenges in translating legal documents and privacy policies into automated governance rules
  • Real-world experiences with implementing LLM/AI-powered governance in data mesh environments
  • Balancing automation with compliance and risk management
  • Approaches to building trust in AI-driven governance decisions
  • Evolution of data contracts and their automation in distributed data architectures

This session welcomes data governance professionals, architects, legal experts, and technology leaders to share their perspectives on automating data governance and data quality processes, or building data products. This roundtable provides an opportunity to network with peers and discuss emerging patterns at the intersection of AI, data mesh, and governance. Together, we'll explore how these technologies can create more efficient and reliable data governance processes while ensuring compliance with legal and organizational requirements.

#cdos #dataarchitects #dataengineers
Roundtables
TECH
BUSINESS
7. Building Real-Time Analytics Solutions: Architectures, Challenges, and Solutions
Sr Staff Data Engineer
airSlate

Join us for a discussion about designing and implementing real-time analytics platforms and solutions, with a focus on processing user clickstream and interaction data at scale. 

 

Key discussion points will include:

  • Goals, cases, and requirements for real-time analytics
  • Common architectural patterns and anti-patterns in real-time analytics
  • Infrastructure decisions: choosing between and combining technologies like Kafka, Flink, Redis, Clickhouse, and other streaming solutions
  • Handling challenges such as data consistency, latency, and scalability
  • Lessons learned from production implementations across different organizations

This session is ideal for data engineers, architects, and technical leaders who are either currently working with real-time analytics or planning to implement such systems. Whether you're dealing with clickstream data, financial transactions, or even IoT sensors, or other streaming use cases, you should find this discussion valuable.

#awsdatasolutions #clickstreamanalytics #dataarchitecture #realtimedata
Roundtables
TECH
BUSINESS
8. The first rule of FLYTE club is: we talk about ML pipelines
Senior MLOps
Printify

I've been in the trenches dealing with messy ML pipelines that consume more time than they should. Through hands-on experience, I found practical ways to simplify pipeline orchestration using Flyte. In this session, I will give you a crash course in building ML pipelines - how and where to start and how to scale it up later, while dealing with all the nasty problems that you will encounter on the road.

#flyte #kubernetes #ml-pipelines #mlops #opensource
Plenary session (16.45 - 17.15)
Presentation
16.45 - 17.10
The topic will be published soon
Presentation
17.10 - 17.15
Summary & closing
Evening networking session (19.30 - 22.30)
Presentation
19.30 - 22.30
Evening networking session

Pobieranie biorgamu... Proszę czekać...

×

10.04.2025 - Second Conference day | Online Only

Presentation
TECH
BUSINESS
Harnessing Real-Time Analytics: Building a Cost-Effective, Resilient Data Lake and Data Mesh with CDC Tools
Lead Architect
Direct Line Group UK

This presentation delves into creating a real-time analytics platform by leveraging cost-effective Change Data Capture (CDC) tools like Debezium for seamless data ingestion from sources such as Oracle into Kafka. We’ll explore how to build a resilient data lake and data mesh architecture using Apache Flink, ensuring data loss prevention, point-in-time recovery, and robust schema evolution to support agile data integration. Participants will learn best practices for establishing a scalable, real-time data pipeline that balances performance, reliability, and flexibility, enabling efficient analytics and decision-making.

#cdc #datalake #datamesh #realtimeanalytics #streamingdata

Pobieranie biorgamu... Proszę czekać...

×