Agenda - Data & AI Warsaw Tech Summit

Our agenda is packed with presentations, arranged into 9 categories – find your most desired topics!

9.04.2025 - First Conference Day | Hybrid: Online + Onsite

8.00 - 9.00
Presentation
8.00 - 9.00
Registration of participants at the hotel
Plenary session (9.00 - 11.05)
Presentation
9.00 - 9.10
Conference opening
Presentation
9.10 - 11.05
The topics will be published soon
11.05 - 11.30
Presentation
11.05 - 11.30
Coffee break
Parallel sessions (11.30 - 11.50)
Presentation
Session no 1
Kingestor: Real-Time Ingestion of hundred of billions of Game Events for ML and Analytics
Senior Software Engineer
King

At King, we ingest and analyze around a trillion of game events weekly to enhance gameplay and deliver player experiences to over two hundred million monthly active users. These game events enable us to improve our games via game observability, features delivery, and machine learning.

However, collection and storage of these game events bring a variety of challenges due to its sheer volume and high speed.

To address the challenges, we have developed an in-house product called Kingestor, which allows us to ingest our game events in an effective manner. It processes game events in near real-time with just a ten-minute latency, loading approximately five million events per second. Kingestor ensures data integrity through event reconciliation and deduplication, providing accurate, real-time insights for both business and technical applications. It is a scalable and adaptable product, which is designed for use across the gaming industry, making it easy to implement for businesses handling large-scale data.

Presentation
Session no 2
TECH
BUSINESS
Data Governance and Access Control in GCP BigQuery
Data Governance and Access Control
Zalando

Methods to manage data and its authorized usage in Google Cloud Platform, including access control through data governance and enforced security measures:

  • Fine grained access control in BigQuery by enforcing IAM policies
  • Column level access control by defining taxonomies and policy tags
  • Best practices for using taxonomies and policy tags
  • Row level access control in BigQuery - Access control restriction by using resource tags.
#accesscontrol #bigquery #datagovernance #datasecurity #gcp
Presentation
Session no 3
The only way to enforce best practices is to offer them out of the box: HelloFresh’s data engineering framework
Director of Data Engineering
HelloFresh

Best practices like naming conventions, proper data asset descriptions, or ownership tagging, are a must have to ensure proper governance across your data landscape. Yet still they often require manual effort and a trained, knowledgeable user base to put in place, unfortunately leading to those practices not being followed in the majority of cases. The only way to ensure those rules are followed to the letter, is by baking them in as defaults into the user experience of your data platform. HelloFresh is the world's leading meal kit company and global integrated food solutions group, shipping over a billion meals to customers per year. This talk will present HelloFresh's in-house, low-code, config-driven data engineering framework that was constructed to offer data governance and best practices out of the box. You will learn about the architecture around its open source components, and get a demonstration of the user experience designed to enable even less technical data practitioners.

Presentation
Session no 4
TECH
BUSINESS
Building and Adopting a Versatile Private AI Chat Platform: A User-Centric Approach
Data Scientist
XTB

Our presentation will cover the development and adoption of a private AI chat platform within our company. We will discuss the technical architecture, including the integration of multiple large language models (LLMs) and chat engines from various providers. A significant part of the presentation will focus on our user adoption strategy, which includes conducting workshops and identifying internal advocates to promote the technology. We will also highlight how we use this platform as a unified interface for testing new AI experiments, which streamlines development and user feedback. Additionally, we will share our experience in creating separate Python packages for chat engines to ensure scalability and reusability. The aim is to provide a comprehensive view of both the technical and human aspects of implementing AI solutions in a corporate environment.

Presentation
Session no 5
The topic will be published soon
Parallel sessions (11.55 - 12.15)
Presentation
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
The topic will be published soon
Presentation
Session no 5
The topic will be published soon
Parallel sessions (12.20 - 12.40)
Presentation
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
The topic will be published soon
Presentation
Session no 5
The topic will be published soon
Parallel sessions (12.45 - 13.05)
Presentation
Session no 1
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
The topic will be published soon
Head of Business Intelligence
Tpay
Presentation
Session no 5
TECH
BUSINESS
The greatest MEMES of Reddit with computer vision and visual transformer embeddings
Data Scientist
Warsaw University of Technology

What makes a great meme? Is it the template? The reference to recent events? Or perhaps sheer luck? Using the image embedding pipeline with the refined Vision Transformer model by Google, we explore the memesphere (yes, it's a word) of Reddit, and it's most popular meme subreddit: r/memes. We brew a recipe for the best memes, by analyzing the upvotes and comments statistics. We determine the most similar memes in terms of content and graphics to establish relations and form clusters segregated by meme templates. Finally, we answer the world-shaking question: What was the best meme of last year?

#computervision #imageembeddings #reddit #socialmedia #visualtransformers
13.05 - 13.55
Presentation
13.05 - 13.55
Lunch
Parallel sessions (13.55 - 14.25)
Presentation
Session no 1
TECH
BUSINESS
AI-Driven Software Testing: Redefining Quality and Innovation in the Telecommunications Industry
Manager, EES Test Framework & AI | AI Product Development Leader
Ericsson AB

The future of Artificial Intelligence in software testing: its transformative impact within the telecommunications industry. As AI continues to evolve, organizations are leveraging intelligent solutions to optimize testing processes, enhance product quality, and reduce time-to-market. Drawing from real-world use cases from Ericsson AB, this presentation will dive into how AI is revolutionizing testing methodologies, addressing challenges in AI deployment, and setting the stage for the next leap in testing innovation. Attendees will gain actionable insights into integrating AI into testing pipelines, handling the complexities of large-scale deployments, and overcoming the challenges that come with AI adoption in the software testing space.

#aiintelecom #aioperationalisation #aitesting #softwaretestinginnovation #telecomai
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
TECH
BUSINESS
Open-weights semantic search for customer support
Senior Director Data & AI
WebPros
ML Engineer
WebPros

We want to showcase that successful implementation of AI solutions (open-weights semantic search for customer care) doesn’t have to be costly and can deliver solid ROI - contrary to the growing sentiment in the media as we seem to be entering the trough of disillusionment (as per Gartner’s hype cycle terminology). In our internal tech talks, our peers were particularly interested in some of the thinking process behind certain decisions that we made. For instance they wanted to know how we evaluated the most suitable model from the MTEB leaderboard, how we organized the embeddings to fit our knowledge base, why we chose this and not a different vector store and more. Interestingly, this implementation and heavy internal promotion among non-tech folks spurred an avalanche of ideas from other departments. We want to spread the experience and believe there’s no better place to do this than at the tech-heavy BigData Technology Warsaw Summit.

#business-fit #embeddings #rag #semantic-search #vector-database
Presentation
Session no 4
TECH
BUSINESS
Bridging the SQL Skills Gap: How LLM-Based Text-to-SQL Boosts Team Productivity
Head of Research
Healthy.io

As organizations increasingly rely on data to drive decision-making, the ability to quickly access and analyze data has become essential. However, SQL querying remains a technical hurdle for many, slowing down workflows and limiting data access to specialized team members. This session will demonstrate how LLM-based Text-to-SQL tools can close this gap, enabling non-technical team members—such as product managers and business analysts—to generate SQL queries using natural language. In this talk will explore how to implement Text-to-SQL solutions for databases, discuss strategies to improve query accuracy through prompt engineering techniques, and optimize results by incorporating database structure representations. Attendees will walk away with actionable insights on how to empower their teams, streamline query development, and increase overall productivity.

#aiinbusinessintelligence #datademocratization #datadrivendecisions #productivityhacks #texttosql
Presentation
Session no 5
The topic will be published soon
Parallel sessions (14.30 - 15.00)
Presentation
Session no 1
The topic will be published soon
Presentation
Session no 2
TECH
BUSINESS
Leveraging Feature Store for High Sparsity Recommendations at LOT Polish Airlines
Data Scientist
LOT Polish Airlines
Data Scientist
LOT Polish Airlines

Recommendation systems are an essential part of most e-commerce industries, often responsible for a significant portion of revenue. However, every branch of this industry has its own set of exceptions and challenges that affect how recommender systems have to be designed. In airlines, these exceptions become extreme as returning visitors become sparse, many purchases are anonymous, and items, such as flight tickets, can be sold at different prices depending on the circumstances. To overcome these challenges, we propose a simple method that utilizes information collected about users and items, omitting the need for extracting user/item embeddings with matrix factorization. Additionally, we will talk about how we used a Feature Store as a foundation for this project and why it could be beneficial to implement it in your Data Science team as well.

#airlines #featurestore #recommendersystems
Presentation
Session no 3
TECH
BUSINESS
Graphs for real-time Fraud Detection and Prevention
Software Engineer
Booking.com
Software Engineering Manager
Booking.com

Exploration how our system uses a graph-based approach to store transactions and enhance fraud controls with advanced features, boosting the effectiveness of both ML models and static rules. Presentation of key components of the system, including a real-time feature computation service optimized for low latency, a visualization tool for network analysis, and a mechanism for historical feature reconstruction.

#frauddetection #graphs #payments-fraud #realtime-fraud-prevention
Presentation
Session no 4
From LLM to Agentic AI: Implement your first Agent with CrewAI
Senior Data Scientist
Kuehne+Nagel

The evolution of AI has seen a remarkable transition from standalone language models (LLMs) to compound systems integrating diverse functionalities, culminating in the rise of agentic AI. This talk traces the journey of AI systems, exploring how agentic AI enables autonomous reasoning, planning, and action, making it a pivotal development in solving complex, dynamic problems.
We will dive into the principles of agentic AI, discussing how it works and why it is essential for creating adaptive, task-oriented solutions. The session will then introduce **CrewAI**, an open-source Python package that simplifies the development of intelligent agents. Through a practical use case, participants will learn how to implement their first agent with CrewAI, gaining hands-on insights into leveraging this powerful tool to unlock new possibilities in AI-driven applications.

Presentation
Session no 5
TECH
BUSINESS
Trace in the Trucks: Using Network Analysis and Geographic Patterns to Uncover Fake Logistics Platform Accounts
Founder
Million Monkeys Software

Fast Deliveries, a freight forwarding company, faced a challenge when several of their forwarders violated non-competition agreements by working for their competitor, WHILE (names anonymized), and transferring clients during this period. The investigation centered on analyzing data from Trans.eu - a major European logistics platform where freight forwarders post and accept transportation orders, essentially serving as a digital marketplace crucial for day-to-day logistics operations.

Our team was tasked with developing a methodology to identify these fake accounts and connect them to specific former employees. Combining network analysis, geographic patterns, and temporal data allowed us to identify suspicious accounts with high confidence.

Attendees will learn about the specific investigation and gain insights into analytical techniques they can apply to their own data challenges. We will demonstrate our methodology using anonymized data from the actual investigation.

#dataforensics #frauddetection #gis #investigativeanalytics #networkanalysis
Parallel sessions (15.05 - 15.35)
Presentation
Session no 1
The topic will be published soon
Presentation
Session no 2
The topic will be published soon
Presentation
Session no 3
The topic will be published soon
Presentation
Session no 4
TECH
BUSINESS
Prompt Engineering vs. Fine-Tuning: Striking the Balance in Building AI Agents
Lead Data Scientist
TUI
ML Engineer
TUI

Dive into the practical and strategic considerations when choosing between these two approaches for creating effective AI agents. Prompt engineering has risen as a fast, adaptable, and low-cost way to harness the capabilities of LLMs. However, its performance often correlates directly with the size of the model - larger, more costly models are required to achieve the desired results. This trade-off raises questions about scalability and cost-efficiency, especially for organisations with resource constraints.

On the other hand, fine-tuning offers a path to tailor models for domain-specific tasks or nuanced interactions, delivering consistent performance even with smaller models. While it demands more resources upfront, fine-tuned solutions can lead to significant long-term savings by reducing reliance on oversized models:

  • The strengths and limitations of prompt engineering vs. fine-tuning in AI agent development
  • Cost implications: why prompt engineering often requires larger, more expensive models to perform well
  • Fine-tuning as a solution to achieve domain-specific precision with smaller models
  • Case study: the TUI AI travel assistant for the UK market and lessons learned
  • A hybrid approach: combining prompt engineering and fine-tuning for best results.
#aiagents #finetuning #generativeai #machinelearning #promptengineering
Presentation
Session no 5
TECH
BUSINESS
Are you ready for Generative BI? The new level of data platform evolution
Cloud Data Architect, Independent Consultant

The purpose of presentation is to demonstrate possibilities of Generative AI in the context of business intelligence. I'll focus on Copilot, AI Skills in MS Fabric, and AI/BI Genie in Databricks. The presentation will contain description of each tools, comparison, and the demo at the end. I'll present how to prepare data model and data to make it available for Gen AI. I'll demonstrate how these tools can support data democratization in an organization.

I'm also considering to include Conversation Analytics from GCP

#copilot #databricks #generetivebi #genie #msfabricks
15.35 - 15.55
Presentation
15.35 - 15.55
Coffee break
Roundtables (16.00 - 16.45)
Roundtables
16.00 - 16.45
Roundtables

Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.

There will be one roundtable sessions, hence every conference participants can take part in 1 discussion.

Roundtables
TECH
BUSINESS
1. Which LLMs are better: managed APIs or self-hosted open models?
AI R&D Director
Pearson

This roundtable will explore the pros and cons of using managed APIs (such as OpenAI GPT, Anthropic Claude, or AWS Bedrock) versus self-hosted open LLMs (e.g., Bielik or Llama). We’ll delve into the trade-offs between these two approaches, covering critical factors such as cost, scalability, performance, and control over data. Key discussion points include:

  • Performance: Are managed APIs inherently faster and more reliable, or can self-hosted solutions match their capabilities?
  • Cost Considerations: An analysis of infrastructure expenses, API pricing, and resource requirements for each option.
  • Data Security and Privacy: Which approach aligns better with stringent data security regulations and privacy concerns?
  • Flexibility and Customization: How do open models stack up when tailoring solutions to specific business needs?
  • Practical Use Cases: In what scenarios does one approach clearly outperform the other? Are there situations where only one solution is viable?

This session is designed for technical experts and decision-makers to exchange experiences, insights, and predictions about the evolving landscape of LLMs. Whether you’re actively using one of these approaches or still considering your options, this presentation/discussion will offer valuable perspectives to help you make informed decisions.

#ai #bielik #llm #openllm #publiccloud
Roundtables
TECH
BUSINESS
2. Beyond Terraform – efficiently managing SnowFlake account setup
Senior Technical Architect
IQVIA Solutions Poland

This presentation is about our IQVIA Data Transformation & Engineering department approach for configuring Snowflake, specifically but not limited to the areas of security (network policies / storage policies / password policies / single-sign-on configuration) & database management (new databases / warehouses / roles & grants).

In many cases companies tend to hire admins who manage these ad-hoc, with best backup being notepad audit trace of what they run. Configuration in such case differs per user and inconsistencies are piling up. Some are smarter and implement some Git-based solutions, like Terraform. Tools like Terraform typically have SnowFlake plugin to manage these all, but they lack templating, are always behind the latest SnowFlake SQL extensions and do not really address self-service (ad-hoc or UI-based) management needs.

In IQVIA DTE we came to the fairly good compromise between various security / auditing needs, leaving space to both automation, enforcement & self-service where appropriate, yet coming up with a very neat & simple solution which I would like to present.

#automation #configuration #rbac #snowflake
Roundtables
TECH
BUSINESS
3. Taming GenAI in Production: pitfalls and solutions in real-world deployments
Lead solutions architect & CEO
datarabbit

The goal is pretty much how it is described in the official/public description for the roundtable i.e.:

GenAI has been with us for a while - enough for a number of actual systems to be deployed into production. Moving beyond the initial hype about it, this roundtable will tackle the real challenges and best practices of running GenAI systems in real-world, related to:

  • Robust deployment architectures and strategies.
  • Quality control and handling model hallucinations and similar limitations in production.
  • Building effective monitoring and observability systems around these.
  • Ensuring solutions are cost effective and scalable.
  • Security, compliance, and data privacy in real-world settings - and can it be fulfilled in both cases of self-hosting LLMs and utilizing 3rd party APIs.
  • And more...

We invite both engineering and business leaders, architects, and AI practitioners that already deployed GenAI systems to production, as well the ones that have to yet turn their proofs-of concepts into serious deployments and would like to get to know how to do so - but the knowledge of the basics of such systems and aspects around them are required.

Essentially - we would like to gather a group of really component people for discussion, exchange of knowledge, and establishing best practices for GenAI systems.

#beyondpoc #genai #llms #production
Roundtables
TECH
BUSINESS
4. Technical and Strategic Lessons from Implementing GDPR
Big Data Engineer
Agile Lab

From 2022 to 2024 I have worked on developing the technical pipelines needed to implement GDPR compliance for a big insurance group. This project has 2 aspects that i would share: the first is technical, namely the mode of encryption we have used to mask data while maintaining its usefulness and reference integrity (format preserving encryption). The second aspect is non technical, related to the value of implementing GDPR: besides compliance / risk management, implementing GDPR has been essential to standardising the process of data sharing from prod to dev environments and improving the data observability and documentation. This has eliminated many inconsistencies and shadow work related to data available in dev environment. Overall the message is that compliance is not only about avoiding fines, but can also be an opportunity for spreading good design and best practices.

#data engineering #encryption #gdpr
Presentation
TECH
BUSINESS
5. From Data Engineering to Data Platform: The top 3 things we changed for our data users (mindset and technology standpoints)
Engineering Manager for Data Platform
SumUp

In 2023, the Data Engineering team @ SumUp adopted a Platform mindset. The plan was great but the journey wasn't as smooth sailing as it should be.

Data users lost trust to our team and started to do workarounds.

#dataengineering #dataplatform #datastrategy
Roundtables
TECH
BUSINESS
6. LLM-Based Detection of Data Protection Violations as Part of a Data Mesh Platform
Professor of software engineering
HTW Berlin

#cdos #dataarchitects #dataengineers
Roundtables
TECH
BUSINESS
7. The architecture of ClickStream solution
Sr Staff Data Engineer
airSlate

I would like to share my experience with the architecture I built to implement the clickstream solution. ClickSteam is a data analytics platform designed to track, collect, analyze, and interpret the sequence of clicks or interactions that users make on a website or app. My team built this solution on AWS using the following technologies: GRPC, Kuber, Flink, Airflow, Redis, Redshift, Apache Kafka, etc. All this staff helped us to achieve good results and make this solution possible. Let's dive deep and see how this solution works and what cons and pros we found.

#awsdatasolutions #clickstreamanalytics #dataarchitecture #realtimedata
Roundtables
TECH
BUSINESS
8. The first rule of FLYTE club is: we talk about ML pipelines
Senior MLOps
Printify

I've been in the trenches dealing with messy ML pipelines that consume more time than they should. Through hands-on experience, I found practical ways to simplify pipeline orchestration using Flyte. In this session, I will give you a crash course in building ML pipelines - how and where to start and how to scale it up later, while dealing with all the nasty problems that you will encounter on the road.

#flyte #kubernetes #ml-pipelines #mlops #opensource
Plenary session (16.45 - 17.15)
Presentation
16.45 - 17.10
The topic will be published soon
Presentation
17.10 - 17.15
Summary & closing
Evening networking session (19.30 - 22.30)
Presentation
19.30 - 22.30
Evening networking session

Pobieranie biorgamu... Proszę czekać...

×

10.04.2025 - Second Conference day | Online Dnly

Presentation
TECH
BUSINESS
Harnessing Real-Time Analytics: Building a Cost-Effective, Resilient Data Lake and Data Mesh with CDC Tools
Lead Architect
Direct Line Group UK

This presentation delves into creating a real-time analytics platform by leveraging cost-effective Change Data Capture (CDC) tools like Debezium for seamless data ingestion from sources such as Oracle into Kafka. We’ll explore how to build a resilient data lake and data mesh architecture using Apache Flink, ensuring data loss prevention, point-in-time recovery, and robust schema evolution to support agile data integration. Participants will learn best practices for establishing a scalable, real-time data pipeline that balances performance, reliability, and flexibility, enabling efficient analytics and decision-making.

#cdc #datalake #datamesh #realtimeanalytics #streamingdata

Pobieranie biorgamu... Proszę czekać...

×