We only invite people actively participating in the conference agenda. Participation for speakers and panelists of the Data & Ai Warsaw Tech Summitit conference is free of charge.
Separate registration is required for the meeting.
"Dzień i Noc" Restaurant, Plac Mirowski 1, 00-138 Warsaw Poland
Pobieranie biorgamu... Proszę czekać...
LinkedIn aims to create the world’s most trusted professional community, empowering individuals, and organizations to achieve success and realize economic opportunity. A key pillar of this vision is ensuring the authenticity of members on the platform, such as through robust identity verification. Equally important is fostering a safe, professional, and trustworthy environment for member interactions. This includes ensuring that content in members’ feeds is appropriate, aligned with their interests, and consistent with our community policies while following member controls.
Generative AI is revolutionizing the state of the art in content understanding, complementing, and enhancing both traditional AI models and human labeling efforts. This innovation significantly improves the quality and scalability of decision-making processes. This talk will provide an overview of LinkedIn’s trust and safety features, with a special focus on our application of Generative AI to advance these initiatives.
Context engineering is the missing link in realizing the full potential of your AI strategy. This presentation explores how advanced techniques in context management can deliver enterprise value through effective use of Retrieval-Augmented Generation (RAG). Discover how to unlock the full potential of your data catalog investments and drive more intelligent, context-aware AI applications to transform data conversations.
A concise exploration of how the new AI Supercomputing era is accelerating the way people live and work. Discuss how to accelerate research and innovation in fields from healthcare and life sciences to the green transition, supporting the development of innovative solutions to the world’s biggest problems. Discover how combining cutting-edge generative models with a Sovereign AI approach empowers organizations to innovate responsibly and securely in a rapidly evolving technological and geopolitical landscape.
The rise of AI presents both opportunity and disruption for technology professionals. As we create these intelligent systems, they simultaneously transform our own workflows and job roles. This panel brings together experts to explore visions of our professional future:
Will AI primarily serve as a productivity multiplier, lowering entry barriers while elevating experienced professionals to focus on higher-value work?
Or will AI drive radical workforce consolidation, with robust engineering and analytics departments reduced to lean teams overseeing automated processes?
Our panelists will examine the short-term impact on our careers over the next 1-2 years alongside longer-term projections spanning more than a decade. They will also share their practical strategies for IT professionals to position themselves in high-demand specializations, develop necessary skills and expertise, and create effective career plans.
At King, we ingest and analyze around a trillion of game events weekly to enhance gameplay and deliver player experiences to over two hundred million monthly active users. These game events enable us to improve our games via game observability, features delivery, and machine learning.
However, collection and storage of these game events bring a variety of challenges due to its sheer volume and high speed.
To address the challenges, we have developed an in-house product called Kingestor, which allows us to ingest our game events in an effective manner. It processes game events in near real-time with just a ten-minute latency, loading approximately five million events per second. Kingestor ensures data integrity through event reconciliation and deduplication, providing accurate, real-time insights for both business and technical applications. It is a scalable and adaptable product, which is designed for use across the gaming industry, making it easy to implement for businesses handling large-scale data.
Methods to manage data and its authorized usage in Google Cloud Platform, including access control through data governance and enforced security measures:
Best practices like naming conventions, proper data asset descriptions, or ownership tagging, are a must have to ensure proper governance across your data landscape. Yet still they often require manual effort and a trained, knowledgeable user base to put in place, unfortunately leading to those practices not being followed in the majority of cases. The only way to ensure those rules are followed to the letter, is by baking them in as defaults into the user experience of your data platform. HelloFresh is the world's leading meal kit company and global integrated food solutions group, shipping over a billion meals to customers per year. This talk will present HelloFresh's in-house, low-code, config-driven data engineering framework that was constructed to offer data governance and best practices out of the box. You will learn about the architecture around its open source components, and get a demonstration of the user experience designed to enable even less technical data practitioners.
Our presentation will cover the development and adoption of a private AI chat platform within our company. We will discuss the technical architecture, including the integration of multiple large language models (LLMs) and chat engines from various providers. A significant part of the presentation will focus on our user adoption strategy, which includes conducting workshops and identifying internal advocates to promote the technology. We will also highlight how we use this platform as a unified interface for testing new AI experiments, which streamlines development and user feedback. Additionally, we will share our experience in creating separate Python packages for chat engines to ensure scalability and reusability. The aim is to provide a comprehensive view of both the technical and human aspects of implementing AI solutions in a corporate environment.
Discover how an open source project under the Google umbrella leverages GCP key pillar services—advanced LLMs (Gemini), Looker for BI, Cloud Functions, and BigQuery—to drive NLP analytics. Integrating these tools within a React application enables intuitive data querying and visualization. Join this presentation and learn about real-world challenges in production environments.
Every week hundreds of numerical metrics are delivered to decision makers at Allegro. Most of them were calculated within our data ecosystem, some calculated correctly, very few documented or described. We'd like to take you on a quick journey describing what the KPI management process has looked like till now, and how we decided to change it. During our presentation, we'd like to showcase our approach to distributed KPI governance over multiple teams, where metric calculation is orchestrated through Airflow, calculated in GCP and documented using an everything-as-code approach, to end up through CI/CD in one, common data catalog.
This presentation highlights the importance of transaction flagging and outlines the basic business logic. Given the complexity and need for deep business knowledge, we will also introduce an AI Chat Bot integrated with our Confluence as a valuable tool. We will show how we are leveraging vector database search to enhance data retrieval efficiency and accuracy. We will also show how Apache Flink was utilized for real-time data processing and analytics, ensuring timely insights. Together, these technologies are streamlining operations and support informed decision-making.
The session will focus on the concept and importance of data extrapolation, emphasizing its vital role in forecasting and decision-making, especially when historical data is limited for anticipating trends. It will explain the reasons for using extrapolation, discuss potential challenges, and outline effective interpretation methods to enhance prediction accuracy.
Additionally, the session will review common extrapolation techniques, detailing their strengths and limitations. An alternative approach will also be introduced. A practical example will be provided to demonstrate how this alternative method can be applied, offering a clear understanding of its benefits, potential drawbacks, and success measures in real-world forecasting scenarios.
In an evolving regulatory landscape, automation plays a crucial role in ensuring efficiency, consistency, and compliance across the Credit Risk Model Life Cycle.
This session explores how automation enhances key processes from data through model development to monitoring and validation reporting.
We will showcase how automation can streamline the processes around MLC and how AI-driven solutions, including LLMs, can help interpret test results and regulatory changes, providing deeper insights and accelerating decision-making.
By leveraging automation and AI, financial institutions can automate workflows, reduce operational risk, and improve transparency in model governance.
Data lakehouse has recently emerged as a go-to architecture for implementing modern data platforms. It bridges the gap between expansive data lakes and the structured world of data warehouses.
On one hand, its elasticity and modular nature enables enormous flexibility to meet very specific customer needs. On the other, since with great power comes great responsibility, some architectural decisions may significantly impact the overall performance, cost and in the end customer satisfaction.
The current tools available on the market allow us to implement this platform in various models, ranging from fully Open Source to manage SaaS. How can we choose the right tool for a specific case? Is there a universal deployment model for such a platform?
Based on our experience from a few data lakehouse projects we will shed some light on the key aspects to consider while architecting a robust data platform.
Fast Deliveries, a freight forwarding company, faced a challenge when several of their forwarders violated non-competition agreements by working for their competitor, WHILE (names anonymized), and transferring clients during this period. The investigation centered on analyzing data from Trans.eu - a major European logistics platform where freight forwarders post and accept transportation orders, essentially serving as a digital marketplace crucial for day-to-day logistics operations.
Our team was tasked with developing a methodology to identify these fake accounts and connect them to specific former employees. Combining network analysis, geographic patterns, and temporal data allowed us to identify suspicious accounts with high confidence.
Attendees will learn about the specific investigation and gain insights into analytical techniques they can apply to their own data challenges. We will demonstrate our methodology using anonymized data from the actual investigation.
At SiriusXM, we make sure that satellite radio and online radio subscribers receive the optimum recommendations of various music, news, talk & podcast or sport content.
This talk is about the real challenges and best practices of application of the AI, ML and Data Science in the streaming media, focusing on:
What makes a great meme? Is it the template? The reference to recent events? Or perhaps sheer luck? Using the image embedding pipeline with the refined Vision Transformer model by Google, we explore the memesphere (yes, it's a word) of Reddit, and it's most popular meme subreddit: r/memes. We brew a recipe for the best memes, by analyzing the upvotes and comments statistics. We determine the most similar memes in terms of content and graphics to establish relations and form clusters segregated by meme templates. Finally, we answer the world-shaking question: What was the best meme of last year?
The future of Artificial Intelligence in software testing: its transformative impact within the telecommunications industry. As AI continues to evolve, organizations are leveraging intelligent solutions to optimize testing processes, enhance product quality, and reduce time-to-market. Drawing from real-world use cases from Ericsson AB, this presentation will dive into how AI is revolutionizing testing methodologies, addressing challenges in AI deployment, and setting the stage for the next leap in testing innovation. Attendees will gain actionable insights into integrating AI into testing pipelines, handling the complexities of large-scale deployments, and overcoming the challenges that come with AI adoption in the software testing space.
We want to showcase that successful implementation of AI solutions (open-weights semantic search for customer care) doesn’t have to be costly and can deliver solid ROI - contrary to the growing sentiment in the media as we seem to be entering the trough of disillusionment (as per Gartner’s hype cycle terminology). In our internal tech talks, our peers were particularly interested in some of the thinking process behind certain decisions that we made. For instance they wanted to know how we evaluated the most suitable model from the MTEB leaderboard, how we organized the embeddings to fit our knowledge base, why we chose this and not a different vector store and more. Interestingly, this implementation and heavy internal promotion among non-tech folks spurred an avalanche of ideas from other departments. We want to spread the experience and believe there’s no better place to do this than at the tech-heavy BigData Technology Warsaw Summit.
As organizations increasingly rely on data to drive decision-making, the ability to quickly access and analyze data has become essential. However, SQL querying remains a technical hurdle for many, slowing down workflows and limiting data access to specialized team members. This session will demonstrate how LLM-based Text-to-SQL tools can close this gap, enabling non-technical team members—such as product managers and business analysts—to generate SQL queries using natural language. In this talk will explore how to implement Text-to-SQL solutions for databases, discuss strategies to improve query accuracy through prompt engineering techniques, and optimize results by incorporating database structure representations. Attendees will walk away with actionable insights on how to empower their teams, streamline query development, and increase overall productivity.
In the fast-paced world of cloud-native applications, ensuring seamless deployment and management of service dependencies is critical. Enter Saxo Blueprint, an innovative solution developed to orchestrate the deployment of essential dependencies such as internal applications, Kafka topics, databases, firewall openings, DNS records, and more. Additionally, a portal based on Spotify Backstage enables developers and other employees to discover and configure services in the company.
This solution enables easier disaster recovery as well as giving us a great way to provide evidence to auditors including approvals.
Restate (https://restate.dev/) is a system built by creators of Apache Flink, and inspired by our work on Apache Flink and Stateful Functions. It is an event-driven system built from the ground up for transactional workflows.
While Flink is built for analytical use cases (think OLAP on streams), Restate can be thought of as the transactional (OLTP) twin counterpart, for transactional use cases like workflows, sagas, AI agents, state machines, and microservice orchestration.
Restate implements an architecture that shares similarities with stream processing, but also takes a very different approach at many layers. Restate sacrifices some of Flink's throughput to achieve very low latency transactions, high availability, serverless deployments, more flexible programming abstraction, more flexible state and queries, and polyglot APIs.
In this talk we contrast durable execution vs. stream processing, their use cases, and how the architecture evolved from Flink over Stateful Functions to Restate.
Recommendation systems are an essential part of most e-commerce industries, often responsible for a significant portion of revenue. However, every branch of this industry has its own set of exceptions and challenges that affect how recommender systems have to be designed. In airlines, these exceptions become extreme as returning visitors become sparse, many purchases are anonymous, and items, such as flight tickets, can be sold at different prices depending on the circumstances. To overcome these challenges, we propose a simple method that utilizes information collected about users and items, omitting the need for extracting user/item embeddings with matrix factorization. Additionally, we will talk about how we used a Feature Store as a foundation for this project and why it could be beneficial to implement it in your Data Science team as well.
Exploration how our system uses a graph-based approach to store transactions and enhance fraud controls with advanced features, boosting the effectiveness of both ML models and static rules. Presentation of key components of the system, including a real-time feature computation service optimized for low latency, a visualization tool for network analysis, and a mechanism for historical feature reconstruction.
The evolution of AI has seen a remarkable transition from standalone language models (LLMs) to compound systems integrating diverse functionalities, culminating in the rise of agentic AI. This talk traces the journey of AI systems, exploring how agentic AI enables autonomous reasoning, planning, and action, making it a pivotal development in solving complex, dynamic problems.
We will dive into the principles of agentic AI, discussing how it works and why it is essential for creating adaptive, task-oriented solutions. The session will then introduce **CrewAI**, an open-source Python package that simplifies the development of intelligent agents. Through a practical use case, participants will learn how to implement their first agent with CrewAI, gaining hands-on insights into leveraging this powerful tool to unlock new possibilities in AI-driven applications.
Over the past 8 years at Uber, I have worked on different aspects of building and scaling Uber's Data Infrastructure (batch and realtime) and gotten to look at microservices and storage systems security in the past 2 years.
This talk aims at enabling security for entire Uber's data infrastructure - a lot of the work which was done over the course of my tenure at Uber. Internally we rely heavily on open source systems for data infrastructure and security (SPIRE and Kerberos are widely adopted).
Motivations for security efforts at Uber include protecting data from cyberattacks, insider threats and staying up to date with compliance requirements.
I'll delve into our journey at Elisa Polystar in deploying large-scale big data platforms for leading telecom operators like Vodafone, TMobile, and Telia. I'll discuss our unique challenges in handling massive volumes of telecom data and achieving real-time analytics. We'll explore the technical architectures we implemented, the tools and technologies we leveraged (such as Hadoop, Spark, and Kafka), and how we customized solutions to meet the specific needs of different markets. The goal is to share practical insights, lessons learned, and best practices that can help others in the industry successfully implement and scale big data solutions.
Dive into the practical and strategic considerations when choosing between these two approaches for creating effective AI agents. Prompt engineering has risen as a fast, adaptable, and low-cost way to harness the capabilities of LLMs. However, its performance often correlates directly with the size of the model - larger, more costly models are required to achieve the desired results. This trade-off raises questions about scalability and cost-efficiency, especially for organisations with resource constraints.
On the other hand, fine-tuning offers a path to tailor models for domain-specific tasks or nuanced interactions, delivering consistent performance even with smaller models. While it demands more resources upfront, fine-tuned solutions can lead to significant long-term savings by reducing reliance on oversized models:
The purpose of presentation is to demonstrate possibilities of Generative AI in the context of business intelligence. I'll focus on Copilot, AI Skills in MS Fabric, and AI/BI Genie in Databricks. The presentation will contain description of each tools, comparison, and the demo at the end. I'll present how to prepare data model and data to make it available for Gen AI. I'll demonstrate how these tools can support data democratization in an organization.
I'm also considering to include Conversation Analytics from GCP
Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.
There will be one roundtable sessions, hence every conference participants can take part in 1 discussion.
This roundtable will explore the pros and cons of using managed APIs (such as OpenAI GPT, Anthropic Claude, or AWS Bedrock) versus self-hosted open LLMs (e.g., Bielik or Llama). We’ll delve into the trade-offs between these two approaches, covering critical factors such as cost, scalability, performance, and control over data. Key discussion points include:
This session is designed for technical experts and decision-makers to exchange experiences, insights, and predictions about the evolving landscape of LLMs. Whether you’re actively using one of these approaches or still considering your options, this presentation/discussion will offer valuable perspectives to help you make informed decisions.
This presentation is about our IQVIA Data Transformation & Engineering department approach for configuring Snowflake, specifically but not limited to the areas of security (network policies / storage policies / password policies / single-sign-on configuration) & database management (new databases / warehouses / roles & grants).
In many cases companies tend to hire admins who manage these ad-hoc, with best backup being notepad audit trace of what they run. Configuration in such case differs per user and inconsistencies are piling up. Some are smarter and implement some Git-based solutions, like Terraform. Tools like Terraform typically have SnowFlake plugin to manage these all, but they lack templating, are always behind the latest SnowFlake SQL extensions and do not really address self-service (ad-hoc or UI-based) management needs.
In IQVIA DTE we came to the fairly good compromise between various security / auditing needs, leaving space to both automation, enforcement & self-service where appropriate, yet coming up with a very neat & simple solution which I would like to present.
The goal is pretty much how it is described in the official/public description for the roundtable i.e.:
GenAI has been with us for a while - enough for a number of actual systems to be deployed into production. Moving beyond the initial hype about it, this roundtable will tackle the real challenges and best practices of running GenAI systems in real-world, related to:
We invite both engineering and business leaders, architects, and AI practitioners that already deployed GenAI systems to production, as well the ones that have to yet turn their proofs-of concepts into serious deployments and would like to get to know how to do so - but the knowledge of the basics of such systems and aspects around them are required.
Essentially - we would like to gather a group of really component people for discussion, exchange of knowledge, and establishing best practices for GenAI systems.
Join us for a dynamic discussion that begins with insights from a gdpr implementation journey to then expand on broader data governance topics. We will see how privacy compliance can inform and enhance general data management practices: going beyond legal obligations to create an opportunity for upgrading and optimizing data infrastructure and operations, leading to better overall data governance.
Key discussion points will include:
This session welcomes privacy officers, data engineers, architects, and technology leaders at any stage of their gdpr or data governance implementation journey. Come ready to share your experiences, challenges, and successes in balancing compliance needs with operational efficiency. Compliance is not only about avoiding fines, but can also be an opportunity for spreading good design and best practices.
This is a discussion about the transformation from traditional data engineering to a data platform-oriented approach. As organizations scale their data operations to serve hundreds or thousands of users, the conventional centralized data engineering model faces significant challenges. In this roundtable, we'll explore and share experiences about critical shifts in both mindset and technology that enable this evolution.
Key discussion points will include:
This session is ideal for data leaders, engineers, and practitioners who are facing or have overcome similar challenges in their organizations. At the end of this roundtable, you’ll hopefully be able to get an idea how to start a data platform team in your organization and get inspired by work done by your peers.
The exchange of sensitive data across systems, both within and between organizations, is often constrained by legal documents, such as privacy policies. Verifying the legitimacy of data access requests can be labor-intensive, leading to delays or, worse, undocumented and unauthorized access. As organizations adopt data mesh and face increasing complexity in data protection requirements, the intersection of GenAI and data governance presents exciting opportunities and challenges.
Key discussion points will include:
This session welcomes data governance professionals, architects, legal experts, and technology leaders to share their perspectives on automating data governance and data quality processes, or building data products. This roundtable provides an opportunity to network with peers and discuss emerging patterns at the intersection of AI, data mesh, and governance. Together, we'll explore how these technologies can create more efficient and reliable data governance processes while ensuring compliance with legal and organizational requirements.
Join us for a discussion about designing and implementing real-time analytics platforms and solutions, with a focus on processing user clickstream and interaction data at scale.
Key discussion points will include:
This session is ideal for data engineers, architects, and technical leaders who are either currently working with real-time analytics or planning to implement such systems. Whether you're dealing with clickstream data, financial transactions, or even IoT sensors, or other streaming use cases, you should find this discussion valuable.
Join our roundtable on MLOps pipelines, where we talk about the practical challenges of building and managing ML workflows. We'll discuss issues like handling complex dependencies and selecting effective tools, sharing examples from real projects along the way. This session is for engineers looking to exchange straightforward, no-nonsense insights on optimizing ML pipelines.
On 9th of April at 19:00 we invite all participants of the DATA & AI WARSAW TECH SUMMIT 2025 conference to the evening meeting, which will be an opportunity to get to know each other, exchange experiences and business talks.
Pobieranie biorgamu... Proszę czekać...
This presentation delves into creating a real-time analytics platform by leveraging cost-effective Change Data Capture (CDC) tools like Debezium for seamless data ingestion from sources such as Oracle into Kafka. We’ll explore how to build a resilient data lake and data mesh architecture using Apache Flink, ensuring data loss prevention, point-in-time recovery, and robust schema evolution to support agile data integration. Participants will learn best practices for establishing a scalable, real-time data pipeline that balances performance, reliability, and flexibility, enabling efficient analytics and decision-making.
I want to show you the principles behind the most popular DBMS, enumerating non-trivial use cases like self-containing dynamic reports, WebAssembly support and pure serverless DB hosting. While SQLite works well with aggregated datasets, DuckDB, its younger cousin, focuses on full OLAP support, allowing processing gigabytes of data in no time on low-end boxes (or even laptops). We will browse various useful features and interfaces in DuckDB, emphasizing scenarios that anyone can implement in their daily work.
In this workshop, we’ll introduce the key components of a multitier architecture designed to scale and streamline LLM productization at Team Internet—a global leader in online presence and advertising, serving millions of customers worldwide. For us, scalability and speed are critical to delivering high-performance services, including LLM applications.
Through hands-on coding exercises and real-world use cases from the domain name industry, we'll demonstrate how standardization enhances flexibility and accelerates development.
By the end of the session, you’ll know the keybuilding blocks that will help you efficiently building and scaling LLM applications in production.
Pobieranie biorgamu... Proszę czekać...