At King, we ingest and analyze around a trillion of game events weekly to enhance gameplay and deliver player experiences to over two hundred million monthly active users. These game events enable us to improve our games via game observability, features delivery, and machine learning.
However, collection and storage of these game events bring a variety of challenges due to its sheer volume and high speed.
To address the challenges, we have developed an in-house product called Kingestor, which allows us to ingest our game events in an effective manner. It processes game events in near real-time with just a ten-minute latency, loading approximately five million events per second. Kingestor ensures data integrity through event reconciliation and deduplication, providing accurate, real-time insights for both business and technical applications. It is a scalable and adaptable product, which is designed for use across the gaming industry, making it easy to implement for businesses handling large-scale data.
Methods to manage data and its authorized usage in Google Cloud Platform, including access control through data governance and enforced security measures:
Best practices like naming conventions, proper data asset descriptions, or ownership tagging, are a must have to ensure proper governance across your data landscape. Yet still they often require manual effort and a trained, knowledgeable user base to put in place, unfortunately leading to those practices not being followed in the majority of cases. The only way to ensure those rules are followed to the letter, is by baking them in as defaults into the user experience of your data platform. HelloFresh is the world's leading meal kit company and global integrated food solutions group, shipping over a billion meals to customers per year. This talk will present HelloFresh's in-house, low-code, config-driven data engineering framework that was constructed to offer data governance and best practices out of the box. You will learn about the architecture around its open source components, and get a demonstration of the user experience designed to enable even less technical data practitioners.
Our presentation will cover the development and adoption of a private AI chat platform within our company. We will discuss the technical architecture, including the integration of multiple large language models (LLMs) and chat engines from various providers. A significant part of the presentation will focus on our user adoption strategy, which includes conducting workshops and identifying internal advocates to promote the technology. We will also highlight how we use this platform as a unified interface for testing new AI experiments, which streamlines development and user feedback. Additionally, we will share our experience in creating separate Python packages for chat engines to ensure scalability and reusability. The aim is to provide a comprehensive view of both the technical and human aspects of implementing AI solutions in a corporate environment.
What makes a great meme? Is it the template? The reference to recent events? Or perhaps sheer luck? Using the image embedding pipeline with the refined Vision Transformer model by Google, we explore the memesphere (yes, it's a word) of Reddit, and it's most popular meme subreddit: r/memes. We brew a recipe for the best memes, by analyzing the upvotes and comments statistics. We determine the most similar memes in terms of content and graphics to establish relations and form clusters segregated by meme templates. Finally, we answer the world-shaking question: What was the best meme of last year?
The future of Artificial Intelligence in software testing: its transformative impact within the telecommunications industry. As AI continues to evolve, organizations are leveraging intelligent solutions to optimize testing processes, enhance product quality, and reduce time-to-market. Drawing from real-world use cases from Ericsson AB, this presentation will dive into how AI is revolutionizing testing methodologies, addressing challenges in AI deployment, and setting the stage for the next leap in testing innovation. Attendees will gain actionable insights into integrating AI into testing pipelines, handling the complexities of large-scale deployments, and overcoming the challenges that come with AI adoption in the software testing space.
We want to showcase that successful implementation of AI solutions (open-weights semantic search for customer care) doesn’t have to be costly and can deliver solid ROI - contrary to the growing sentiment in the media as we seem to be entering the trough of disillusionment (as per Gartner’s hype cycle terminology). In our internal tech talks, our peers were particularly interested in some of the thinking process behind certain decisions that we made. For instance they wanted to know how we evaluated the most suitable model from the MTEB leaderboard, how we organized the embeddings to fit our knowledge base, why we chose this and not a different vector store and more. Interestingly, this implementation and heavy internal promotion among non-tech folks spurred an avalanche of ideas from other departments. We want to spread the experience and believe there’s no better place to do this than at the tech-heavy BigData Technology Warsaw Summit.
As organizations increasingly rely on data to drive decision-making, the ability to quickly access and analyze data has become essential. However, SQL querying remains a technical hurdle for many, slowing down workflows and limiting data access to specialized team members. This session will demonstrate how LLM-based Text-to-SQL tools can close this gap, enabling non-technical team members—such as product managers and business analysts—to generate SQL queries using natural language. In this talk will explore how to implement Text-to-SQL solutions for databases, discuss strategies to improve query accuracy through prompt engineering techniques, and optimize results by incorporating database structure representations. Attendees will walk away with actionable insights on how to empower their teams, streamline query development, and increase overall productivity.
Recommendation systems are an essential part of most e-commerce industries, often responsible for a significant portion of revenue. However, every branch of this industry has its own set of exceptions and challenges that affect how recommender systems have to be designed. In airlines, these exceptions become extreme as returning visitors become sparse, many purchases are anonymous, and items, such as flight tickets, can be sold at different prices depending on the circumstances. To overcome these challenges, we propose a simple method that utilizes information collected about users and items, omitting the need for extracting user/item embeddings with matrix factorization. Additionally, we will talk about how we used a Feature Store as a foundation for this project and why it could be beneficial to implement it in your Data Science team as well.
Exploration how our system uses a graph-based approach to store transactions and enhance fraud controls with advanced features, boosting the effectiveness of both ML models and static rules. Presentation of key components of the system, including a real-time feature computation service optimized for low latency, a visualization tool for network analysis, and a mechanism for historical feature reconstruction.
The evolution of AI has seen a remarkable transition from standalone language models (LLMs) to compound systems integrating diverse functionalities, culminating in the rise of agentic AI. This talk traces the journey of AI systems, exploring how agentic AI enables autonomous reasoning, planning, and action, making it a pivotal development in solving complex, dynamic problems.
We will dive into the principles of agentic AI, discussing how it works and why it is essential for creating adaptive, task-oriented solutions. The session will then introduce **CrewAI**, an open-source Python package that simplifies the development of intelligent agents. Through a practical use case, participants will learn how to implement their first agent with CrewAI, gaining hands-on insights into leveraging this powerful tool to unlock new possibilities in AI-driven applications.
Fast Deliveries, a freight forwarding company, faced a challenge when several of their forwarders violated non-competition agreements by working for their competitor, WHILE (names anonymized), and transferring clients during this period. The investigation centered on analyzing data from Trans.eu - a major European logistics platform where freight forwarders post and accept transportation orders, essentially serving as a digital marketplace crucial for day-to-day logistics operations.
Our team was tasked with developing a methodology to identify these fake accounts and connect them to specific former employees. Combining network analysis, geographic patterns, and temporal data allowed us to identify suspicious accounts with high confidence.
Attendees will learn about the specific investigation and gain insights into analytical techniques they can apply to their own data challenges. We will demonstrate our methodology using anonymized data from the actual investigation.
Dive into the practical and strategic considerations when choosing between these two approaches for creating effective AI agents. Prompt engineering has risen as a fast, adaptable, and low-cost way to harness the capabilities of LLMs. However, its performance often correlates directly with the size of the model - larger, more costly models are required to achieve the desired results. This trade-off raises questions about scalability and cost-efficiency, especially for organisations with resource constraints.
On the other hand, fine-tuning offers a path to tailor models for domain-specific tasks or nuanced interactions, delivering consistent performance even with smaller models. While it demands more resources upfront, fine-tuned solutions can lead to significant long-term savings by reducing reliance on oversized models:
The purpose of presentation is to demonstrate possibilities of Generative AI in the context of business intelligence. I'll focus on Copilot, AI Skills in MS Fabric, and AI/BI Genie in Databricks. The presentation will contain description of each tools, comparison, and the demo at the end. I'll present how to prepare data model and data to make it available for Gen AI. I'll demonstrate how these tools can support data democratization in an organization.
I'm also considering to include Conversation Analytics from GCP
Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.
There will be one roundtable sessions, hence every conference participants can take part in 1 discussion.
This roundtable will explore the pros and cons of using managed APIs (such as OpenAI GPT, Anthropic Claude, or AWS Bedrock) versus self-hosted open LLMs (e.g., Bielik or Llama). We’ll delve into the trade-offs between these two approaches, covering critical factors such as cost, scalability, performance, and control over data. Key discussion points include:
This session is designed for technical experts and decision-makers to exchange experiences, insights, and predictions about the evolving landscape of LLMs. Whether you’re actively using one of these approaches or still considering your options, this presentation/discussion will offer valuable perspectives to help you make informed decisions.
This presentation is about our IQVIA Data Transformation & Engineering department approach for configuring Snowflake, specifically but not limited to the areas of security (network policies / storage policies / password policies / single-sign-on configuration) & database management (new databases / warehouses / roles & grants).
In many cases companies tend to hire admins who manage these ad-hoc, with best backup being notepad audit trace of what they run. Configuration in such case differs per user and inconsistencies are piling up. Some are smarter and implement some Git-based solutions, like Terraform. Tools like Terraform typically have SnowFlake plugin to manage these all, but they lack templating, are always behind the latest SnowFlake SQL extensions and do not really address self-service (ad-hoc or UI-based) management needs.
In IQVIA DTE we came to the fairly good compromise between various security / auditing needs, leaving space to both automation, enforcement & self-service where appropriate, yet coming up with a very neat & simple solution which I would like to present.
The goal is pretty much how it is described in the official/public description for the roundtable i.e.:
GenAI has been with us for a while - enough for a number of actual systems to be deployed into production. Moving beyond the initial hype about it, this roundtable will tackle the real challenges and best practices of running GenAI systems in real-world, related to:
We invite both engineering and business leaders, architects, and AI practitioners that already deployed GenAI systems to production, as well the ones that have to yet turn their proofs-of concepts into serious deployments and would like to get to know how to do so - but the knowledge of the basics of such systems and aspects around them are required.
Essentially - we would like to gather a group of really component people for discussion, exchange of knowledge, and establishing best practices for GenAI systems.
From 2022 to 2024 I have worked on developing the technical pipelines needed to implement GDPR compliance for a big insurance group. This project has 2 aspects that i would share: the first is technical, namely the mode of encryption we have used to mask data while maintaining its usefulness and reference integrity (format preserving encryption). The second aspect is non technical, related to the value of implementing GDPR: besides compliance / risk management, implementing GDPR has been essential to standardising the process of data sharing from prod to dev environments and improving the data observability and documentation. This has eliminated many inconsistencies and shadow work related to data available in dev environment. Overall the message is that compliance is not only about avoiding fines, but can also be an opportunity for spreading good design and best practices.
In 2023, the Data Engineering team @ SumUp adopted a Platform mindset. The plan was great but the journey wasn't as smooth sailing as it should be.
Data users lost trust to our team and started to do workarounds.
In modern data-sharing paradigms like data mesh, it is highly desirable to automate the verification of data access requests. This is typically achieved by formalizing constraints into data contracts using domain-specific languages. However, these constraints originate from legal documents or internal policies. Translating these documents into formal data contracts is a tedious and error-prone process that must be repeated whenever the source documents are updated.
To address this challenge, we have implemented a request checker leveraging large language models as part of an open-source data governance platform (https://github.com/datacontract-manager). Our system evaluates requests based on the relevant policies, the type of data requested, and the context of the request. This way, our platform is able to correctly detect potential data protection violations and to provide correct explanations for rejections.
I would like to share my experience with the architecture I built to implement the clickstream solution. ClickSteam is a data analytics platform designed to track, collect, analyze, and interpret the sequence of clicks or interactions that users make on a website or app. My team built this solution on AWS using the following technologies: GRPC, Kuber, Flink, Airflow, Redis, Redshift, Apache Kafka, etc. All this staff helped us to achieve good results and make this solution possible. Let's dive deep and see how this solution works and what cons and pros we found.
I've been in the trenches dealing with messy ML pipelines that consume more time than they should. Through hands-on experience, I found practical ways to simplify pipeline orchestration using Flyte. In this session, I will give you a crash course in building ML pipelines - how and where to start and how to scale it up later, while dealing with all the nasty problems that you will encounter on the road.
Pobieranie biorgamu... Proszę czekać...
This presentation delves into creating a real-time analytics platform by leveraging cost-effective Change Data Capture (CDC) tools like Debezium for seamless data ingestion from sources such as Oracle into Kafka. We’ll explore how to build a resilient data lake and data mesh architecture using Apache Flink, ensuring data loss prevention, point-in-time recovery, and robust schema evolution to support agile data integration. Participants will learn best practices for establishing a scalable, real-time data pipeline that balances performance, reliability, and flexibility, enabling efficient analytics and decision-making.
Pobieranie biorgamu... Proszę czekać...