LinkedIn aims to create the world’s most trusted professional community, empowering individuals, and organizations to achieve success and realize economic opportunity. A key pillar of this vision is ensuring the authenticity of members on the platform, such as through robust identity verification. Equally important is fostering a safe, professional, and trustworthy environment for member interactions. This includes ensuring that content in members’ feeds is appropriate, aligned with their interests, and consistent with our community policies while following member controls.
Generative AI is revolutionizing the state of the art in content understanding, complementing, and enhancing both traditional AI models and human labeling efforts. This innovation significantly improves the quality and scalability of decision-making processes. This talk will provide an overview of LinkedIn’s trust and safety features, with a special focus on our application of Generative AI to advance these initiatives.
Context engineering is the missing link in realizing the full potential of your AI strategy. This presentation explores how advanced techniques in context management can deliver enterprise value through effective use of Retrieval-Augmented Generation (RAG). Discover how to unlock the full potential of your data catalog investments and drive more intelligent, context-aware AI applications to transform data conversations.
At King, we ingest and analyze around a trillion of game events weekly to enhance gameplay and deliver player experiences to over two hundred million monthly active users. These game events enable us to improve our games via game observability, features delivery, and machine learning.
However, collection and storage of these game events bring a variety of challenges due to its sheer volume and high speed.
To address the challenges, we have developed an in-house product called Kingestor, which allows us to ingest our game events in an effective manner. It processes game events in near real-time with just a ten-minute latency, loading approximately five million events per second. Kingestor ensures data integrity through event reconciliation and deduplication, providing accurate, real-time insights for both business and technical applications. It is a scalable and adaptable product, which is designed for use across the gaming industry, making it easy to implement for businesses handling large-scale data.
Methods to manage data and its authorized usage in Google Cloud Platform, including access control through data governance and enforced security measures:
Best practices like naming conventions, proper data asset descriptions, or ownership tagging, are a must have to ensure proper governance across your data landscape. Yet still they often require manual effort and a trained, knowledgeable user base to put in place, unfortunately leading to those practices not being followed in the majority of cases. The only way to ensure those rules are followed to the letter, is by baking them in as defaults into the user experience of your data platform. HelloFresh is the world's leading meal kit company and global integrated food solutions group, shipping over a billion meals to customers per year. This talk will present HelloFresh's in-house, low-code, config-driven data engineering framework that was constructed to offer data governance and best practices out of the box. You will learn about the architecture around its open source components, and get a demonstration of the user experience designed to enable even less technical data practitioners.
Our presentation will cover the development and adoption of a private AI chat platform within our company. We will discuss the technical architecture, including the integration of multiple large language models (LLMs) and chat engines from various providers. A significant part of the presentation will focus on our user adoption strategy, which includes conducting workshops and identifying internal advocates to promote the technology. We will also highlight how we use this platform as a unified interface for testing new AI experiments, which streamlines development and user feedback. Additionally, we will share our experience in creating separate Python packages for chat engines to ensure scalability and reusability. The aim is to provide a comprehensive view of both the technical and human aspects of implementing AI solutions in a corporate environment.
What makes a great meme? Is it the template? The reference to recent events? Or perhaps sheer luck? Using the image embedding pipeline with the refined Vision Transformer model by Google, we explore the memesphere (yes, it's a word) of Reddit, and it's most popular meme subreddit: r/memes. We brew a recipe for the best memes, by analyzing the upvotes and comments statistics. We determine the most similar memes in terms of content and graphics to establish relations and form clusters segregated by meme templates. Finally, we answer the world-shaking question: What was the best meme of last year?
The future of Artificial Intelligence in software testing: its transformative impact within the telecommunications industry. As AI continues to evolve, organizations are leveraging intelligent solutions to optimize testing processes, enhance product quality, and reduce time-to-market. Drawing from real-world use cases from Ericsson AB, this presentation will dive into how AI is revolutionizing testing methodologies, addressing challenges in AI deployment, and setting the stage for the next leap in testing innovation. Attendees will gain actionable insights into integrating AI into testing pipelines, handling the complexities of large-scale deployments, and overcoming the challenges that come with AI adoption in the software testing space.
We want to showcase that successful implementation of AI solutions (open-weights semantic search for customer care) doesn’t have to be costly and can deliver solid ROI - contrary to the growing sentiment in the media as we seem to be entering the trough of disillusionment (as per Gartner’s hype cycle terminology). In our internal tech talks, our peers were particularly interested in some of the thinking process behind certain decisions that we made. For instance they wanted to know how we evaluated the most suitable model from the MTEB leaderboard, how we organized the embeddings to fit our knowledge base, why we chose this and not a different vector store and more. Interestingly, this implementation and heavy internal promotion among non-tech folks spurred an avalanche of ideas from other departments. We want to spread the experience and believe there’s no better place to do this than at the tech-heavy BigData Technology Warsaw Summit.
As organizations increasingly rely on data to drive decision-making, the ability to quickly access and analyze data has become essential. However, SQL querying remains a technical hurdle for many, slowing down workflows and limiting data access to specialized team members. This session will demonstrate how LLM-based Text-to-SQL tools can close this gap, enabling non-technical team members—such as product managers and business analysts—to generate SQL queries using natural language. In this talk will explore how to implement Text-to-SQL solutions for databases, discuss strategies to improve query accuracy through prompt engineering techniques, and optimize results by incorporating database structure representations. Attendees will walk away with actionable insights on how to empower their teams, streamline query development, and increase overall productivity.
Recommendation systems are an essential part of most e-commerce industries, often responsible for a significant portion of revenue. However, every branch of this industry has its own set of exceptions and challenges that affect how recommender systems have to be designed. In airlines, these exceptions become extreme as returning visitors become sparse, many purchases are anonymous, and items, such as flight tickets, can be sold at different prices depending on the circumstances. To overcome these challenges, we propose a simple method that utilizes information collected about users and items, omitting the need for extracting user/item embeddings with matrix factorization. Additionally, we will talk about how we used a Feature Store as a foundation for this project and why it could be beneficial to implement it in your Data Science team as well.
Exploration how our system uses a graph-based approach to store transactions and enhance fraud controls with advanced features, boosting the effectiveness of both ML models and static rules. Presentation of key components of the system, including a real-time feature computation service optimized for low latency, a visualization tool for network analysis, and a mechanism for historical feature reconstruction.
The evolution of AI has seen a remarkable transition from standalone language models (LLMs) to compound systems integrating diverse functionalities, culminating in the rise of agentic AI. This talk traces the journey of AI systems, exploring how agentic AI enables autonomous reasoning, planning, and action, making it a pivotal development in solving complex, dynamic problems.
We will dive into the principles of agentic AI, discussing how it works and why it is essential for creating adaptive, task-oriented solutions. The session will then introduce **CrewAI**, an open-source Python package that simplifies the development of intelligent agents. Through a practical use case, participants will learn how to implement their first agent with CrewAI, gaining hands-on insights into leveraging this powerful tool to unlock new possibilities in AI-driven applications.
Fast Deliveries, a freight forwarding company, faced a challenge when several of their forwarders violated non-competition agreements by working for their competitor, WHILE (names anonymized), and transferring clients during this period. The investigation centered on analyzing data from Trans.eu - a major European logistics platform where freight forwarders post and accept transportation orders, essentially serving as a digital marketplace crucial for day-to-day logistics operations.
Our team was tasked with developing a methodology to identify these fake accounts and connect them to specific former employees. Combining network analysis, geographic patterns, and temporal data allowed us to identify suspicious accounts with high confidence.
Attendees will learn about the specific investigation and gain insights into analytical techniques they can apply to their own data challenges. We will demonstrate our methodology using anonymized data from the actual investigation.
Dive into the practical and strategic considerations when choosing between these two approaches for creating effective AI agents. Prompt engineering has risen as a fast, adaptable, and low-cost way to harness the capabilities of LLMs. However, its performance often correlates directly with the size of the model - larger, more costly models are required to achieve the desired results. This trade-off raises questions about scalability and cost-efficiency, especially for organisations with resource constraints.
On the other hand, fine-tuning offers a path to tailor models for domain-specific tasks or nuanced interactions, delivering consistent performance even with smaller models. While it demands more resources upfront, fine-tuned solutions can lead to significant long-term savings by reducing reliance on oversized models:
The purpose of presentation is to demonstrate possibilities of Generative AI in the context of business intelligence. I'll focus on Copilot, AI Skills in MS Fabric, and AI/BI Genie in Databricks. The presentation will contain description of each tools, comparison, and the demo at the end. I'll present how to prepare data model and data to make it available for Gen AI. I'll demonstrate how these tools can support data democratization in an organization.
I'm also considering to include Conversation Analytics from GCP
Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.
There will be one roundtable sessions, hence every conference participants can take part in 1 discussion.
This roundtable will explore the pros and cons of using managed APIs (such as OpenAI GPT, Anthropic Claude, or AWS Bedrock) versus self-hosted open LLMs (e.g., Bielik or Llama). We’ll delve into the trade-offs between these two approaches, covering critical factors such as cost, scalability, performance, and control over data. Key discussion points include:
This session is designed for technical experts and decision-makers to exchange experiences, insights, and predictions about the evolving landscape of LLMs. Whether you’re actively using one of these approaches or still considering your options, this presentation/discussion will offer valuable perspectives to help you make informed decisions.
This presentation is about our IQVIA Data Transformation & Engineering department approach for configuring Snowflake, specifically but not limited to the areas of security (network policies / storage policies / password policies / single-sign-on configuration) & database management (new databases / warehouses / roles & grants).
In many cases companies tend to hire admins who manage these ad-hoc, with best backup being notepad audit trace of what they run. Configuration in such case differs per user and inconsistencies are piling up. Some are smarter and implement some Git-based solutions, like Terraform. Tools like Terraform typically have SnowFlake plugin to manage these all, but they lack templating, are always behind the latest SnowFlake SQL extensions and do not really address self-service (ad-hoc or UI-based) management needs.
In IQVIA DTE we came to the fairly good compromise between various security / auditing needs, leaving space to both automation, enforcement & self-service where appropriate, yet coming up with a very neat & simple solution which I would like to present.
The goal is pretty much how it is described in the official/public description for the roundtable i.e.:
GenAI has been with us for a while - enough for a number of actual systems to be deployed into production. Moving beyond the initial hype about it, this roundtable will tackle the real challenges and best practices of running GenAI systems in real-world, related to:
We invite both engineering and business leaders, architects, and AI practitioners that already deployed GenAI systems to production, as well the ones that have to yet turn their proofs-of concepts into serious deployments and would like to get to know how to do so - but the knowledge of the basics of such systems and aspects around them are required.
Essentially - we would like to gather a group of really component people for discussion, exchange of knowledge, and establishing best practices for GenAI systems.
Join us for a dynamic discussion that begins with insights from a gdpr implementation journey to then expand on broader data governance topics. We will see how privacy compliance can inform and enhance general data management practices: going beyond legal obligations to create an opportunity for upgrading and optimizing data infrastructure and operations, leading to better overall data governance.
Key discussion points will include:
This session welcomes privacy officers, data engineers, architects, and technology leaders at any stage of their gdpr or data governance implementation journey. Come ready to share your experiences, challenges, and successes in balancing compliance needs with operational efficiency. Compliance is not only about avoiding fines, but can also be an opportunity for spreading good design and best practices.
This is a discussion about the transformation from traditional data engineering to a data platform-oriented approach. As organizations scale their data operations to serve hundreds or thousands of users, the conventional centralized data engineering model faces significant challenges. In this roundtable, we'll explore and share experiences about critical shifts in both mindset and technology that enable this evolution.
Key discussion points will include:
This session is ideal for data leaders, engineers, and practitioners who are facing or have overcome similar challenges in their organizations. At the end of this roundtable, you’ll hopefully be able to get an idea how to start a data platform team in your organization and get inspired by work done by your peers.
The exchange of sensitive data across systems, both within and between organizations, is often constrained by legal documents, such as privacy policies. Verifying the legitimacy of data access requests can be labor-intensive, leading to delays or, worse, undocumented and unauthorized access. As organizations adopt data mesh and face increasing complexity in data protection requirements, the intersection of GenAI and data governance presents exciting opportunities and challenges.
Key discussion points will include:
This session welcomes data governance professionals, architects, legal experts, and technology leaders to share their perspectives on automating data governance and data quality processes, or building data products. This roundtable provides an opportunity to network with peers and discuss emerging patterns at the intersection of AI, data mesh, and governance. Together, we'll explore how these technologies can create more efficient and reliable data governance processes while ensuring compliance with legal and organizational requirements.
Join us for a discussion about designing and implementing real-time analytics platforms and solutions, with a focus on processing user clickstream and interaction data at scale.
Key discussion points will include:
This session is ideal for data engineers, architects, and technical leaders who are either currently working with real-time analytics or planning to implement such systems. Whether you're dealing with clickstream data, financial transactions, or even IoT sensors, or other streaming use cases, you should find this discussion valuable.
I've been in the trenches dealing with messy ML pipelines that consume more time than they should. Through hands-on experience, I found practical ways to simplify pipeline orchestration using Flyte. In this session, I will give you a crash course in building ML pipelines - how and where to start and how to scale it up later, while dealing with all the nasty problems that you will encounter on the road.
Pobieranie biorgamu... Proszę czekać...
This presentation delves into creating a real-time analytics platform by leveraging cost-effective Change Data Capture (CDC) tools like Debezium for seamless data ingestion from sources such as Oracle into Kafka. We’ll explore how to build a resilient data lake and data mesh architecture using Apache Flink, ensuring data loss prevention, point-in-time recovery, and robust schema evolution to support agile data integration. Participants will learn best practices for establishing a scalable, real-time data pipeline that balances performance, reliability, and flexibility, enabling efficient analytics and decision-making.
Pobieranie biorgamu... Proszę czekać...