Two years ago, we embarked on building DoorDash's ad platform from the ground up. Today, our platform handles over 2 trillion events every day and our advertising business has experienced significant growth in recent years, becoming a key area of focus for the company. To generate ad metrics and analytics in real-time, we built an ad event tracking, attribution, and analysis pipeline on top of Apache Flink, Apache Kafka, Apache Pinot and our in-house real-time event processing system. This powerful combination enables us to manage a large number of active ad campaigns with reliable ad delivery and timely attribution. It also allows us to share ad metrics with advertisers in real-time.
During this session, we will start by introducing the core concepts of an online advertising system to provide a better understanding of the crucial role played by the ad event processing pipeline. We will then present an overview of our end-to-end pipeline and delve into specific challenges we encountered. These include:
-
The evolution of our core attribution job to address data races and the lessons we learned from it
-
Our approach to ensuring fault tolerance across the entire pipeline
-
We will also share best practices for designing and developing large-scale Flink streaming pipelines for production environments.
This session will provide you with insights and practical knowledge to help you build robust and efficient streaming pipelines for your ad platforms. By attending, you will gain a deeper understanding of the key challenges involved in building a scalable and fault-tolerant ad event processing pipeline, including data ingestion, real-time processing, attribution, and reporting.
Speaker
Chao Chu
Software Engineer @DoorDash
Chao Chu is a backend engineer at DoorDash. He is working at the ads foundation team focusing on ad event pipeline and ad exchange service. Previously, he worked at Morgan Stanley where he helped build the Fixed Income Risk Infrastructure platform using Scala. He is passionate about building large scale distributed systems.