27-29 November, Vilnius

Conference about Big Data, High Load, Data Science, Machine Learning & AI

Conference is over. See you next year!


Stripe, USA


Emma Tang is an engineering manager in Data Infrastructure at Stripe where she helps build scalable data pipelines in Scala and Spark. Previous to Stripe, Emma was a lead software engineer at Neustar on the Platform team.


Airflow, Spark, EMR – Building a Batch Data Pipeline

Robust and user friendly data pipelines are at the foundation of powerful analytics, machine learning, and is at the core of allowing companies scale with their data. In this talk, we will walk through how to get started building a batch processing data pipeline end to end using Airflow, Spark on EMR. Through real code and live examples we will explore one of the most popular OSS data pipeline stacks.