Apache Spark and Databricks performance Tuning

APRIL 3-4, 2025 | VILNIUS, LITHUANIA

Are some technical aspects of Apache Spark tricky? Do you need help with performance or trouble shooting? We’ll highlight the nitty gritty details beyond the SQL. In a digestible manner. All to help you resolve your top Apache Spark issues and get the most out of your ecosystem. We’ll share the top takeaways on avoiding failure from our longstanding experience with Spark. Skewed data, Cartesian join, executor fine-tuning -we will cover all of it.

WHO IS THIS COURSE FOR?

This workshop is for Big Data developers who work with Spark Jobs and want to learn new ways of optimizing the data jobs to see how Spark is performing its work under the hood.

WHAT YOU NEED TO PREPARE?

You must bring a laptop with an e-mail access and a working up-to-date web browser. The workshop will take place on a Web environment from Databricks – no development environment, SDKs, or libraries will be needed on your machine.

WHAT WILL YOU GET FROM THIS COURSE?

This workshop will show you the approach to Spark performance problems. You gain necessary experience develop an intuition on how your code be translated and executed in cluster what ways its can improved.

WHO IS THIS COURSE FOR?

This workshop is for Big Data developers who work with Spark Jobs and want to learn new ways of optimizing the data jobs to see how Spark is performing its work under the hood.

WHAT YOU NEED TO PREPARE?

You must bring a laptop with an e-mail access and a working up-to-date web browser. The workshop will take place on a Web environment from Databricks – no development environment, SDKs, or libraries will be needed on your machine.

WHAT WILL YOU GET FROM THIS COURSE?

This workshop will show you the approach to Spark performance problems. You gain necessary experience develop an intuition on how your code be translated and executed in cluster what ways its can improved.

Agenda

INTRODUCTION

  1. When to optimize, when not to?
  2. The Databricks environment – our free playground

THE BASICS - WHEN TO USE IT, WHEN NOT TO

  1. Broadcast
  2. Cache
  3. Catalyst

LOADING DATA INTO SPARK

Numerous input files challenge

WINDOWING FUNCTIONS - HOW TO GROUP THE DATA EFFICIENTY

  1. Working with Spark 2
  2. AQE and its gotchas

JOINS - THE SHUFFLE WE CANNOT AVOID

  1. Balancing skewed data
  2. Full cartesian join – when and why does it happen
  3. Joining intervals – ways to do it right

PARAMETRISED SPARK JOBS AND QUERIES

Avoiding non-obvious bottlenecks

COMPUTATIONS OUTSIDE THE ENGINE - IS IT WORTH REINVENTING THE WHEEL?

  1. Spark & UDFs
  2. PySpark & UDFs

SUMMARY

Summarizing course 

Agenda

INTRODUCTION

  1. When to optimize, when not to?
  2. The Databricks environment – our free playground

THE BASICS – WHEN TO USE, WHEN NOT TO 

  1. Broadcast
  2. Cache
  3. Catalyst

LOADING DATA INTO SPARK

  1. Numerous input files challenge

WINDOWING FUNCTIONS – HOW TO GROUP THE DATA EFFICIENTY

  1. Working with Spark 2
  2. AQE and its gotchas

JOINS – THE SGUFFLE WE CANNOT AVOID

  1. Balancing skewed data
  2. Full cartesian join – when and why does it happen
  3. Joining intervals – ways to do it right

PARAMETRISED SPARK JOBS AND QUERIES

  1. Avoiding non-obvious bottlenecks

COMPUTATIONS OUTSIDE THE ENGINE – IS IT WORTH REINVENTING THE WHEEL?

  1. Spark & UDFs
  2. PySpark & UDFs

SUMMARY

Get your ticket now

Choose this 2-day workshop, or embark on a comprehensive learning journey by selecting the full Introduction to Databricks and Apache Spark and Databricks Performance Tuning course package, held from March 31 to April 4. This complete program offers a seamless progression from foundational concepts to advanced optimization techniques. Learn more about the Introduction to Databricks course HERE. For any questions, contact us at tickets@bigdataconference.eu or call +370 618 00999.

X

Ticket Information

In order to provide an invoice or a Proforma invoice for Full ticket, we would be grateful to have this information provided by email at tickets@bigdataconference.eu:

  • Company details (Registration code, VAT, Address)
  • Type of ticket (Full Ticket)
  • Number of tickets
  • Email of the attendee(s)
  • Workshop title

If you have any other questions, please call +370 695 65000.

X

Ticket Information

To get the Proforma invoice issued, please choose Proforma invoice in Payment type field in Paysera.

If you have any other questions, please call +370 695 65000.

WORKSHOP STARTS IN:

Day(s)

:

Hour(s)

:

Minute(s)

:

Second(s)

2 DAY WORKSHOP

TICKET

APACHE SPARK AND DATABRICKS PERFORMANCE TUNING

1999 €/ excl. VAT

5 DAY WORKSHOP

TICKET

INTRODUCTION TO DATABRICKS + APACHE SPARK AND DATABRICKS PERFORMANCE TUNING

3999 €/ excl. VAT

meet the speaker

Marcin Szymaniuk is the CEO and Senior Data Engineer at TantusData, as well as an internationally recognized conference speaker. With over two decades of experience in helping clients monetize big data, Marcin leads a team of expert data engineers specializing in Data Engineering, Machine Learning (ML), ML-Ops, and Cloud technologies. He excels in solving both complex, unconventional challenges and more routine problems that require fast and efficient solutions. Marcin’s extensive experience spans various industries and project scales, with a particular focus on AI, ML, and deployment. His speaking engagements have included notable events such as Infoshare, J On the Beach, Devoxx, Huawei Eco-Connect Poland 2023, Berlin Buzzwords, Codestar, GeeCON, and Java Day Istanbul.

 

meet the speaker

Marcin Szymaniuk is the CEO and Senior Data Engineer at TantusData, as well as an internationally recognized conference speaker. With over two decades of experience in helping clients monetize big data, Marcin leads a team of expert data engineers specializing in Data Engineering, Machine Learning (ML), ML-Ops, and Cloud technologies. He excels in solving both complex, unconventional challenges and more routine problems that require fast and efficient solutions. Marcin’s extensive experience spans various industries and project scales, with a particular focus on AI, ML, and deployment. His speaking engagements have included notable events such as Infoshare, J On the Beach, Devoxx, Huawei Eco-Connect Poland 2023, Berlin Buzzwords, Codestar, GeeCON, and Java Day Istanbul.

 

workshop venue