Apache Spark and Databricks performance Tuning
APRIL 3-4, 2025 | VILNIUS, LITHUANIA
Are some technical aspects of Apache Spark tricky? Do you need help with performance or trouble shooting? We’ll highlight the nitty gritty details beyond the SQL. In a digestible manner. All to help you resolve your top Apache Spark issues and get the most out of your ecosystem. We’ll share the top takeaways on avoiding failure from our longstanding experience with Spark. Skewed Data, Cartesian join, Executor fine-tuning -we will cover all of it.
WHO IS THIS COURSE FOR?
This workshop is for Big Data developers who work with Spark Jobs and want to learn new ways of optimizing the Data jobs to see how Spark is performing its work under the hood.
WHAT YOU NEED TO PREPARE?
You must bring a laptop with an e-mail access and a working up-to-date web browser. The workshop will take place on a Web environment from Databricks – no development environment, SDKs, or libraries will be needed on your machine.
WHAT WILL YOU GET FROM THIS COURSE?
This workshop will show you the approach to Spark performance problems. You gain necessary experience develop an intuition on how your code be translated and executed in cluster what ways its can improved.
WHO IS THIS COURSE FOR?
This workshop is for Big Data developers who work with Spark Jobs and want to learn new ways of optimizing the Data jobs to see how Spark is performing its work under the hood.
WHAT YOU NEED TO PREPARE?
You must bring a laptop with an e-mail access and a working up-to-date web browser. The workshop will take place on a Web environment from Databricks – no development environment, SDKs, or libraries will be needed on your machine.
WHAT WILL YOU GET FROM THIS COURSE?
This workshop will show you the approach to Spark performance problems. You gain necessary experience develop an intuition on how your code be translated and executed in cluster what ways its can improved.
Agenda
INTRODUCTION
- When to optimize, when not to?
- The Databricks environment – our free playground
THE BASICS - WHEN TO USE IT, WHEN NOT TO
- Broadcast
- Cache
- Catalyst
LOADING DATA INTO SPARK
Numerous input files challenge
WINDOWING FUNCTIONS - HOW TO GROUP THE DATA EFFICIENTY
- Working with Spark 2
- AQE and its gotchas
JOINS - THE SHUFFLE WE CANNOT AVOID
- Balancing skewed Data
- Full cartesian join – when and why does it happen
- Joining intervals – ways to do it right
PARAMETRISED SPARK JOBS AND QUERIES
Avoiding non-obvious bottlenecks
COMPUTATIONS OUTSIDE THE ENGINE - IS IT WORTH REINVENTING THE WHEEL?
- Spark & UDFs
- PySpark & UDFs
SUMMARY
Summarizing course
Agenda
INTRODUCTION
- When to optimize, when not to?
- The Databricks environment – our free playground
THE BASICS – WHEN TO USE, WHEN NOT TO
- Broadcast
- Cache
- Catalyst
LOADING DATA INTO SPARK
- Numerous input files challenge
WINDOWING FUNCTIONS – HOW TO GROUP THE DATA EFFICIENTY
- Working with Spark 2
- AQE and its gotchas
JOINS – THE SGUFFLE WE CANNOT AVOID
- Balancing skewed data
- Full cartesian join – when and why does it happen
- Joining intervals – ways to do it right
PARAMETRISED SPARK JOBS AND QUERIES
- Avoiding non-obvious bottlenecks
COMPUTATIONS OUTSIDE THE ENGINE – IS IT WORTH REINVENTING THE WHEEL?
- Spark & UDFs
- PySpark & UDFs
SUMMARY
Get your ticket now
Choose this 2-day workshop, or embark on a comprehensive learning journey by selecting the full Introduction to Databricks and Apache Spark and Databricks Performance Tuning course package, held from March 31 to April 4. This complete program offers a seamless progression from foundational concepts to advanced optimization techniques. Learn more about the Introduction to Databricks course HERE. For any questions, contact us at tickets@bigdataconference.eu or call +370 618 00999.
Ticket Information
In order to provide an invoice or a Proforma invoice for Full ticket, we would be grateful to have this information provided by email at tickets@bigdataconference.eu:
- Company details (Registration code, VAT, Address)
- Type of ticket (Full Ticket)
- Number of tickets
- Email of the attendee(s)
- Workshop title
If you have any other questions, please call +370 695 65000.
Ticket Information
To get the Proforma invoice issued, please choose Proforma invoice in Payment type field in Paysera.
If you have any other questions, please call +370 695 65000.
WORKSHOP STARTS IN:
Day(s)
:
Hour(s)
:
Minute(s)
:
Second(s)
2 DAY WORKSHOP
TICKET
APACHE SPARK AND DATABRICKS PERFORMANCE TUNING
1999 €/ excl. VAT
5 DAY WORKSHOP
TICKET
INTRODUCTION TO DATABRICKS + APACHE SPARK AND DATABRICKS PERFORMANCE TUNING
3999 €/ excl. VAT
meet the speaker
Marcin Szymaniuk is the CEO and Senior Data Engineer at TantusData, as well as an internationally recognized conference speaker. With over two decades of experience in helping clients monetize Big Data, Marcin leads a team of expert Data engineers specializing in Data Engineering, Machine Learning (ML), ML-Ops, and Cloud technologies. He excels in solving both complex, unconventional challenges and more routine problems that require fast and efficient solutions. Marcin’s extensive experience spans various industries and project scales, with a particular focus on AI, ML, and deployment. His speaking engagements have included notable events such as Infoshare, J On the Beach, Devoxx, Huawei Eco-Connect Poland 2023, Berlin Buzzwords, Codestar, GeeCON, and Java Day Istanbul.
meet the speaker
Marcin Szymaniuk is the CEO and Senior Data Engineer at TantusData, as well as an internationally recognized conference speaker. With over two decades of experience in helping clients monetize Big Data, Marcin leads a team of expert Data engineers specializing in Data Engineering, Machine Learning (ML), ML-Ops, and Cloud technologies. He excels in solving both complex, unconventional challenges and more routine problems that require fast and efficient solutions. Marcin’s extensive experience spans various industries and project scales, with a particular focus on AI, ML, and deployment. His speaking engagements have included notable events such as Infoshare, J On the Beach, Devoxx, Huawei Eco-Connect Poland 2023, Berlin Buzzwords, Codestar, GeeCON, and Java Day Istanbul.