BIG DATA CONFERENCE
EUROPE 2024
November 19-22
Onsite & Online
Lidor Gerstel
DEVOPS-Cloud Architect
Centerity, Israel
Biography
DevOps Team Leader & Experienced Trainer with a demonstrated history of leading CI/CD Projects & Big Data Project in the Industry Skilled in Kubernetes, Hadoop, AWS, Docker, AWS and Jenkins, Certified in AWS Solution Architect.
Workshop
Spark and HADOOP
Abstract
The Workshop will cover basic concepts of Hadoop and mostly in The Cloudera stack, like using HBase & Impala to query data, using Spark to stream data, afterwards we will launch a Cloudera quickstart, using datasets of top-rated movies in the workshops, getting the data analyzed and queried with Hadoop, explaining & demonstrating Map Reduce Concepts, RDD Partition on Spark.
Agenda
- Part 1: Introduction to Hadoop and Map Reduce :
- Hadoop Distributers
- Hadoop Vs Traditional Data Storage
- Working with HDFS
- Basic commands
- Architecture
- Part 2: Hive and HBase:
- HiveQL
- Hive Data types
- HBase data model
- HBase vs RDBMS
- Client API and REST
- Part 3: Apache Spark ( PySpark):
- Basics and RDD
- Caching & Modules
- Spark Streaming
- Spark SQL
The main Goal is to really Understand what big data is , how to ingest data , main concepts for Hadoop Data warehouse , and utilize & stream Spark with Big Data
entry Level in Big Data, DBA’s , BI Engineers, familiarity in Open Source Systems
docker run –hostname=quickstart.cloudera –privileged=true -t -i -p 8888 -p 80 -p 7180 -d <Name of the Image> /usr/bin/docker-quickstart