BIG DATA CONFERENCE

EUROPE 2024

November 19-22

Onsite & Online

Lidor Gerstel

DEVOPS-Cloud Architect

Centerity, Israel

Biography

DevOps Team Leader & Experienced Trainer with a demonstrated history of leading CI/CD Projects & Big Data Project in the Industry Skilled in Kubernetes, Hadoop, AWS, Docker, AWS and Jenkins, Certified in AWS Solution Architect.

Workshop

Spark and HADOOP

Abstract

The Workshop will cover basic concepts of Hadoop and mostly in The Cloudera stack, like using HBase & Impala to query data, using Spark to stream data, afterwards we will launch a Cloudera quickstart, using datasets of top-rated movies in the workshops, getting the data analyzed and queried with Hadoop, explaining & demonstrating Map Reduce Concepts, RDD Partition on Spark.

Agenda

Part 1: Introduction to Hadoop and Map Reduce :
- Hadoop Distributers
- Hadoop Vs Traditional Data Storage
- Working with HDFS
- Basic commands
- Architecture
Part 2: Hive and HBase:
- HiveQL
- Hive Data types
- HBase data model
- HBase vs RDBMS
- Client API and REST
Part 3: Apache Spark ( PySpark):
- Basics and RDD
- Caching & Modules
- Spark Streaming
- Spark SQL

Objectives

The main Goal is to really Understand what big data is , how to ingest data , main concepts for Hadoop Data warehouse , and utilize & stream Spark with Big Data

Target audience

entry Level in Big Data, DBA’s , BI Engineers, familiarity in Open Source Systems

Technical requirements

Installations:

Docker Installed on Linux : sudo apt-get install docker.io

Download the Cloudera QuickStart Image : docker pull cloudera/quickstart:latest

Start the Cloudera stack Container:

docker run –hostname=quickstart.cloudera –privileged=true -t -i -p 8888 -p 80 -p 7180 -d <Name of the Image> /usr/bin/docker-quickstart

« Back to Workshops List