BIG DATA CONFERENCE

EUROPE 2024

November 19-22

Onsite & Online

Lidor Gerstel

DEVOPS-Cloud Architect

Centerity, Israel

Biography

DevOps Team Leader & Experienced Trainer with a demonstrated history of leading CI/CD Projects & Big Data Project in the Industry Skilled in Kubernetes, Hadoop, AWS, Docker, AWS and Jenkins, Certified in AWS Solution Architect.

Workshop

Spark and HADOOP

Abstract

The Workshop will cover basic concepts of Hadoop and mostly in The Cloudera stack, like  using HBase & Impala to query data, using Spark to stream data, afterwards we will launch a Cloudera quickstart, using datasets of top-rated movies in the workshops, getting the data analyzed and queried with Hadoop, explaining & demonstrating  Map Reduce Concepts, RDD Partition on Spark.

Agenda

  • Part 1: Introduction to Hadoop and Map Reduce :
    • Hadoop Distributers
    • Hadoop Vs Traditional Data Storage
    • Working with HDFS
    • Basic commands
    • Architecture
  • Part 2: Hive and HBase:
    • HiveQL
    • Hive Data types
    • HBase data model
    • HBase vs RDBMS
    • Client API and REST
  • Part 3: Apache Spark ( PySpark):
    • Basics and RDD
    • Caching & Modules
    • Spark Streaming
    • Spark SQL
Objectives

The main Goal is to really Understand what big data is , how to ingest data , main concepts for Hadoop Data warehouse , and utilize & stream Spark with Big  Data

Target audience

entry Level in Big Data, DBA’s , BI Engineers, familiarity in Open Source Systems

Technical requirements
Installations:

  • Docker Installed on Linux : sudo apt-get install docker.io
  • Download the Cloudera QuickStart Image : docker pull cloudera/quickstart:latest
  • Start the Cloudera stack Container:
  • docker run –hostname=quickstart.cloudera –privileged=true -t -i -p 8888 -p 80 -p 7180 -d <Name of the Image> /usr/bin/docker-quickstart