BIG DATA CONFERENCE

EUROPE 2022

November 23-24

Online

Catalin Ciobanu

CTO & Co-Founder

Boostrs SAS, France

Biography

Catalin is co-founder and CTO of Boost.rs, an A.I. startup advancing the science of skills to connect people with jobs and learning. Before founding Boost.rs, Catalin led a data & analytics team within the travel industry, generating several award-winning products and outstanding research coverage (Harvard Business Review, The Economist etc). Prior to his industry experience, Catalin worked as an experimental particle physicist; his PhD thesis centered on applying AI to solving low signal-to-noise problems.

Workshop

Data Science Playbook: A Step-By-Step Guide for Your Journey From Data to Insight

Abstract

This workshop will cover the do’s and don’ts of working with data, from formulating insightful questions to communicating the results to specialists or non-technical public. At the end of the workshop participants will have an analysis blueprint complete with Python notebooks which they can re-purpose and apply to their own projects at work.

Agenda

  • Part 1: Introduction. What is data science? Three types of problems faced by every data scientist.
    • Learn to recognize the type of problem to which you are confronted.
    • Take an object-oriented approach to data analysis: the O-V-V-D framework.
  • Part 2: Data gathering / cleaning / wrangling.
    • Prepare the data. Avoid the GIGO myth.
    • Perform exploratory analysis. Learn guidelines for outlier treatment.
  • Part 3: Feature engineering.
    • Define variable types and variable combinations (linear and non-linear).
    • Perform correlation analysis.
  • Part 4: Multi-variate analysis: regression and classification.
    • Perform linear regression. Quantify predictor importance. Learn to incorporate fixed effects.
    • Perform non-linear techniques. Compare the results obtained from different techniques.
  • Part 5: Prediction using multi-variate models.
    • Quantify model accuracy.
    • Create a “money plot”. Perform final cross checks and verify the robustness of the results.
  • Part 6: Result visualization and communication
    • Learn several guidelines for making great charts.
    • Make your results work for you through data marketing and thought leadership.
Objectives

The main goal of this workshop is to introduce participants to a useful framework for data analysis and to teach them how to solve various challenges inherent to data-related activities in business.

Target audience

The target audience includes data scientists, aspiring data scientists, and anyone interested in working with data in a business environment.

Technical requirements

Anaconda environment with Python 3.7 or above.

Technical knowledge:

  • Basic Python programming or some other programming language
  • Basic statistics: calculating averages and standard deviations, representing basic histograms

Professional experience: participants occupy or have occupied a data-related role, preferably in a business setting.