BIG DATA CONFERENCE
CTO & Co-Founder
Boostrs SAS, France
Catalin is co-founder and CTO of Boost.rs, an A.I. startup advancing the science of skills to connect people with jobs and learning. Before founding Boost.rs, Catalin led a data & analytics team within the travel industry, generating several award-winning products and outstanding research coverage (Harvard Business Review, The Economist etc). Prior to his industry experience, Catalin worked as an experimental particle physicist; his PhD thesis centered on applying AI to solving low signal-to-noise problems.
Data Science Playbook: A Step-By-Step Guide for Your Journey From Data to Insight
This workshop will cover the do’s and don’ts of working with data, from formulating insightful questions to communicating the results to specialists or non-technical public. At the end of the workshop participants will have an analysis blueprint complete with Python notebooks which they can re-purpose and apply to their own projects at work.
- Part 1: Introduction. What is data science? Three types of problems faced by every data scientist.
- Learn to recognize the type of problem to which you are confronted.
- Take an object-oriented approach to data analysis: the O-V-V-D framework.
- Part 2: Data gathering / cleaning / wrangling.
- Prepare the data. Avoid the GIGO myth.
- Perform exploratory analysis. Learn guidelines for outlier treatment.
- Part 3: Feature engineering.
- Define variable types and variable combinations (linear and non-linear).
- Perform correlation analysis.
- Part 4: Multi-variate analysis: regression and classification.
- Perform linear regression. Quantify predictor importance. Learn to incorporate fixed effects.
- Perform non-linear techniques. Compare the results obtained from different techniques.
- Part 5: Prediction using multi-variate models.
- Quantify model accuracy.
- Create a “money plot”. Perform final cross checks and verify the robustness of the results.
- Part 6: Result visualization and communication
- Learn several guidelines for making great charts.
- Make your results work for you through data marketing and thought leadership.
The main goal of this workshop is to introduce participants to a useful framework for data analysis and to teach them how to solve various challenges inherent to data-related activities in business.
The target audience includes data scientists, aspiring data scientists, and anyone interested in working with data in a business environment.
Anaconda environment with Python 3.7 or above.
- Basic Python programming or some other programming language
- Basic statistics: calculating averages and standard deviations, representing basic histograms
Professional experience: participants occupy or have occupied a data-related role, preferably in a business setting.