ECTS credits
8 credits
Semester
Fall
Prerequisites
 Knowledge in Optimization, Probabillity and Statistics Bachelor level
 Knowledge in Python programming and in algorithmics
Learning objectives
 Know how to acquire, aggregate and manipulate data.
 Know how to model standard regression and classification problems and program their solutions with an adequate programming language.
 Know how to use data to take decisions
 Understand the importance of data governance and data quality
 Understand the basic approach of data engineering in data science projects
Description of the programme
This course unit consists of three courses: Statistical learning, Python for data science and Datadriven decision making, of 24 hours each, and is complemented by the second part of the data science projects (9 hours course and 12 hours project) devoted to data issues.
Statistical learning
 Introduction
 Classical problems: regression, classification
 Supervised, unsupervised and semisupervised learning
 Curse of dimensionality
 Regression
 Multiple linear regression, OLS method
 Shrinkagetype methods (LASSO, Ridge)
 knearest neighbors
 Classification
 Logistic regression
 knearest neighbors
 SVM
 Rosenblatt perceptron and neuronal networks
Python for data science
 Dataframe: data exploration and data description
 Spotting patterns using factor
 Principal Component Analysis
 Correspondence analysis
 Prediction using trend analysis
 Linear regression
 Logistic regression
 Data classification
 Classification using partitions
 Hierarchical methods
Datadriven decision making
 What is data?
 How do we take decision?
 Data governance and data quality
 How to develop databased decision making?
 Data platform and data architecture
Data science projects: data issues
 Starting a data science project
 The constraints of data science projects
 Finding data
 Acquiring information
 Playing with data
Generic central skills and knowledge targeted in the discipline
 Know how to apply standard supervised and unsupervised classification methods and how to compare several models.
 Know how to apply standard regression methods (OLS) and advanced methods to select variables and cope with the curse of dimensionality (Ridge, LASSO, Elastic Net)
 Know how to apply dimension reduction and data description procedures such as PCA and Correspondence Analysis.

Be able to build indicators of the performance of a model on a dataset

Understand and measure the value of data

Know which data to use to take decisions
 Be able to manipulate data to start data science project
How knowledge is tested
 Tests and projects (Statistical learning): 30%
 Project (Python for data science): 35%
 Group project and presentation (Datadriven decision making): 35%
Bibliography
Statistical Learning
 James G., Witten D., Hastie T. and al. (2013). An introduction to statistical learning: with applications in R. New York: Springer
 Hastie T., Tibshirani R. and Friedman J. (2013). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
 Cornillon PA., MatznerLøber E. et al. (2010). Régression avec R. Paris: Springer.
Python for data science
 Jannach, D., Zanker, M., Felfernig, A. and Friedrich, G. (2010). Recommender Systems: An Introduction. Cambridge.
Data science projects
 Zeng, A and Casari, A. Feature Engineering for Machine Learning. O'Reilly Media.
 Müller, A. and Guido, S. Introduction to Machine Learning with Python. O'Reilly Media.
Teaching team
 Statistical learning: Christophe Pouet (Centrale Marseille)
 Python for data science: François Brucker (Centrale Marseille), Emmanuel Daucé (Centrale Marseille)
 Datadriven decision making: Mickaël Chalamel (Yves SaintLaurent), Franck Chevalier (EY)
 Data science projects: Maximilien Défourné (Mantiks)
Sustainable Development Goal
Partnerships for the goals
 Total hours of teaching100h
 Master class81h
 19h