ECTS credits
8 credits
Semester
Fall
Prerequisites
- Knowledge in Optimization, Probabillity and Statistics Bachelor level
- Knowledge in Python programming and in algorithmics
Learning objectives
- Know how to acquire, aggregate and manipulate data.
- Know how to model standard regression and classification problems and program their solutions with an adequate programming language.
- Know how to use data to take decisions
- Understand the importance of data governance and data quality
- Understand the basic approach of data engineering in data science projects
Description of the programme
This course unit consists of three courses: Statistical learning, Python for data science and Data-driven decision making, of 24 hours each, and is complemented by the second part of the data science projects (9 hours course and 12 hours project) devoted to data issues.
Statistical learning
- Introduction
- Classical problems: regression, classification
- Supervised, unsupervised and semi-supervised learning
- Curse of dimensionality
- Regression
- Multiple linear regression, OLS method
- Shrinkage-type methods (LASSO, Ridge)
- k-nearest neighbors
- Classification
- Logistic regression
- k-nearest neighbors
- SVM
- Rosenblatt perceptron and neuronal networks
Python for data science
- Dataframe: data exploration and data description
- Spotting patterns using factor
- Principal Component Analysis
- Correspondence analysis
- Prediction using trend analysis
- Linear regression
- Logistic regression
- Data classification
- Classification using partitions
- Hierarchical methods
Data-driven decision making
- What is data?
- How do we take decision?
- Data governance and data quality
- How to develop data-based decision making?
- Data platform and data architecture
Data science projects: data issues
- Starting a data science project
- The constraints of data science projects
- Finding data
- Acquiring information
- Playing with data
Generic central skills and knowledge targeted in the discipline
- Know how to apply standard supervised and unsupervised classification methods and how to compare several models.
- Know how to apply standard regression methods (OLS) and advanced methods to select variables and cope with the curse of dimensionality (Ridge, LASSO, Elastic Net)
- Know how to apply dimension reduction and data description procedures such as PCA and Correspondence Analysis.
-
Be able to build indicators of the performance of a model on a dataset
-
Understand and measure the value of data
-
Know which data to use to take decisions
- Be able to manipulate data to start data science project
How knowledge is tested
- Tests and projects (Statistical learning): 30%
- Project (Python for data science): 35%
- Group project and presentation (Data-driven decision making): 35%
Bibliography
Statistical Learning
- James G., Witten D., Hastie T. and al. (2013). An introduction to statistical learning: with applications in R. New York: Springer
- Hastie T., Tibshirani R. and Friedman J. (2013). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
- Cornillon P-A., Matzner-Løber E. et al. (2010). Régression avec R. Paris: Springer.
Python for data science
- Jannach, D., Zanker, M., Felfernig, A. and Friedrich, G. (2010). Recommender Systems: An Introduction. Cambridge.
Data science projects
- Zeng, A and Casari, A. Feature Engineering for Machine Learning. O'Reilly Media.
- Müller, A. and Guido, S. Introduction to Machine Learning with Python. O'Reilly Media.
Teaching team
- Statistical learning: Christophe Pouet (Centrale Marseille)
- Python for data science: François Brucker (Centrale Marseille), Emmanuel Daucé (Centrale Marseille)
- Data-driven decision making: Mickaël Chalamel (Yves Saint-Laurent), Franck Chevalier (EY)
- Data science projects: Maximilien Défourné (Mantiks)
Sustainable Development Goal
Partnerships for the goals
- Total hours of teaching100h
- Master class81h
- 19h