Formation
Mastère spécialisé
Syllabus 2024/2025 CentraleDigitalLab@LaPlateforme_
Period 1 - Lesson
Formation théoriques
IA and data sciences
Foundations of Machine Learning

Foundations of Machine Learning

Learning objectives

The proliferation of sensors along with data collection and recording systems is such that the amount of information available to users is too large to be processed without the help of high-performance IT tools and algorithms. Among the algorithms that are essential for data processing, classification algorithms are widely used, either to aggregate data into coherent groups (clustering or unsupervised classification), or to decide on the automatic assignment of new data to groups that have already been formed (supervised classification).

The learning objectives of this course are to :

- be familiar with and know how to implement the main supervised and unsupervised classification algorithms,

- be familiar with the conditions for implementing these algorithms and the prerequisites for any data pre-processing,

- be able to quantitatively assess the quality of these algorithms.

- Filtering and prediction of temporal sequences.

Description of the programme

Supervised classification: data management (creation of training, validation and test sets). Metrics in supervised classification (recall, precision, ROC curves and area under the curve, confusion matrices). Details of the main supervised classification algorithms: k-nearest neighbours (KNNs), wide margin separators (linear and kernel SVMs), random trees and forests, neural networks.
Unsupervised classification: data pre-processing (dimension reduction). Details of ascending hierarchical classification methods (study of dissimilarity criteria), k-means and Gaussian mixtures (EM algorithm).
Time series: statistical models, autoregressive models (statsmodels, ARIMA, ARIMAX, SARIMA, etc.)

Implementation and manipulation of these methods using the python library sklearn.

Data pre-processing
Choice and evaluation of a classification algorithm
Presentation of classification results (presentation of metrics and/or graphical representation)
Good knowledge of the sklearn library

How knowledge is tested

Implementation of classification algorithms on real data (iris, moon, mnist, telecom churn and cardiovascular disease prediction data) or simulated data ((non)linearly separable data, data from (multi)-normal distributions) and critical analysis of the results.

Hands-on Machine Learning with Scikit-Learn, Keras, and Tensorflow, 2nd edition, Aurélien Géron, O’ Reilly Media, 2019, 600 pp., ISBN: 978-1-492-03264-9
Vapnik, V. Statistical Learning Theory. Wiley-Interscience, New York, (1998)

Dinov, ID. "Expectation Maximization and Mixture Modeling Tutorial". California Digital Library, Statistics Online Computational Resource, Paper EM_MM,

Valeriya STRIZHKOVA

Total hours of teaching22h
Master class22h

Foundations of Machine Learning

Learning objectives

Description of the programme

Generic central skills and knowledge targeted in the discipline

How knowledge is tested

Bibliography

Teaching team