Statistical learning with high-dimensional data 7.5 credits
About the course
This course provides comprehensive knowledge, both regarding breadth and depth, about data science and statistical learning. In the course, both traditional and state of the art methods and algorithms in these fields are discussed. The related fundamental theories are also covered. After passing the course, the students should have a strong ability to solve problems through data. Meanwhile, students are also expected to have a strong self-study ability for understanding and learning any newly developed methods and algorithms.
Module 1 (3hp): Theory
Three families of approaches for dimensionality reduction are covered: spectral based learning (multi-dimensional Scaling, Isomap, Kernel PCA, etc.), manifold learning (Locally linear Embedding, Hessian Eigen-mapping, t-distributed stochastic neighbor embedding, etc.), and deep neural network-based methods (Autoencoders, Variational autoencoder, etc.). As special cases of dimensionality reduction, different feature selection methods, such as Ridge regression, LASSO, and Feature importance are also discussed. Supervised learning approaches including the Kernel-based methods (Kernel ridge regression, Support Vector Machine, etc.), Ensemble methods (Random Forest and Adaboost), Neural Networks, and different Deep Learning approaches and architectures are discussed. Furthermore unsupervised learning approaches including different clustering analysis algorithms, such as Density-based methods and Spectral clustering analysis are included. Deep learning-based unsupervised learning methods, such as Generative adversarial networks and its variations are also covered. Finally, fundamental mathematical theories about kernel methods, ensemble methods, penalty approaches, shallow network, gradient descent algorithm, universal estimator, and fundamental theorem of learning, etc. are discussed.
Module 2 (4.5hp): Computer labs
The module covers the analysis of several data sets, using the statistical methods that are included in the course. The analyses are conducted in one of the the programming languages R or Python. In the module, students write thorough reports of the analyses and the results from them.
Apply
Contact us
Your message goes to Infocenter, and they’ll make sure it gets to the right person – so you get the best and most relevant reply.