Established by: Faculty Board of Science and Technology, 2022-03-14
This course provides comprehensive knowledge, both regarding breadth and depth, about data science and statistical learning. In the course, both traditional and state of the art methods and algorithms in these fields are discussed. The related fundamental theories are also covered. After passing the course, the students should have a strong ability to solve problems through data. Meanwhile, students are also expected to have a strong self-study ability for understanding and learning any newly developed methods and algorithms.
Module 1 (3hp): Theory Three families of approaches for dimensionality reduction are covered: spectral based learning (multi-dimensional Scaling, Isomap, Kernel PCA, etc.), manifold learning (Locally linear Embedding, Hessian Eigen-mapping, t-distributed stochastic neighbor embedding, etc.), and deep neural network-based methods (Autoencoders, Variational autoencoder, etc.). As special cases of dimensionality reduction, different feature selection methods, such as Ridge regression, LASSO, and Feature importance are also discussed. Supervised learning approaches including the Kernel-based methods (Kernel ridge regression, Support Vector Machine, etc.), Ensemble methods (Random Forest and Adaboost), Neural Networks, and different Deep Learning approaches and architectures are discussed. Furthermore unsupervised learning approaches including different clustering analysis algorithms, such as Density-based methods and Spectral clustering analysis are included. Deep learning-based unsupervised learning methods, such as Generative adversarial networks and its variations are also covered. Finally, fundamental mathematical theories about kernel methods, ensemble methods, penalty approaches, shallow network, gradient descent algorithm, universal estimator, and fundamental theorem of learning, etc. are discussed.
Module 2 (4.5hp): Computer labs The module covers the analysis of several data sets, using the statistical methods that are included in the course. The analyses are conducted in one of the the programming languages R or Python. In the module, students write thorough reports of the analyses and the results from them.
Expected learning outcomes
For a passing grade, the student must be able to
Knowledge and understanding
describe in detail the basic ideas and formulations of different algorithms for dimensionality reduction, supervised learning, and unsupervised learning problems.
describe and derive the theoretical results
apply the basic ideas and common techniques for building statistical models and machine learning models
implement methods and algorithms with programming languages like R or Python
identify suitable analysis methods, suitable variable selection methods and dimension reduction methods for given classification and cluster analysis problems
apply validation methods in order to choose among suitable analysis, variable selection and dimension reduction methods, and pick the most suitable one for specific problems
present the results of the analyses in written form
Judgement and approach
critically evaluate classification methods and cluster analysis methods from a scientific point of view
The course requires 90 ECTS including 7,5 ECTS Computer Programming and 12 ECTS Mathematical Statistics or equivalent. Proficiency in English and Swedish equivalent to the level required for basic eligibility for higher studies.
Form of instruction
The teaching in Module 1 takes the form of lectures and lessons. The teaching in Module 2 takes the form of supervised lab work.
Module 1 is assessed through a written exam and is awarded one of the following grades: Fail (U), or Pass (G). The grade is based on the score on the exam. The lab reports are awarded one of the following grades: Fail (U), or Pass (G) and they are given a score. For module 2 to be awarded the grade Pass (G), all the lab reports have to be approved. For the course as whole, one of the following grades is awarded: Fail (U), Pass (3), Pass with merit (4), Pass with distinction (5). The grade for the whole course is determined by the total score on the lab reports and the exam, where the lab reports constitute 2/3 and the written exam 1/3 of the total score.
Deviations from the syllabus examination form can be made for a student who has a decision on pedagogical support due to disability. Individual adaptation of the examination form shall be considered based on the student's needs. The examination form is adapted within the framework of the expected learning outcomes of the course syllabus. At the request of the student, the course coordinator, in consultation with the examiner, must promptly decide on the adapted examination form. The decision shall then be communicated to the student.
A student who has been awarded a passing grade for the course cannot be re-assessed for a higher grade. Students who do not pass a test or examination on the original date are given another date to retake the examination. A student who has sat two examinations for a course or a part of a course, without passing either examination, has the right to have another examiner appointed, provided there are no specific reasons for not doing so (Chapter 6, Section 22, HEO). The request for a new examiner is made to the Head of the Department of Mathematics and Mathematical Statistics. Examinations based on this course syllabus are guaranteed to be offered for two years after the date of the student's first registration for the course.
Credit transfer All students have the right to have their previous education or equivalent, and their working life experience evaluated for possible consideration in the corresponding education at Umeå university. Application forms should be addressed to Student ser-vices/Degree evaluation office. More information regarding credit transfer can be found on the student web pages of Umeå university, http://www.student.umu.se, and in the Higher Education Ordinance (chapter 6). If denied, the application can be ap-pealed (as per the Higher Education Ordinance, chapter 12) to Överklagandenämnden för högskolan. This includes partially denied applications
This course can not be included in a degree together with another course with similar contents. When in doubt, the student should consult the director of study at the department of mathematics and mathematical statistics. The course can also be included in the subject area of computational science and engineering.
In the event that the syllabus ceases to apply or undergoes major changes, students are guaranteed at least three examinations (including the regular examination opportunity) according to the regulations in the syllabus that the student was originally registered on for a period of a maximum of two years from the time that the previous syllabus ceased to apply or that the course ended.
2022 week 30
An Introduction to Statistical Learning : with Applications in R James Gareth., Witten Daniela., Hastie Trevor., Tibshirani Robert. New York, NY : Springer New York : 2013. : xiv, 426 p. 150 ill., 146 ill. in color. : ISBN: 9781461471370 Mandatory Search the University Library catalogue