Element 1 (2 hp): Theory. In this Element we discuss what characterizes big data and high-dimensional data, including a historical background and examples of applications. Regression analysis including the maximum likelihood- and least squares methods are repeated. The general classification problem is introduced. The goals of classification and hoe performance is measured, are discussed. Furthermore validation methods including cross validation, and evaluation with independent test data, are included. The theory and applications of logistic regression analysis and linear and quadratic discriminant analysis (LDA and QDA) are covered. Variable selection for classification problems, ridge regressio, lasso and principal component analysis (PCA) are treated, as well as how these methods can be used together with logistic regression, LDA and QDA. The statistical software R and interestin program libraries in it are introduced, including a discussion on a worked through exampl containing variable selection, classification and evaluation. Furthermore, the methods K-nearest neighbour (KNN), system vector machines (SVM) and random forest are covered. The general problem of cluster analysis is introduced. The goals of cluster analysis and how performance (robustness) is measured, are discussed. In conection to this, hierarchical cluster analysis, k-means, ans self-organizing maps (SOM) are treated.
Element 2 (5.5 hp) Computer labs. The Element covers analysis of several data sets, using the statistical methods that are included in the course. The analyses are conducted in the programming language R. In the element, the students are supposed to write thorough reports of the analyses and the results from them.
In a degree, this course may not be included together with another course with a similar content. If unsure, students should ask the Director of Studies in Mathematics and Mathematical Statistics. The course can also be included in the subject area of computational science and engineering.
Big Data and high-dimensional data analysis, 7.5 hp
Autumn Term 2018
Lectures begin on week starting 5 November 2018
Lectures end during the week of 14 January 2019
English (upon request)
Type of studies
The course requires 90 ECTS including 12 ECTS Mathematical Statistics and 7,5 ECTS Computer Programming or equivalent. Proficiency in English equivalent to Swedish upper secondary course English 5/A. Where the language of instruction is Swedish, applicants must prove proficiency in Swedish to the level required for basic eligibility for higher studies.
Applicants in some programs at Umeå University have guaranteed admission to this course. The number of places for a single course may therefore be limited.
Online applications are now open. You will be transferred to Universityadmissions.se.
Application deadline is
16 April 2018.