Information for students, faculty and staff regarding COVID-19. (Updated: 18 November 2020)

# Big Data and high-dimensional data analysis

• 7.5 Credits
• Master’s level
• Autumn Term 2020

Here you will find everything you need to know before the course starts.

Element 1 (2 hp): Theory.
In this Element we discuss what characterizes big data and high-dimensional data, including a historical background and examples of applications. Regression analysis including the maximum likelihood- and least squares methods are repeated. The general classification problem is introduced. The goals of classification and hoe performance is measured, are discussed. Furthermore validation methods including cross validation, and evaluation with independent test data, are included. The theory and applications of logistic regression analysis and linear and quadratic discriminant analysis (LDA and QDA) are covered. Variable selection for classification problems, ridge regressio, lasso and principal component analysis (PCA) are treated, as well as how these methods can be used together with logistic regression, LDA and QDA. The statistical software R and interestin program libraries in it are introduced, including a discussion on a worked through exampl containing variable selection, classification and evaluation. Furthermore, the methods K-nearest neighbour (KNN), support vector machines (SVM) and random forest are covered. The general problem of cluster analysis is introduced. The goals of cluster analysis and how performance (robustness) is measured, are discussed. In conection to this, hierarchical cluster analysis, k-means, ans self-organizing maps (SOM) are treated.

Element 2 (5.5 hp) Computer labs.
The Element covers analysis of several data sets, using the statistical methods that are included in the course. The analyses are conducted in the programming language R. In the element, the students are supposed to write thorough reports of the analyses and the results from them.

In a degree, this course may not be included together with another course with a similar content. If unsure, students should ask the Director of Studies in Mathematics and Mathematical Statistics. The course can also be included in the subject area of computational science and engineering.

## Application and eligibility

### Big Data and high-dimensional data analysis, 7.5 hp

Autumn Term 2020

2 November 2020

17 January 2021

Umeå

English

Daytime, 50%

#### Required Knowledge

The course requires 90 ECTS including 12 ECTS Mathematical Statistics and 7,5 ECTS Computer Programming or equivalent. Proficiency in English equivalent to Swedish upper secondary course English 5/A. Where the language of instruction is Swedish, applicants must prove proficiency in Swedish to the level required for basic eligibility for higher studies.

#### Selection

Guaranteed place Applicants in some programs at Umeå University have guaranteed admission to this course. The number of places for a single course may therefore be limited.

UMU-58216

#### Application

Application deadline was 15 April 2020. The application period is closed.

#### Application and Tuition fees

As a citizen of a country outside the European Union (EU), the European Economic Area (EEA) or Switzerland, you are required to pay application and tuition fees for studies at Umeå University.

SEK900

SEK17,850

SEK17,850