Skip to content
printicon
Main menu hidden.
Syllabus:

Big Data and high-dimensional data analysis, 7.5 Credits

Swedish name: Big data och analys av högdimensionella data

This syllabus is valid: 2020-08-17 and until further notice

Course code: 5MS062

Credit points: 7.5

Education level: Second cycle

Main Field of Study and progress level: Mathematical Statistics: Second cycle, has only first-cycle course/s as entry requirements

Grading scale: TH teknisk betygsskala

Revised by: Faculty Board of Science and Technology, 2020-05-04

Contents

Element 1 (2 hp): Theory.
In this Element we discuss what characterizes big data and high-dimensional data, including a historical background and examples of applications. Regression analysis including the maximum likelihood- and least squares methods are repeated. The general classification problem is introduced. The goals of classification and hoe performance is measured, are discussed. Furthermore validation methods including cross validation, and evaluation with independent test data, are included. The theory and applications of logistic regression analysis and linear and quadratic discriminant analysis (LDA and QDA) are covered. Variable selection for classification problems, ridge regressio, lasso and principal component analysis (PCA) are treated, as well as how these methods can be used together with logistic regression, LDA and QDA. The statistical software R and interestin program libraries in it are introduced, including a discussion on a worked through exampl containing variable selection, classification and evaluation. Furthermore, the methods K-nearest neighbour (KNN), support vector machines (SVM) and random forest are covered. The general problem of cluster analysis is introduced. The goals of cluster analysis and how performance (robustness) is measured, are discussed. In conection to this, hierarchical cluster analysis, k-means, ans self-organizing maps (SOM) are treated.

Element 2 (5.5 hp) Computer labs.
The Element covers analysis of several data sets, using the statistical methods that are included in the course. The analyses are conducted in the programming language R. In the element, the students are supposed to write thorough reports of the analyses and the results from them.

Expected learning outcomes

For a passing grade, the student must be able to

Knowledge and understanding
  • thoroughly describe several classification and cluster analysis algorithms, such as logistic regression, LDA, QDA, KNN, random forest, SVM, k-means, hierarchical cluster analysis and SOM
  • thoroughly describe several methods for variable selection and dimension reduction, such as ridge regression, lasso, PCA and MDS
  • thoroughly describe several validation methods, such as cross validation, evaluation with independent test data and bootstrap methods
Skills
  • analyze data with the methods abbove and the statistical package R
  • thoroughly describe and interpret the results from the analyses mentioned above
  • conduct variable selection and dimension reduction using R
  • identify suitable analysis methods, suitable variable selection methods and dimension reduction methods for given classification and cluster analysis problems
  • apply validation methods in order to choose among suitable analysis, variable selection and dimension reduction methods, and pick the most suitable one for specific classification and cluster analysis problems
  • present the results of the analyses in written form
Judgement and approach
  • critically evaluate classification methods and cluster analysis methods from a scientific point of view

Required Knowledge

The course requires 90 ECTS including 12 ECTS Mathematical Statistics and 7,5 ECTS Computer Programming or equivalent. Proficiency in English equivalent to Swedish upper secondary course English 5/A. Where the language of instruction is Swedish, applicants must prove proficiency in Swedish to the level required for basic eligibility for higher studies.

Form of instruction

The teaching in Element 1 takes the form of lectures and lessons. The teaching in Element 2 takes the form of supervised lab work.

Examination modes

Element 1 is assessed through written lab reports and a written exam. The lab reports are awarded with one of the following judgements: Fail (U), or Pass (G) and they are given a score. Element 1 and Element 2 are awarded with one of the following judgements: Fail (U), or Pass (G). For Element 2 to be awarded the judgement Pass (G), all the lab reports have to be approved.  For the course as whole, one of the following grades is awarded: Fail (U), Pass (3), Pass with merit (4), Pass with distinction (5). The grade for the whole course is determined by total score on the lab reports and the exam, where the lab reports constitute  2/3 and the written exam 1/3 of the total score.

A student who has been awarded a passing grade for the course cannot be reassessed for a higher grade. Students who do not pass a test or examination on the original date are given another date to retake the examination. A student who has sat two examinations for a course or a part of a course, without passing either examination, has the right to have another examiner appointed, provided there are no specific reasons for not doing so (Chapter 6, Section 22, HEO). The request for a new examiner is made to the Head of the Department of Mathematics and Mathematical Statistics. Examinations based on this course syllabus are guaranteed to be offered for two years after the date of the student's first registration for the course.

Credit transfer
All students have the right to have their previous education or equivalent, and  their working life experience evaluated for possible consideration in the corresponding education at Umeå university. Application forms should be adressed to Student services/Degree evaluation office. More information regarding credit transfer can be found on the student web pages of Umeå university, http://www.student.umu.se, and in the Higher Education Ordinance (chapter 6). If denied, the application can be appealed (as per the Higher Education Ordinance, chapter 12) to Överklagandenämnden för högskolan. This includes partially denied applications

Other regulations

This course can not be included in a degree together with another course with similar contents. When in doubt, the student should consult the director of study at the department of mathematics and mathematical statistics. The course can also be included in the subject area of computational science and engineering.

Literature

Valid from: 2020 week 34

An Introduction to Statistical Learning : with Applications in R
James Gareth., Witten Daniela., Hastie Trevor., Tibshirani Robert.
New York, NY : Springer New York : 2013. : xiv, 426 p. 150 ill., 146 ill. in color. :
ISBN: 9781461471370
Mandatory
Search Album, the University Library catalogue