Established by: Faculty Board of Science and Technology, 2021-02-25
This course is an introduction to Natural Language Processing (NLP) for students already proficient in programming and machine learning. The aim is to provide a solid background in theory and techniques used to accomplish different NLP tasks such as understanding and generating natural language. As NLP technologies are used by many people every day, and inform many other "AI" systems, special focus will be given to questions of ethics, equity, and the social impact of these technologies.
The course covers a mix of techniques, including rule-based, statistical, and machine learning methods for NLP. Since language data is at the core of many modern NLP techniques, the course will additionally cover assessment of data quality, as well as developing an understanding of complex issues of representation and data ownership.
Basic concepts and methodology from linguistics are introduced, including aspects of how language is constructed and used, and the importance of context. These are used to ground an understanding both of how effective solutions to NLP tasks are constructed, and the challenges of doing so for various languages.
Beyond this theoretical grounding, there will be practical exercises and assignments focusing on applying various techniques to address tasks within NLP. The coursework also includes actively participating in seminars and writing reports.
Expected learning outcomes
Knowledge and understanding After completing the course, the student should be able to:
FSR1: Describe and apply core concepts and methods from various disciplines in Linguistics (including morphology, syntax, semantics, and pragmatics) to natural language processing.
FSR2: Explain what is required to accomplish typical NLP tasks (e.g. machine translation or natural language generation).
FSR3: Categorize various NLP tools as rule-based, statistical, or machine learning and compare the advantages of and disadvantages of each strategy.
Competence and skills After completing the course, the student should be able to:
FSR4: Design an appropriate pipeline for a given NLP task, and construct parts of such a pipeline.
FSR5: Apply linguistic principles and methods to solve language tasks, e.g. using syntactic analysis to analyze sentences and produce syntax trees.
FSR6: Implement algorithmic solutions to specific language problems, e.g. parsers for producing syntax trees.
FSR7: Evaluate the performance of NLP software for quality and effectiveness using appropriate metrics; interpret and explain the results of these metrics.
Judgement and approach After completing the course, the student should be able to:
FSR8: Critically assess the social impact of language technology, including evaluating the risks, benefits, and harms of specific technologies.
FSR9: Explain with examples the potential harms of an NLP technology in development, and how such harms might be mitigated.
FSR10: Discuss the ethical and practical issues associated with language data for NLP, including questions of ownership, implicit bias, linguistic discrimination, and representational harms.
Univ: To be admitted you must have (or equivalent) 90 ECTS-credits including 60 ECTS-credits in Computing Science or three years of completed studies within a study programme (180 ECTS-credits). In both cases, includning * a course (7.5 ECTS-credits) in Machine learning (e.g. 5DV194) that includes Naive Bayes, Hidden Markov Models, Decision Trees and Neural Networks including how backpropagation works * a course (7.5 ECTS-credits) in Formal languages (e.g. 5DV208 CS3: Computations and languages or 5DV037 Fundamentals of Computer Science) that includes Automata, Turing Machines, Regular languages, Context-free languages, pumping lemma (regular, context free), CYK parser (also passing familiarity with shift-reduce)
It is recommended to have some familiarity with Python (we will use Python in exercises/assignments, so students should either know how to code in Python or be in a situation where they feel confident they can quickly pick it up)
Proficiency in English equivalent to Swedish upper Secondary course English A/5. Where the language of instruction is Swedish, applicants must prove proficiency in Swedish to the level required for basic eligibility for higher studies.
Form of instruction
This course follows a "flipped classroom" model, where students engage with the material before class. Class may consist of instructor-led discussion and exercises for applying knowledge, seminars, as well as supervised computer labs and tutorials. In addition to scheduled activities, individual work with the material is also required.
The course gives one of the grades Fail (U), Pass (G), or Pass with Distinction (VG). The student's achievements on the course are assessed through written assignments (FSR 1-7, 9-10) and seminars (FSR 8-10). Some assignments involve programming in Python. All assignments and seminars must be completed to receive a passing grade.
Deviations from the syllabus' modes of assessment can be made for a student who has a decision on pedagogical support due to a disability. Individual adaptation of modes of assessment must be considered based on the student's needs. The mode of assessment is adapted within the framework of the syllabus' expected learning outcomes. At the request of the student, the course coordinator, in consultation with the examiner, shall promptly decide on an adapted mode of assessment. The decision must then be notified to the student.
A student who, without receiving a passing grade, has participated in two tests for a course or part of a course, has the right to have another examiner appointed, unless special reasons militate against it (Högskoleförordningen 6 kap. 22 §). A request for a new examiner is made to the head of the Department of Computing Science.
This course may not be included in a degree, in whole or in part, at the same time as another course with similar content. In case of doubt, the student should consult the study counsellor at the Department of Computing Science and/or the programme coordinator for their degree programme.
If the syllabus has expired or the course has been discontinued, a student who at some point registered for the course is guaranteed at least three examinations (including the regular examination) according to this syllabus for a maximum period of two years from the syllabus expiring or the course being discontinued.
2021 week 1
Speech and language processing : an introduction to natural language processing, computational linguistics and speech recognition Jurafsky Dan, Martin James H. 2. ed. : Upper Saddle River, N.J. : Pearson Education International/Prentice Hall : cop. 2009 : 1024 s. : ISBN: 9780135041963 Mandatory Search the University Library catalogue
Additional sources such as research articles, book chapters, etc as appropriate.