Main Field of Study and progress level:
Computing Science: Second cycle, has second-cycle course/s as entry requirements
Grading scale: Three-grade scale
Responsible department: Department of Computing Science
Established by: Faculty Board of Science and Technology, 2021-02-25
Revised by: Faculty Board of Science and Technology, 2023-03-07
This course is an introduction to Natural Language Processing (NLP) for students already proficient in programming and machine learning. The aim is to provide a solid background in theory and techniques used to accomplish different NLP tasks such as understanding and generating natural language. As NLP technologies are used by many people every day, and inform many other "AI" systems, special focus will be given to questions of ethics, equity, and the social impact of these technologies.
The course covers a mix of techniques, including rule-based, statistical, and machine learning methods for NLP. Since language data is at the core of many modern NLP techniques, the course will additionally cover assessment of data quality, as well as developing an understanding of complex issues of representation and data ownership.
Basic concepts and methodology from linguistics are introduced, including aspects of how language is constructed and used, and the importance of context. These are used to ground an understanding both of how effective solutions to NLP tasks are constructed, and the challenges of doing so for various languages.
Beyond this theoretical grounding, there will be practical exercises and assignments focusing on applying various techniques to address tasks within NLP. The coursework also includes actively participating in seminars and writing reports.
The programming language Python is used, but the language is not taught during the course.
Expected learning outcomes
Knowledge and understanding After completing the course, the student should be able to:
(FSR 1) describe and apply core concepts and methods from various disciplines in Linguistics (including morphology, syntax, semantics, and pragmatics) to natural language processing,
(FSR 2) explain what is required to accomplish typical NLP tasks (e.g. machine translation or natural language generation),
(FSR 3) categorize various NLP tools as rule-based, statistical, or machine learning and compare the advantages of and disadvantages of each strategy.
Competence and skills After completing the course, the student should be able to:
(FSR 4) design an appropriate pipeline for a given NLP task, and construct parts of such a pipeline,
(FSR 5) apply linguistic principles and methods to solve language tasks, e.g. using syntactic analysis to analyze sentences and produce syntax trees,
(FSR 6) implement algorithmic solutions to specific language problems, e.g. parsers for producing syntax trees,
(FSR 7) evaluate the performance of NLP software for quality and effectiveness using appropriate metrics; interpret and explain the results of these metrics.
Judgement and approach After completing the course, the student should be able to:
(FSR 8) critically assess the social impact of language technology, including evaluating the risks, benefits, and harms of specific technologies,
(FSR 9) explain with examples the potential harms of an NLP technology in development, and how such harms might be mitigated,
(FSR 10) discuss the ethical and practical issues associated with language data for NLP, including questions of ownership, implicit bias, linguistic discrimination, and representational harms.
At least 90 ECTS, including 60 ECTS Computing Science, or at least 120 ECTS within a study programme. At least 7.5 ECTS data structures and algorithms; 7.5 ECTS discrete mathematics; 7.5 ECTS formal languages and 7.5 ECTS machine learning. Proficiency in English equivalent to the level required for basic eligibility for higher studies.
Form of instruction
This course follows a "flipped classroom" model, where students engage with the material before class. Class may consist of instructor-led discussion and exercises for applying knowledge, seminars, as well as supervised computer labs and tutorials. In addition to scheduled activities, individual work with the material is also required.
The course gives one of the grades Fail (U), Pass (G), or Pass with Distinction (VG). The student's achievements on the course are assessed through written assignments (FSR 1-7, 9-10) and seminars (FSR 8-10). Some assignments involve programming in Python. All assignments and seminars must be completed to receive a passing grade.
Adapted examination The examiner can decide to deviate from the specified forms of examination. Individual adaptation of the examination shall be considered based on the needs of the student. The examination is adapted within the constraints of the expected learning outcomes. A student that needs adapted examination shall no later than 10 days before the examination request adaptation from the Department of Computing Science. The examiner makes a decision of adapted examination and the student is notified.
This course may not be included in a degree, in whole or in part, at the same time as another course with similar content. In case of doubt, the student should consult the study counsellor at the Department of Computing Science and/or the programme coordinator for their degree programme.
If the syllabus has expired or the course has been discontinued, a student who at some point registered for the course is guaranteed at least three examinations (including the regular examination) according to this syllabus for a maximum period of two years from the syllabus expiring or the course being discontinued.
2023 week 26
Speech and language processing : an introduction to natural language processing, computational linguistics and speech recognition Jurafsky Dan, Martin James H. 2. ed. : Upper Saddle River, N.J. : Pearson Education International/Prentice Hall : cop. 2009 : 1024 s. : ISBN: 9780135041963 Mandatory Search the University Library catalogue
Additional sources such as research articles, book chapters, etc as appropriate.