Rules must be broken when computers are to learn human language

NEWS Computers always follow rules and the instructions they have been given. However, this is a weakness when it comes to processing human language. People don’t always say what they mean or do what they say, but they still understand each other. This is difficult for a computer. ”We are basically completely incomprehensible," says Anna Jonsson, Department of Computing Science at Umeå University, whose thesis was approved in the mid of June.

Text: Ingrid Söderbergh

Anna Jonsson, Department of Computing Science.

ImageLina Lidmark

Computers always follow rules and the instructions they have been given. However, this is a weakness when it comes to processing human language. People don’t always say what they mean or do what they say, but they still understand each other. This is difficult for a computer.

”We are basically completely incomprehensible, says Anna Jonsson, Department of Computer Science at Umeå University, whose thesis "Best Trees Extraction and Contextual Grammars for Language Processing” was approved in the mid of June.

Not infinite

Another problem is that languages are infinite but a computer's memory space is finite. “In Language Processing, we try to use the finite space to cover as large part of a language as possible.

Computers communicate via formal languages that follow a number of logical rules. Human language also follows certain grammatical rules, but these rules are changeable and not strictly necessary for us to understand each other.

In order to analyse human language, the computer needs a language model and data structures that can represent syntax and semantics. Then the computer requires some information about what is considered correct syntax and semantics. “It must also be possible to evaluate the language analysis models and for this, data in the form of human language is needed”.

In her dissertation, Anna Jonsson develops a method that extracts the highest-scoring syntactic analyzes from an existing language analysis model.

“We have also chosen to develop our own model for semantic analysis, and I hope and believe that our model will be important in research on semantics language processing," says Anna Jonsson.

About the disseration:

On Friday 11 June, Anna Jonsson, Department of Computing science at Umeå University, defened her thesis entitled: Best Trees Extraction and Contextual Grammars for Language Processing.
The defence took place at 10.00 in room MA316 at Umeå University and was streamed at Zoom.
Faculty opponent was Dr. Hendrik Jan Hoogeboom, Leiden University.

For more information, please contact:

Anna Jonsson Associate professor

Email

+46 90 786 57 75