New formal methods for natural language processing

NEWS When the semantic meaning of a sentence is represented in a computer, it is common to do so using graphs, consisting of nodes and edges. These are generally computationally hard to verify. By making the structure of the graphs explicit and obvious, many of the difficulties can be alleviated. Petter Ericson will defend the results in his thesis on Monday the 4th of February at Umeå University.

Text: Ingrid Söderbergh

Petter Ericsson, PhD at the Department of Computing science at Umeå university

PhotoUlrika Bergfors

The most common tools for auto-translation, such as Google Translate, tend to be based on very simple statistical models that use massive amounts of data to achieve high quality translations. This shows especially in translations to smaller languages, where quality suffers. By using more complex models, higher quality can be achieved using less available data, but this can easily require impractical amounts of computational resources.

The trick lies in restricting the amount of "guesses" that the algorithms need to make during the verification process. In most formal graph models, these need to be made many times for both the structure of the graph and the order of nodes, and this can easily require exponential amounts of time.

In the models presented in Petter Ericson's thesis, both order and structure are instead explicit in the graph, and the verification can furthermore assume that nothing changes during processing.

There are a numberof prerequisites that need to be fulfilled for their use, which limits where and how they can be applied, but preliminary tests look promising for the semantic graphs that motivate the work. Graphs are also used in many other fields, and there are high hopes that the formalisms may find use in other applications as well.

"A central part of the results lies in creating a direct correspondence between certain graphs and simpler structures for which efficient algorithms are already known. As a bonus, a number of related properties can be shown to hold with relative ease, although unexpected complications have turned up several times", says Petter Ericson.

Taken further, the new models can lead to improvements in both auto-translation and natural language interfaces of various kinds, but being very general, they may find use in almost every field where the structure and composition of graphs need formal verification in some way.

Petter Ericson comes from Holmsjön outside Umeå and has studied the Master of Science programme in Computing Science and Engineering at Umeå university.

Read the whole thesis digitally

About the dissertation:

On Monday the 4th of February, Petter Ercson, the Department of computing science at Umeå university, defends his thesis entitled: Order-preseving Graph Grammars. Swedish title: Ordnade grafgrammatiker.
The dissertation takes place at 13:00 in room MA 121, Umeå University.
The Faculty opponent is Professor Sebastian Maneth, Department of Computer Science, University of Bremen.

Press photo

For more information, please contact:

Petter Ericson Postdoctoral fellow

Email