Skip to content
Main menu hidden.
Published: 2020-06-26

Making sense of complex tree data

NEWS Most plant features arise from complex interactions of genes, proteins and metabolites. The identification and analysis of these genetic traits is very challenging, especially when the sequenced genomes are fragmented. In his thesis, Bastian Schiffthaler has improved the genome information from European aspen and developed bioinformatic tools that help to analyse complex genetic traits in plants.

Text: Carolin Rebernig/Ingrid Söderbergh

For sequencing a genome, the DNA is normally cut into small pieces, the sequence is read and then bioinformatic software assembles the whole sequence information using overlapping regions of these small pieces in an iterative process that ideally yields full length chromosomes. For trees, which often have very complex genomes, most available genome assemblies are therefore not very contiguous. Bastian Schiffthaler worked on improving the contiguity of such genomes focussing on European aspen.

The genome sequence of European aspen was already quite good when compared for example to Norway spruce. However, it was still fragmented which made it difficult to carry out analyses that depend on a highly contiguous assembly. Examples of this are the detection of DNA signatures that relate to traits via genome wide association, or studying evolutionary history by looking at large scale genomic rearrangements.

“Our strategy included modern long read sequencing, polished with highly accurate short-read data and combined with an optical and a genetic map to further link the initially assembled scaffolds into fully assembled chromosomes. At close to 20,000 genetic markers, the genetic map is one of the most comprehensive ones created for any organism to date. This was an overwhelming mass of information that most of the commonly used free software programmes were not able to handle,” says Bastian Schiffthaler.

Ordering markers on a genetic map is a classic application of the travelling salesman problem. To derive the perfect order for only sixty markers would take more calculations than are atoms in the universe, hence all software relies on approximations, but even those were too slow for a dataset of this size.

To overcome this problem, Bastian Schiffthaler developed “BatchMap”, a software package that speeds up the computations required to find the order of genetic markers with the highest likelihood given their inheritance patterns. The software divides calculations into small batches, which are easy to compute and can run in parallel. This drastically decreased the calculation time and Bastian Schiffthaler could produce a dense map of genetic signatures on the European aspen chromosomes. Since the creation of BatchMap, it has now been adopted by other genome projects such as those assembling the Norway spruce and octa-ploid strawberry.        

“We wanted to evaluate our improved assembly in the context of genome wide association studies to look for genes that are involved in the salicinoid metabolism. These metabolites are only available in Populus and Salix species and help to protect the plant against herbivores,” explains Bastian Schiffthaler. “When compared to previous attempts using the more fragmented assembly, we could see that our new genome version improved the analysis of this complex trait a lot and we were able to gain new insights into the evolution of the different Populus species.”

To identify genes that are controlling complex traits is very challenging. Bastian Schiffthaler studied leaf shape variation in European aspen, a complex trait that is inherited from the parents but still highly diverse between individuals. Their results show that leaf shape is controlled by a complex network of many different genes, but the individual gene often exerted only a minor influence on the final leaf shape.

Bastian Schiffthaler believes that to better understand the workings of traits like leaf shape, an integrative approach, where traits are analysed at all stages that contribute to their emergence. He therefore developed “Seidr”, a toolkit to study the interactions of genes that are actively being made into protein within an organism. He hopes that integrating “Seidr” with other layers of data will enable scientists to better predict complex traits in trees in the future.


Read the whole thesis

Press photo. Credit: Alena Aliashkevich


About the public defence:

On Friday the 12th of June at the KBC, Bastian Schiffthaler, Department of Plant Physiology at Umeå University, defended his thesis with the title: Embracing the Data Flood – Integrating Diverse Data to Improve Phenotype Association Discovery in Forest Trees.

The faculty opponent was Marek Mutwil from the School of Biological Sciences at Nanyang Technological University in Singapore, who participated remotely in the defence. Supervisor was Nathaniel Street. The defence was broadcasted live. Interested people could participate via Zoom.