Our group develops algorithms, tools, statistical methodology and machine learning approaches within omics and adaptive immunity. We have through the last decade published a broad array of methodology and software tools for genomic co-localization analysis. Currently, a main focus is on machine learning methodology to diagnose disease from the immune receptors of adaptive immune cells. We are also interested in causal aspects in the context of adaptive immunity and omics. And finally, we are aiming to determine the degree to which graph representations can improve the analysis of genomic variation.
The research in our group is highly interdisciplinary – not just through integration of computer science and biomedicine, but also through incorporation of a range of methodology aspects – statistical inference, machine learning, software engineering and more – in order to develop solutions that both represent theoretical advancements and availability as robust and accessible software. We eagerly support open source code and open access publication, and put a high emphasis on reproducibility of our work.
Machine learning of immune receptor specificity and disease state
The adaptive immune system is a natural diagnostic and therapeutic – it recognizes threats (such as viruses or bacteria) and helps neutralizing them using adaptive immune receptors (B-cell receptors and T-cell receptors). The immune receptors exhibit high specificity and diversity, but decoding them allows for early disease detection before the appearance of clinical symptoms and improvement in drug development. The disease-specific signal in the immune repertoires is very subtle due to noise, presence of other diseases and generation probabilities of individual receptors.
Our lab focuses on developing machine learning approaches for determining the immune receptor specificity and disease state prediction based on immune repertoires, focusing also on the interpretability of the ML approaches. At the moment, we are working on developing an open-source platform for machine learning analysis of immune receptors, and benchmarking different embedding and encoding techniques along with different machine learning methods to describe the capabilities of ML approaches in the immune receptor field. Additionally, we are looking into how gene variations, generation probabilities and pairing of immune receptors chains influence disease state prediction.
Improving analysis of genomic variation using genome graphs
In line with the more and more common belief that traditional linear reference genomes are insufficient for studying and representing genomic variation, we are investigating how graph-based reference genomes can improve the way we do common bioinformatics operations, such as read mapping and variant calling. As part of this effort, we have developed the first graph-based peak caller, and investigated how read-mapping to graph-based reference genomes potentially can be improved. Currently, we are interested in understanding the relationship between genotyping/variant calling and read mapping to graph-based reference genomes, and are investigating more efficient and accurate ways of performing genotyping using genome graphs. This effort is done in collaboration with the CELS 2 research initiative, where the goal is to employ genome graphs to better understand and analyse the intricate Cod genome, by studying the sequences of more than 900 cod fishes sequenced at the Centre for Ecological and Evolutionary Synthesis (UiO).
Causal analysis in machine learning, omics and immune repertoire analysis
Knowing the true causes of a disease allows for the development of effective medical therapies. Causality is therefore an important question in medical machine learning applications. In the omics setting there are many molecules to consider, which results in very high dimensional problems and complicated causal relationships requiring novel specialized techniques. Furthermore, machine learning predictions are vulnerable to confounding, that is learning relationships among variables that are particular to a dataset and not generalizable. Causal analysis techniques allow for the handling of such confounding problems, and are an active research area within machine learning.
Statistical genome analysis
Our group has been a main driver of “the Genomic HyperBrowser”, which has been a long-term, coordinated research activity on genome analysis in Oslo, in collaboration with local biologists and statisticians. Through this collaboration we published around ten papers on software and methodology development and around fifteen papers on biomedical applications of genome analysis.
Research directions with auxiliary contributions
- Gene Regulatory Network (Ping-Han)
- Cancer mutation analysis (Daniel Vodak, Sumana Kalyanasundaram)
- Single cell transcriptomics (Ying Yao, Alumni)
- Celiac disease research (Ralf Neumann, Alumni)
- 3D genome structure (Jonas Paulsen, Alumni)
- Genome assembly (Ksenia Khelik, Alumni)
- Gene regulation (Antonio Mora and Ankush Sharma, Alumni)
- Doctor AI: Developing novel machine learning methodology for immune repertoire-based diagnostics.
- ImmunoHub: An end-to-end software service and tool ecosystem for immune receptor sequence storage and analysis of immune receptor sequence data.
- ImmunoLingo: Linguistics-driven machine learning to decipher the molecular language of immunity.