The Robinson lab develops a wide range of algorithms, computational resources, and applications.

The Human Phenotype Ontology (HPO)

The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Terms in the HPO describe individual phenotypic abnormalities such as atrial septal defect. For further details and information please refer to the Human Phenotype Ontology Homepage. The HPO is developed together with the Berlin Phenomics Team as a part of the Monarch Initiative.

Köhler S, Vasilevsky NA, Engelstad M, Foster E, McMurry J, Aymé S, Baynam G, Bello SM, Boerkoel CF, Boycott KM, Brudno M, Buske OJ, Chinnery PF, Cipriani V, Connell LE, Dawkins HJ, DeMare LE, Devereau AD, de Vries BB, Firth HV, Freson K, Greene D, Hamosh A, Helbig I, Hum C, Jähn JA, James R, Krause R, F Laulederkind SJ, Lochmüller H, Lyon GJ, Ogishima S, Olry A, Ouwehand WH, Pontikos N, Rath A, Schaefer F, Scott RH, Segal M, Sergouniotis PI, Sever R, Smith CL, Straub V, Thompson R, Turner C, Turro E, Veltman MW, Vulliamy T, Yu J, von Ziegenweidt J, Zankl A, Züchner S, Zemojtel T, Jacobsen JO, Groza T, Smedley D, Mungall CJ, Haendel M, Robinson PN (2017) The Human Phenotype Ontology in 2017. Nucleic Acids Res. 45(D1):D865-D876.


The Exomiser is a Java program that functionally annotates and prioritises variants from whole-exome sequencing data starting from a VCF file. The Exomiser was developed by our group, Damian Smedley and Jules Jacobsen of the Mouse Informatics Group at the Sanger Institute (now at the 100,000 Genomes Project and Quenn Mary's University in London), and other members of the Monarch Initiative. The Exomiser is available for download, and an online demo version is available here.

Smedley D, Jacobsen JO, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN. (2015). Next-generation diagnostics and disease-gene discovery with the Exomiser. Nature Protocols 10:2004-15.


The Phenomizer aims to help clinicians to identify the correct differential diagnosis in the field of human genetics. The user enters the signs/symptoms of the patient encoded as terms from the Human Phenotype Ontology Homepage. The software then ranks all diseases from OMIM, Orphanet, and DECIPHER by a score that reflects how well the phenotypic profiles of the patient and the disease match to each other.

Kohler, S., Schulz, M. H., Krawitz, P., Bauer, S., Dolken, S., Ott, C. E., Mundlos, C., Horn, D., Mundlos, S., and Robinson, P. N. (2009). Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 85, 457–464.


PhenIX, Phenotypic Interpretation of eXomes, is a pipeline for ranking (prioritizing) candidate genes in exomes or NGS panels with comprehensive coverage of human Mendelian disease genes. It ranks genes based on predicted variant pathogenicity as well as phenotypic similarity of diseases associated with the genes harboring these variants to the phenotypic profile of the individual being investigated, based on analysis powered by the Human Phenotype Ontology (HPO). An online demo version of PhenIX is available here.

Zemojtel, T., Kohler, S., Mackenroth, L., Jager, M., Hecht, J., Krawitz, P., Graul-Neumann, L., Doelken, S., Ehmke, N., Spielmann, M., et al. (2014). Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci Transl Med 6:252ra123.


PhenogramViz is a tool that automatically analyses and visualizes gene-to-phenotype relations for a set of genes affected by CNV of a patient and a set of HPO-terms representing the symptoms of said patient. The tool makes full use of the cross-species phenotype ontology uberpheno. Video Tutorials are available on the PhenogramViz Youtube-Channel.

Köhler, S., Schoeneberg, U., Czeschik, J. C., Doelken, S. C., Hehir-Kwa, J. Y., Ibn-Salem, J., Mungall, C. J., Smedley, D., Haendel, M. A., and Robinson, P. N. (2014). Clinical interpretation of CNVs with cross-species phenotype data. J. Med. Genet. 51:766–772.

ChIP-seq software

Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein-DNA interactions based on various measures of enrichment of sequence reads.

Our algorithm, Q, uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5' ends of reads. We show that our method not only is substantially faster than several competing methods but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs.

Hansen P, Hecht J, Ibrahim DM, Krannich A, Truss M, Robinson PN (2015) Saturation analysis of ChIP-seq data for reproducible identification of binding peaks. Genome Res 25:1391-400.

ChIP-exo and ChIP-nexus are related methods that can characterize genomic binding sites of proteins with higher resolution than standard ChIP-seq, but come with their own methodological and statistical challenges. We developed an algorithm for quickly evaluation ChIP-exo and ChIP-nexus datasets called Q-nexus.

Hansen P, Hecht J, Ibn-Salem J, Menkuec BS, Roskosch S, Truss M, Robinson PN (2016) Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus. BMC Genomics 17:873


A Java library for Exome Annotation

Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in disease-gene discovery projects or diagnostics. Jannovar is a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides HGVS-compliant annotations for both for variants affecting coding sequences and splice junctions as well as UTR sequences and non-coding RNA transcripts. Jannovar can also perform family-based pedigree analysis with VCF files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data.


For convenience, we provide a compiled version of the Jannovar executable. The latest version of jannovar is always available at the project github repository. See the instructions in the online manual for information about how to use Jannovar.

Downloading the source code

Developers can download the entire source code for the Jannovar project in the form of a maven repository from the GitHub page.

Current version

The version of Jannovar originally published in 2014 has been substantially extended by Manuel Holtgrewe and Max Schubach. Many of the new features are described in the online manual.

Jager, M., Wang, K., Bauer, S., Smedley, D., Krawitz, P., and Robinson, P. N. (2014). Jannovar: a java library for exome annotation. Hum. Mutat. 35, 548–555.


Analyze and Visualize High-Throughput Biological Data Using Gene Ontology

The Ontologizer is a tool for the statistical analysis and visualization of high-throughput biological data using Gene Ontology. The Ontologizer can be started as a Java Webstart app from the project's homepage.

Bauer, S., Grossmann, S., Vingron, M., and Robinson, P. N. (2008). Ontologizer 2.0–a multifunctional tool for GO term enrichment analysis and data exploration. Bioinformatics 24, 1650–1651.

Bauer, S., Robinson, P. N., and Gagneur, J. (2011). Model-based gene set analysis for Bioconductor. Bioinformatics 27, 1882–1883.


IMSEQ--a fast and error aware approach to immunogenetic sequence analysis

Recombined T- and B-cell receptor repertoires are increasingly being studied using next generation sequencing (NGS) in order to interrogate the repertoire composition as well as changes in the distribution of receptor clones under different physiological and disease states. This type of analysis requires efficient and unambiguous clonotype assignment to a large number of NGS read sequences, including the identification of the incorporated V and J gene segments and the CDR3 sequence. Current tools have deficits with respect to performance, accuracy and documentation of their underlying algorithms and usage.

IMSEQ is a method to derive clonotype repertoires from NGS data with sophisticated routines for handling errors stemming from PCR and sequencing artefacts. The application can handle different kinds of input data originating from single- or paired-end sequencing in different configurations and is generic regarding the species and gene of interest.

The software can be downloaded from the project homepage.

Kuchenbecker, L., Nienen, M., Hecht, J., Neumann, A. U., Babel, N., Reinert, K., and Robinson, P. N. (2015). IMSEQ - a fast and error aware approach to immunogenetic sequence analysis. Bioinformatics, .


BOQA - The Bayesian Ontology Query Algorithm

BOQA integrates the knowledge stored in an ontology and the accompanying annotations into a Bayesian network in order to implement a search system in which users enter one or more terms of the ontology to get a list of appropriate domain items. This is the companion Website to the original Bioinformatics publication. Here, we provide an implementation as well as the Benchmark procedure. The source code can be downloaded here.

Bauer, S., Kohler, S., Schulz, M. H., and Robinson, P. N. (2012). Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics 28, 2502–2508.