Edit this page on GitHub

Phenoscape KB

The Phenoscape Knowledgebase (KB) is an integrated dataset, and software infrastructure and services, bringing together evolutionary data matrices annotated by Phenoscape curators, model organism phenotype and gene expression annotations, and a collection of ontologies providing a common semantic framework. The entire dataset is expressed in the OWL Web Ontology Language, and, when used as a database, stored in an RDF triplestore.

This page describes the in-development version of the KB built on OWL and RDF technologies, not the legacy Phenoscape Knowledgebase built on OBD.

Data sources

Comparative data

Phenoscape

Model organism data

ZFIN

Gene expression and genetic phenotype annotations for zebrafish (Danio rerio). ZFIN data downloads can be found at http://zfin.org/downloads.

MGI

Gene expression and genetic phenotype annotations for mouse (Mus musculus). MGI data downloads can be found at ftp://ftp.informatics.jax.org/pub/reports/index.html.

Xenbase

Gene expression and genetic phenotype annotations for frog (Xenopus laevis and X. tropicalis). Xenbase data downloads can be found at ftp://ftp.xenbase.org/pub.

HPO

Genetic phenotype annotations for human (Homo sapiens). HPO data downloads can be found at http://www.human-phenotype-ontology.org/contao/index.php/downloads.html.

Ontologies

Production tools

All of the inputs to the Phenoscape KB are processed to create a coherent semantic model in OWL. For OWL-based ontology files, processing may be minimal. For input datasets, we have written conversion tools which generate OWL-formatted data from input tables.

Phenoscape OWL tools

This project provides a collection of Scala classes which implement various ontology manipulations, as well as OWL converters for the various data formats used by our data providers.

KB build process

The Knowledgebase is built using a Scala script within the Phenoscape OWL tools codebase. It orchestrates coordinates OWL conversions and reasoning tasks. The result is a set of OWL files, constituting the Phenoscape KB, which can be loaded together into an RDF triplestore.

An overview of specific components of the KB build process is available.

Reasoning process

The OWL ontologies and data files constituting the Phenoscape KB are ultimately loaded into an RDF triplestore for use as a database back-end for the KB web application. The SPARQL query language provides a natural interface to semantic web data; however, support for query-time reasoning within RDF triplestores is severely limited in both performance (in the face of large, complex ontologies) and OWL expressivity. Further, while SPARQL and RDF are perfectly “aligned” with OWL instance data and property assertions, queries that require direct manipulation of the RDF serialization implementation details of OWL class expressions and complex axioms are over-complicated, error-prone, and bound to result in possibly faulty attempts to embed OWL reasoning logic into the structure of the query.

These considerations have contributed to a number of guiding principles for the production of OWL data for the KB:

Database and web application

The RDF output of the production script is loaded into an RDF triplestore. We are currently using the Bigdata triplestore. In selecting an RDF triplestore for the KB, we have a few basic requirements:

Nice features (not required but possibly needed for certain interface features):