Phenoscape KB
The Phenoscape Knowledgebase (KB) is an integrated dataset, and software infrastructure and services, bringing together evolutionary data matrices annotated by Phenoscape curators, model organism phenotype and gene expression annotations, and a collection of ontologies providing a common semantic framework. The entire dataset is expressed in the OWL Web Ontology Language, and, when used as a database, stored in an RDF triplestore.
This page describes the in-development version of the KB built on OWL and RDF technologies, not the legacy Phenoscape Knowledgebase built on OBD.
Contents
Data sources
Comparative data
Phenoscape
Model organism data
ZFIN
Gene expression and genetic phenotype annotations for zebrafish (Danio rerio). ZFIN data downloads can be found at http://zfin.org/downloads.
- Gene identifiers and names
- Gene expression annotations
- Genetic phenotype annotations
MGI
Gene expression and genetic phenotype annotations for mouse (Mus musculus). MGI data downloads can be found at ftp://ftp.informatics.jax.org/pub/reports/index.html.
- Gene identifiers and names
- Gene expression annotations
- Custom report provided by Terry Hayamizu. We are working with MGI to establish continually updated downloadable reports.
- Genetic phenotype annotations
- Custom report provided by Terry Hayamizu. We are working with MGI to establish continually updated downloadable reports.
Xenbase
Gene expression and genetic phenotype annotations for frog (Xenopus laevis and X. tropicalis). Xenbase data downloads can be found at ftp://ftp.xenbase.org/pub.
- Gene identifiers and names
- Gene expression annotations
- Genetic phenotype annotations
- A preliminary annotation file has been provided by Xenbase curators. Not yet imported into the KB.
HPO
Genetic phenotype annotations for human (Homo sapiens). HPO data downloads can be found at http://www.human-phenotype-ontology.org/contao/index.php/downloads.html.
- Genetic phenotype annotations
Ontologies
- Uberon cross-species anatomy ontology, with Phenoscape extensions
- Zebrafish anatomy
- Mouse annotations provided by Terry directly reference Uberon; no mouse-specific ontologies are used at this time.
- Xenopus anatomy
- Human phenotypes
- http://purl.obolibrary.org/obo/hp.owl
- Logical definitions for human phenotype classes (FMA + PATO): http://purl.obolibrary.org/obo/hp/hp-equivalence-axioms.obo
- This file is converted to OWL after introducing a special header,
logical-definition-view-relation: involves
, which is used by the oboformat converter.
- This file is converted to OWL after introducing a special header,
- Bridge axioms from FMA (human anatomy) to Uberon: http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-fma.owl
- FMA itself is not used (should it be? are there FMA classes used in the HP definitions that don't have direct mappings in the bridge file?)
- Gene Ontology
- Biospatial ontology
- Phenotypic quality ontology
- Vertebrate taxonomy
- http://purl.obolibrary.org/obo/vto.owl
- An individual-based tree is generated from this class hierarchy, and both are loaded into the KB.
- Ontology metadata
Production tools
All of the inputs to the Phenoscape KB are processed to create a coherent semantic model in OWL. For OWL-based ontology files, processing may be minimal. For input datasets, we have written conversion tools which generate OWL-formatted data from input tables.
Phenoscape OWL tools
This project provides a collection of Scala classes which implement various ontology manipulations, as well as OWL converters for the various data formats used by our data providers.
Phenoscape builder
This project provides an Ant build script which orchestrates download of data sources and ontologies, conversion to OWL, and pre-reasoning tasks using the Phenoscape OWL tools and other code libraries. The result is a set of OWL files, constituting the Phenoscape KB, which can be loaded together into an RDF triplestore.
Reasoning process
The OWL ontologies and data files constituting the Phenoscape KB are ultimately loaded into an RDF triplestore for use as a database back-end for the KB web application. The SPARQL query language provides a natural interface to semantic web data; however, support for query-time reasoning within RDF triplestores is severely limited in both performance (in the face of large, complex ontologies) and OWL expressivity. Further, while SPARQL and RDF are perfectly "aligned" with OWL instance data and property assertions, queries that require direct manipulation of the RDF serialization implementation details of OWL class expressions and complex axioms are over-complicated, error-prone, and bound to result in possibly faulty attempts to embed OWL reasoning logic into the structure of the query.
These considerations have led to a number of guiding principles for the production of OWL data for the KB:
- Any logical inferences which we wish to make us of in the web application interface must be precomputed and asserted into the triplestore knowledgebase.
- SPARQL queries should be able to obtain the desired results by making use of only named classes; it should not be necessary to describe OWL class expressions via SPARQL triple patterns.
- TODO Tbox vs Abox reasoning performance