Informatics

From phenoscape
Revision as of 19:51, 5 November 2008 by Crk18 (talk | contribs) (Phenoscape Data Loader)

This page provides a broad overview of our informatics activities. Phenoscape supports open development processes and collaboration. All source code we create is available from open source repositories such as Sourceforge, and we work with existing open-source projects whenever possible. Development plans can be found at our software roadmap.

Phenoscape software components

Phenex curation tool

Phenex is an application for annotating character matrix files with ontology terms to describe phenotypes and identify taxa. Phenex is a Java application based on code from the OBO-Edit and Phenote projects. While still under development, Phenex is ready for use and enables our ongoing data curation activities.

Data services built on OBD

We are adopting OBD as the ontology-driven datastore for our phenotype annotations. We are collaborating with the Berkeley Bioinformatics Open-source Projects group in driving future development of OBD. We are also developing a suite of web services on top of OBD to serve as a data access API and foundation for our user-oriented Phenoscape web application. These web services make use of the OBD Java API and present a RESTful service interface using Restlet.

Phenoscape web UI

The Phenoscape web application will allow scientists to browse and query the phenotype annotations as well as the supporting ontologies. Initially, the query capabilities will concentrate on implementing a select set of "use-cases", research questions that show the utility of the approach. Ultimately, we will build interfaces that allow researchers to ask open-ended questions of the data. The web application is being developed using Ruby on Rails and accesses phenotype data and ontology information via our OBD web services.

Phenoscape Data Loader

A data loader application to refresh the data in the Phenoscape database on a daily basis is under development. The application is being developed as a Perl module which:

  1. Downloads curated NeXML files from the Phenoscape SVN repositories
  2. Drops and recreates the database
  3. Loads the requisite ontologies into the database
  4. Loads the data from the curated NeXML files into the database, and
  5. Invokes the OBD reasoner to elicit implicit information from the data and adds them to the database

Concerns: The quality of the curated data is a constraint to the proper functioning of the data loader. Incomplete data that violates database integrity constraints will not be loaded into the database. Quality checks with respect to the completeness of the data are being implemented in parallel.

Affiliated projects

OBO-Edit

We are using the OBO-Edit ontology editor to develop and maintain our ontologies such as the Teleost Anatomy Ontology and the Teleost Taxonomy Ontology.

NeXML

Phenex saves character matrix data using the new evolutionary data standard NeXML. NeXML is an XML Schema and has robust facilities for embedding additional data, such as our phenotype annotations, within a traditional character-by-taxon matrix.