Phenoscape data loader

From phenoscape
Revision as of 14:44, 25 March 2009 by Crk18 (talk | contribs) (Invoking the Phenoscape Data Loader)

The Phenoscape Data Loader is being developed as a Perl module. This section offers an overview of the functioning of the Phenoscape Data Loader. The data loader has been installed on the dev application server at NESCent and can be found at /usr/local/projects/phenoscape-data-loader

Functioning of the Phenoscape Data Loader

The Phenoscape Data Loader performs the following steps in sequence

Downloads curated NeXML files from the Phenoscape SVN repositories

Ichthyologists curate scientific publications using the Phenex character matrix annotator and these curations are stored in an SVN repository. The data loader downloads these data files on a daily basis for subsequent uploading into the relational database

Loads the requisite ontologies into the database

The annotations of character matrices use terms that are defined in life science ontologies. These ontologies include the Teleost Taxonomy Ontology (TTO), the Relations Ontology, the Teleost Anatomy Ontology (TAO), and the Phenotype and Trait Ontology (PATO). The data loader loads all these definitions into the relational database

Loads the data from ZFIN model organism database

ZFIN hosts data relating mutant phenotypes of the Danio Rerio (Zebrafish) organism to specific genes and genotypes. The data loader transcribes this data into OBD format and loads it into the database

Loads the data from the curated NeXML files into the database

The data loader transforms the curated data from NeXML syntax to a set of relational tuples (records), which are then sequentially inserted into the database

Logs incomplete annotations into a log file

The data loader does not load incomplete annotations from the NeXML files into the database. Incomplete annotations contain null values for taxa or phenotype or both. Instead, it logs these incomplete annotations on a file-specific basis in this Problem Log Format. Curators can then work on finishing these annotations which will be subsequently loaded into the database in the next execution of the data loader.

Reasons with the data

Lastly, the data loader invokes the OBD Reasoner to infer implicit knowledge from the assertions, in the form of new assertions. These inferred assertions are also added to the database.

Invoking the Phenoscape Data Loader

The Phenoscape data loader has been installed on the dev application server at NESCent and can be found at /usr/local/projects/phenoscape-data-loader. The data loader can be manually invoked by those with access rights through public or private key to the application dev server at NESCent. These would be JB, HL, JA, CK (but of course), and TJV (?). To invoke the data loader, navigate to the scripts subdirectory where the application is housed and run the command shown below. This has been tested with JA.

<javascript> sh refreshPhenoscapeDB </javascript>

Future plans and concerns

We are working towards turning the database load into a nightly cron job. However the entire process takes about 70 minutes to complete. It would be inadvisable to have the main database down for this duration, which can be expected to increase as more and more curations are loaded. Therefore, a copy database is updated by the data loader. As of now, the database administrator needs to copy this to the main database in a step that takes a few seconds to complete. It would be desirable to automate this process as well, but it precipitates access privilege concerns for the cron process. We are currently investigating feasible means to accomplish this.

OBD API Documentation

For code specific details Specific details of OBD related classes and interfaces as documented by Cartik Kothari. These will be updated very often and are meant to be used as an addendum to the The OBOEdit Javadoc