Difference between revisions of "Phenoscape data loader"
(New page: One of the chief objectives of the Phenoscape project is to present a centralized repository to store annotations entered by ichthyologists. These annotations can be queried from a user in...) |
(→Functioning of the Phenoscape Data Loader) |
||
(20 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | The Phenoscape Data Loader is being developed as a Perl module. This section offers an overview of the functioning of the Phenoscape Data Loader. The data loader has been installed on the dev application server at NESCent and can be found at /usr/local/projects/phenoscape-data-loader | |
− | == | + | ==The Phenoscape data loader - A step by step review== |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | The Phenoscape data loader performs the following steps in sequence | |
− | |||
+ | ===Downloads curated NeXML files from the Phenoscape SVN repositories=== | ||
+ | Ichthyologists curate scientific publications using the [[Phenex]] character matrix annotator and these curations are stored in an SVN repository. The data loader downloads these data files for subsequent uploading into the relational database | ||
+ | |||
+ | ===Loads the requisite ontologies into the database=== | ||
+ | The annotations of character matrices use terms that are defined in life science ontologies. These ontologies include the [http://bioportal.bioontology.org/ontologies/38703 Teleost Taxonomy Ontology (TTO)], the [http://www.obofoundry.org/ro/ Relations Ontology], the [[Teleost_Anatomy_Ontology|Teleost Anatomy Ontology (TAO)]], and the [http://bioontology.org/wiki/index.php/PATO:Main_Page Phenotype and Trait Ontology (PATO)]. The data loader loads all these definitions into the relational database | ||
+ | |||
+ | ===Loads the data from ZFIN model organism database === | ||
+ | [http://zfin.org ZFIN] hosts data relating mutant phenotypes of the ''Danio Rerio'' (Zebrafish) organism to specific genes and genotypes. The data loader transcribes this data into OBD format and loads it into the database | ||
+ | |||
+ | ===Loads the data from the curated NeXML files into the database=== | ||
+ | The data loader transforms the curated data from NeXML syntax to a set of relational tuples (records), which are then sequentially inserted into the database | ||
+ | |||
+ | ===Reasons with the data === | ||
+ | The data loader invokes the [[OBD Reasoner]] to infer implicit knowledge from the assertions, in the form of new assertions. These inferred assertions are also added to the database. | ||
+ | |||
+ | ===Populates the data warehouse=== | ||
+ | The data warehouse is necessary for speedy execution of queries invoked from the user interface. The data warehouse is populated by running stored procedures on the data, after the reasoning process has completed. As an additional step, the loader invokes a labelmaker procedure, which generates readable labels for "post composed" terms | ||
+ | |||
+ | ===Loads the homology data=== | ||
+ | Homology data from the Phenoscape SVN files are loaded into the database. This includes assertions about homologies, the publications from which these are sourced, and the evidence codes used for these assertions. | ||
+ | |||
+ | ==Refreshing the data with the Phenoscape data loader== | ||
+ | |||
+ | The Phenoscape data loader runs as a cron job on the kb-dev server every Friday at 6 PM EST. Given that the application WAR points to one of two copies of the Phenoscape knowledgebase, the data loader refreshes the other copy of the knowledgebase, so the application is never "offline". After the data refresh is completed, the application WAR is redeployed so that it points to the newer version of the data. | ||
+ | |||
+ | ==[[OBD API Documentation]]== | ||
For code specific details | For code specific details | ||
− | + | Specific details of OBD related classes and interfaces as documented by CK. These will be updated very often and are meant to be used as an addendum to the [http://oboedit.org/?page=javadocs The OBOEdit Javadoc] | |
− | + | ||
+ | [[Category:Informatics]] | ||
+ | [[Category:Database]] |
Latest revision as of 16:05, 29 October 2009
The Phenoscape Data Loader is being developed as a Perl module. This section offers an overview of the functioning of the Phenoscape Data Loader. The data loader has been installed on the dev application server at NESCent and can be found at /usr/local/projects/phenoscape-data-loader
Contents
- 1 The Phenoscape data loader - A step by step review
- 1.1 Downloads curated NeXML files from the Phenoscape SVN repositories
- 1.2 Loads the requisite ontologies into the database
- 1.3 Loads the data from ZFIN model organism database
- 1.4 Loads the data from the curated NeXML files into the database
- 1.5 Reasons with the data
- 1.6 Populates the data warehouse
- 1.7 Loads the homology data
- 2 Refreshing the data with the Phenoscape data loader
- 3 OBD API Documentation
The Phenoscape data loader - A step by step review
The Phenoscape data loader performs the following steps in sequence
Downloads curated NeXML files from the Phenoscape SVN repositories
Ichthyologists curate scientific publications using the Phenex character matrix annotator and these curations are stored in an SVN repository. The data loader downloads these data files for subsequent uploading into the relational database
Loads the requisite ontologies into the database
The annotations of character matrices use terms that are defined in life science ontologies. These ontologies include the Teleost Taxonomy Ontology (TTO), the Relations Ontology, the Teleost Anatomy Ontology (TAO), and the Phenotype and Trait Ontology (PATO). The data loader loads all these definitions into the relational database
Loads the data from ZFIN model organism database
ZFIN hosts data relating mutant phenotypes of the Danio Rerio (Zebrafish) organism to specific genes and genotypes. The data loader transcribes this data into OBD format and loads it into the database
Loads the data from the curated NeXML files into the database
The data loader transforms the curated data from NeXML syntax to a set of relational tuples (records), which are then sequentially inserted into the database
Reasons with the data
The data loader invokes the OBD Reasoner to infer implicit knowledge from the assertions, in the form of new assertions. These inferred assertions are also added to the database.
Populates the data warehouse
The data warehouse is necessary for speedy execution of queries invoked from the user interface. The data warehouse is populated by running stored procedures on the data, after the reasoning process has completed. As an additional step, the loader invokes a labelmaker procedure, which generates readable labels for "post composed" terms
Loads the homology data
Homology data from the Phenoscape SVN files are loaded into the database. This includes assertions about homologies, the publications from which these are sourced, and the evidence codes used for these assertions.
Refreshing the data with the Phenoscape data loader
The Phenoscape data loader runs as a cron job on the kb-dev server every Friday at 6 PM EST. Given that the application WAR points to one of two copies of the Phenoscape knowledgebase, the data loader refreshes the other copy of the knowledgebase, so the application is never "offline". After the data refresh is completed, the application WAR is redeployed so that it points to the newer version of the data.
OBD API Documentation
For code specific details Specific details of OBD related classes and interfaces as documented by CK. These will be updated very often and are meant to be used as an addendum to the The OBOEdit Javadoc