Difference between revisions of "Data Repository and Data Services"

From phenoscape
(Data loader)
(Data loader)
Line 6: Line 6:
 
*'''Loads the requisite ontologies into the database'''
 
*'''Loads the requisite ontologies into the database'''
 
**The annotations of character matrices use terms that are defined in life science ontologies. These ontologies include the [http://bioportal.bioontology.org/ontologies/38703 Teleost Taxonomy Ontology (TTO)] that contains definitions of taxa, the [https://www.nescent.org/phenoscape/Teleost_Anatomy_Ontology Teleost Anatomy Ontology (TAO)] that contains definitions of teleost specific anatomical characteristics, and the [http://bioontology.org/wiki/index.php/PATO:Main_Page Phenotype and Trait Ontology (PATO)] that contains definitions of phenotypic qualities. The data loader loads these definitions into the relational database
 
**The annotations of character matrices use terms that are defined in life science ontologies. These ontologies include the [http://bioportal.bioontology.org/ontologies/38703 Teleost Taxonomy Ontology (TTO)] that contains definitions of taxa, the [https://www.nescent.org/phenoscape/Teleost_Anatomy_Ontology Teleost Anatomy Ontology (TAO)] that contains definitions of teleost specific anatomical characteristics, and the [http://bioontology.org/wiki/index.php/PATO:Main_Page Phenotype and Trait Ontology (PATO)] that contains definitions of phenotypic qualities. The data loader loads these definitions into the relational database
*'''Loads the data from the curated NeXML files into the database''', and
+
*'''Loads the data from the curated NeXML files into the database''',  
 +
**The data loader transforms the curated data from NeXML syntax to a set of relational tuples (records), which are then sequentially inserted into the database
 
*'''Invokes the OBD reasoner to elicit implicit information from the data and adds them to the database'''
 
*'''Invokes the OBD reasoner to elicit implicit information from the data and adds them to the database'''
 +
**The OBD reasoner uses definitions of transitive relations, relation hierarchies, and relation compositions to infer implicit information. These inferences are added to the relational database in the final step
  
 
==Web services==
 
==Web services==

Revision as of 15:11, 11 November 2008

Data loader

A data loader application to refresh the data in the Phenoscape database on a daily basis is under development. The application is being developed as a Perl module which:

  • Downloads curated NeXML files from the Phenoscape SVN repositories
    • Ichthyologists curate scientific publications using the Phenex character matrix annotator and these curations are stored in an SVN repository. The data loader downloads these data files on a daily basis for subsequent uploading into the relational database
  • Drops and recreates the database
  • Loads the requisite ontologies into the database
  • Loads the data from the curated NeXML files into the database,
    • The data loader transforms the curated data from NeXML syntax to a set of relational tuples (records), which are then sequentially inserted into the database
  • Invokes the OBD reasoner to elicit implicit information from the data and adds them to the database
    • The OBD reasoner uses definitions of transitive relations, relation hierarchies, and relation compositions to infer implicit information. These inferences are added to the relational database in the final step

Web services

Each service may support multiple media types. The desired media type can be specified by appending ?media=json or similar to the request URL. URI specifications are defined (loosely) using URI Templates.

Term info

URI

<BASE URI>/term/{term_id}

Returns

JSON: <javascript> {

   "id" : "TAO:0001700",
   "name" : "caudal-fin stay",
   "definition" : "Bone that is located anterior to the caudal procurrent rays. Caudal fin stays are unpaired bone."
   "parents" :
   [
       {
           "relation" : "OBO_REL:is_a",
           "id" : "TAO:0001514",
           "name" : "bone"
       },
       {
           "relation" : "OBO_REL:part_of",
           "id" : "TAO:0000862",
           "name" : "caudal fin skeleton"
       }
   ],
   "children" : [] // if there are children, this content should be in the same format as the parents list

} // how should xrefs, etc. be represented, property_value definitions? </javascript>

OWL-RDF:

Todo...

Error

If there is no term with the given ID, the service should return "404 Not Found".

Handling of anonymous post-compositions

Autocomplete

URI

<BASE URI>/term/search/{text}?name=[true|false]&syn=[true|false]&def=[true|false]&ontology=[ont1;ont2;...]

All URI parameters are optional. Default values are name=true, syn=false, def=false. The "ontology" parameter should be a semicolon-separated list of ontology prefixes to search within. If not given, the default is to search all ontologies.

Returns

JSON: <javascript> [

   {   // overall format
       "id" : "TAO:0001514",
       "name" : "bone",
       "match_type" : "name" | "syn" | "def",
       "match_text" : "this is the term name, synonym name, or definition that matched"
   },
   {   // a name example
       "id" : "TAO:0001514",
       "name" : "bone",
       "match_type" : "name",
       "match_text" : "bone"
   },
   {   // a synonym example
       "id" : "TAO:0001795",
       "name" : "ceratohyal foramen",
       "match_type" : "syn",
       "match_text" : "bericiform foramen"
   },
   {   // a definition example
       "id" : "TAO:0000488",
       "name" : "ceratobranchial bone",
       "match_type" : "def",
       "match_text" : "Ceratobranchials are bilaterally paired cartilage bones that form part of the ventral branchial arches. They articulate medially with the hypobranchials and laterally and dorsally with the epibranchials.  Ceratobranchials 1-5 ossify in the ceratobranchial cartilages."
   }

] </javascript>

I'm not sure if an array can be the outer-most object in JSON (need to check). If not, this should be wrapped in an object with a single "matches" key.

Error

If there are no terms matching the given input, a document should still be returned, containing an empty results list.