Difference between revisions of "Data Repository and Data Services"

From phenoscape
(Gene search summary)
(Replacing page with ' Category:Informatics')
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
One of the chief objectives of the Phenoscape project is to present a centralized repository to store annotations entered by ichthyologists. These annotations can be queried from a user interface and used to answer [[Driving_Research_Questions]] and for Advanced [[Phenoscape_use_cases]]. The following tools which are under various stages of development, will serve to realize this objective.
 
  
==Phenoscape Data loader==
 
A data loader application to refresh the data in the Phenoscape database on a daily basis is under development. The application is being developed as a Perl module, which in sequential order:
 
* '''Downloads curated NeXML files from the Phenoscape SVN repositories'''
 
** Ichthyologists curate scientific publications using the [[Phenex]] character matrix annotator and these curations are stored in an SVN repository. The data loader downloads these data files on a daily basis for subsequent uploading into the relational database
 
* '''Drops and recreates the database'''
 
* '''Loads the requisite ontologies into the database'''
 
** The annotations of character matrices use terms that are defined in life science ontologies. These ontologies include the [http://bioportal.bioontology.org/ontologies/38703 Teleost Taxonomy Ontology (TTO)], the [http://www.obofoundry.org/ro/ Relations Ontology],  the [[Teleost_Anatomy_Ontology|Teleost Anatomy Ontology (TAO)]], and the [http://bioontology.org/wiki/index.php/PATO:Main_Page Phenotype and Trait Ontology (PATO)]. The data loader loads all these definitions into the relational database
 
* '''Loads the data from the curated NeXML files into the database''',
 
** The data loader transforms the curated data from NeXML syntax to a set of relational tuples (records), which are then sequentially inserted into the database
 
* '''Invokes the OBD reasoner to elicit implicit information from the data and adds them to the database'''
 
** The OBD reasoner uses definitions of transitive relations, relation hierarchies, and relation compositions to infer implicit information. These inferences are added to the relational database in the final step
 
* '''Logs incomplete annotations into a log file'''
 
** The data loader does not load incomplete annotations from the NeXML files into the database. Incomplete annotations contain null values for taxa or phenotype or both. Instead, it logs these incomplete annotations on a file-specific basis in this [[Problem Log Format]]. Curators can then work on finishing these annotations which will be subsequently loaded into the database in the next execution of the data loader.
 
 
For status updates
 
* '''[[Cartik's notes on the Phenoscape Data Loader]]'''
 
 
For code specific details
 
* '''[[OBD API Documentation]]'''
 
** Specific details of OBD related classes and interfaces as documented by Cartik Kothari. These will be updated very often and are meant to be used as an addendum to the [http://oboedit.org/?page=javadocs The OBOEdit Javadoc]
 
 
==Web services==
 
Each service may support multiple media types.  The desired media type can be specified by appending <code>?media=json</code> or similar to the request URL.  URI specifications are defined (loosely) using [http://bitworking.org/projects/URI-Templates/draft-gregorio-uritemplate-00.html URI Templates].
 
===Term info===
 
'''URI'''
 
 
<BASE URI>/term/{term_id}
 
 
'''Returns'''
 
 
JSON:
 
<javascript>
 
{
 
    "id" : "TAO:0001700",
 
    "name" : "caudal-fin stay",
 
    "definition" : "Bone that is located anterior to the caudal procurrent rays. Caudal fin stays are unpaired bone.",
 
    "parents" :
 
    [
 
        {
 
            "relation" : {
 
                "id" : "OBO_REL:is_a",
 
                "name" : "is_a"
 
            },
 
            "target" : {
 
                "id" : "TAO:0001514",
 
                "name" : "bone"
 
            }
 
 
        },
 
        {
 
            "relation" : {
 
                "id" : "OBO_REL:part_of",
 
                "name" : "part_of"
 
            },
 
            "target" : {
 
                "id" : "TAO:0000862",
 
                "name" : "caudal fin skeleton"
 
            }
 
        }
 
    ],
 
    "children" : [] // if there are children, this content should be in the same format as the parents list
 
}
 
// how should xrefs, etc. be represented, property_value definitions?
 
</javascript>
 
 
OWL-RDF:
 
 
Todo...
 
 
'''Error'''
 
 
If there is no term with the given ID, the service should return "404 Not Found".
 
 
====Handling of anonymous post-compositions====
 
 
===Autocomplete===
 
'''URI'''
 
 
<BASE URI>/term/search?text=[input]&name=[true|false]&syn=[true|false]&def=[true|false]&ontology=[ont1,ont2,...]
 
 
All URI parameters are optional except for <code>text</code>.  Default values are name=true, syn=false, def=false.  The "ontology" parameter should be a comma-separated list of ontology prefixes to search within.  If not given, the default is to search all ontologies. Specifying "ZFIN" for the ontology should be a search for gene nodes, by gene name.
 
 
'''Returns'''
 
 
JSON:
 
<javascript>
 
{
 
    "matches" : [
 
        {  // overall format
 
            "id" : "TAO:0001514",
 
            "name" : "bone",
 
            "match_type" : "name" | "syn" | "def",
 
            "match_text" : "this is the term name, synonym name, or definition that matched"
 
        },
 
        {  // a name example
 
            "id" : "TAO:0001514",
 
            "name" : "bone",
 
            "match_type" : "name",
 
            "match_text" : "bone"
 
        },
 
        {  // a synonym example
 
            "id" : "TAO:0001795",
 
            "name" : "ceratohyal foramen",
 
            "match_type" : "syn",
 
            "match_text" : "bericiform foramen"
 
        },
 
        {  // a definition example
 
            "id" : "TAO:0000488",
 
            "name" : "ceratobranchial bone",
 
            "match_type" : "def",
 
            "match_text" : "Ceratobranchials are bilaterally paired cartilage bones that form part of the ventral branchial arches. They articulate medially with the hypobranchials and laterally and dorsally with the epibranchials.  Ceratobranchials 1-5 ossify in the ceratobranchial cartilages."
 
        }
 
    ]
 
}
 
</javascript>
 
 
'''Error'''
 
 
If there are no terms matching the given input, a document should still be returned, containing an empty results list.
 
 
===Anatomy search summary===
 
'''URI'''
 
 
<BASE URI>/phenotypes/summary/anatomy/{term_id}
 
 
<code>term_id</code> is an anatomy search term.  This service returns a summary of all the phenotype annotations involving that anatomy term (or its descendants, via reasoning).  The summary is grouped by quality attribute term - all annotations with qualities descending from the same closest attribute should be grouped into the same count.
 
 
'''Returns'''
 
 
JSON:
 
<javascript>
 
{
 
    "term" : { "id" : "TAO:xxxxx", "name" : "some anatomical part"},
 
    "qualities" : [
 
        {
 
            "id" : "PATO:000xxxxx",
 
            "name" : "shape",
 
            "taxon_annotations" : {
 
                "annotation_count": 5,
 
                "taxon_count" : 3
 
            },
 
            "genotype_annotations" : {
 
                "annotation_count" : 3,
 
                "genotype_count" : 2
 
            }
 
        },
 
        {
 
            // another quality attribute
 
        } // etc.
 
    ]
 
}
 
</javascript>
 
 
===Anatomy taxon annotations results===
 
'''URI'''
 
 
<BASE URI>/phenotypes/anatomy/{anatomy_term_id}/taxa/{quality_term_id}?attribute=[true|false]
 
 
<code>anatomy_term_id</code> is an anatomy search term.  This service returns all of the phenotype annotations involving that anatomy term (or its descendants, via reasoning) and the given quality term (or its descendants, via reasoning).  In the result, the values of the outer "entity" and "quality" keys are the search inputs.
 
 
The "attribute" query parameter is optional - if "true", the service should first find the corresponding attribute for the given quality, and use that in the search instead of the given quality.  The default should be "false".
 
 
'''Returns'''
 
 
JSON:
 
<javascript>
 
{
 
    "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
    "quality" : { "id" : "PATO:1234", "name" : "some attribute quality"},
 
    "annotations" : [
 
    {
 
        "taxon" : { "id" : "TTO:34242", "name" : "some species" },
 
        "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
        "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
    },
 
    {
 
        "taxon" : { "id" : "TTO:34242", "name" : "some species" },
 
        "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
        "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
    },
 
    {
 
        "taxon" : { "id" : "TTO:34242", "name" : "some species" },
 
        "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
        "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
    }
 
    ]
 
}
 
</javascript>
 
 
===Anatomy genotypes annotations results===
 
'''URI'''
 
 
<BASE URI>/phenotypes/anatomy/{anatomy_term_id}/genes/{quality_term_id}
 
 
<code>anatomy_term_id</code> is an anatomy search term.  This service returns all of the phenotype annotations for genotypes involving that anatomy term (or its descendants, via reasoning) and the given quality term (or its descendants, via reasoning).  In the result, the values of the outer "entity" and "quality" keys are the search inputs.
 
 
The "attribute" query parameter is optional - if "true", the service should first find the corresponding attribute for the given quality, and use that in the search instead of the given quality.  The default should be "false".
 
 
'''Returns'''
 
 
JSON:
 
<javascript>
 
{
 
    "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
    "quality" : { "id" : "PATO:1234", "name" : "some attribute quality"},
 
    "annotations" : [
 
          {
 
            "genotype" : { "id" : "ZFIN:34242", "name" : "some genotype" },
 
            "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
            "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
        },
 
        {
 
            "genotype" : { "id" : "ZFIN:34242", "name" : "some genotype" },
 
            "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
            "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
        },
 
        {
 
            "genotype" : { "id" : "ZFIN:34242", "name" : "some genotype" },
 
            "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
            "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
        } 
 
    ]
 
}
 
</javascript>
 
 
===Gene search summary===
 
'''URI'''
 
 
<BASE URI>/phenotypes/summary/gene/{term_id}
 
 
<code>term_id</code> is an genotype search id.  This service returns all the phenotype annotations involving that genotype.
 
 
'''Returns'''
 
 
JSON:
 
<javascript>
 
{
 
    "term" : { "id" : "ZFIN:xxxxx", "name" : "some genotype name"},
 
    "annotations" : [
 
        {
 
            "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
            "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
        },
 
        {
 
            "entity" : { "id" : "TAO:34242", "name" : "some anatomical part" },
 
            "quality" : { "id" : "PATO:34242", "name" : "some quality" }
 
        }
 
    ]
 
}
 
</javascript>
 
  
 
[[Category:Informatics]]
 
[[Category:Informatics]]

Latest revision as of 00:44, 9 January 2009