Difference between revisions of "Phenoscape data repository"

From phenoscape
(The LINK table)
(The LINK table)
Line 42: Line 42:
 
<javascript>
 
<javascript>
  
link_id | reiflink_node_id | node_id | predicate_id | object_id | when_id | is_metadata | is_inferred | is_instantiation | is_negated | applies_to_all | object_quantifier_some | object_quantifier_only | combinator | source_id | loaded_from_id | is_obsolete
+
link_id | reiflink_node_id | node_id | predicate_id | object_id | when_id | is_metadata | is_inferred | is_instantiation | is_negated | applies_to_all |  
 
---------+------------------+---------+--------------+-----------+---------+-------------+-------------+------------------+------------+----------------+----
 
---------+------------------+---------+--------------+-----------+---------+-------------+-------------+------------------+------------+----------------+----
--------------------+------------------------+------------+-----------+----------------+-------------
+
   23854 |                  |    9637 |          102 |    46050 |        | f          | f          | f                | f          | t               
   23854 |                  |    9637 |          102 |    46050 |        | f          | f          | f                | f          | t              | t 
+
   59897 |                  |  45723 |          102 |    46050 |        | f          | f          | f                | f          | t            
                    | f                      |            |          |                | f
+
   60223 |                  |  46050 |          102 |    46160 |        | f          | f          | f                | f          | t               
   59897 |                  |  45723 |          102 |    46050 |        | f          | f          | f                | f          | t             | t 
+
   501448 |                  |    9932 |          102 |    46050 |        | f          | t          | f                | f          | t               
                    | f                      |            |          |                | f
 
   60223 |                  |  46050 |          102 |    46160 |        | f          | f          | f                | f          | t              | t 
 
                    | f                      |            |          |                | f
 
   501448 |                  |    9932 |          102 |    46050 |        | f          | t          | f                | f          | t              | t 
 
                    | f                      |            |          |                | f
 
  
 +
</javascript>
 +
 +
* The LINK_ID column shows the database generated identifier for the link
 +
* The NODE_ID column shows the Subject of the Statement (in RDF parlance). This ID the is database generated identifier for the concept ''Eigenmanniidae'' (TTO:10000005)
 +
* The PREDICATE_ID column shows the Predicate of the Statement. This ID is the database generated identifier for the relation ''OBO_REL:is_a''.
 +
* The OBJECT_ID column shows the Object of the Statement which is the ID generated by the database for ''Gymnotiformes''
 +
 +
In simple terms, a sub species of Gymnotiformes is displayed by this Statement as shown in the triple below
 +
 +
<javascript>
 +
      ''Eigenmanniidae                                is_a                        Gymnotiformes''
 
</javascript>
 
</javascript>

Revision as of 18:07, 12 March 2009

The Phenoscape data repository is a relational database, which holds phenotypic data from the model organism Danio Rerio (Zebrafish) and the evolutionary organisms belong to the clade of Ostariophysi. This page describes the schema of this data repository and outlines some data transformation techniques used to integrate data captured in different formats at different locations in this repository.

Data Repository

The Phenoscape data repository has been implemented as a PostgreSQL relational database, and at present housed on the development database server at NESCent.

Schema

The schema of the Phenoscape data repository is based on the Open Biomedical Database (OBD) data format developed at the Berkeley Bioinformatics Open-source Projects (BBOP). OBD is based upon the Resource Description Framework (RDF) format for capturing metadata about Web (and Semantic Web) resources such as Web pages and Web services.

The philosophy of OBD is to represent every conceptual entity, be it a type or a token (synonymously a class or an object, or a concept or an instance) as a Node. Binary relations between these nodes are represented as Statements, specifically Link Statements. OBD also allows for reification, which is vital to the life sciences with their emphasis on evidence codes and attributions. For this purpose, OBD provides Literal Statements to capture metadata about Nodes and Link Statements, such as the source publication, evidence codes, specimens used, and so forth.

Two relational tables are central to the schema of the Phenoscape data repository. These are: LINK and NODE. The SQL commands for the creation of these tables (and the others) can be found at this Phenoscape Sourceforge page.

The NODE table

The NODE table contains information about every concept such as its unique identifier, label, and source ontology. The NODE table contains this information about concepts extracted from the source Ontologies. In addition, it also holds information about scientific publications (in a rudimentary format which will be improved soon), the ontologies themselves, representation of phenotypes from the ZFIN and NeXML databases, and will be augmented in the future to hold information about collection specimens. The NODE table adds a unique identifier (generated from a sequence) of its own to every row. A row from the NODE table for the Gymnotiformes term is as shown below

<javascript>

node_id |   uid    |     label     | uri | metatype | is_anonymous | is_transitive | is_obsolete | is_reiflink | is_metadata | source_id | loaded_from

+----------+---------------+-----+----------+--------------+---------------+-------------+-------------+-------------+-----------+----------------

  46050 | TTO:1390 | Gymnotiformes |     | C        | f            | f             | f           | f           | f           |      9630 |     

</javascript>

  • The NODE_ID column holds the unique identifier generated by the Phenoscape database
  • The UID column holds the identifier of this term that is obtained from the Teleost Taxonomy Ontology (TTO). The 'TTO' is the namespace prefix
  • The LABEL column displays the label for this term
  • The METATYPE column shows term is a Class (C). Other metatypes are Relation (R) and Instance (I).
  • The IS_ANONYMOUS column shows this term is not anonymous, that it has a unique identifier of its own (from the source ontology in this example)
  • The IS_TRANSITIVE column is used to capture the transitive nature of binary relations. It is not applicable here for this term
  • The IS_OBSOLETE column tracks obsolete and archaic terms as the ontologies evolve. Here, it says this term is not obsolete
  • The IS_REIFLINK columns is used to identify Statements that capture metadata. In this case, it shows this term is not a reification link
  • The IS_METADATA column is used to identify terms that do not have an intrinsic identity of their own, but are dependent upon other terms
  • The SOURCE_ID column holds the NODE_ID of the ontology from which the term was extracted. In this case, the source ontology is the TTO

The LINK table

The LINK table contains rows which represent Statements which link the Nodes to one another, and also the metadata about these Nodes. The excerpt below shows some of the rows in the LINK table about the Gymnotiformes term

<javascript>

link_id | reiflink_node_id | node_id | predicate_id | object_id | when_id | is_metadata | is_inferred | is_instantiation | is_negated | applies_to_all |


+------------------+---------+--------------+-----------+---------+-------------+-------------+------------------+------------+----------------+----

  23854 |                  |    9637 |          102 |     46050 |         | f           | f           | f                | f          | t              
  59897 |                  |   45723 |          102 |     46050 |         | f           | f           | f                | f          | t             
  60223 |                  |   46050 |          102 |     46160 |         | f           | f           | f                | f          | t              
 501448 |                  |    9932 |          102 |     46050 |         | f           | t           | f                | f          | t              

</javascript>

  • The LINK_ID column shows the database generated identifier for the link
  • The NODE_ID column shows the Subject of the Statement (in RDF parlance). This ID the is database generated identifier for the concept Eigenmanniidae (TTO:10000005)
  • The PREDICATE_ID column shows the Predicate of the Statement. This ID is the database generated identifier for the relation OBO_REL:is_a.
  • The OBJECT_ID column shows the Object of the Statement which is the ID generated by the database for Gymnotiformes

In simple terms, a sub species of Gymnotiformes is displayed by this Statement as shown in the triple below

<javascript>

     Eigenmanniidae                                is_a                        Gymnotiformes

</javascript>