Difference between revisions of "Queries"

From phenoscape
(Anatomical Entity Services)
(Data Services)
 
(96 intermediate revisions by the same user not shown)
Line 1: Line 1:
This section describes the queries that have been (or are to be) implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database on Darwin.
+
This section describes the queries that have been implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database.
 
 
==Status (Jan 20, 09)==
 
The first iteration of the Web Services module for the Phenoscape project (the '''SICB prototype''') was demonstrated at the SICB meeting in Boston, MA in January 2009. This module allowed database searches for Anatomical Entities (Anatomical Entity Services) and Genes (Gene Services). Searches for Taxa (Taxon Services) are to be implemented in the next iteration which will be a part of the next Phenoscape version to be demonstrated at the ASIH meeting in Portland, OR (the '''ASIH prototype''') in July, 2009.
 
 
 
Testing by the Phenoscape project stakeholders (Paula, Todd, and Monte) at the SICB meeting revealed that Anatomy and Gene Services were functional, but their execution was very slow in terms of time. As a result, the data retrieval strategy used in the SICB prototype is being examined for bottlenecks and these details are presented here.
 
  
 
===Summary===
 
===Summary===
Queries are assembled in a Java program and dispatched through a connection to the database, and executed at the database end. For brevity's sake, the Java program is called the client side and the database side is called the backend henceforth. The query modules on the client-side interface with the database in the backend to execute the queries. The data retrieved by these query executions are then processed at the client-side. There are two possible bottlenecks in this scheme: one at the client-side and the other at the backend.
 
 
The backend bottleneck is the more likely of the two. This is because the query has to be transmitted through the connection from the client side to the backend, then executed at the backend (a process in itself which is not discussed here), and the retrieved results sent back over the connection to the client side. All this takes time, which eventually adds up. As a case in point, the query execution strategy implemented for the SICB prototype spawns a multitude of queries. The execution of each of these queries takes up time to connect, retrieve the results, and transfer them back to the client side. Therefore, a new strategy that tries to obtain all the required data in one query (or a very limited number of queries) is being tested as of now. Details of both the old and new strategies can be found in the linked pages
 
  
To test the efficiency of the new queries, more methods need to be added to the OBD Shard libraries, the projects have to be compiled and linked prior to testing. This is to be done over the next two weeks from now (Jan 20, 09). The details of these strategies can also be found here.
+
In the Phenoscape application, queries are assembled in a Java program and dispatched through a connection to the database, and executed at the database end. For brevity's sake, the Java program is called the client side and the database side is called the backend henceforth. The database has been implemented using the [http://www.postgresql.org/ PostgreSQL] Relational Database Management System (DBMS).
  
==Database details==
+
Query execution in PostgreSQL occurs in four sequential steps. In the first step, the query is transferred from the client side over the network to the database. In the second step, the query is parsed and an execution plan is drawn up by the PostgreSQL DBMS to retrieve the data as efficiently as possible in terms of time and memory utilization. In the third step, the DBMS executes the query as per the drawn up execution strategy and retrieves the results. In the last step, the retrieved results  are sent back over the connection to the client side.
* Last updated: Jan 02, 2009
 
* Size: ~ 600 MB
 
  
==Anatomical Entity Services==
+
==Relations of interest==
  
Queries on anatomical entities retrieve information on the qualities that inhere in them, the taxa that exhibit these entity-quality (or more correctly, character-state) combinations. Querying strategies to retrieve this information from the OBD database leverage a number of relation instances which are stored in the OBD database. These are detailed below
+
The relations described in this section are of use in finding information about phenotypes, and are therefore leveraged in the implementation of the phenotype summary and details modules of the Phenoscape application.
  
===Relations of interest===
+
Post compositions of Entities and Qualities are used to relate taxa (and genes) and phenotypes through the ''exhibits'' relation as shown in (1) and (2).
 
 
Post compositions of Entities and Qualities are used to relate taxa and phenotypes through the ''exhibits'' relation as shown in (1).
 
 
<javascript>
 
<javascript>
 
Taxon                              exhibits                inheres_in(Quality, Entity)                                      -- (1)
 
Taxon                              exhibits                inheres_in(Quality, Entity)                                      -- (1)
 +
Gene                              exhibits                inheres_in(Quality, Entity)                                      -- (2)
 
</javascript>
 
</javascript>
In addition, the OBD database also contains information relating post composed phenotypes to both the Quality and the Entity by different relations as shown in (2) and (3) respectively
+
In addition, the OBD database also contains information relating post composed phenotypes to both the Quality and the Entity by different relations as shown in (3) and (4) respectively
 
 
 
<javascript>
 
<javascript>
inheres_in(Quality, Entity)        is_a                    Quality                                                          -- (2)
+
inheres_in(Quality, Entity)        is_a                    Quality                                                          -- (3)
inheres_in(Quality, Entity)        inheres_in              Entity                                                            -- (3)
+
inheres_in(Quality, Entity)        inheres_in              Entity                                                            -- (4)
 
</javascript>
 
</javascript>
Quality can be either a Value or an Attribute (beside other slims) and is related to these by the ''in_subset_of'' relation as shown in (4)
+
Quality is related to Character by the ''value_for'' relation as shown in (5)
 
<javascript>
 
<javascript>
Quality                            in_subset_of            Slim                                                              -- (4)
+
Quality                            value_for                Character                                                        -- (5)
 
</javascript>
 
</javascript>
  
Qualities and Anatomical entities are subclassed in the PATO and TAO hierarchies respectively as shown in (5) and (6)
+
Phenotypes can also be traced back to the publications and datasets they are extracted from as explained below. Phenotype data summaries and details retrieved by the services modules of Phenoscape are filtered by publications as well.
 +
 
 +
Every dataset is associated with a publication as shown in (6). The list of link statements posited by a dataset can be retrieved by traversing the relation shown in (7)
 
<javascript>
 
<javascript>
Value                              is_a                    Attribute                                                          -- (5)
+
DataSet                            has_publication        Publication                                                        -- (6)
Sub Anatomical Feature            is_a                    Anatomical Feature                                                -- (6)
+
LinkStatement                      posited_by              Dataset                                                            -- (7)
 
</javascript>
 
</javascript>
  
  
 
* The details of these queries can be found [[Queries for Phenoscape UI demo'ed at SICB, Boston in Jan 2009|here]]
 
 
* The details of these queries can be found [[Queries to be implemented in the future|here]]
 
 
==Gene Services==
 
The querying strategy for the Gene Services module of SICB prototype is identical to the strategy for the Anatomy Services module. This strategy also involves the spawning of multiple queries, which add to the backend bottleneck. The only difference in this case is this strategy leverages the relationships between genes and genotypes and then, the genotypes and phenotypes (as shown in (1) and (2) below to retrieve the desired information.
 
 
<javascript>
 
Gene                                  has_allele              Genotype                                                          -- (1)
 
Genotype                              exhibits                inheres_in(Quality, Entity)                                      -- (2)
 
</javascript>
 
 
==Taxon Services==
 
These will be implemented for the first time in the ASIH prototype
 
  
 
[[Category:Informatics]]
 
[[Category:Informatics]]
 
[[Category:Database]]
 
[[Category:Database]]
 +
[[Category:Queries]]

Latest revision as of 18:07, 21 August 2009

This section describes the queries that have been implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database.

Summary

In the Phenoscape application, queries are assembled in a Java program and dispatched through a connection to the database, and executed at the database end. For brevity's sake, the Java program is called the client side and the database side is called the backend henceforth. The database has been implemented using the PostgreSQL Relational Database Management System (DBMS).

Query execution in PostgreSQL occurs in four sequential steps. In the first step, the query is transferred from the client side over the network to the database. In the second step, the query is parsed and an execution plan is drawn up by the PostgreSQL DBMS to retrieve the data as efficiently as possible in terms of time and memory utilization. In the third step, the DBMS executes the query as per the drawn up execution strategy and retrieves the results. In the last step, the retrieved results are sent back over the connection to the client side.

Relations of interest

The relations described in this section are of use in finding information about phenotypes, and are therefore leveraged in the implementation of the phenotype summary and details modules of the Phenoscape application.

Post compositions of Entities and Qualities are used to relate taxa (and genes) and phenotypes through the exhibits relation as shown in (1) and (2). <javascript> Taxon exhibits inheres_in(Quality, Entity) -- (1) Gene exhibits inheres_in(Quality, Entity) -- (2) </javascript> In addition, the OBD database also contains information relating post composed phenotypes to both the Quality and the Entity by different relations as shown in (3) and (4) respectively <javascript> inheres_in(Quality, Entity) is_a Quality -- (3) inheres_in(Quality, Entity) inheres_in Entity -- (4) </javascript> Quality is related to Character by the value_for relation as shown in (5) <javascript> Quality value_for Character -- (5) </javascript>

Phenotypes can also be traced back to the publications and datasets they are extracted from as explained below. Phenotype data summaries and details retrieved by the services modules of Phenoscape are filtered by publications as well.

Every dataset is associated with a publication as shown in (6). The list of link statements posited by a dataset can be retrieved by traversing the relation shown in (7) <javascript> DataSet has_publication Publication -- (6) LinkStatement posited_by Dataset -- (7) </javascript>