Difference between revisions of "Queries and Query Execution Plans"

From phenoscape
(Executive Summary)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This section describes the queries that have been (or are to be) implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database on Darwin.
 
This section describes the queries that have been (or are to be) implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database on Darwin.
  
==Executive Summary==
+
==Status (Jan 20, 09)==
 
The first iteration of the Web Services module for the Phenoscape project (the '''SICB prototype''') was demonstrated at the SICB meeting in Boston, MA in January 2009. This module allowed database searches for Anatomical Entities (Anatomical Entity Services) and Genes (Gene Services). Searches for Taxa (Taxon Services) are to be implemented in the next iteration which will be a part of the next Phenoscape version to be demonstrated at the ASIH meeting in Portland, OR (the '''ASIH prototype''') in July, 2009.
 
The first iteration of the Web Services module for the Phenoscape project (the '''SICB prototype''') was demonstrated at the SICB meeting in Boston, MA in January 2009. This module allowed database searches for Anatomical Entities (Anatomical Entity Services) and Genes (Gene Services). Searches for Taxa (Taxon Services) are to be implemented in the next iteration which will be a part of the next Phenoscape version to be demonstrated at the ASIH meeting in Portland, OR (the '''ASIH prototype''') in July, 2009.
  
Testing by the Phenoscape project stakeholders (Paula, Todd, and Monte) at the SICB meeting revealed that Anatomy and Gene Services were functional, but their execution was very slow in terms of time. As a result, the data retrieval strategy used in the SICB prototype is being examined for bottlenecks and these details are presented here. New data retrieval strategies are being tested for use in the ASIH prototype. The details of these strategies are documented here as well. A few high-level questions
+
Testing by the Phenoscape project stakeholders (Paula, Todd, and Monte) at the SICB meeting revealed that Anatomy and Gene Services were functional, but their execution was very slow in terms of time. As a result, the data retrieval strategy used in the SICB prototype is being examined for bottlenecks and these details are presented here.
  
# What are the possible bottlenecks responsible for the poor execution times of the queries that process the web service inputs?
+
===Summary===
* The query modules on the client-side interface with the database in the backend to execute the queries. The data retrieved by these query executions are then processed at the client-side. There are two possible bottlenecks in this scheme: one at the client-side and the other more likely, at the backend. The backend bottleneck is more likely because of connection times, data transfer times, query execution times, and other related lag times that occur when the client-side invokes the database backend. Therefore, minimizing the backend interface requirements will reduce the execution time substantially.
+
The query modules on the client-side interface with the database in the backend to execute the queries. The data retrieved by these query executions are then processed at the client-side. There are two possible bottlenecks in this scheme: one at the client-side and the other more likely, at the backend. The backend bottleneck is more likely because of connection times, data transfer times, query execution times, and other related lag times that occur when the client-side invokes the database backend. Therefore, minimizing the backend interface requirements will reduce the execution time substantially.
# What has been found so far?
+
 
* The query execution strategy implemented for the SICB prototype spawns a multitude of queries, and the execution of each of these takes up time to connect, retrieve the results, and transfer them back to the client side. Therefore, a new strategy that tries to obtain all the required data in one query (or a very limited number of queries) is being tested as of now. Details of both these strategies can be found in the linked pages
+
The query execution strategy implemented for the SICB prototype spawns a multitude of queries, and the execution of each of these takes up time to connect, retrieve the results, and transfer them back to the client side. Therefore, a new strategy that tries to obtain all the required data in one query (or a very limited number of queries) is being tested as of now. Details of both these strategies can be found in the linked pages
# Is there a timeline available for this work?
+
 
* A rudimentary timeline has been put up on another location on this Wiki. This is expected to take at least one more week from now (Jan 20)
+
To test the efficiency of the new queries, more methods need to be added to the OBD Shard libraries, the projects have to be compiled and linked prior to  testing. This is to be done over the next two weeks from now (Jan 20, 09). The details of these strategies can also be found here.
  
 
==Database details==
 
==Database details==

Latest revision as of 20:30, 21 January 2009

This section describes the queries that have been (or are to be) implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database on Darwin.

Status (Jan 20, 09)

The first iteration of the Web Services module for the Phenoscape project (the SICB prototype) was demonstrated at the SICB meeting in Boston, MA in January 2009. This module allowed database searches for Anatomical Entities (Anatomical Entity Services) and Genes (Gene Services). Searches for Taxa (Taxon Services) are to be implemented in the next iteration which will be a part of the next Phenoscape version to be demonstrated at the ASIH meeting in Portland, OR (the ASIH prototype) in July, 2009.

Testing by the Phenoscape project stakeholders (Paula, Todd, and Monte) at the SICB meeting revealed that Anatomy and Gene Services were functional, but their execution was very slow in terms of time. As a result, the data retrieval strategy used in the SICB prototype is being examined for bottlenecks and these details are presented here.

Summary

The query modules on the client-side interface with the database in the backend to execute the queries. The data retrieved by these query executions are then processed at the client-side. There are two possible bottlenecks in this scheme: one at the client-side and the other more likely, at the backend. The backend bottleneck is more likely because of connection times, data transfer times, query execution times, and other related lag times that occur when the client-side invokes the database backend. Therefore, minimizing the backend interface requirements will reduce the execution time substantially.

The query execution strategy implemented for the SICB prototype spawns a multitude of queries, and the execution of each of these takes up time to connect, retrieve the results, and transfer them back to the client side. Therefore, a new strategy that tries to obtain all the required data in one query (or a very limited number of queries) is being tested as of now. Details of both these strategies can be found in the linked pages

To test the efficiency of the new queries, more methods need to be added to the OBD Shard libraries, the projects have to be compiled and linked prior to testing. This is to be done over the next two weeks from now (Jan 20, 09). The details of these strategies can also be found here.

Database details

  • Database name: obdphenoscape
  • Database server: Darwin
  • Last updated: Jan 02, 2009
  • Size: ~ 600 MB

Anatomical Entity Services

Queries for Phenoscape UI demo'ed at SICB, Boston in Jan 2009

Queries to be implemented in the future

Gene Services

Taxon Services