Difference between revisions of "Queries and Query Execution Plans"

From phenoscape
(Execution details)
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This section describes the queries that have been (or are to be) implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database on Darwin.
 
This section describes the queries that have been (or are to be) implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database on Darwin.
 +
 +
==Status (Jan 20, 09)==
 +
The first iteration of the Web Services module for the Phenoscape project (the '''SICB prototype''') was demonstrated at the SICB meeting in Boston, MA in January 2009. This module allowed database searches for Anatomical Entities (Anatomical Entity Services) and Genes (Gene Services). Searches for Taxa (Taxon Services) are to be implemented in the next iteration which will be a part of the next Phenoscape version to be demonstrated at the ASIH meeting in Portland, OR (the '''ASIH prototype''') in July, 2009.
 +
 +
Testing by the Phenoscape project stakeholders (Paula, Todd, and Monte) at the SICB meeting revealed that Anatomy and Gene Services were functional, but their execution was very slow in terms of time. As a result, the data retrieval strategy used in the SICB prototype is being examined for bottlenecks and these details are presented here.
 +
 +
===Summary===
 +
The query modules on the client-side interface with the database in the backend to execute the queries. The data retrieved by these query executions are then processed at the client-side. There are two possible bottlenecks in this scheme: one at the client-side and the other more likely, at the backend. The backend bottleneck is more likely because of connection times, data transfer times, query execution times, and other related lag times that occur when the client-side invokes the database backend. Therefore, minimizing the backend interface requirements will reduce the execution time substantially.
 +
 +
The query execution strategy implemented for the SICB prototype spawns a multitude of queries, and the execution of each of these takes up time to connect, retrieve the results, and transfer them back to the client side. Therefore, a new strategy that tries to obtain all the required data in one query (or a very limited number of queries) is being tested as of now. Details of both these strategies can be found in the linked pages
 +
 +
To test the efficiency of the new queries, more methods need to be added to the OBD Shard libraries, the projects have to be compiled and linked prior to  testing. This is to be done over the next two weeks from now (Jan 20, 09). The details of these strategies can also be found here.
  
 
==Database details==
 
==Database details==
Line 5: Line 17:
 
* Database server: Darwin
 
* Database server: Darwin
 
* Last updated: Jan 02, 2009
 
* Last updated: Jan 02, 2009
* Size: Approx. 600 MB
+
* Size: ~ 600 MB
  
 
==Anatomical Entity Services==
 
==Anatomical Entity Services==
  
===Using LIKE Wildcard Operator===
+
===[[Queries for Phenoscape UI demo'ed at SICB, Boston in Jan 2009]]===
This query will return all the PHENOTYPES that inhere in a ANATOMICAL ENTITY* and the subclasses of the ANATOMICAL ENTITY*. In addition, all the QUALITIES that are related to each PHENOTYPE are returned. The SUBSET SLIMS of each QUALITY are returned. Finally, all the TAXA that exhibit the PHENOTYPES are returned.
 
 
 
*Search Term (The search term in this example is TAO:0000108, which is, you guessed it, a fin!)
 
 
 
====Query====
 
<javascript>
 
SELECT DISTINCT
 
taxon_node.uid AS taxon,
 
exhibits_pred_node.uid AS exhibits,
 
phenotype_node.uid AS phenotype,
 
inheres_pred_node.uid AS inheres,
 
anatomy_node.uid AS anatomy,
 
is_a_pred_node.uid AS isA,
 
quality_node.uid AS quality,
 
subset_pred_node.uid AS subset,
 
slim_node.uid AS slim
 
FROM
 
link AS inheres_link, link AS is_a_link, link AS subset_link,
 
link AS exhibits_link,
 
node AS phenotype_node, node AS anatomy_node, node AS inheres_pred_node,
 
node AS is_a_pred_node, node AS quality_node,
 
node AS slim_node, node AS subset_pred_node,
 
node AS taxon_node, node AS exhibits_pred_node
 
WHERE
 
exhibits_pred_node.uid LIKE '%exhibits%' AND
 
subset_pred_node.uid LIKE '%inSubset%' AND
 
is_a_pred_node.uid LIKE '%is_a%' AND
 
inheres_pred_node.uid LIKE '%inheres_in' AND
 
anatomy_node.uid LIKE '%TAO:0000108%' AND
 
exhibits_link.node_id = taxon_node.node_id AND
 
exhibits_link.predicate_id = exhibits_pred_node.node_id AND
 
exhibits_link.object_id = phenotype_node.node_id AND
 
subset_link.node_id = quality_node.node_id AND
 
subset_link.predicate_id = subset_pred_node.node_id AND
 
subset_link.object_id = slim_node.node_id AND
 
is_a_link.node_id = phenotype_node.node_id AND
 
is_a_link.predicate_id = is_a_pred_node.node_id AND
 
is_a_link.object_id = quality_node.node_id AND
 
inheres_link.node_id = phenotype_node.node_id AND
 
inheres_link.predicate_id = inheres_pred_node.node_id AND
 
inheres_link.object_id = anatomy_node.node_id;
 
</javascript>
 
 
 
====Query Execution Plan====
 
 
 
<javascript>
 
"Unique  (cost=30576.32..30576.35 rows=1 width=243) (actual time=746.269..754.034 rows=2495 loops=1)"
 
"  ->  Sort  (cost=30576.32..30576.33 rows=1 width=243) (actual time=746.265..747.451 rows=4730 loops=1)"
 
"        Sort Key: taxon_node.uid, exhibits_pred_node.uid, phenotype_node.uid, inheres_pred_node.uid, anatomy_node.uid, is_a_pred_node.uid, quality_node.uid, subset_pred_node.uid, slim_node.uid"
 
"        ->  Nested Loop  (cost=234.53..30576.31 rows=1 width=243) (actual time=100.128..343.130 rows=4730 loops=1)"
 
"              ->  Nested Loop  (cost=234.53..30554.96 rows=7 width=220) (actual time=76.510..309.103 rows=5084 loops=1)"
 
"                    ->  Nested Loop  (cost=234.53..30533.63 rows=7 width=197) (actual time=76.502..278.077 rows=5084 loops=1)"
 
"                          ->  Nested Loop  (cost=234.53..30512.30 rows=7 width=178) (actual time=76.494..248.111 rows=5084 loops=1)"
 
"                                ->  Nested Loop  (cost=234.53..27963.03 rows=1 width=182) (actual time=76.482..236.770 rows=255 loops=1)"
 
"                                      ->  Nested Loop  (cost=234.53..27822.74 rows=46 width=159) (actual time=76.469..231.634 rows=898 loops=1)"
 
"                                            ->  Nested Loop  (cost=234.53..27682.57 rows=46 width=128) (actual time=76.453..226.351 rows=898 loops=1)"
 
"                                                  ->  Nested Loop  (cost=234.53..23794.21 rows=1275 width=105) (actual time=76.129..207.649 rows=3280 loops=1)"
 
"                                                        ->  Nested Loop  (cost=234.53..19909.03 rows=1275 width=82) (actual time=76.116..188.042 rows=3280 loops=1)"
 
"                                                              ->  Nested Loop  (cost=234.53..15182.73 rows=45 width=70) (actual time=76.106..180.003 rows=411 loops=1)"
 
"                                                                    ->  Nested Loop  (cost=234.53..15077.70 rows=1 width=58) (actual time=76.093..179.138 rows=17 loops=1)"
 
"                                                                          ->  Nested Loop  (cost=234.53..15056.36 rows=7 width=35) (actual time=58.846..141.426 rows=6904 loops=1)"
 
"                                                                                ->  Seq Scan on node inheres_pred_node  (cost=0.00..4026.71 rows=1 width=31) (actual time=57.190..128.983 rows=1 loops=1)"
 
"                                                                                      Filter: ((uid)::text ~~ '%inheres_in'::text)"
 
"                                                                                ->  Bitmap Heap Scan on link inheres_link  (cost=234.53..10563.47 rows=37294 width=12) (actual time=1.642..7.526 rows=6904 loops=1)"
 
"                                                                                      Recheck Cond: (inheres_link.predicate_id = "outer".node_id)"
 
"                                                                                      ->  Bitmap Index Scan on link_predicate_object_indx  (cost=0.00..234.53 rows=37294 width=0) (actual time=1.489..1.489 rows=6904 loops=1)"
 
"                                                                                            Index Cond: (inheres_link.predicate_id = "outer".node_id)"
 
"                                                                          ->  Index Scan using node_pkey on node anatomy_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=0 loops=6904)"
 
"                                                                                Index Cond: ("outer".object_id = anatomy_node.node_id)"
 
"                                                                                Filter: ((uid)::text ~~ '%TAO:0000108%'::text)"
 
"                                                                    ->  Index Scan using link_node_indx on link is_a_link  (cost=0.00..104.68 rows=28 width=12) (actual time=0.006..0.032 rows=24 loops=17)"
 
"                                                                          Index Cond: ("outer".node_id = is_a_link.node_id)"
 
"                                                              ->  Index Scan using link_node_indx on link subset_link  (cost=0.00..104.68 rows=28 width=12) (actual time=0.005..0.012 rows=8 loops=411)"
 
"                                                                    Index Cond: ("outer".object_id = subset_link.node_id)"
 
"                                                        ->  Index Scan using node_pkey on node slim_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=3280)"
 
"                                                              Index Cond: ("outer".object_id = slim_node.node_id)"
 
"                                                  ->  Index Scan using node_pkey on node is_a_pred_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=0 loops=3280)"
 
"                                                        Index Cond: ("outer".predicate_id = is_a_pred_node.node_id)"
 
"                                                        Filter: ((uid)::text ~~ '%is_a%'::text)"
 
"                                            ->  Index Scan using node_pkey on node phenotype_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=898)"
 
"                                                  Index Cond: ("outer".node_id = phenotype_node.node_id)"
 
"                                      ->  Index Scan using node_pkey on node subset_pred_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=0 loops=898)"
 
"                                            Index Cond: ("outer".predicate_id = subset_pred_node.node_id)"
 
"                                            Filter: ((uid)::text ~~ '%inSubset%'::text)"
 
"                                ->  Index Scan using link_object_indx on link exhibits_link  (cost=0.00..2537.96 rows=905 width=12) (actual time=0.005..0.027 rows=20 loops=255)"
 
"                                      Index Cond: (exhibits_link.object_id = "outer".node_id)"
 
"                          ->  Index Scan using node_pkey on node quality_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=5084)"
 
"                                Index Cond: ("outer".node_id = quality_node.node_id)"
 
"                    ->  Index Scan using node_pkey on node taxon_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.005 rows=1 loops=5084)"
 
"                          Index Cond: ("outer".node_id = taxon_node.node_id)"
 
"              ->  Index Scan using node_pkey on node exhibits_pred_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=1 loops=5084)"
 
"                    Index Cond: ("outer".predicate_id = exhibits_pred_node.node_id)"
 
"                    Filter: ((uid)::text ~~ '%exhibits%'::text)"
 
</javascript>
 
 
 
====Execution Details====
 
* Rows returned:
 
* Time for first run: 349 seconds
 
* Time for subsequent run: 21 seconds
 
 
 
===Using LIKE Wildcard operator and a restriction on Slims returned===
 
This query will return all the PHENOTYPES that inhere in a ANATOMICAL ENTITY and the subclasses of the ANATOMICAL ENTITY. In addition, all the QUALITIES that are related to each PHENOTYPE are returned. Only those QUALITIES which are either ATTRIBUTE_SLIMS or VALUE_SLIMS are returned. Finally, all the TAXA that exhibit the PHENOTYPES are returned.
 
 
 
====Query====
 
<javascript>
 
SELECT DISTINCT
 
taxon_node.uid AS taxon,
 
exhibits_pred_node.uid AS exhibits,
 
phenotype_node.uid AS phenotype,
 
inheres_pred_node.uid AS inheres,
 
anatomy_node.uid AS anatomy,
 
is_a_pred_node.uid AS isA,
 
quality_node.uid AS quality,
 
subset_pred_node.uid AS subset,
 
slim_node.uid AS slim
 
FROM
 
link AS inheres_link, link AS is_a_link, link AS subset_link,
 
link AS exhibits_link,
 
node AS phenotype_node, node AS anatomy_node, node AS inheres_pred_node,
 
node AS is_a_pred_node, node AS quality_node,
 
node AS slim_node, node AS subset_pred_node,
 
node AS taxon_node, node AS exhibits_pred_node
 
WHERE
 
exhibits_pred_node.uid LIKE '%exhibits%' AND
 
subset_pred_node.uid LIKE '%inSubset%' AND
 
is_a_pred_node.uid LIKE '%is_a%' AND
 
inheres_pred_node.uid LIKE '%inheres_in' AND
 
anatomy_node.uid LIKE '%TAO:0000108%' AND
 
slim_node.uid IN ('attribute_slim', 'value_slim') AND
 
exhibits_link.node_id = taxon_node.node_id AND
 
exhibits_link.predicate_id = exhibits_pred_node.node_id AND
 
exhibits_link.object_id = phenotype_node.node_id AND
 
subset_link.node_id = quality_node.node_id AND
 
subset_link.predicate_id = subset_pred_node.node_id AND
 
subset_link.object_id = slim_node.node_id AND
 
is_a_link.node_id = phenotype_node.node_id AND
 
is_a_link.predicate_id = is_a_pred_node.node_id AND
 
is_a_link.object_id = quality_node.node_id AND
 
inheres_link.node_id = phenotype_node.node_id AND
 
inheres_link.predicate_id = inheres_pred_node.node_id AND
 
inheres_link.object_id = anatomy_node.node_id;
 
</javascript>
 
 
 
====Query Execution Plan====
 
<javascript>
 
"Unique  (cost=27033.30..27033.32 rows=1 width=243) (actual time=95708.710..95715.120 rows=2070 loops=1)"
 
"  ->  Sort  (cost=27033.30..27033.30 rows=1 width=243) (actual time=95708.707..95709.717 rows=3887 loops=1)"
 
"        Sort Key: taxon_node.uid, exhibits_pred_node.uid, phenotype_node.uid, inheres_pred_node.uid, anatomy_node.uid, is_a_pred_node.uid, quality_node.uid, subset_pred_node.uid, slim_node.uid"
 
"        ->  Nested Loop  (cost=9416.69..27033.29 rows=1 width=243) (actual time=70172.832..95404.139 rows=3887 loops=1)"
 
"              ->  Nested Loop  (cost=9416.69..27030.24 rows=1 width=220) (actual time=227.929..79578.381 rows=2919406 loops=1)"
 
"                    ->  Nested Loop  (cost=9416.69..27027.19 rows=1 width=197) (actual time=227.919..62509.808 rows=2919406 loops=1)"
 
"                          ->  Nested Loop  (cost=9416.69..27024.14 rows=1 width=174) (actual time=227.903..38352.678 rows=4115922 loops=1)"
 
"                                ->  Nested Loop  (cost=9416.69..27021.09 rows=1 width=151) (actual time=227.885..12271.892 rows=4115922 loops=1)"
 
"                                      ->  Hash Join  (cost=9414.52..26907.04 rows=1 width=163) (actual time=227.752..2896.269 rows=114658 loops=1)"
 
"                                            Hash Cond: ("outer".object_id = "inner".node_id)"
 
"                                            ->  Nested Loop  (cost=4261.24..21750.61 rows=630 width=109) (actual time=86.733..2633.493 rows=252348 loops=1)"
 
"                                                  ->  Nested Loop  (cost=4261.24..19830.87 rows=630 width=78) (actual time=86.711..1176.546 rows=252348 loops=1)"
 
"                                                        ->  Nested Loop  (cost=234.53..15791.56 rows=315 width=47) (actual time=58.083..587.038 rows=252348 loops=1)"
 
"                                                              ->  Nested Loop  (cost=234.53..15056.36 rows=7 width=35) (actual time=58.071..136.526 rows=6904 loops=1)"
 
"                                                                    ->  Seq Scan on node inheres_pred_node  (cost=0.00..4026.71 rows=1 width=31) (actual time=56.524..124.720 rows=1 loops=1)"
 
"                                                                          Filter: ((uid)::text ~~ '%inheres_in'::text)"
 
"                                                                    ->  Bitmap Heap Scan on link inheres_link  (cost=234.53..10563.47 rows=37294 width=12) (actual time=1.531..6.138 rows=6904 loops=1)"
 
"                                                                          Recheck Cond: (inheres_link.predicate_id = "outer".node_id)"
 
"                                                                          ->  Bitmap Index Scan on link_predicate_object_indx  (cost=0.00..234.53 rows=37294 width=0) (actual time=1.379..1.379 rows=6904 loops=1)"
 
"                                                                                Index Cond: (inheres_link.predicate_id = "outer".node_id)"
 
"                                                              ->  Index Scan using link_node_indx on link is_a_link  (cost=0.00..104.68 rows=28 width=12) (actual time=0.005..0.037 rows=37 loops=6904)"
 
"                                                                    Index Cond: ("outer".node_id = is_a_link.node_id)"
 
"                                                        ->  Materialize  (cost=4026.71..4026.73 rows=2 width=31) (actual time=0.000..0.001 rows=1 loops=252348)"
 
"                                                              ->  Seq Scan on node exhibits_pred_node  (cost=0.00..4026.71 rows=2 width=31) (actual time=28.608..132.502 rows=1 loops=1)"
 
"                                                                    Filter: ((uid)::text ~~ '%exhibits%'::text)"
 
"                                                  ->  Index Scan using node_pkey on node phenotype_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=252348)"
 
"                                                        Index Cond: ("outer".node_id = phenotype_node.node_id)"
 
"                                            ->  Hash  (cost=5153.24..5153.24 rows=14 width=66) (actual time=37.026..37.026 rows=4413 loops=1)"
 
"                                                  ->  Nested Loop  (cost=4.01..5153.24 rows=14 width=66) (actual time=0.067..32.679 rows=4413 loops=1)"
 
"                                                        ->  Nested Loop  (cost=4.01..5110.58 rows=14 width=35) (actual time=0.058..6.879 rows=4413 loops=1)"
 
"                                                              ->  Bitmap Heap Scan on node slim_node  (cost=4.01..12.03 rows=2 width=31) (actual time=0.044..0.046 rows=2 loops=1)"
 
"                                                                    Recheck Cond: (((uid)::text = 'attribute_slim'::text) OR ((uid)::text = 'value_slim'::text))"
 
"                                                                    ->  BitmapOr  (cost=4.01..4.01 rows=2 width=0) (actual time=0.040..0.040 rows=0 loops=1)"
 
"                                                                          ->  Bitmap Index Scan on node_uid_key  (cost=0.00..2.00 rows=1 width=0) (actual time=0.026..0.026 rows=1 loops=1)"
 
"                                                                                Index Cond: ((uid)::text = 'attribute_slim'::text)"
 
"                                                                          ->  Bitmap Index Scan on node_uid_key  (cost=0.00..2.00 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1)"
 
"                                                                                Index Cond: ((uid)::text = 'value_slim'::text)"
 
"                                                              ->  Index Scan using link_object_indx on link subset_link  (cost=0.00..2537.96 rows=905 width=12) (actual time=0.013..1.901 rows=2206 loops=2)"
 
"                                                                    Index Cond: (subset_link.object_id = "outer".node_id)"
 
"                                                        ->  Index Scan using node_pkey on node quality_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=4413)"
 
"                                                              Index Cond: ("outer".node_id = quality_node.node_id)"
 
"                                      ->  Bitmap Heap Scan on link exhibits_link  (cost=2.18..113.61 rows=29 width=12) (actual time=0.021..0.052 rows=36 loops=114658)"
 
"                                            Recheck Cond: ((exhibits_link.predicate_id = "outer".node_id) AND (exhibits_link.object_id = "outer".node_id))"
 
"                                            ->  Bitmap Index Scan on link_predicate_object_indx  (cost=0.00..2.18 rows=29 width=0) (actual time=0.016..0.016 rows=36 loops=114658)"
 
"                                                  Index Cond: ((exhibits_link.predicate_id = "outer".node_id) AND (exhibits_link.object_id = "outer".node_id))"
 
"                                ->  Index Scan using node_pkey on node subset_pred_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=1 loops=4115922)"
 
"                                      Index Cond: ("outer".predicate_id = subset_pred_node.node_id)"
 
"                                      Filter: ((uid)::text ~~ '%inSubset%'::text)"
 
"                          ->  Index Scan using node_pkey on node is_a_pred_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.005 rows=1 loops=4115922)"
 
"                                Index Cond: ("outer".predicate_id = is_a_pred_node.node_id)"
 
"                                Filter: ((uid)::text ~~ '%is_a%'::text)"
 
"                    ->  Index Scan using node_pkey on node taxon_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=2919406)"
 
"                          Index Cond: ("outer".node_id = taxon_node.node_id)"
 
"              ->  Index Scan using node_pkey on node anatomy_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=0 loops=2919406)"
 
"                    Index Cond: ("outer".object_id = anatomy_node.node_id)"
 
"                    Filter: ((uid)::text ~~ '%TAO:0000108%'::text)"
 
</javascript>
 
 
 
====Execution Details====
 
* Rows returned: 2070
 
* Time for first run: 116 seconds
 
* Time for subsequent run: 1.13 seconds
 
 
 
===Using EQUALS (=) Operator===
 
The result of this query is the same as for the previous two queries except the wildcard search is replaced by exact match requirements on the search parameters viz. relation and concept names (IDs, to be specific). This enables leverage of the indexes set up for the ''Link'' and ''Node'' tables in the ''OBDPhenoscape'' database
 
 
 
====Query====
 
<javascript>
 
SELECT DISTINCT
 
taxon_node.uid AS taxon,
 
exhibits_pred_node.uid AS exhibits,
 
phenotype_node.uid AS phenotype,
 
inheres_pred_node.uid AS inheres,
 
anatomy_node.uid AS anatomy,
 
is_a_pred_node.uid AS isA,
 
quality_node.uid AS quality,
 
subset_pred_node.uid AS subset,
 
slim_node.uid AS slim
 
FROM
 
link AS inheres_link, link AS is_a_link, link AS subset_link,
 
link AS exhibits_link,
 
node AS phenotype_node, node AS anatomy_node, node AS inheres_pred_node,
 
node AS is_a_pred_node, node AS quality_node,
 
node AS slim_node, node AS subset_pred_node,
 
node AS taxon_node, node AS exhibits_pred_node
 
WHERE
 
exhibits_pred_node.uid = 'PHENOSCAPE:exhibits' AND
 
subset_pred_node.uid = 'oboInOwl:inSubset' AND
 
is_a_pred_node.uid = 'OBO_REL:is_a' AND
 
inheres_pred_node.uid = 'OBO_REL:inheres_in' AND
 
anatomy_node.uid = 'TAO:0000108' AND
 
exhibits_link.node_id = taxon_node.node_id AND
 
exhibits_link.predicate_id = exhibits_pred_node.node_id AND
 
exhibits_link.object_id = phenotype_node.node_id AND
 
subset_link.node_id = quality_node.node_id AND
 
subset_link.predicate_id = subset_pred_node.node_id AND
 
subset_link.object_id = slim_node.node_id AND
 
is_a_link.node_id = phenotype_node.node_id AND
 
is_a_link.predicate_id = is_a_pred_node.node_id AND
 
is_a_link.object_id = quality_node.node_id AND
 
inheres_link.node_id = phenotype_node.node_id AND
 
inheres_link.predicate_id = inheres_pred_node.node_id AND
 
inheres_link.object_id = anatomy_node.node_id;
 
</javascript>
 
 
 
====Query Execution Plan====
 
<javascript>
 
"Unique  (cost=14937.55..14937.58 rows=1 width=243) (actual time=128953.998..128961.769 rows=2495 loops=1)"
 
"  ->  Sort  (cost=14937.55..14937.55 rows=1 width=243) (actual time=128953.994..128955.175 rows=4730 loops=1)"
 
"        Sort Key: taxon_node.uid, exhibits_pred_node.uid, phenotype_node.uid, inheres_pred_node.uid, anatomy_node.uid, is_a_pred_node.uid, quality_node.uid, subset_pred_node.uid, slim_node.uid"
 
"        ->  Nested Loop  (cost=234.53..14937.54 rows=1 width=243) (actual time=71555.798..128542.683 rows=4730 loops=1)"
 
"              ->  Nested Loop  (cost=234.53..14934.49 rows=1 width=220) (actual time=71555.785..128444.355 rows=17935 loops=1)"
 
"                    ->  Nested Loop  (cost=234.53..14931.44 rows=1 width=197) (actual time=127.978..128127.758 rows=61019 loops=1)"
 
"                          ->  Nested Loop  (cost=234.53..14928.39 rows=1 width=182) (actual time=127.967..127766.052 rows=61019 loops=1)"
 
"                                ->  Nested Loop  (cost=234.53..14925.35 rows=1 width=159) (actual time=127.958..127409.550 rows=61019 loops=1)"
 
"                                      ->  Nested Loop  (cost=234.53..14922.30 rows=1 width=136) (actual time=127.951..127049.206 rows=61019 loops=1)"
 
"                                            Join Filter: ("inner".object_id = "outer".node_id)"
 
"                                            ->  Index Scan using node_uid_key on node anatomy_node  (cost=0.00..4.64 rows=1 width=31) (actual time=0.040..0.042 rows=1 loops=1)"
 
"                                                  Index Cond: ((uid)::text = 'TAO:0000108'::text)"
 
"                                            ->  Nested Loop  (cost=234.53..14907.75 rows=793 width=113) (actual time=113.185..107822.482 rows=58674691 loops=1)"
 
"                                                  ->  Nested Loop  (cost=234.53..11966.94 rows=28 width=101) (actual time=113.178..15223.264 rows=1567627 loops=1)"
 
"                                                        ->  Nested Loop  (cost=234.53..11881.62 rows=28 width=82) (actual time=113.163..6085.954 rows=1567627 loops=1)"
 
"                                                              ->  Nested Loop  (cost=234.53..11776.59 rows=1 width=70) (actual time=113.150..2590.279 rows=171514 loops=1)"
 
"                                                                    Join Filter: ("inner".predicate_id = "outer".node_id)"
 
"                                                                    ->  Index Scan using node_uid_key on node is_a_pred_node  (cost=0.00..4.64 rows=1 width=31) (actual time=0.017..0.018 rows=1 loops=1)"
 
"                                                                          Index Cond: ((uid)::text = 'OBO_REL:is_a'::text)"
 
"                                                                    ->  Nested Loop  (cost=234.53..11769.48 rows=198 width=47) (actual time=113.117..2106.357 rows=1216866 loops=1)"
 
"                                                                          ->  Nested Loop  (cost=234.53..11034.28 rows=7 width=35) (actual time=10.717..85.954 rows=52145 loops=1)"
 
"                                                                                ->  Index Scan using node_uid_key on node exhibits_pred_node  (cost=0.00..4.64 rows=1 width=31) (actual time=0.020..0.021 rows=1 loops=1)"
 
"                                                                                      Index Cond: ((uid)::text = 'PHENOSCAPE:exhibits'::text)"
 
"                                                                                ->  Bitmap Heap Scan on link exhibits_link  (cost=234.53..10563.47 rows=37294 width=12) (actual time=10.688..45.307 rows=52145 loops=1)"
 
"                                                                                      Recheck Cond: (exhibits_link.predicate_id = "outer".node_id)"
 
"                                                                                      ->  Bitmap Index Scan on link_predicate_object_indx  (cost=0.00..234.53 rows=37294 width=0) (actual time=10.112..10.112 rows=52145 loops=1)"
 
"                                                                                            Index Cond: (exhibits_link.predicate_id = "outer".node_id)"
 
"                                                                          ->  Index Scan using link_node_indx on link is_a_link  (cost=0.00..104.68 rows=28 width=12) (actual time=0.004..0.022 rows=23 loops=52145)"
 
"                                                                                Index Cond: (is_a_link.node_id = "outer".object_id)"
 
"                                                              ->  Index Scan using link_node_indx on link subset_link  (cost=0.00..104.68 rows=28 width=12) (actual time=0.004..0.012 rows=9 loops=171514)"
 
"                                                                    Index Cond: ("outer".object_id = subset_link.node_id)"
 
"                                                        ->  Index Scan using node_pkey on node quality_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=1567627)"
 
"                                                              Index Cond: ("outer".node_id = quality_node.node_id)"
 
"                                                  ->  Index Scan using link_node_indx on link inheres_link  (cost=0.00..104.68 rows=28 width=12) (actual time=0.004..0.031 rows=37 loops=1567627)"
 
"                                                        Index Cond: (inheres_link.node_id = "outer".node_id)"
 
"                                      ->  Index Scan using node_pkey on node slim_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=61019)"
 
"                                            Index Cond: ("outer".object_id = slim_node.node_id)"
 
"                                ->  Index Scan using node_pkey on node taxon_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=61019)"
 
"                                      Index Cond: ("outer".node_id = taxon_node.node_id)"
 
"                          ->  Index Scan using node_pkey on node phenotype_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=61019)"
 
"                                Index Cond: ("outer".object_id = phenotype_node.node_id)"
 
"                    ->  Index Scan using node_pkey on node inheres_pred_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.004 rows=0 loops=61019)"
 
"                          Index Cond: ("outer".predicate_id = inheres_pred_node.node_id)"
 
"                          Filter: ((uid)::text = 'OBO_REL:inheres_in'::text)"
 
"              ->  Index Scan using node_pkey on node subset_pred_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.004 rows=0 loops=17935)"
 
"                    Index Cond: ("outer".predicate_id = subset_pred_node.node_id)"
 
"                    Filter: ((uid)::text = 'oboInOwl:inSubset'::text)"
 
  
</javascript>
+
===[[Queries to be implemented in the future]]===
  
====Execution details====
+
==Gene Services==
* Rows returned: 2495
 
* Time for first run: 22 seconds
 
* Time for subsequent runs: 5 ~ 15 seconds
 
  
===With EQUALS (=) operator and restriction on SLIMS===
+
==Taxon Services==

Latest revision as of 20:30, 21 January 2009

This section describes the queries that have been (or are to be) implemented for the Phenoscape data services, in addition to the execution details of each queries on the PostgreSQL database on Darwin.

Status (Jan 20, 09)

The first iteration of the Web Services module for the Phenoscape project (the SICB prototype) was demonstrated at the SICB meeting in Boston, MA in January 2009. This module allowed database searches for Anatomical Entities (Anatomical Entity Services) and Genes (Gene Services). Searches for Taxa (Taxon Services) are to be implemented in the next iteration which will be a part of the next Phenoscape version to be demonstrated at the ASIH meeting in Portland, OR (the ASIH prototype) in July, 2009.

Testing by the Phenoscape project stakeholders (Paula, Todd, and Monte) at the SICB meeting revealed that Anatomy and Gene Services were functional, but their execution was very slow in terms of time. As a result, the data retrieval strategy used in the SICB prototype is being examined for bottlenecks and these details are presented here.

Summary

The query modules on the client-side interface with the database in the backend to execute the queries. The data retrieved by these query executions are then processed at the client-side. There are two possible bottlenecks in this scheme: one at the client-side and the other more likely, at the backend. The backend bottleneck is more likely because of connection times, data transfer times, query execution times, and other related lag times that occur when the client-side invokes the database backend. Therefore, minimizing the backend interface requirements will reduce the execution time substantially.

The query execution strategy implemented for the SICB prototype spawns a multitude of queries, and the execution of each of these takes up time to connect, retrieve the results, and transfer them back to the client side. Therefore, a new strategy that tries to obtain all the required data in one query (or a very limited number of queries) is being tested as of now. Details of both these strategies can be found in the linked pages

To test the efficiency of the new queries, more methods need to be added to the OBD Shard libraries, the projects have to be compiled and linked prior to testing. This is to be done over the next two weeks from now (Jan 20, 09). The details of these strategies can also be found here.

Database details

  • Database name: obdphenoscape
  • Database server: Darwin
  • Last updated: Jan 02, 2009
  • Size: ~ 600 MB

Anatomical Entity Services

Queries for Phenoscape UI demo'ed at SICB, Boston in Jan 2009

Queries to be implemented in the future

Gene Services

Taxon Services