Difference between revisions of "Queries to be implemented in the future"

From phenoscape
(Anatomy data services)
Line 5: Line 5:
 
Given a specific anatomical entity A, the requirements of the anatomy data service module are as follows:
 
Given a specific anatomical entity A, the requirements of the anatomy data service module are as follows:
  
# Identify ALL unique phenotypes {P} that are composed of the entity A or any of its subtypes.  
+
# Identify ALL unique phenotypes {P} that are composed of the entity A or any of its subtypes.
 
# For each identified phenotype P which is of the form inheres_in(Q, E)
 
# For each identified phenotype P which is of the form inheres_in(Q, E)
## Identify the quality Q. In addition, identify the character C which subsumes Q.  
+
## Identify the quality Q.  
## Identify the entity E. E may be the same as A, or any valid subtype of A.  
+
### In addition, identify the character C which subsumes Q. This requires navigating the PATO hierarchy recursively in some cases
## Identify the set of taxa {T} or genotypes {GT} that exhibit the phenotype P. For a genotype GT, identify the gene G that encodes it.  
+
## Identify the entity E. E may be the same as A, or any valid subtype of A.
 +
## Identify the set of taxa {T} or genotypes {GT} that exhibit the phenotype P.  
 +
### For a genotype GT, identify the gene G that encodes it.
  
 
The result should be a list in the format shown below
 
The result should be a list in the format shown below
Line 18: Line 20:
  
  
==Query details==
+
==Strategy #1==
  
These are some queries being tested out to improve the database performance following the SICB meeting in Boston in Jan 2009. The execution methodology of these queries is to retrieve the information of interest, in this case, all the taxon and quality details associated with an anatomical being searched for, in one run
+
This strategy is the [[Strategy_#1:_Simple_table_joins|simplest way]] to try and get all the necessary information related to the anatomical entity being searched for. Below is the actual query which attempts to do this. This query returns the unique phenotypes associated with the anatomical entity, and the qualities and specific anatomical entities associated with the phenotype, as well as the list of taxa that exhibit each of these phenotypes in one fell table join (uh, swoop).  
 
 
The result of this query is the same as for the previous two queries except the wildcard search is replaced by exact match requirements on the search parameters viz. relation and concept names (IDs, to be specific). This enables leverage of the indexes set up for the ''Link'' and ''Node'' tables in the OBD database
 
  
 
===Query===
 
===Query===
Line 62: Line 62:
  
 
===Query Execution Plan===
 
===Query Execution Plan===
<javascript>
+
 
"Unique  (cost=15397.47..15397.50 rows=1 width=243) (actual time=45692.829..45699.320 rows=2070 loops=1)"
 
"  ->  Sort  (cost=15397.47..15397.47 rows=1 width=243) (actual time=45692.825..45693.795 rows=3887 loops=1)"
 
"        Sort Key: taxon_node.uid, exhibits_pred_node.uid, phenotype_node.uid, inheres_pred_node.uid, anatomy_node.uid, is_a_pred_node.uid, quality_node.uid, subset_pred_node.uid, slim_node.uid"
 
"        ->  Nested Loop  (cost=234.53..15397.46 rows=1 width=243) (actual time=33218.390..45415.489 rows=3887 loops=1)"
 
"              ->  Nested Loop  (cost=234.53..15394.41 rows=1 width=228) (actual time=33218.380..45393.860 rows=3887 loops=1)"
 
"                    ->  Nested Loop  (cost=234.53..15391.36 rows=1 width=205) (actual time=33218.359..45367.897 rows=4730 loops=1)"
 
"                          ->  Nested Loop  (cost=234.53..15388.31 rows=1 width=182) (actual time=1.649..29103.510 rows=3310775 loops=1)"
 
"                                ->  Nested Loop  (cost=234.53..15382.42 rows=1 width=190) (actual time=1.634..16423.157 rows=1443372 loops=1)"
 
"                                      Join Filter: ("inner".predicate_id = "outer".node_id)"
 
"                                      ->  Index Scan using node_uid_key on node exhibits_pred_node  (cost=0.00..4.64 rows=1 width=31) (actual time=0.050..0.051 rows=1 loops=1)"
 
"                                            Index Cond: ((uid)::text = 'PHENOSCAPE:exhibits'::text)"
 
"                                      ->  Nested Loop  (cost=234.53..15377.44 rows=28 width=167) (actual time=1.580..15236.969 rows=1520592 loops=1)"
 
"                                            ->  Nested Loop  (cost=234.53..15292.11 rows=28 width=144) (actual time=1.572..6089.530 rows=1520592 loops=1)"
 
"                                                  ->  Index Scan using node_uid_key on node subset_pred_node  (cost=0.00..4.64 rows=1 width=31) (actual time=0.018..0.018 rows=1 loops=1)"
 
"                                                        Index Cond: ((uid)::text = 'oboInOwl:inSubset'::text)"
 
"                                                  ->  Nested Loop  (cost=234.53..15287.20 rows=28 width=113) (actual time=1.552..5026.391 rows=1520592 loops=1)"
 
"                                                        ->  Nested Loop  (cost=234.53..12737.92 rows=1 width=101) (actual time=1.541..1974.620 rows=37918 loops=1)"
 
"                                                              Join Filter: ("inner".predicate_id = "outer".node_id)"
 
"                                                              ->  Index Scan using node_uid_key on node is_a_pred_node  (cost=0.00..4.64 rows=1 width=31) (actual time=0.015..0.016 rows=1 loops=1)"
 
"                                                                    Index Cond: ((uid)::text = 'OBO_REL:is_a'::text)"
 
"                                                              ->  Nested Loop  (cost=234.53..12729.35 rows=315 width=78) (actual time=1.507..1869.447 rows=252348 loops=1)"
 
"                                                                    ->  Nested Loop  (cost=234.53..11769.48 rows=315 width=47) (actual time=1.495..440.581 rows=252348 loops=1)"
 
"                                                                          ->  Nested Loop  (cost=234.53..11034.28 rows=7 width=35) (actual time=1.485..11.990 rows=6904 loops=1)"
 
"                                                                                ->  Index Scan using node_uid_key on node inheres_pred_node  (cost=0.00..4.64 rows=1 width=31) (actual time=0.017..0.019 rows=1 loops=1)"
 
"                                                                                      Index Cond: ((uid)::text = 'OBO_REL:inheres_in'::text)"
 
"                                                                                ->  Bitmap Heap Scan on link inheres_link  (cost=234.53..10563.47 rows=37294 width=12) (actual time=1.463..6.271 rows=6904 loops=1)"
 
"                                                                                      Recheck Cond: (inheres_link.predicate_id = "outer".node_id)"
 
"                                                                                      ->  Bitmap Index Scan on link_predicate_object_indx  (cost=0.00..234.53 rows=37294 width=0) (actual time=1.316..1.316 rows=6904 loops=1)"
 
"                                                                                            Index Cond: (inheres_link.predicate_id = "outer".node_id)"
 
"                                                                          ->  Index Scan using link_node_indx on link is_a_link  (cost=0.00..104.68 rows=28 width=12) (actual time=0.005..0.035 rows=37 loops=6904)"
 
"                                                                                Index Cond: ("outer".node_id = is_a_link.node_id)"
 
"                                                                    ->  Index Scan using node_pkey on node quality_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=252348)"
 
"                                                                          Index Cond: ("outer".object_id = quality_node.node_id)"
 
"                                                        ->  Index Scan using link_object_indx on link exhibits_link  (cost=0.00..2537.96 rows=905 width=12) (actual time=0.006..0.046 rows=40 loops=37918)"
 
"                                                              Index Cond: ("outer".node_id = exhibits_link.object_id)"
 
"                                            ->  Index Scan using node_pkey on node taxon_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=1520592)"
 
"                                                  Index Cond: ("outer".node_id = taxon_node.node_id)"
 
"                                ->  Index Scan using link_triple_indx on link subset_link  (cost=0.00..5.87 rows=1 width=12) (actual time=0.004..0.006 rows=2 loops=1443372)"
 
"                                      Index Cond: ((subset_link.node_id = "outer".node_id) AND (subset_link.predicate_id = "outer".node_id))"
 
"                          ->  Index Scan using node_pkey on node anatomy_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.004 rows=0 loops=3310775)"
 
"                                Index Cond: ("outer".object_id = anatomy_node.node_id)"
 
"                                Filter: ((uid)::text = 'TAO:0000108'::text)"
 
"                    ->  Index Scan using node_pkey on node slim_node  (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=4730)"
 
"                          Index Cond: ("outer".object_id = slim_node.node_id)"
 
"                          Filter: (((uid)::text = 'attribute_slim'::text) OR ((uid)::text = 'value_slim'::text))"
 
"              ->  Index Scan using node_pkey on node phenotype_node  (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=3887)"
 
"                    Index Cond: ("outer".object_id = phenotype_node.node_id)"
 
</javascript>
 
  
 
===Execution Details===
 
===Execution Details===
 
* Rows returned: 2070
 
* Rows returned: 2070
 
* Time: 0.6 ~ 17 s
 
* Time: 0.6 ~ 17 s
 +
 +
===Discussion===
 +
 +
==Strategy #2==
 +
 +
==Strategy #3==
 +
 +
==Strategy #4==
 +
 +
==Strategy #5==
  
 
[[Category:Database]]
 
[[Category:Database]]

Revision as of 17:03, 12 February 2009

This page discusses the requirements of the various data access modules of the Phenoscape application to be demonstrated at the ASIH workshop, and documents the performance of the various querying strategies being tested for these modules.

Anatomy data services

Given a specific anatomical entity A, the requirements of the anatomy data service module are as follows:

  1. Identify ALL unique phenotypes {P} that are composed of the entity A or any of its subtypes.
  2. For each identified phenotype P which is of the form inheres_in(Q, E)
    1. Identify the quality Q.
      1. In addition, identify the character C which subsumes Q. This requires navigating the PATO hierarchy recursively in some cases
    2. Identify the entity E. E may be the same as A, or any valid subtype of A.
    3. Identify the set of taxa {T} or genotypes {GT} that exhibit the phenotype P.
      1. For a genotype GT, identify the gene G that encodes it.

The result should be a list in the format shown below

<javascript> <Search-Anatomical-Entity> <Phenotype> <Entity> <Quality> <Character> <List of taxa>OR<List of genes> </javascript>


Strategy #1

This strategy is the simplest way to try and get all the necessary information related to the anatomical entity being searched for. Below is the actual query which attempts to do this. This query returns the unique phenotypes associated with the anatomical entity, and the qualities and specific anatomical entities associated with the phenotype, as well as the list of taxa that exhibit each of these phenotypes in one fell table join (uh, swoop).

Query

<sql> SELECT DISTINCT taxon_node.uid AS taxon, phenotype_node.uid AS phenotype, anatomy_node.uid AS anatomy, quality_node.uid AS quality, slim_node.uid AS slim FROM link AS inheres_link, link AS is_a_link, link AS subset_link, link AS exhibits_link, node AS phenotype_node, node AS anatomy_node, node AS inheres_pred_node, node AS is_a_pred_node, node AS quality_node, node AS slim_node, node AS subset_pred_node, node AS taxon_node, node AS exhibits_pred_node WHERE exhibits_pred_node.uid = 'PHENOSCAPE:exhibits' AND subset_pred_node.uid = 'oboInOwl:inSubset' AND is_a_pred_node.uid = 'OBO_REL:is_a' AND is_a_link.is_inferred = 'f' AND inheres_pred_node.uid = 'OBO_REL:inheres_in' AND anatomy_node.uid = 'TAO:0000108' AND slim_node.uid IN ('attribute_slim', 'value_slim') AND exhibits_link.node_id = taxon_node.node_id AND exhibits_link.predicate_id = exhibits_pred_node.node_id AND exhibits_link.object_id = phenotype_node.node_id AND subset_link.node_id = quality_node.node_id AND subset_link.predicate_id = subset_pred_node.node_id AND subset_link.object_id = slim_node.node_id AND is_a_link.node_id = phenotype_node.node_id AND is_a_link.predicate_id = is_a_pred_node.node_id AND is_a_link.object_id = quality_node.node_id AND inheres_link.node_id = phenotype_node.node_id AND inheres_link.predicate_id = inheres_pred_node.node_id AND inheres_link.object_id = anatomy_node.node_id; </sql>

Query Execution Plan

Execution Details

  • Rows returned: 2070
  • Time: 0.6 ~ 17 s

Discussion

Strategy #2

Strategy #3

Strategy #4

Strategy #5