Queries to be implemented in the future

From phenoscape
Revision as of 19:37, 22 January 2009 by Hilmar (talk | contribs) (Execution Details)

These queries are being tested out to improve the database performance following the SICB meeting in Boston in Jan 2009. The execution methodology of these queries is to retrieve the information of interest, in this case, all the taxon and quality details associated with an anatomical being searched for, in one run

Summary

Post compositions of Entities and Qualities are used to relate taxa and phenotypes through the exhibits relation as shown in (1). <javascript> Taxon exhibits inheres_in(Quality, Entity) -- (1) </javascript> In addition, the OBD database also contains information relating post composed phenotypes to both the Quality and the Entity by different relations as shown in (2) and (3) respectively

<javascript> inheres_in(Quality, Entity) is_a Quality -- (2) inheres_in(Quality, Entity) inheres_in Entity -- (3) </javascript> Quality can be either a Value_Slim or an Attribute_Slim (beside other slims) and is related to these by the in_subset_of relation as shown in (4) <javascript> Quality in_subset_of Slim -- (4) </javascript> The queries described in this page traverse these relationships to find all the taxa and qualities that are related to an anatomical entity that is being searched for, using a combination of TABLE JOINS. This methodology makes optimal use of transitive relations derived by the OBD reasoner between Attributes and Values in the PATO hierarchy and between Anatomical Entities in the TAO hierarchy, in contrast to the earlier approach.

Query details

Using LIKE Wildcard Operator

This query will return all the PHENOTYPES that inhere in a ANATOMICAL ENTITY* and the subclasses of the ANATOMICAL ENTITY*. In addition, all the QUALITIES that are related to each PHENOTYPE are returned. The SUBSET SLIMS of each QUALITY are returned. Finally, all the TAXA that exhibit the PHENOTYPES are returned.

*Search Term (The search term in this example is TAO:0000108, which is, you guessed it, a fin!)

Query

<javascript> SELECT DISTINCT taxon_node.uid AS taxon, exhibits_pred_node.uid AS exhibits, phenotype_node.uid AS phenotype, inheres_pred_node.uid AS inheres, anatomy_node.uid AS anatomy, is_a_pred_node.uid AS isA, quality_node.uid AS quality, subset_pred_node.uid AS subset, slim_node.uid AS slim FROM link AS inheres_link, link AS is_a_link, link AS subset_link, link AS exhibits_link, node AS phenotype_node, node AS anatomy_node, node AS inheres_pred_node, node AS is_a_pred_node, node AS quality_node, node AS slim_node, node AS subset_pred_node, node AS taxon_node, node AS exhibits_pred_node WHERE exhibits_pred_node.uid LIKE '%exhibits%' AND subset_pred_node.uid LIKE '%inSubset%' AND is_a_pred_node.uid LIKE '%is_a%' AND inheres_pred_node.uid LIKE '%inheres_in' AND anatomy_node.uid LIKE '%TAO:0000108%' AND slim_node.uid IN ('attribute_slim', 'value_slim') AND exhibits_link.node_id = taxon_node.node_id AND exhibits_link.predicate_id = exhibits_pred_node.node_id AND exhibits_link.object_id = phenotype_node.node_id AND subset_link.node_id = quality_node.node_id AND subset_link.predicate_id = subset_pred_node.node_id AND subset_link.object_id = slim_node.node_id AND is_a_link.node_id = phenotype_node.node_id AND is_a_link.predicate_id = is_a_pred_node.node_id AND is_a_link.object_id = quality_node.node_id AND inheres_link.node_id = phenotype_node.node_id AND inheres_link.predicate_id = inheres_pred_node.node_id AND inheres_link.object_id = anatomy_node.node_id; </javascript>

Query Execution Plan

<javascript> "Unique (cost=27033.30..27033.32 rows=1 width=243) (actual time=95708.710..95715.120 rows=2070 loops=1)" " -> Sort (cost=27033.30..27033.30 rows=1 width=243) (actual time=95708.707..95709.717 rows=3887 loops=1)" " Sort Key: taxon_node.uid, exhibits_pred_node.uid, phenotype_node.uid, inheres_pred_node.uid, anatomy_node.uid, is_a_pred_node.uid, quality_node.uid, subset_pred_node.uid, slim_node.uid" " -> Nested Loop (cost=9416.69..27033.29 rows=1 width=243) (actual time=70172.832..95404.139 rows=3887 loops=1)" " -> Nested Loop (cost=9416.69..27030.24 rows=1 width=220) (actual time=227.929..79578.381 rows=2919406 loops=1)" " -> Nested Loop (cost=9416.69..27027.19 rows=1 width=197) (actual time=227.919..62509.808 rows=2919406 loops=1)" " -> Nested Loop (cost=9416.69..27024.14 rows=1 width=174) (actual time=227.903..38352.678 rows=4115922 loops=1)" " -> Nested Loop (cost=9416.69..27021.09 rows=1 width=151) (actual time=227.885..12271.892 rows=4115922 loops=1)" " -> Hash Join (cost=9414.52..26907.04 rows=1 width=163) (actual time=227.752..2896.269 rows=114658 loops=1)" " Hash Cond: ("outer".object_id = "inner".node_id)" " -> Nested Loop (cost=4261.24..21750.61 rows=630 width=109) (actual time=86.733..2633.493 rows=252348 loops=1)" " -> Nested Loop (cost=4261.24..19830.87 rows=630 width=78) (actual time=86.711..1176.546 rows=252348 loops=1)" " -> Nested Loop (cost=234.53..15791.56 rows=315 width=47) (actual time=58.083..587.038 rows=252348 loops=1)" " -> Nested Loop (cost=234.53..15056.36 rows=7 width=35) (actual time=58.071..136.526 rows=6904 loops=1)" " -> Seq Scan on node inheres_pred_node (cost=0.00..4026.71 rows=1 width=31) (actual time=56.524..124.720 rows=1 loops=1)" " Filter: ((uid)::text ~~ '%inheres_in'::text)" " -> Bitmap Heap Scan on link inheres_link (cost=234.53..10563.47 rows=37294 width=12) (actual time=1.531..6.138 rows=6904 loops=1)" " Recheck Cond: (inheres_link.predicate_id = "outer".node_id)" " -> Bitmap Index Scan on link_predicate_object_indx (cost=0.00..234.53 rows=37294 width=0) (actual time=1.379..1.379 rows=6904 loops=1)" " Index Cond: (inheres_link.predicate_id = "outer".node_id)" " -> Index Scan using link_node_indx on link is_a_link (cost=0.00..104.68 rows=28 width=12) (actual time=0.005..0.037 rows=37 loops=6904)" " Index Cond: ("outer".node_id = is_a_link.node_id)" " -> Materialize (cost=4026.71..4026.73 rows=2 width=31) (actual time=0.000..0.001 rows=1 loops=252348)" " -> Seq Scan on node exhibits_pred_node (cost=0.00..4026.71 rows=2 width=31) (actual time=28.608..132.502 rows=1 loops=1)" " Filter: ((uid)::text ~~ '%exhibits%'::text)" " -> Index Scan using node_pkey on node phenotype_node (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=252348)" " Index Cond: ("outer".node_id = phenotype_node.node_id)" " -> Hash (cost=5153.24..5153.24 rows=14 width=66) (actual time=37.026..37.026 rows=4413 loops=1)" " -> Nested Loop (cost=4.01..5153.24 rows=14 width=66) (actual time=0.067..32.679 rows=4413 loops=1)" " -> Nested Loop (cost=4.01..5110.58 rows=14 width=35) (actual time=0.058..6.879 rows=4413 loops=1)" " -> Bitmap Heap Scan on node slim_node (cost=4.01..12.03 rows=2 width=31) (actual time=0.044..0.046 rows=2 loops=1)" " Recheck Cond: (((uid)::text = 'attribute_slim'::text) OR ((uid)::text = 'value_slim'::text))" " -> BitmapOr (cost=4.01..4.01 rows=2 width=0) (actual time=0.040..0.040 rows=0 loops=1)" " -> Bitmap Index Scan on node_uid_key (cost=0.00..2.00 rows=1 width=0) (actual time=0.026..0.026 rows=1 loops=1)" " Index Cond: ((uid)::text = 'attribute_slim'::text)" " -> Bitmap Index Scan on node_uid_key (cost=0.00..2.00 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1)" " Index Cond: ((uid)::text = 'value_slim'::text)" " -> Index Scan using link_object_indx on link subset_link (cost=0.00..2537.96 rows=905 width=12) (actual time=0.013..1.901 rows=2206 loops=2)" " Index Cond: (subset_link.object_id = "outer".node_id)" " -> Index Scan using node_pkey on node quality_node (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=4413)" " Index Cond: ("outer".node_id = quality_node.node_id)" " -> Bitmap Heap Scan on link exhibits_link (cost=2.18..113.61 rows=29 width=12) (actual time=0.021..0.052 rows=36 loops=114658)" " Recheck Cond: ((exhibits_link.predicate_id = "outer".node_id) AND (exhibits_link.object_id = "outer".node_id))" " -> Bitmap Index Scan on link_predicate_object_indx (cost=0.00..2.18 rows=29 width=0) (actual time=0.016..0.016 rows=36 loops=114658)" " Index Cond: ((exhibits_link.predicate_id = "outer".node_id) AND (exhibits_link.object_id = "outer".node_id))" " -> Index Scan using node_pkey on node subset_pred_node (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=1 loops=4115922)" " Index Cond: ("outer".predicate_id = subset_pred_node.node_id)" " Filter: ((uid)::text ~~ '%inSubset%'::text)" " -> Index Scan using node_pkey on node is_a_pred_node (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.005 rows=1 loops=4115922)" " Index Cond: ("outer".predicate_id = is_a_pred_node.node_id)" " Filter: ((uid)::text ~~ '%is_a%'::text)" " -> Index Scan using node_pkey on node taxon_node (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=2919406)" " Index Cond: ("outer".node_id = taxon_node.node_id)" " -> Index Scan using node_pkey on node anatomy_node (cost=0.00..3.04 rows=1 width=31) (actual time=0.005..0.005 rows=0 loops=2919406)" " Index Cond: ("outer".object_id = anatomy_node.node_id)" " Filter: ((uid)::text ~~ '%TAO:0000108%'::text)" </javascript>

Execution Details

  • Rows returned: 2070
  • Time:

Using EQUALS (=) Operator

The result of this query is the same as for the previous two queries except the wildcard search is replaced by exact match requirements on the search parameters viz. relation and concept names (IDs, to be specific). This enables leverage of the indexes set up for the Link and Node tables in the OBD database

Query

<javascript> SELECT DISTINCT taxon_node.uid AS taxon, exhibits_pred_node.uid AS exhibits, phenotype_node.uid AS phenotype, inheres_pred_node.uid AS inheres, anatomy_node.uid AS anatomy, is_a_pred_node.uid AS isA, quality_node.uid AS quality, subset_pred_node.uid AS subset, slim_node.uid AS slim FROM link AS inheres_link, link AS is_a_link, link AS subset_link, link AS exhibits_link, node AS phenotype_node, node AS anatomy_node, node AS inheres_pred_node, node AS is_a_pred_node, node AS quality_node, node AS slim_node, node AS subset_pred_node, node AS taxon_node, node AS exhibits_pred_node WHERE exhibits_pred_node.uid = 'PHENOSCAPE:exhibits' AND subset_pred_node.uid = 'oboInOwl:inSubset' AND is_a_pred_node.uid = 'OBO_REL:is_a' AND inheres_pred_node.uid = 'OBO_REL:inheres_in' AND anatomy_node.uid = 'TAO:0000108' AND slim_node.uid IN ('attribute_slim', 'value_slim') AND exhibits_link.node_id = taxon_node.node_id AND exhibits_link.predicate_id = exhibits_pred_node.node_id AND exhibits_link.object_id = phenotype_node.node_id AND subset_link.node_id = quality_node.node_id AND subset_link.predicate_id = subset_pred_node.node_id AND subset_link.object_id = slim_node.node_id AND is_a_link.node_id = phenotype_node.node_id AND is_a_link.predicate_id = is_a_pred_node.node_id AND is_a_link.object_id = quality_node.node_id AND inheres_link.node_id = phenotype_node.node_id AND inheres_link.predicate_id = inheres_pred_node.node_id AND inheres_link.object_id = anatomy_node.node_id; </javascript>

Query Execution Plan

<javascript> "Unique (cost=15397.47..15397.50 rows=1 width=243) (actual time=45692.829..45699.320 rows=2070 loops=1)" " -> Sort (cost=15397.47..15397.47 rows=1 width=243) (actual time=45692.825..45693.795 rows=3887 loops=1)" " Sort Key: taxon_node.uid, exhibits_pred_node.uid, phenotype_node.uid, inheres_pred_node.uid, anatomy_node.uid, is_a_pred_node.uid, quality_node.uid, subset_pred_node.uid, slim_node.uid" " -> Nested Loop (cost=234.53..15397.46 rows=1 width=243) (actual time=33218.390..45415.489 rows=3887 loops=1)" " -> Nested Loop (cost=234.53..15394.41 rows=1 width=228) (actual time=33218.380..45393.860 rows=3887 loops=1)" " -> Nested Loop (cost=234.53..15391.36 rows=1 width=205) (actual time=33218.359..45367.897 rows=4730 loops=1)" " -> Nested Loop (cost=234.53..15388.31 rows=1 width=182) (actual time=1.649..29103.510 rows=3310775 loops=1)" " -> Nested Loop (cost=234.53..15382.42 rows=1 width=190) (actual time=1.634..16423.157 rows=1443372 loops=1)" " Join Filter: ("inner".predicate_id = "outer".node_id)" " -> Index Scan using node_uid_key on node exhibits_pred_node (cost=0.00..4.64 rows=1 width=31) (actual time=0.050..0.051 rows=1 loops=1)" " Index Cond: ((uid)::text = 'PHENOSCAPE:exhibits'::text)" " -> Nested Loop (cost=234.53..15377.44 rows=28 width=167) (actual time=1.580..15236.969 rows=1520592 loops=1)" " -> Nested Loop (cost=234.53..15292.11 rows=28 width=144) (actual time=1.572..6089.530 rows=1520592 loops=1)" " -> Index Scan using node_uid_key on node subset_pred_node (cost=0.00..4.64 rows=1 width=31) (actual time=0.018..0.018 rows=1 loops=1)" " Index Cond: ((uid)::text = 'oboInOwl:inSubset'::text)" " -> Nested Loop (cost=234.53..15287.20 rows=28 width=113) (actual time=1.552..5026.391 rows=1520592 loops=1)" " -> Nested Loop (cost=234.53..12737.92 rows=1 width=101) (actual time=1.541..1974.620 rows=37918 loops=1)" " Join Filter: ("inner".predicate_id = "outer".node_id)" " -> Index Scan using node_uid_key on node is_a_pred_node (cost=0.00..4.64 rows=1 width=31) (actual time=0.015..0.016 rows=1 loops=1)" " Index Cond: ((uid)::text = 'OBO_REL:is_a'::text)" " -> Nested Loop (cost=234.53..12729.35 rows=315 width=78) (actual time=1.507..1869.447 rows=252348 loops=1)" " -> Nested Loop (cost=234.53..11769.48 rows=315 width=47) (actual time=1.495..440.581 rows=252348 loops=1)" " -> Nested Loop (cost=234.53..11034.28 rows=7 width=35) (actual time=1.485..11.990 rows=6904 loops=1)" " -> Index Scan using node_uid_key on node inheres_pred_node (cost=0.00..4.64 rows=1 width=31) (actual time=0.017..0.019 rows=1 loops=1)" " Index Cond: ((uid)::text = 'OBO_REL:inheres_in'::text)" " -> Bitmap Heap Scan on link inheres_link (cost=234.53..10563.47 rows=37294 width=12) (actual time=1.463..6.271 rows=6904 loops=1)" " Recheck Cond: (inheres_link.predicate_id = "outer".node_id)" " -> Bitmap Index Scan on link_predicate_object_indx (cost=0.00..234.53 rows=37294 width=0) (actual time=1.316..1.316 rows=6904 loops=1)" " Index Cond: (inheres_link.predicate_id = "outer".node_id)" " -> Index Scan using link_node_indx on link is_a_link (cost=0.00..104.68 rows=28 width=12) (actual time=0.005..0.035 rows=37 loops=6904)" " Index Cond: ("outer".node_id = is_a_link.node_id)" " -> Index Scan using node_pkey on node quality_node (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=252348)" " Index Cond: ("outer".object_id = quality_node.node_id)" " -> Index Scan using link_object_indx on link exhibits_link (cost=0.00..2537.96 rows=905 width=12) (actual time=0.006..0.046 rows=40 loops=37918)" " Index Cond: ("outer".node_id = exhibits_link.object_id)" " -> Index Scan using node_pkey on node taxon_node (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=1520592)" " Index Cond: ("outer".node_id = taxon_node.node_id)" " -> Index Scan using link_triple_indx on link subset_link (cost=0.00..5.87 rows=1 width=12) (actual time=0.004..0.006 rows=2 loops=1443372)" " Index Cond: ((subset_link.node_id = "outer".node_id) AND (subset_link.predicate_id = "outer".node_id))" " -> Index Scan using node_pkey on node anatomy_node (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.004 rows=0 loops=3310775)" " Index Cond: ("outer".object_id = anatomy_node.node_id)" " Filter: ((uid)::text = 'TAO:0000108'::text)" " -> Index Scan using node_pkey on node slim_node (cost=0.00..3.04 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=4730)" " Index Cond: ("outer".object_id = slim_node.node_id)" " Filter: (((uid)::text = 'attribute_slim'::text) OR ((uid)::text = 'value_slim'::text))" " -> Index Scan using node_pkey on node phenotype_node (cost=0.00..3.03 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=3887)" " Index Cond: ("outer".object_id = phenotype_node.node_id)" </javascript>

Execution Details

  • Rows returned: 2070
  • Time: 0.6 ~ 17 s