Difference between revisions of "Matching Phenotypes"

From phenoscape
(Using attributes to limit the scope of quality comparisons)
Line 14: Line 14:
 
Therefore, matching phenotypes is reduced to matching the entity component of changes that have a common attribute (so shape changes and color changes will never be matched).
 
Therefore, matching phenotypes is reduced to matching the entity component of changes that have a common attribute (so shape changes and color changes will never be matched).
  
ZFIN does not use the same qualities from PATO as Phenoscape.
+
ZFIN and Phenoscape use different (though partially overlapping) sets of qualities (from the PATO ontology) in constructing phenotype annotations.
  
 
= Scoring matches of individual phenotypes =
 
= Scoring matches of individual phenotypes =

Revision as of 19:18, 4 January 2011

This page discusses the method developed and implemented in late 2010 for search for, and scoring, phenotype matches between taxa (Phenoscape) and zebrafish mutants (ZFIN).

Purpose

An important goal for the Phenoscape project is to be able to suggest candidate genes that may have contributed to evolutionary change. The way that we have proposed to do this is to search for changes in phenotype that appear as the result of mutations in model organisms and also appear as phenotype changes on an evolutionary tree.

Selecting Phenotypes from Taxon Annotations

The matching process involves matching changes in phenotype, not directly matching phenotypes. For phenotypes associated with mutants of model organism mutants, it is understood that they vary with respect to the wild type. For taxa, however, this means looking for taxonomic nodes where variation in a phenotype is observed among the children of the node. For example, there are nine species within the genus Aspidoras with annotations for the shape of the opercle bone. Of these, eight exhibit opercle bones with round shape, but the ninth (A. pauciradiatus) is annotated with a triangular opercle. In contrast, all three annotated species of the related Hoplosternum are annotated with a triangular opercle. Thus there is detectable variation in opercle shape within the children of Aspidoras, but not within Hoplosternum - suggesting that change in opercle shape has occurred somewhere among the descendants of Aspidoras. Once changes are identified, they treated as variation in the affected entity at the level of the attribute parent of the qualities involved (e.g., shape).

Using attributes to limit the scope of quality comparisons

Because the interest is in change in phenotype, phenotypes 'reduced' to a change in an attribute of an anatomical entity, not a particular quality of a phenotype. Subsuming qualities all the way to their attributes is not, in principle, necessary (for example, round and triangular are subsumed by 2-D shape), using a consistent set of subsumers simplifies the matching process.

Therefore, matching phenotypes is reduced to matching the entity component of changes that have a common attribute (so shape changes and color changes will never be matched).

ZFIN and Phenoscape use different (though partially overlapping) sets of qualities (from the PATO ontology) in constructing phenotype annotations.

Scoring matches of individual phenotypes

Each phenotype is linked to multiple entities via inheres_in_part_of relations.

Phenotypes with different attributes are not matches and (implicitly) are scored 0. Two phenotypes with qualities subsumed under the same attribute can be matched by generating the set of entities linked by inheres_in_part_of relations from each phenotype (EQ). Take the intersection of these sets of 'neighboring' entities and select the entity with the largest information content (this requires calculating information content for entities as well as phenotypes). The information content of this entity is used as the match score. Note that under this scheme, an exact match may still score poorly if the shared entity is high up in the anatomy hierarchy. For example round inheres_in bone matched against itself will have a lower score than straight inheres_in neural spine 1 against curved inheres_in neural spine 2 because the latter two share the inheres_in_part_of neighbor neural spine.

Scoring matches of sets of phenotypes (phenotypic profiles) associated with a gene or taxon node

  • maxIC - the greatest IC of any pair of phenotypes in the set of pairs of phenotypes where one is drawn from the taxon set and other from the gene set.
  • ICCS - each taxon phenotype is matched against all gene phenotypes and the highest match score for each taxon phenotype is collected. The final score is the mean of these highest matches. (Should the set of scores be accumulated both ways?)
  • simIC -
  • simJ

Information Content as a metric of term relatedness

Alternatives

Because of its dependence on the set of annotations, there is a real concern that IC may not be the most appropriate measure of similarity between terms an ontology with associated annotations.