EQ for character matrices

From phenoscape
Revision as of 17:47, 21 May 2007 by Jpb15 (talk | contribs)

This document is under construction... --Jpb15 12:13, 21 May 2007 (EDT)

Encoding evolutionary character matrices using the EQ format (PATO formalism) currently presents some problems that may need to be resolved. I will try to describe the issues here.

The EQ format provides a "phenotype statement" documenting the phenotype of an individual organism (usually a genetic mutant). The anatomical structure being described is represented by the Entity term chosen from an anatomical ontology, and the aspect of that structure being described is the Quality term, chosen from the PATO ontology. These phenotype statements usually describe the value the mutant exhibits:

E="dorsal fin" Q="round" --> This fish has a rounded dorsal fin.

Evolutionary phenotype descriptions are often formatted as a character matrix. For a given set of species, a list of distinguishing characters is formulated, and species-specific value for each character is entered into the matrix. In this situation, each character (column in the matrix) represents an entity and attribute, and the character state cells contain values. Here is a graphical depiction of the relationship between evolutionary characters and the components of the EQ system:

EQmatrix.png

The character (a column header in the matrix) is composed of an Entity and a Quality representing an attribute (e.g. "shape"). Values for this character are entered into the cells (e.g. "round). So you can see that the Q of EQ is represented in both the character and the character state. When EQ is used to describe mutant phenotypes, typically only the value is stated, since one can traverse back through the PATO hierarchy to find ancestor terms representing attributes - "round" is a child of "shape".

Clearly, evolutionary characters and character states can be represented using an EQ system as mutant phenotypes are. However there is a major difference in the two data models: the data formats being developed for mutant phenotype EQ statements store only a list of phenotype value statements, with no place to reference an independent character. So if you have this data set:

Genotype Entity Quality
fish1 dorsal fin round
fish2 dorsal fin lobate
fish2 dorsal fin red
fish1 pectoral fin blue