Difference between revisions of "Matrix annotation workflow"

From phenoscape
Line 18: Line 18:
  
 
"EQ character state coding" can continue to be performed within Phenote, being Phenote's core capability.  A revised Phenoscape configuration will be created, eliminating all taxon-related columns from the annotation table, but adding a column for "character state".  Curators will enter 1 or more annotation rows for each character/character state combination, with no reference to any particular taxon.  The curator can enter multiple EQ's to represent any particular character state.  The search filter field at the bottom of the table will be enhanced such that the curator can filter the list down to one character at a time, viewing only the EQ annotations for the states of that character.  The curator will save these EQ annotations to a file for later processing.
 
"EQ character state coding" can continue to be performed within Phenote, being Phenote's core capability.  A revised Phenoscape configuration will be created, eliminating all taxon-related columns from the annotation table, but adding a column for "character state".  Curators will enter 1 or more annotation rows for each character/character state combination, with no reference to any particular taxon.  The curator can enter multiple EQ's to represent any particular character state.  The search filter field at the bottom of the table will be enhanced such that the curator can filter the list down to one character at a time, viewing only the EQ annotations for the states of that character.  The curator will save these EQ annotations to a file for later processing.
 +
 +
For the time being "Taxon and specimen lists" can continue to be entered using the current Phenote configuration.  As described below, entering these will likely be incorporated into the "Character matrix" activity within Mesquite.  Phenote-based taxon files will be later processed into a to-be-developed NEXUS block.
  
 
[[Category:Informatics]]
 
[[Category:Informatics]]

Revision as of 18:46, 1 May 2008

The first Phenoscape data jamboree revealed that our curation workflow implemented in Phenote could be quite cumbersome, particularly in managing the huge numbers of rows containing annotations for every taxon. Discussion after the jamboree settled on a new workflow which allows more division of effort, allowing experienced curators to focus on the description of phenotypes using EQ, while input of specimen lists and character matrices can be done separately, possibly by assistants.

Workflow activities

This workflow is divided into 3 main activities, each of which can be performed independently and in parallel.

EQ character state coding

This activity involves the translation of published character states into ontology-based EQ descriptions. This requires deep domain knowledge providing full comprehension of the published descriptions and thorough understanding of anatomy and quality ontologies. This activity can be performed independent of annotations to particular taxa - the curator focuses only on describing the character states. As such, the curator will need to deal with far fewer rows of data when working in Phenote, as compared to simultaneous taxon annotation.

Taxon and specimen lists

This activity includes entering the list of species as named in the publication ("Publication Taxon") and choosing the matching taxon term from the TTO for each. Also at this step the list of specimens studied is entered for each taxon. This activity can be easily performed by an assistant with minimal training.

Character matrix

A character matrix provides the annotation of each species with a particular character state for each described character. Many studies publish the data already in matrix form. If a matrix is not available, the curator will need to prepare one representing the data for that publication. The states in the matrix are linked to those described in "EQ character state coding" by using corresponding character and character state numbers. The taxon names in the character matrix must match Publication Taxon entries in the taxon list. This activity may also include choosing an evidence code, from the evidence code ontology, for each state assignment (each cell). If a matrix is available for a publication, this activity requires little work on the part of an experience curator.

Output from each of these activities can be merged using a non-interactive script, producing resultant taxon-specific EQ annotations.

Implementation proposal

"EQ character state coding" can continue to be performed within Phenote, being Phenote's core capability. A revised Phenoscape configuration will be created, eliminating all taxon-related columns from the annotation table, but adding a column for "character state". Curators will enter 1 or more annotation rows for each character/character state combination, with no reference to any particular taxon. The curator can enter multiple EQ's to represent any particular character state. The search filter field at the bottom of the table will be enhanced such that the curator can filter the list down to one character at a time, viewing only the EQ annotations for the states of that character. The curator will save these EQ annotations to a file for later processing.

For the time being "Taxon and specimen lists" can continue to be entered using the current Phenote configuration. As described below, entering these will likely be incorporated into the "Character matrix" activity within Mesquite. Phenote-based taxon files will be later processed into a to-be-developed NEXUS block.