Difference between revisions of "EQ Editor"

From phenoscape
(EQ Editor requirements)
(Sources of data)
Line 11: Line 11:
 
The publication being coded may contain data in one of a few different formats.  The given data format may suggest its own style of workflow.  These publication types include 3 main forms:
 
The publication being coded may contain data in one of a few different formats.  The given data format may suggest its own style of workflow.  These publication types include 3 main forms:
  
 +
# Description of many characters for many species and higher taxonomic levels; no character-by-taxon matrix published.
 
# A data matrix with multiple species and multiple characters.  There is a character state value for each cell in this matrix.
 
# A data matrix with multiple species and multiple characters.  There is a character state value for each cell in this matrix.
#A single species description of values for many characters.
+
# A single species description of values for many characters.
# Description of a single character for many species (perhaps less common than the first two formats?).  If focusing on a single character the data may come from multiple publications.
+
# Description of a single character for many species (perhaps less common than the other formats?).  If focusing on a single character the data may come from multiple publications.
# Description of many characters for many species and higher taxonomic levels; no character-by-taxon matrix published.
+
 
  
 
It seems like scenarios 2 and 3 can be treated as special cases of scenario 1.  For each character, an interface is required for choosing the Entity and Quality from their respective ontologies, and entering free text such as the original character descriptions, as well as other [[#Data_Model|relevant data]].
 
It seems like scenarios 2 and 3 can be treated as special cases of scenario 1.  For each character, an interface is required for choosing the Entity and Quality from their respective ontologies, and entering free text such as the original character descriptions, as well as other [[#Data_Model|relevant data]].
  
Since many species will share common values of character states, the interface should have a way of choosing previously entered character states, perhaps by separately enumerating the possible values for a character.  It is not clear whether EQ coding should be performed for characters as well as character states - see [[EQ_for_character_matrices|"EQ for character matrices"]].
+
Since many species will share common values of character states, the interface should have a way of choosing previously entered character states, perhaps by separately enumerating the possible values for a character.  EQ coding will be performed at the level of character states, but there may be a facility for dynamically viewing entries in a character matrix format.  See [[EQ_for_character_matrices|"EQ for character matrices"]] for a discussion - at the June 5 PI meeting we decided to work with only character states.
  
 
====Detailed steps====
 
====Detailed steps====

Revision as of 16:02, 13 June 2007

EQ Editor requirements

The EQ Editor will be used by curators to annotate phenotypic descriptions of Ostariophysi using EQ syntax and ontologies.

The curator will use the EQ Editor to code phenotypic data from an existing publication into the EQ format. This data consists of descriptions of character state values for corresponding species (or, more precisely, specimens). A published character state description may contain: the species or higher taxonomic specification, a textual description of the character state value, reference to a voucher specimen for this description, an image showing the character.

Workflow

Sources of data

The publication being coded may contain data in one of a few different formats. The given data format may suggest its own style of workflow. These publication types include 3 main forms:

  1. Description of many characters for many species and higher taxonomic levels; no character-by-taxon matrix published.
  2. A data matrix with multiple species and multiple characters. There is a character state value for each cell in this matrix.
  3. A single species description of values for many characters.
  4. Description of a single character for many species (perhaps less common than the other formats?). If focusing on a single character the data may come from multiple publications.


It seems like scenarios 2 and 3 can be treated as special cases of scenario 1. For each character, an interface is required for choosing the Entity and Quality from their respective ontologies, and entering free text such as the original character descriptions, as well as other relevant data.

Since many species will share common values of character states, the interface should have a way of choosing previously entered character states, perhaps by separately enumerating the possible values for a character. EQ coding will be performed at the level of character states, but there may be a facility for dynamically viewing entries in a character matrix format. See "EQ for character matrices" for a discussion - at the June 5 PI meeting we decided to work with only character states.

Detailed steps

One possible workflow would be a curator dealing with a single publication during a session of working with the EQ Editor. Steps might be:

  1. Create a new EQ Editor document.
  2. Enter document-wide data, such as publication information including author, title, journal, etc.
  3. Create a list of taxa which are described in the publication. If a taxonomic ontology is available, the taxa could be chosen from the ontology; otherwise, simply entered as free text (are other forms of identifier available for fish species?)
  4. Create a list of characters described in the publication. For each character, choose an Entity from the anatomy ontology and a Quality from PATO. If the Quality is a relational quality, choose another Entity to which it refers. Because this is a character, the quality should be an Attribute, not a Value. For each character, the curator can also add free text containing the original character title and notes.
  5. After taxa and characters have been defined, the curator can begin annotating character states for each character for each species. For each character state, the curator will choose a value from a list of the Value descendants in PATO of the Attribute specified in the character. The curator can also add free text containing the original character state title and notes. Each character state can also be annotated with an image URL and a voucher specimen ID.
  6. At any point, the curator can save the current work to a document (or database). We should investigate requirements regarding the document format (relationship to PhenoXML/PhenoSyntax, NEXUS, etc.), which will depend on the detailed data model.

Questions

  • What should the user be able to do if there is not an appropriate term in the ontology?
  • What will be done (in the immediate term) with the annotations produced by the EQ Editor? Will they be stored in separate documents, or compiled into a central repository?
  • Is a "post-composition" capability required?
  • Need to add a button to say "Need a new term"

Features perhaps suited for EQSYTE

At the February 2007 PI meeting, some features of the workflow were discussed which may be most useful when the central database of character states is in place, such as:

  • see what is already present about a particular character or "similar" characters (based on related entities or qualities)
  • see what values have been previously assigned for a character (from other publications)
  • view conflicting character states from different publications, choose to keep both or reconcile

Data Model

The following are essential data elements to be captured by the EQ Editor (format is in parentheses). There is some additional discussion of how to treat character definitions at "EQ for character matrices".

  • species name (free text until taxonomic ontology is available?)
  • EQ statement for character state (Q should be a value)
    • E from fish anatomy (ontology ID)
    • Q from PATO (ontology ID)
    • measurement (number followed by unit name)
    • E2 from fish anatomy (if Q is descendant of "relational quality of continuant" or "relational quality of occurrent") (ontology ID)
    • May be polymorphic within a species/taxon
  • original description (free text)
  • publication/citation (DOI? older publications don't have DOI, do they? Another possibility is SICI)
  • image or URL for image (image data or URL)
  • voucher specimen ID (format?)
  • evidence statement (confirmation by taxonomic expert, etc.)

Technology

Some technology choices may figure into the application requirements, particularly if integration with another application or system is a desired feature. Possibilities include:

  • Implement EQ editing as a Mesquite plug-in, providing some additional interface to the character editing already in Mesquite. This would require extensions to the NEXUS format for storing ontology term information. Here is some more discussion on Mesquite.
  • Extending Phenote with any additional functionality that is required. Phenote is being actively developed by the model organism community, but deals with a list of unordered EQ statements rather than a species-by-character matrix. Here is some more discussion on Phenote.
  • Web-based application.
  • Components from all of the above.