EQ Editor

From phenoscape
Revision as of 19:26, 13 June 2007 by Jpb15 (talk | contribs) (Working with EQ statements)

EQ Editor requirements

The EQ Editor will be used by curators to annotate phenotypic descriptions of Ostariophysi using EQ syntax and ontologies.

The curator will use the EQ Editor to code phenotypic data from an existing publication into the EQ format. This data consists of descriptions of character state values for corresponding species (or, more precisely, specimens). A published character state description may contain: the species or higher taxonomic specification, a textual description of the character state value, reference to a voucher specimen for this description, an image showing the character.

Data Model

The following are essential data elements to be captured by the EQ Editor for each character state description (format is in parentheses). EQ coding will be performed at the level of character states, but there may be a facility for dynamically viewing entries in a character matrix format. See "EQ for character matrices" for a discussion - at the June 5 PI meeting we decided to work with only character states.

  • species name (free text until taxonomic ontology is available?)
  • voucher specimen lot ID (format?)
  • specimen count
  • specimen preparation (cleared and stained, etc.)
  • EQ statement for character state (Q should be a value)
    • E from fish anatomy (ontology ID) - may be post-composed from multiple ontology terms, especially in conjunction with a spatial modifier ontology
    • Q from PATO (ontology ID)
    • measurement - used if Q is an attribute suitable for measurement (e.g. "length")
      • number - should be able to put in single value, a range, or <, > - can all this be done within EQ formalism?
      • unit (units ontology ID)
    • E2 from fish anatomy (if Q is descendant of "relational quality of continuant" or "relational quality of occurrent") (ontology ID)
  • original description (free text)
  • publication/citation (DOI? older publications don't have DOI, do they? Another possibility is SICI)
  • image or URL for image (image data or URL)
  • evidence statement (confirmation by taxonomic expert, etc.) - this requires discussion

Workflow

Sources of data

The publication being coded may contain data in one of a few different formats. The given data format may suggest its own style of workflow. These publication types include 3 main forms:

  1. Description of many characters for many species and higher taxonomic levels; no character-by-taxon matrix published.
  2. A data matrix with multiple species and multiple characters. There is a character state value for each cell in this matrix.
  3. A single species description of values for many characters.
  4. Description of a single character for many species (perhaps less common than the other formats?). If focusing on a single character the data may come from multiple publications.


It seems like scenarios 2 and 3 can be treated as special cases of scenario 1. For each character, an interface is required for choosing the Entity and Quality from their respective ontologies, and entering free text such as the original character descriptions, as well as other relevant data.

Detailed steps

The standard workflow would be a curator dealing with a single publication during a session of working with the EQ Editor. Steps might be:

  1. Create a new EQ Editor document.
  2. Enter document-wide data:
    1. publication information (author, title, journal)
    2. list of specimens - input specimen info, choosing museum institutions from a pick list, and choose taxon from taxonomic ontology for each one
  3. Begin making character state annotations:
    1. Select a specimen or multiple specimens to which this character state applies (there should be facilities for efficiently choosing sets of specimens/taxa)
    2. Create a new EQ statement
    3. Choose Entity from anatomy ontology
    4. Choose Quality from PATO (usually a Value term)
    5. If the Quality is an Attribute term (such as "length"), enter a measurement and its units
    6. If the Quality is relational, enter a second Entity from the anatomy ontology
    7. If dealing with a single specimen, enter optional image data (URL). Images may not be applied to multiple specimens at once.
  4. At any point, the curator can save the current work to a document (or database). We should investigate requirements regarding the document format (relationship to PhenoXML/PhenoSyntax, NEXUS, etc.), which will depend on the detailed data model.

Working with EQ statements

  • A particular EQ statement (specific combination of Entity and Quality) will often be applied to multiple specimens within one document. Specimens are likely to share EQ values as a result of phylogenetic history, so facilities for efficiently selecting groups of related specimens should be available. Entries with this EQ statement would be generated for each selected specimen.
    • An EQ entry panel could allow the user to choose a taxon from the taxonomic ontology, either by directly browsing the ontology or through an autocomplete text search field. All specimens within that taxon would be selected.
    • A phylogenetic tree view could be provided, which allows the user to select a node or nodes to which to apply an EQ statement. All specimens descending from that node would be selected. This tree could be initialized by using the taxonomic ontology. The user could manually edit the tree to provide additional resolution (perhaps by following a tree in the paper). Alternatively, the software could allow the specification in Newick format of a tree created by another application.
    • It would be useful to be able to invert selections generated by either of the preceding methods.
  • The EQ statements entered into the document should be able to be viewed in various ways to allow checking data entry progress. Various views are desired:
    • Flat list of all entered EQ statements.
    • Filtered flat list - a search text field can allow quick filtering by any of the data fields.
    • Character-by-taxon matrix - generated from EQ statements by grouping via shared PATO Attributes.
    • Filtered matrix view - what capabilities are needed? filter by taxon, entity, quality: if one of these is a higher term in the ontology, show all descendant matches

Questions

  • What should the user be able to do if there is not an appropriate term in the ontology?
  • Need to add a button to say "Need a new term"
  • What will be done (in the immediate term) with the annotations produced by the EQ Editor? Will they be stored in separate documents, or compiled into a central repository?

Technology

Some technology choices may figure into the application requirements, particularly if integration with another application or system is a desired feature. Possibilities include:

  • Implement EQ editing as a Mesquite plug-in, providing some additional interface to the character editing already in Mesquite. This would require extensions to the NEXUS format for storing ontology term information. Here is some more discussion on Mesquite.
  • Extending Phenote with any additional functionality that is required. Phenote is being actively developed by the model organism community, but deals with a list of unordered EQ statements rather than a species-by-character matrix. Here is some more discussion on Phenote.
  • Web-based application.
  • Components from all of the above.