Data Jamboree 1/Annotation Experiment

From phenoscape
Revision as of 20:27, 24 April 2008 by Wasila (talk | contribs) (Variability of EQ statements)

Background and Participant Preparation

An annotation experiment was conducted on day 2 of the Phenoscape Data Jamboree in order to assess curation consistency among the four trained participants. Training consisted of a hands-on group annotation exercise on day 1, and individual work on each participant's own publications with assistance from project personnel on days 1 and 2. An Annotation Guide with examples of character types commonly encountered in the fish systematic literature was also given to participants. For the experiment, participants were given 2 hours to annotate 10 characters (plus one extra credit) taken from three publications.

Results and Conclusions

Completeness of annotations

Three of the four participants attempted annotations for all 11 characters, while one participant finished only 7 characters. All participants recorded the character number and textual description, and selected the appropriate voucher specimen for each annotation. Only two of the four participants recorded evidence codes for each annotation.

Variability of EQ statements

A summary of annotation consistency among participants is presented in the table below (incomplete annotations due to software issues are excluded).

Character # # Participants with

Completed Annotations*

% Consistency with Key Variable component of annotation
1 4 100
2 3 0 post-composition of Q term for relative length
3 3 0 incorrect recording of count values
4 4 0 TAO term definition confusion (bone vs. cartilage)
5 3 33 E post-composition; choice of appropriate Q
6 4 0 E post-composition
7 4 50 E post-composition
8 3 33 choice of appropriate Q term
9 3 0 E post-composition; choice of appropriate Q term
10 2 50 choice of appropriate Q term
EC 2 25 E post-composition; choice of appropriate Q term
    • incomplete annotations due to software issues were excluded

Participants annotated only one character identically. Variation in other annotations was due to several reasons:

  • Granularity of annotations. Some participants integrated very detailed information in post-compositions of entities or qualities, whereas others used single anatomy terms or broad term categories for quality. Use of spatial information in post-composition also varied among participants.
  • Creation of post-composed entities. Participants had difficulty in deciding what term to use as the genus in post-composition. Also, the relation used in post-composition (for example, use of part_of/has_part) differed in annotations among participants.
  • Choice of the appropriate quality term. Participants had difficulty in choosing quality terms among many similar choices. The appropriate use of monadic and relational qualities also differed in the annotations among participants.
  • Confusion regarding the definition of ontology terms. For one character, confusion about the identity of a term resulted in differing annotations, and points to the need for consistent term names for bones in the TAO.

The results of the annotation experiment provide detailed areas for improvement in establishing curation standards to assist in annotation, and stream-lining of the software interface so that curators are not faced with similar and inapplicable choices for terms and relations. Based on these results, curation standards and improvement to the Annotation Guide are in development, and improvements to the Phenote interface is planned.