Data Jamboree 2/Notes
User interface
User interface demonstration
DATE? SUNDAY MEETING OR DO THESE NOTES ALSO APPLY TO MEETING WITH
INDIVIDUALS IN PAIRS ON MONDAY/TUESDAY/WED?
- Visualization
- Jim Balhoff suggested enabling drill down from higher taxa to
lower ones. Typically, query for annotations will yield more
results at higher levels in the taxonomy. Drilling down into the
lower levels will serve to prune the results and narrow down to
users’ exact requirements. Monte Westerfield, and Paula Mabee
seconded.
- Mark Sabaj suggested using a fish landscape with different
predefined areas for visualization of results and guiding the
search. Paula Mabee and John Lundberg approved.
- Todd Vision suggests displaying only those nodes of a taxonomy
that have been annotated
- Judith Blake suggested using Cytoscape to browse through various
nodes in the tree
- Monte Westerfield suggested linking phenotypes to genes
- Todd Vision suggested character correlations
- Querying
- Todd Vision suggested using auto composition of search terms
- Monte Westerfield suggested using Boolean combinations of query
parameters. Seconded by Judith Blake
- option for hierarchical indexing of results (taxonomy, phylogeny, but
also anatomy ontology)
- mapping characters on a tree: multiple phenotypes may match any
particular query
- map to different colors for indicators? use numbers as indexes?
- mapping phenotypes onto trees cannot typically reconstruct character
state changes, and hence traditional visualizations may be
misleading?
- ability to prune species with no data (values) for export
- search interface: ability to combine taxon/entity/quality
specifications (and, or, not)
- graph navigation: Dbgraphnav, Cytoscape
- clickable fish image for starting navigation
- most common entry point is likely to be a simple one-field form for
entering terms
- phenotype query prototype: how do I get from here to the genes?
- ability to see correlations between phenotypes
MGI batch query demo
- users don’t use complex query forms
- auto-detect type of input tokens
- allow download in different formats
- computationally savvy users
- pre-written SQL as available from GO website
User interface strategy
- Most common entry gateways: search by
- taxon
- gene (from ZFIN)
- anatomical entity
- Use case: find evolutionary phenotypes that match a mutant ZFIN
phenotype
- Query by phenotype, result is species and ZFIN mutants that have
matching annotations
- Alternatively, query by phenotype profile (several phenotypes)
- This could be retrieved by ZFIN mutant rather than typing them all
in
- Inverting this use case: find ZFIN mutants for a set of phenotypes
that differs between two taxa
- Query by two taxa, pull out all phenotype annotations that aren’t
annotated to both taxa (each of which may be a clade)
- User can remove phenotype annotations from that result to create the
search profile
- Use case: find matching taxa and/or genes for a phenotype or phenotype
profile
- Query by anatomical entity to retrieve matching phenotypes
([Q]ualities, essentially)
- Build phenotype profile from that (choosing or removing phenotype
annotations)
- Summarization of results:
- Number of matching taxa, ZFIN genes, publications
- Search by anatomy term
- obtain the qualities (and entities if query is higher-level) for the
query term
- use this to be build query profile for obtaining matching ZFIN genes
- Search by taxon:
- results in phenotypes annotated to this taxon
- list of publications, possibly as a secondary result after narrowing
down phenotype list by anatomy term
- should be able to see where in the tree the currently selected taxon
is
- Search publications:
- Search by author, by taxon, by anatomical term
- Search by identifier (doi)
- Search by ZFIN gene (or mutant)
- Results in phenotypes annotated to this mutant
- Use these to retrieve publications describing such phenotypes
(regardless of taxon or gene)
- Obtain the number of these phenotypes that are used for evolutionary
annotation (i.e., annotated to - presumably normal - taxa)
- Summarizing data per publication
- number of taxa and/or anatomical entities or phenotypes matching the
original query, compared to overall number of taxa or anatomical
entities or phenotypes
Feedback and Suggestions on proposed Phenoscape UI
- Data cube like representation to represent taxa, phenotypes, and genes
on three dimensions. This should allow combinations of two of the
three parameters to be displayed at any given time(Rick Mayden)
- While looking for gene-phenotype associations, display ALL phenotypes
the gene is associated with IN ADDITION to the phenotypes of interest
(Rick Mayden)
- Term Information area for a selected term can be used to display all
synonyms (dbxrefs?) and misspellings (Rick Mayden)
- In publication search results, indicate whether the publication was
curated or not. If curated, display the details of curations such as
curator’s info, date curated, and importantly, versions of the
ontologies that were used in the curation process (Mark Sabaj, Rick
Mayden)
- Provide external links to other morphological databases such as
MorphBank, MorphoBank etc (Rick Mayden, Mark Sabaj, Terry Grande, Eric
Hilton)
- Enable search for related entity to the phenotype (Eric Hilton)
- While displaying publications, include a link to Authors’ page and if
possible, links to related people (Terry Grande)
- Provide links to images (Terry Grande)
- Group phenotype query results by taxa. Higher taxa should be
expandable to display lower taxa (Paula Mabee, John Lundberg)
- Display result distributions among taxa when displaying results from
phenotype queries (Paula Mabee)
- Group results by author, publication date etc while displaying
publications (Paula Mabee)
- Group annotations by author, annotation date (Paula Mabee)
- Allow taxon/gene/phenotype combinations for querying (Paula Mabee)
Taxon Concepts subgroup meeting (Monday, Sept. 29, 2008)
(Present: Peter Midford, Paula Mabee, Todd Vision, Wasila Dahdul, Hilmar
Lapp, John Lundberg, Suzi Lewis, Judy Blake)
Synonym scanner is working well but will never be perfect because CoF
doesn’t list every synonym in existence (because they may not detect
every use in literature, or maybe deemed a name not worthy of addition
as synonym)
TJP (TV?): Have we checked requested taxa not in CoF against UBio? -
that would provide an LSID that could provide a dbxref to anchor the
taxon
Addition of synonyms not in TTO: currently Peter must add by hand.
Need to track - Association between synonym, person who requested and
publication? - Not being tracked right now.
JGL: We don’t want to give this to CoF because these synonyms are picked
up in morphological or phylogenetic studies. These names that appear and
we flag as synonyms to valid names in CoF are not names coming out of
taxonomic research; these are mistakes: OCR, typographical mistakes of
author, or author mistakes of species in wrong genus.
Seems we should track all of this in database?
We want: - author year of the reference (right now it’s in comments) -
want in a searchable context - does CoF have a database of publications?
PM: ed publication information (DOI, SCSI) -CoF has an internal
reference - We also need to record the full citation ourselves (name,
year, unpublished dissertation…)
Peter: dbxref from CoF can be added to TTO; OBO ontologies are
structured to hold dbxrefs, whole references in any structured way.
synonym reference - how should it be handled?
SL: synonym xref can be a person (initials)
to summarize: full text ref would be in db; ontology would have a
pointer to that
Peter: need interface to the db; won’t be visible to obo-edit - make the
dbxref a url link to URI
TV: We need a universal place to resolve that URL
PM: synonym types necessary? misspellings; narrow etc…
Peter: Hoping we can scan stuff in from CoF
JGL: see ANSP collection search: type in a type specimen name and it
pulls in the CoF ref need CoF identifier; CASspec?
PM: who should do this?
JGL: why can’t curator do this?
PM: too much time; can be done in bulk
Peter: synonym types: - current vocab is exact, related, broader,
narrower - all synonyms currently are related (the weakest relation). -
we will want misspelling to be even weaker than related; - distinguish
between published synonomy vs, curator made decision
HL: weighing how trustworthy the evidence vs. designation of
weak/strong; to HL, misspelling is a strong exact synonym not weak
PM: are misspelings the only exact synonyms?
TV: not critical to have the types of synonyms; just use related
all in agreement (misspellings will be taken as exact, other taxonomic
synonyms use related).
Hilmar’s notes
There are taxonomic names that are found in publications and are absent
from TTO.
- These are usually synonyms of existing taxonomic definitions.
- Curators file TTO synonym term requests for those missing terms,
indicating what the currently valid taxon is
- Based on a brief survey, some of these synonyms are in fact contained
in the latest CoF update, but many are not.
- Resolution against uBio has not been attempted yet.
Evidence for synonym to current name assignment needs to be recorded.
- OBO format allows dbxref for the synonym, which is used for storing
the reference, such as the curator (e.g., pers. comm.)
- Also need unique URLs and GUIDs for publications so that these can be
referenced as dbxref.
- OBO edit allows typing in those references, but doesn’t allow
hyperlinking to a display of the source.
- Many of these publications could be imported from CoF, which maintains
a database of publications with identifiers (CoF#).
- Synonym assignments will have relationship type ‘RELATED’ except for
misspellings, for which it would be ‘EXACT’.
Action items
- Peter will try UBIO for synonyms not in CAS (and report to Phenoscape
group)
- Peter will request dbx prefix from OBO for specifying CAS publication
ids (e.g., CASREF)
- Peter will propose scheme for adding the name of the requester to
synonyms added to the TTO
- Wasila/Paula provide full refs for all curated pubs with CAS number
(if cited in CAS); PMID or DOI if not cited in CAS
Notes from Data Curation Sessions
Sylvan Lake Lodge Meeting Room - September 29, 2008 (Monday)
Wasila & Paula’s notes
Comparative phenotypes: discussion of relative size, shape characters:
Problem: Descriptors comparing size, shape among taxa within a pub
cannot be extended to taxa outside the study (e.g., Rick’s size of bone
example with three character states: large (0), small (1), extremely
small (2)) How to do this with ontologies? Judy pointed out that there
is a (dynamic line)in annotation, between the depth of a structured
vocabulary and free text. I.e. where do you stop using an ontology and
begin using free text? Data specific to study should be free-text.
Judy Blake intially suggested simply annotating our complex anatomical
characters to “shape”. This indexing first pass is useful for users to
be able to aggregate the data (vs. full curation). Through further
discussion, we agree that annotating to a more granular level, i.e.
“shape: width” would be better and more informative. The weakness of the
current PATO comes in here, in that there need to be more nodes between
shape and all the terminal nodes (the descriptors such as “narrow”
“broad” etc.). This might allow more depth than just “shape:width”.
We need to index our comparative systematic studies at a level that is
useful for the field. It is like a library, “binning” index to multiple
things.
John’s idea for recording size comparison within a study: Use shape:
width and ALSO apply an internal grading system for these characters
such that the least/smallest value is given 1.
- graded series of lengths, widths, etc.. give 1, 2, 3 …
- least/smallest value is given 1
Eric’s example of incomplete/complete scute series:
- E: scute series; Q: in contact RE: skull
- E: scute series; Q: in contact RE: dorsal fin
- E: scute series; Q: separated from RE: skull
- E: scute series; Q: separated from RE: dorsal fin
Judy: important to separate tasks of 1) anatomy ontology development and
2) annotation of publications
- suggested having ontology development workshops and curation workshops
- her suggestion: Curator needs to enter terms ahead of time and have
experts fix the annotations (problem at our workshop currently is that
people are not finding entities in ontology - need ontology work)
Mark submitted TAO request on Weberian vertebra - relationships that
don’t hold for all taxa
Wasila: batching vs. one term request: ok to submit related terms in one
request
Suzi: suggested having a small PATO workshop with ichthyologists, like
anatomy workshop
curators understand details of system
JB: put 2-3 people together to do anatomical subtree: send out invite to
ontology development
Judy’s observation notes
- Wasila and John talking about anatomical terms and what they should be
- Paula showing Mark how to log into sourceforge
- Rick looking at characters for his paper
- Wasila from Peter: question on batching or one term per request
- Jeff helps Eric remember how to log into sourceforge
- Terry working on her paper
- looking for “right” first … low hanging fruit
- Rick annotating with Wasila’s help
- Mark entering successful s.f. proposal
- Eric looking for term in his paper
- Terry asking Jeff how to enter post-composed terms
Wrap-up discussion with curators
October 1, 2008 (Wed morning)
Wasila + Paula’s notes
Paula: suggestions for improving curation process?
- Terry: would like to go through a paper first to deal with needed
terms
- also liked annotating easy terms to become familiar with ontology
structure
- John: some things, like joints, can be added in bulk (e.g. in ontology
development meeting like the one in Philadelphia)
- All: Had visualization issues - how to know what terms are there, and
their relationships?
- Need large poster of all terms (Paula, Wasila, Jeff Engeman will
make one and send file to all curators to print)
-
- Paula: PDF mark-up tool that highlights terms matching to TAO;
characters not highlighted would likely require new term
- Todd and Paula: use Phenex to highlight ontology terms because
character and state descriptions from pub are pre-entered as free
text (Paula added this to Phenex tracker)
- Mark: how does a curator decide to precompose vs. postcompose?
- Wasila: if entity will be used repeatedly (in single or multiple
pubs) then add to TAO; if not, post-compose
- Mark: when post-composing, would be helpful if Phenex would
autocomplete to pre-composed cross-product term (if present in TAO)
so that curator can be aware of similar term in ontology
- Suzi: tough anatomy stuff: you can discuss as a group of curators then
submit and enter in ontology
- Paula: need to set up svn for Phenex files (Jim will do this) - WD
will send instructions to new curators
- Rick: can we break down single character into multiple characters?
- Paula: make into multiple annotations - don’t break down character
- Pubs with same characters
- Paula: curate it all separately
- Jim: duplicates will come together in database
Paula: Problem: Currently we are not curating of entities that are
always present (e.g. “maxilla” present in all Teleostei). Problem is
that users may search for distribution of a character and may find
variability in very restricted taxon (genus) but have no information as
to the condition in the larger group (teleost fishes). Need to add these
conserved characters. Two approaches - either fine for us - need input
from Jim regarding database feasibility.
- 1. Inherit: annotate 1-2 basal species within each ostariophysan
subgroup whose character states are optimized to ancestral/internal
nodes.
- 2. Propagate: annotate higher level group node (e.g. “Teleostei”)
with a character state (e.g. “presence”) and flag with weak evidence
code. This character state is propagated to leaf
nodes/species/terminals (unless run into different character state).
- Jim: preference for 1 (inherit);
- Jim: relationship between EQ and taxon: “inferred to exhibit”;
phenotypes are exhibited by taxa; congregating “exhibits” to higher
levels seems to be ok to do (e.g., cypriniformes exhibit red and blue
dorsal fin)
- Jim: might need more than one relationship between taxon and a
phenotype; right now we have “exhibits” as the relationship
- John: We want biological principles incorporated here/ examples of
optimization
- Paula: This relates to character mapping/optimization. May need new
methods to optimize EQs?
Action items:
- Paula: Will followup with curators regarding completeness of
literature surveyed –Pmabee@usd.edu 11:05, 14 October 2008
(EDT)Done
- Paula & John: Will work up optimization example for group that
summarizes basic user mapping needs
Curation consistency experiment: Review of Results
Five curators (Engeman, Grande, Hilton, Mayden, Sabaj) participated in
the consistency experiment on September 30, 2008 (Wed.). The results of
the experiment are discussed in more detail
here.
Hilmar’s notes on discussion of experiment with curators and
advisors
Character #2
- difference between depth, length, width
- some failed to post-compose (e.g., increased length)
- need to standardize shape, length, depth, and width definitions
- should then use higher level character (such as ‘shape’)
Character #3
- some use ‘increased count’ which is-a count
Character #4
- question of whether basihyal bone precludes presence (or implies
absence) of cartilage
- question of how to annotate this looks more difficult than in reality
is
- basihyal cartilage absent implies basihyal bone absent (because the
latter develops from the former)
- in fact it can also be that the cartilage is absent b/c it has
developed into the bone (completely ossified)
- hence need to add that basihyal is absent too
- graph view can be very helpful
Character #5
- software should prevent filling in relative entities for qualities
that aren’t relational
- post-composed relative entity because of uncertainty of whether the
full (existing) term would be compatible with the definition the
author may have been using (for a different clade)
- difficulty to find the ‘bony projection’ by auto-completion (because
it doesn’t pop up when typing the beginnings of ‘projection’)
- problem of capturing the ‘overlaps with’ aspect in addition to
‘anterior to’
- full text search of the definitions would be very useful
Character #6
- hypural is also contained within the upper lobe of the caudal fin
- difficulty of finding and comparing definitions of relationship types
Character #7
- Software should ensure that entity starts with an entity term if
post-composed, not a quality
Character #8
- sphenotic needs to go into the comment field
Character #9
- too complex to express the exact nature of the orientation, so choose
just ‘orientation’
Character #10
- triradiate versus tripartite