Data Jamboree 2/Notes

From phenoscape
Revision as of 03:54, 7 October 2008 by Hilmar (talk | contribs)

User interface

User interface demonstration

  • option for hierarchical indexing of results (taxonomy, phylogeny, but also anatomy ontology)
  • mapping characters on a tree: multiple phenotypes may match any particular query
    • map to different colors for indicators? use numbers as indexes?
    • mapping phenotypes onto trees cannot typically reconstruct character state changes, and hence traditional visualizations may be misleading?
  • ability to prune species with no data (values) for export
  • search interface: ability to combine taxon/entity/quality specifications (and, or, not)
  • graph navigation: Dbgraphnav, Cytoscape
  • clickable fish image for starting navigation
  • most common entry point is likely to be a simple one-field form for entering terms
  • phenotype query prototype: how do I get from here to the genes?
  • ability to see correlations between phenotypes

MGI batch query demo

  • users don't use complex query forms
  • auto-detect type of input tokens
  • allow download in different formats
  • computationally savvy users
  • pre-written SQL as available from GO website

User interface strategy

  • Most common entry gateways: search by
    • taxon
    • gene (from ZFIN)
    • anatomical entity
  • Use case: find evolutionary phenotypes that match a mutant ZFIN phenotype
    • Query by phenotype, result is species and ZFIN mutants that have matching annotations
    • Alternatively, query by phenotype profile (several phenotypes)
      • This could be retrieved by ZFIN mutant rather than typing them all in
  • Inverting this use case: find ZFIN mutants for a set of phenotypes that differs between two taxa
    • Query by two taxa, pull out all phenotype annotations that aren't annotated to both taxa (each of which may be a clade)
    • User can remove phenotype annotations from that result to create the search profile
  • Use case: find matching taxa and/or genes for a phenotype or phenotype profile
    • Query by anatomical entity to retrieve matching phenotypes ([Q]ualities, essentially)
    • Build phenotype profile from that (choosing or removing phenotype annotations)
  • Summarization of results:
    • Number of matching taxa, ZFIN genes, publications
  • Search by anatomy term
    • obtain the qualities (and entities if query is higher-level) for the query term
    • use this to be build query profile for obtaining matching ZFIN genes
  • Search by taxon:
    • results in phenotypes annotated to this taxon
    • list of publications, possibly as a secondary result after narrowing down phenotype list by anatomy term
    • should be able to see where in the tree the currently selected taxon is
  • Search publications:
    • Search by author, by taxon, by anatomical term
    • Search by identifier (doi)
  • Search by ZFIN gene (or mutant)
    • Results in phenotypes annotated to this mutant
    • Use these to retrieve publications describing such phenotypes (regardless of taxon or gene)
    • Obtain the number of these phenotypes that are used for evolutionary annotation (i.e., annotated to - presumably normal - taxa)
  • Summarizing data per publication
    • number of taxa and/or anatomical entities or phenotypes matching the original query, compared to overall number of taxa or anatomical entities or phenotypes

Feedback and Suggestions on proposed Phenoscape UI

  • Data cube like representation to represent taxa, phenotypes, and genes on three dimensions. This should allow combinations of two of the three parameters to be displayed at any given time(Rick Mayden)
  • While looking for gene-phenotype associations, display ALL phenotypes the gene is associated with IN ADDITION to the phenotypes of interest (Rick Mayden)
  • Term Information area for a selected term can be used to display all synonyms (dbxrefs?) and misspellings (Rick Mayden)
  • In publication search results, indicate whether the publication was curated or not. If curated, display the details of curations such as curator's info, date curated, and importantly, versions of the ontologies that were used in the curation process (Mark Sabaj, Rick Mayden)
  • Provide external links to other morphological databases such as MorphBank, MorphoBank etc (Rick Mayden, Mark Sabaj, Terry Grande, Eric Hilton)
  • Enable search for related entity to the phenotype (Eric Hilton)
  • While displaying publications, include a link to Authors' page and if possible, links to related people (Terry Grande)
  • Provide links to images (Terry Grande)
  • Group phenotype query results by taxa. Higher taxa should be expandable to display lower taxa (Paula Mabee, John Lundberg)
  • Display result distributions among taxa when displaying results from phenotype queries (Paula Mabee)
  • Group results by author, publication date etc while displaying publications (Paula Mabee)
  • Group annotations by author, annotation date (Paula Mabee)
  • Allow taxon/gene/phenotype combinations for querying (Paula Mabee)

Taxon Concepts

Synonym scanner is working well but will never be perfect because CoF doesn't list every synonym in existence (because they may not detect every use in literature, or maybe deemed a name not worthy of addition as synonym)

TJP: Have we checked requested taxa not in CoF against UBio - that would provide an LSID that could provide a dbxref to anchor the taxon

Addition of synonyms not in TTO: currently Peter must add it by hand.

Need to track - Assoc between synonym, person who requested and publication? - Not being tracked right now.

JGL: We don't want to give this to CoF because these synonyms are picked up in morphological or phylogenetic stdies. These names that appear and we flag as synonyms to valid names in CoF are not names coming out of taxonomic research; these are mistakes: OCR, typographical mistakes of author, or author mistakes of species in wrong genus.

Seems we should track all of this in database?

We want: - author year of the reference (right now it's in comments) - want in a searchable context - does CoF have a database of publications?

PM: ed publication information (DOI, SCSI) -CoF has an internal reference - We also need to record the full citation ourselves (name, year, unpublished dissertation...)

Peter: dbxref from CoF add to TTO TTO just hold dbxref not whole ref (no structured place to hold it).

synonym reference - how should it be handled?

SL: synonym xref can be a person (initials)

to summarize: full text ref would be in db; onotology owuld have a pointer to that

Peter: need interface to the db; won't be visible to obo-edit - make the dbxref a url link to URI

TV: We need a universal place to resolve that URL

PM: synonym types necessary? misspellings; narrow etc...

Peter: Hoping we can scan stuff in from CoF

John: see ANSP collection search: type in a type specimen name and it pulls in the CoF ref need CoF identifier; CASspec?

PM: who should do this?

JGL: why can't curator do this?

PM: too much time; can be done in bulk

Peter: synonym types: - current vocab is exact, related, broader, narrower - all synonyms currently are related (the weakest relation). - we will want misspelling to be even weaker than related; - distinguish between published synonomy vs, curator made decision -

HL: weighing how trustworthy the evidence vs. designation of weak/strong; to HL, misspelling is a strong exact synonym not weak

PM: are misspelings the only exact synonyms?

TV: not critical to have the types of synonyms; just use related

all in agreement.

Hilmar's notes

There are taxonomic names that are found in publications and are absent from TTO.

  • These are usually synonyms of existing taxonomic definitions.
  • Curators file TTO synonym term requests for those missing terms, indicating what the currently valid taxon is
  • Based on a brief survey, some of these synonyms are in fact contained in the latest CoF update, but many are not.
  • Resolution against uBio has not been attempted yet.

Evidence for synonym to current name assignment needs to be recorded.

  • OBO format allows dbxref for the synonym, which is used for storing the reference, such as the curator (e.g., pers. comm.)
  • Also need unique URLs and GUIDs for publications so that these can be referenced as dbxref.
  • OBO edit allows typing in those references, but doesn't allow hyperlinking to a display of the source.
  • Many of these publications could be imported from CoF, which maintains a database of publications with identifiers (CoF#).
  • Synonym assignments will have relationship type 'RELATED' except for misspellings, for which it would be 'EXACT'.