Difference between revisions of "Phenoscape Grant Renewal Workshop/Notes"

From phenoscape
(Idea bin)
(Idea bin)
Line 70: Line 70:
 
#* Mouse is done but needs to be rearranged (MP decomposition)
 
#* Mouse is done but needs to be rearranged (MP decomposition)
 
#* Amniotes need to be done from the ground up
 
#* Amniotes need to be done from the ground up
 +
 +
== Wednesday afternoon ==
 +
 +
* Problem of lossy transformation of legacy character data
 +
** EQ annotation is often at a lesser granularity (or level of detail) than the original free text description
 +
** This isn't due to any technological obstacles but rather due to the limited resources available for annotation.
 +
** Annotation effort is best focused to a granularity level where it suffices to solve a use-case of interest.
 +
** Annotating at very granular levels is entirely possible but can take a lot of time because many of the ontology terms needed are likely not to be present yet in the ontology.
 +
** How does this impact phylogeny and data matrix reconciliation, or phylogenetic reconstruction on the basis of the EQ annotations?
 +
* Use-case for development
 +
** Heterochrony would be interesting if one can pull it out of the database
 +
** Developmental ordering relationships currently used are confined to develops_from and transforms-into.
 +
*** This is missing yet for mouse.
 +
*** Wouldn't really allow inferring heterochrony.
 +
*** Would allow though to place and compare the placement of phenotypes into a developmental chain of transformations.
 +
* Demonstration of Phenoscape KB
 +
** Some of the features we are developing (the 3 principle queries) would be very useful to MOD users
 +
* Informatics goals for Phenoscape 2:
 +
** Supporting units for measurements
 +
** Allowing others to compare their phenotype data to the knowledge base
 +
** Generic ontological queries, such as a SPARQL endpoint
 +
** Private data overlay
 +
** Linked Data and semweb integration
 +
** Tree visualization
 +
** Triple store technology evaluation
 +
** Species ID
 +
** Phylogenetically informative distance metric based on EQ assertions
 +
** NLP-based text processing, Mass curation
  
 
== Idea bin ==
 
== Idea bin ==
  
* Intermine connection (multiple model organisms, AJAX-based widget for displaying protein family tree)
+
# Intermine connection (multiple model organisms, AJAX-based widget for displaying protein family tree)
 +
# Some phenotypes imply developmental abnormalities, differences, or variation.
 +
#* For example, a "poorly ossified cranium" in an adult amphibian implies that the developmental process was delayed or did not complete.
 +
# How does logical inference of homology propagate over develops-from relationships.
 +
#* E.g., if A and B are asserted homologous, and C develops from A and D develops from B, are then C and D inferred as homologous, and are C and B inferred as homologous.

Revision as of 14:11, 30 April 2009

Tuesday afternooon

  • Will be extending the taxonomic scope to extant and extinct vertebrates
    • Create an annotation database spanning the taxonomic scope and their anatomy ontologies
    • Three multi-species anatomy ontologies are being developed: teleosts, amphibians, mammalian anatomy
    • Tree mapping: show when an evolutionary phenotype first appeared
  • EQ support in all involved model organism databases
    • Structure of MP is not consistent with anatomy, or with PATO
    • Mapping from MP to EQ syntax is being worked on
    • MGI isn't in a position to use EQ internally, but for example Phenoscape could provide an EQ-view on mouse phenotypes, using the decomposition of MP into anatomy (or entity) and quality term cross-product
    • Full EQ annotation for OMIM is being planned but not yet funded
    • MGI, Xenbase, and ZFIN all have links from their phenotype data to OMIM
  • Development of anatomy
    • Different approaches between MGI and ZFIN:
      • MGI uses complete anatomy at different developmental stages
      • ZFIN uses one anatomy ontology and adds start and end dates to indicate the developmental period during which it appears (adult structures don't have an end date)
      • Xenbase uses the ZFIN approach
    • Candidate genes for evolutionary change could be derived from anatomy-annotated gene expression studies during development
      • genes responsible for or associated with morphological changes during individual development could be candidates for evolutionary change
      • evolutionary phenotype changes could be used to query morphological changes during development
    • Can we enable queries to generate hyptheses based on ontogeny reflecting evolutionary history?
      • Evolutionary developmental data could come from medaka, stickleback
  • Sequence of steps:
    1. Ontology building
      • Adult phenotype annotations for mutants in Xenbase
      • Presently only 3 species for amphibians: Xenopus, Dermophis, Salamandria
      • Development decoupling from anatomy ontology in mouse
      • Expanding mammalian anatomy to include extinct species
      • Anatomy for amniotes (the clade including mouse and dinosaurs), including extinct taxa (such as dinosaurs)
        • Scope of this could be overwhelming
        • Should be strongly driven (or staged) according to the character matrices to be annotated
        • Chicken anatomy could provide a starting point?
        • There is work on bird anatomy that uses a latin naming scheme
        • Work on any smaller-scope anatomy (or multi-species anatomy) ontology will contribute terms that apply more broadly
        • There may not be very many neomorphs between birds and mammals, though there are a few areas such as the digits in birds where the exact homology relationships aren't as clearly agreed upon.
    2. Mammalian phenotype decomposition
      • Need support for QC'ing the results of automated decomposition
      • Alignment of MP with mouse anatomy
    3. Annotation
      • Which published character matrices are there for dinosaurs?
  • Ontology mapping and alignment
    • Need to align teleost and amphibian anatomy to mouse.
    • Mouse and chicken need to be mapped to amniote anatomy. Mouse is a better start because there are genetic data.
  • Reconciling character-derived trees and character definition and use between different matrices and trees that share taxa (Paul)
    • TaxonSearch
    • Formalizing and generating grammar of character coding and character state definition
    • Analyzing character usage (shared, not shared, rejected) between trees that share taxa
    • Conflicting phylogenies for the same set of taxa often turn out to be based on character codings that are not consistent or not compatible

Wednesday morning

  • Nomenclatural history of terms in ontologies
    • Current exchange format standards don't support this really beyond obsoletion
    • ZFIN tracks within the database the complete nomenclatural history of gene names
    • RDBOM tracks literature and author attribution for anatomy terms but not yet nomenclatural history
  • Character quality profiles for data matrices
    • Addresses the question of "Where in the skeleton are the changes occurring that drive the phylogeny"
    • Distribution of characters across the skeletal anatomy, at different levels of the hierarchy
    • Distribution of missing data across the anatomy, at different hierarchy levels
    • How many taxa have how much missing data, distributed on which anatomical parts
  • Use case: Character redundancy
    • Deriving a character profile across the anatomy could help visualize the redundancy between annotations
  • Use case: Mapping the evolutionary characters that lead (and distinguish) to a model organism. Subsequently, see whether these evolutionary changes and the order in which they appear to the ontogeny of the organism.
    • Will need to deal with character gaps and with character redundancy for this.

Specific goals:

  1. Expand taxonomic coverage of Phenoscape to Vertebrata
    • Mouse is done but needs to be rearranged (MP decomposition)
    • Amniotes need to be done from the ground up

Wednesday afternoon

  • Problem of lossy transformation of legacy character data
    • EQ annotation is often at a lesser granularity (or level of detail) than the original free text description
    • This isn't due to any technological obstacles but rather due to the limited resources available for annotation.
    • Annotation effort is best focused to a granularity level where it suffices to solve a use-case of interest.
    • Annotating at very granular levels is entirely possible but can take a lot of time because many of the ontology terms needed are likely not to be present yet in the ontology.
    • How does this impact phylogeny and data matrix reconciliation, or phylogenetic reconstruction on the basis of the EQ annotations?
  • Use-case for development
    • Heterochrony would be interesting if one can pull it out of the database
    • Developmental ordering relationships currently used are confined to develops_from and transforms-into.
      • This is missing yet for mouse.
      • Wouldn't really allow inferring heterochrony.
      • Would allow though to place and compare the placement of phenotypes into a developmental chain of transformations.
  • Demonstration of Phenoscape KB
    • Some of the features we are developing (the 3 principle queries) would be very useful to MOD users
  • Informatics goals for Phenoscape 2:
    • Supporting units for measurements
    • Allowing others to compare their phenotype data to the knowledge base
    • Generic ontological queries, such as a SPARQL endpoint
    • Private data overlay
    • Linked Data and semweb integration
    • Tree visualization
    • Triple store technology evaluation
    • Species ID
    • Phylogenetically informative distance metric based on EQ assertions
    • NLP-based text processing, Mass curation

Idea bin

  1. Intermine connection (multiple model organisms, AJAX-based widget for displaying protein family tree)
  2. Some phenotypes imply developmental abnormalities, differences, or variation.
    • For example, a "poorly ossified cranium" in an adult amphibian implies that the developmental process was delayed or did not complete.
  3. How does logical inference of homology propagate over develops-from relationships.
    • E.g., if A and B are asserted homologous, and C develops from A and D develops from B, are then C and D inferred as homologous, and are C and B inferred as homologous.