Difference between revisions of "Project Plan"

From phenoscape
(4. Homology (Paula, Hilmar))
(Aims and working groups)
 
(24 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Much of the schedule below is based on the [[:File:Phenoscape_Project_description_refs.pdf| 1 July 2011 funded NSF grant]].  Our development schedule is revised as shown in the figure below in concert with budget cuts.  Relative to the original proposal the support for curation, particularly in the model organism databases, is significantly reduced.  This will impact our ability to respond as any problems arise with integration of the evolutionary data and somewhat compromise our ability to rigorously evaluate the tools being developed. Additionally, we will scale back the development goals for the semantic similarity search engine, focussing our efforts on achieving scalability and speedup.  We have also frontloaded the plans for execution of the NLP work to achieve a scalable workflow as early as possible in the process.
+
__TOC__
[[Image:Phenoscape2fig.png|center|800px]]
+
= Goals in progress, coming up, and scheduled =
  
=1. Scalable workflow=
+
We are keeping our high-level goals, milestones, and deliverables in a [https://trello.com/b/xYkl0qmC public Trello Board]. (Click this link to get the board in its own window.)
==1.1 NLP in aid of ontology building and EQ annotation (Hong) ==
 
Curation of legacy phenotypes from the literature is a major bottleneck. The overall objective of this part of our work is to improve the efficiency with which curators can find accurate terms, add missing terms, etc.
 
  
Working group: Hong Cui, Jim, Todd*, Wasila, Judy, Hong's MS student Zilong Chang, Paula
+
<embedurl>https://trello.com/b/xYkl0qmC</embedurl>
*coordinator: Todd
 
  
=== Target milestone: First quarter: October 1, 2011  ===
+
= Aims and working groups =
Objective: Develop NLP to generate potential ontology terms and candidate EQ's in Phenex
 
* Generate list of terms to be added to ontologies (Hong, done)
 
* Start to develop algorithms for "term to ontology" mapping
 
* Identify corpus of publications, extract character descriptions (who's responsible for this?)
 
  
 +
; 1. Scalable workflow :
  
Action items:
+
:; 1.1 NLP in aid of ontology building and EQ annotation : Curation of legacy phenotypes from the literature is a major bottleneck. The overall objective of this part of our work is to improve the efficiency with which curators can find accurate terms, add missing terms, etc.
*Wasila to send Hong the PDFs containing characters from the 50 char list - done
+
::* Participants: '''Hong Cui''' (lead), Jim, Todd, Wasila, Judy, Hong's MS student Zilong Chang, Paula
*Action item: Wasila to send Paul a curated pub - done
 
*Action Item: Jim will send Hong full database report with character text and EQ assignments
 
*Hong: Hire MS student at AZ (e fdone)
 
*Todd: Hire UNC postdoc
 
*Jim: to enumerate pros and cons for MX (web-based) and Phenex for first phone call, estimate development time
 
*Interface between Hong’s tools and Phenex (or MX)
 
  
=== Target milestone: Second quarter: January 1, 2012  ===
+
:; 1.2 Term broker, in collaboration with NCBO : The overall goal here is to obtain temporary ID for anatomy ontology terms, communicate it to Phenex, and automatically replace with permanent terms when available.
 +
::* Participants: '''Jim''' (lead), Natasha, Hilmar, Judy, Wasila
  
=== Target milestone: End of Year 1:July 1, 2012  ===
+
; 2. Ontology development and coordination:
* Generate entities and qualities that can be mapped to ontology
 
* Evaluate accuracy of automated EQs on 50 character test set, refine testing set and methodology (Hong)
 
* Begin development of Phenex ‘EQ suggestion’ interface requirements/specifications (Todd)
 
  
=== Target milestone: Year 2:July 1, 2013  ===
+
:; 2.1 Anatomy ontologies : The objective is to coordinate the development and alignment of multispecies and single species ontologies for vertebrates.  The lead curators of the zebrafish, Xenopus, and mouse anatomy ontologies, the teleost and amphibian multi-species ontologies, and the proposed amniote anatomy ontology will meet regularly to review terms from the skeletal branch, and update and synchronize ontologies accordingly. The focus of ontology development in year 1 is the limb/fin skeletal branch. Specific ontology development plans are [[Ontologies | here]] and Ontology development workflow is [[Ontology workflow | here]].
=== Target milestone: Year 3:July 1, 2014  ===
+
::* Participants: '''Wasila''' lead), Nizar, Lauren, Paul, David, Terry, Yvonne, Ceri, Christina, VG, Paula, Chris
=== Target milestone: Year 4:July 1, 2015  ===
 
  
==1.2 Term broker, in collaboration with NCBO (Jim) ==
+
:; 2.2 Vertebrate Taxonomy Ontology : The objective is to develop a taxonomic ontology that includes all fossil and extant vertebrate taxa from community-vetted sources (PBDB, Paul's database, AmphibiaWeb).  This ontology is required for curation of phenotypes and querying.
The overall goal here is to obtain temporary ID for anatomy ontology terms, communicate it to Phenex, and automatically replace with permanent terms when available.
+
::* Participants: '''Peter''' (lead), Paul, David, Nizar, Wasila
  
Working group: Jim*, Natasha, Hilmar, Judy, Wasila
+
:; 2.3 Sync Tool :
* coordinator: Jim
+
::* Participants: '''Jim''' (lead), Wasila, Chris, Ceri, Yvonne
* Judy, Wasila, and Hilmar to form a committee to decide on next steps
 
* We envisioned curator working within WebProtege within the NCBO environment within which temp ids are captured; after term vetted, replaced w/in ontology automatically?
 
* review specification (done at June mtg; Bioportal api: bioontology.org/wiki/index.php/BioPortal_Provisional_terms)
 
* find out how are privileges are handled? (not stated in documentation).
 
* Judy will look into 'term genie' in relation to this project
 
* Evaluate move from OBO to Protege: (1) necessary for NCBO coordination with term broker, (2) not necessary for import/export of vertebrate subontologies (OBO equivalent)
 
* address the disconnection at initiation of community vetting process and integrating that process into term provenance. We had proposed to use the NCBO provenance
 
mechanism.
 
* Wasila to document current term change process and requirements for term broker
 
* '''Update:''' NCBO Bioportal has implemented a provisional term service which can be used as the term broker. Natasha Noy of Stanford has worked with Hilmar on our requirements for this. --[[User:Balhoff|Balhoff]] 10:17, 13 October 2011 (EDT)
 
  
=== Target milestone: First quarter: October 1, 2011  ===
+
; 3. Phenotype annotation :
Priorities and milestones (3 mos, 6 mos, 1 yr)
 
* Decide on technologies for longer term
 
* Decide if we can use a short-term patch (e.g. in Brix)
 
* Get requirements to NCBO for both term request and provenance
 
* Dependency on closing curation cycle and tying ourselves in to it
 
* Suitability of web protege, insufficient resources available to build plug-ins
 
  
=== Target milestone: Second quarter: January 1, 2012 ===
+
:; 3.1 Evolutionary phenotypes : The objective is to transform the characters and character states from published phylogenetic studies into ontology-based descriptions ('Evolutionary phenotypes'), with a focus on fin and limb morphology. This will require the development of a list of papers to be curated, re-evaluation of software curation tool, training of personnel in use of curation software and ontology development, and development of appropriate ontologies. Annotation workflow is described [[Curation workflow | here.]]
=== Target milestone: End of Year 1:July 1, 2012  ===
+
::* Participants: '''Paula''' (lead), David, Paul, Wasila, Jim, Nizar
=== Target milestone: Year 2:July 1, 2013  ===
 
=== Target milestone: Year 3:July 1, 2014  ===
 
=== Target milestone: Year 4:July 1, 2015  ===
 
  
=2. Ontology development and coordination=
+
:; 3.2 Model organism phenotypes (Monte) : To annotate the skeletal phenotypes for fin and limb for genetic mutants of zebrafish, Xenopus, and mouse. The model organism (MOD) curators will initially prioritize comprehensive annotation of skeletal phenotypes for the fin and limb, and subsequently of skeletal phenotypes in general.
==2.1 Anatomy ontologies (Wasila)==
+
::* Participants: '''Monte''' (lead), ZFIN (Monte, Ceri,Yvonne), Xenbase (Aaron, Christina), MGI (Judy, Terry)
The objective is to coordinate the development and alignment of multispecies and single species ontologies for vertebrates.  The lead curators of the zebrafish, Xenopus, and mouse anatomy ontologies, the teleost and amphibian multi-species ontologies, and the proposed amniote anatomy ontology will meet regularly to review terms from the musculoskeletal branch, and update and synchronize ontologies accordingly. The focus of ontology development in year 1 is the limb/fin skeletal branch. Specific ontology development plans are documented [[Anatomy_Ontology_Development_Plan | here.]]
 
  
Working group: Wasila*, Nizar, Lauren, Paul, David, Terry, Yvonne, Ceri, Christina, VG, Paula, Chris
+
; 4. Homology : The legacy homology assertions for the fin-limb skeleton, including assertions of both phylogenetic and iterative (serial) homology, and the genes involved in growth and patterning of the limb at various stages, e.g., Bmps, Fgfs, Gdf5, Sox9 are also well known.
*Coordinator
+
:* Participants: '''Hilmar''' (lead), Chris, Paula, David, Paul, Nizar
  
=== Target milestone: First quarter: October 1, 2011  ===
+
; 5. Semantic similarity search engine (aka Phenoblast) : Provide the ability for users to take a phenotype collection (of terms) and look across all collections for those that semantically match the terms. Like BLAST, highest ‘hits’ would be ranked first, and user could drill down.
Action items:
+
:* Participants: '''Todd''' (lead), Hilmar, Jim, Chris, Judy, Paula, Peter
  
* Set up weekly anatomy ontology conference calls and mailing lists -- ''done''
+
; 6. Knowledgebase reasoning and development :
* Development of synchronization plug-in in 3-6 mo window -- ''ongoing''
+
:* Participants: '''Hilmar''' (lead), Todd, Jim, Chris, Paula, Wasila and others for UI development
* Synchronization:
 
** communication with ZFIN and XenBase on issue of moving to GO like model, probably long-term goal ''-- done, August 2011''
 
** incorporate MIREOT into ontologies - ''done: modified import/MIREOT strategy used in new TAO; will be used in AMAO''
 
* XAO update (lead by Christina, VG, Erik) -- ''done, August 2011:''
 
**Complete definitions; add CARO terms to make is_a complete; Review of overall structure (review structure from 2008)
 
**Terms from AAO and VAO integrated into XAO (including xrefs); looked to ZFA and UBERON for additional terms needed
 
* Training in ontology development and curation
 
** ''-- done: Paul and Lauren visited NESCent in October 2011''
 
  
=== Target milestone: Second quarter: January 1, 2012  ===
+
; 7. Capstone : As a capstone, in years 3 and 4 of the project, we will validate the capabilities of the above suite of tools by testing how well known developmental pathways for the well-studied fin/limb skeletal transition in vertebrate evolution are identified and how well it scales to a datastore containing billions of phenotypes.
Action items:
+
:* Participants: '''Todd''' (lead), everyone in project
* VAO update: cloning of fin terms from TAO and cross-referencing or importing non-skeletal Uberon terms -- ''in progress''
 
* Review cell and process terms in VAO, CL, and GO; submit term proposals to CL and GO - ''ongoing with Alex Diehl and Melissa Haendel''
 
* Training in ontology development and annotation planned for Nizar at NESCent in November, 2011
 
* Review of limb and cranial skeletal terms by Nizar and Paul
 
* initiate Amniote Anatomy Ontology (AMAO) -- ''draft OBO file created''
 
  
=== Target milestone: End of Year 1:July 1, 2012  ===
+
= NSF Phenoscape project abstracts =
* submit AMAO to the OBO Foundry
 
 
 
=== Target milestone: Year 2: July 1, 2013  ===
 
* Annotation for AAO to begin in Year 2 (after cloning XAO)
 
=== Target milestone: Year 3: July 1, 2014  ===
 
=== Target milestone: Year 4: July 1, 2015  ===
 
 
 
==2.2 Vertebrate Taxonomy Ontology (Peter)==
 
The objective is to develop a taxonomic ontology that includes all fossil and extant vertebrate taxa from community-vetted sources (PBDB, Paul's database, AmphibiaWeb).  This ontology is required for curation of phenotypes and querying.
 
 
 
Working group: Peter*, Paul, David, Nizar, Wasila
 
*Coordinator
 
 
 
=== Target milestone: First quarter: October 1, 2011  ===
 
Action Items
 
* Paul will generate taxonomy and bring to NESCent in August  -- ''In progress week of October 2011''
 
* David will update the ATO based on his in-progress revision of AmphibiaWeb (with outreach to ???)
 
* initiate Vertebrate Taxonomy Ontology (VTO) -- ''Initial version generated, refinements in progress week of October 2011''
 
* Obtain names for extinct taxa from PaleoDB -- ''Done''
 
** Add PaleoDB terms to VTO -- ''In progress''
 
 
 
=== Target milestone: Second quarter: January 1, 2012  ===
 
*Tool for small batch updates from PaleoDB
 
**update VTO from list of taxa previously added
 
 
 
=== Target milestone: End of Year 1:July 1, 2012  ===
 
*Support generation of OWL (individual-based) taxonomies (Peter?,Jim?)
 
*Finalize workflow for bulk updates from CoF(via TTO), ATO, and PaleoDB
 
*Guide for updating taxonomy ontologies with curation driven additions
 
 
 
==2.3 Sync Tool (Jim)==
 
Working group: Jim*, Wasila, Chris, Ceri, Yvonne
 
=== Target milestone: Second quarter: January 1, 2012  ===
 
* Sync Tool 0.5
 
** management of terms excluded from syncing
 
 
 
=3. Phenotype annotation=
 
==3.1 Evolutionary phenotypes (Paula)==
 
The objective is to transform the characters and character states from published phylogenetic studies into ontology-based descriptions ('Evolutionary phenotypes'), with a focus on fin and limb morphology.  This will require the development of a list of papers to be curated, re-evaluation of software curation tool, training of personnel in use of curation software and ontology development, and development of appropriate ontologies.
 
 
 
Working group: Paula*, David, Paul, Wasila, Jim, Nizar
 
*Coordinator
 
 
 
=== Target milestone: First quarter: October 1, 2011  ===
 
Objective: Develop a prioritized list of phylogenetic papers containing vertebrate fin/limb data for curation; evaluate curation tool; prepare KB for vertebrate data
 
 
 
Action items:
 
* Develop a list of priority papers (pdfs) to be curated. - ''done except archosaurs''
 
* Training (ontology editor; annotation tool) for Paul & Nizar at NESCent - ''Oct 10-13, Paul and Lauren at NESCent''
 
* Paul and Nizar will document in lists any additional terms and definitions before meeting; prepare to add them to AmAO at NESCent. - ''ongoing''
 
 
 
=== Target milestone: Second quarter: January 1, 2012  ===
 
* Annotation for AmAO to begin after AmAO developed (August 2011)
 
 
 
=== Target milestone: End of Year 1:July 1, 2012  ===
 
 
 
=== Target milestone: Year 2: July 1, 2013  ===
 
* Annotation for AAO to begin in Year 2 (after cloning XAO)
 
=== Target milestone: Year 3: July 1, 2014  ===
 
=== Target milestone: Year 4: July 1, 2015  ===
 
 
 
==3.2 Model organism phenotypes (Monte) ==
 
Objective: To annotate the skeletal phenotypes for fin and limb for genetic mutants of zebrafish, Xenopus, and mouse. The model organism (MOD) curators will initially prioritize comprehensive annotation of skeletal phenotypes for the fin and limb, and subsequently of skeletal phenotypes in general.
 
 
 
Working group: ZFIN (Monte*, Ceri,Yvonne), Xenbase (Aaron, Christina), MGI (Judy, Terry)
 
* coordinator
 
 
 
=== Target milestone: First quarter: October 1, 2011  ===
 
* Curation of expression and phenotypes
 
* Investigate additional funding through NSF (Todd) and NIH (Monte, Judy)
 
* Determine whether the current MP->EQ mapping is sufficient (i.e., the mapping that had been done with
 
George)? In particular, is the limb and limb girdle mapping complete?
 
 
 
Action items:
 
* Judy will check with Martin to see if limb mapping has done
 
* Determine who is responsible for completing the mapping and keeping it up to date? Cindy?  How to coordinate with PATO?
 
* Are there outstanding problems with developmental phenotypes for mouse, i.e. how to incorporate the abstract mouse?
 
* Determine timeline for:
 
** Developing pipelines for uploading Xenbase and MGI phenotype data to Phenoscape
 
** Incorporating expression data into Phenoscape KB
 
 
 
=== Target milestone: Second quarter: January 1, 2012  ===
 
=== Target milestone: End of Year 1:July 1, 2012  ===
 
=== Target milestone: Year 2: July 1, 2013  ===
 
* In year 2, Xenbase will begin curating phenotypes
 
 
 
=== Target milestone: Year 3: July 1, 2014  ===
 
=== Target milestone: Year 4: July 1, 2015  ===
 
 
 
=4. Homology (Hilmar)=
 
Objective: The legacy homology assertions for the fin-limb skeleton, including assertions of both phylogenetic and iterative (serial) homology [50], and
 
the genes involved in growth and patterning of the limb at various stages, e.g., Bmps, Fgfs, Gdf5, Sox9
 
are also well known, e.g., [20
 
 
 
Working group:  Hilmar*, Chris, Paula, David, Paul, Nizar
 
*coordinator
 
* Use-cases and requirements for querying and reasoning; Decisions on reasoning etc. important because system architecture depends on it.
 
* Collecting, curating, and annotating homology assertions
 
* Logical model for homology in the KB, including for serial homology
 
* Exposing homologies on UI
 
* Integration of homology into reasoning
 
* Exposing homology inferences (for display and for querying/filtering) through UI
 
* Handling ‘Default homology’: same term in different taxa and with no evidence to the contrary, homologous.
 
 
 
=== Target milestone: First quarter: October 1, 2011  ===
 
Action items
 
* Collect use cases to develop and test reasoning - on wiki (look for examples of structures that are symmetric in some species; where serial structures are separate vs.fused in some taxa).
 
* Make sure taxa in Vertebrate Homologies document are in VTO (Peter)
 
* Work out initial reasoning for serial homology
 
* Establish matrix of fin/limb homologies, evidence codes -- Matrix established (google doc: https://docs.google.com/spreadsheet/ccc?key=0Apgi__7Z2km5dHQ2eTRWeEpCOWwtS2VIZC1tM2lPbUE&hl=en_US#gid=0)
 
 
 
=== Target milestone: Second quarter: January 1, 2012  ===
 
* Ensure testing can be accomplished within Protege, 3-6 months (Hilmar, Chris)
 
* Finalize small test matrix of fin/limb homologies, evidence codes
 
 
 
=== Target milestone: End of Year 1:July 1, 2012  ===
 
* Plan bake-off of homology models after testing is complete? (Practicality not clear)
 
* Define tests for knowledgebase based on OWL-DL test suite (Jim)
 
* Establish use cases for homology reasoning, esp iterative homology
 
* Plan homology jamboree for experts on the fin/limb in year 2
 
 
 
=== Target milestone: Year 2: July 1, 2013  ===
 
* Ensure that early data annotation has coverage over test homologies
 
* Convene homology jamboree for experts
 
 
 
=== Target milestone: Year 3: July 1, 2014  ===
 
=== Target milestone: Year 4: July 1, 2015  ===
 
 
 
Communication and coordination issues
 
* Plan on homology theme at RCN no earlier than Yr3
 
* Touch base with Parkinson re: VBO plans (Monte)
 
 
 
=5. Semantic similarity search engine (aka Phenoblast) and OBD/OWL (Todd)=
 
Objective: Provide the ability for users to take a phenotype collection (of terms) and look across all collections for those that semantically match the terms. Like BLAST, highest ‘hits’ would be ranked first, and user could drill down.
 
 
 
Working group: Todd*, Hilmar, Jim, Chris, Judy, Paula, Peter
 
*coordinator
 
 
 
Action items:
 
=== Target milestone: First quarter: October 1, 2011  ===
 
* literature review of existing semantic similarity metrics and algorithms
 
* compilation of test sets for benchmarking and evaluation
 
=== Target milestone: Second quarter: January 1, 2012  ===
 
* installation and existing tools (standalone, R libraries, etc)
 
* elaboration of alternative graph-based metrics
 
* R implementation of graph-based metrics
 
=== Target milestone: End of Year 1:July 1, 2012  ===
 
* initial scalability/performance testing of tools
 
* begin work required on algorithm optimization
 
=== Target milestone: Year 2: July 1, 2013  ===
 
* software development, testing, documentation
 
=== Target milestone: Year 3: July 1, 2014  ===
 
* integration of software into Phenoscape KB
 
=== Target milestone: Year 4: July 1, 2015  ===
 
* use in capstone
 
 
 
=6. Knowledgebase reasoning and development (Hilmar)=
 
 
 
Objective:
 
 
 
Working group: Hilmar*, Todd, Jim, Chris [+Wasila and others for UI development]
 
*Coordinator
 
 
 
Action items:
 
 
 
* Jim will develop KB instance in parallel for Vertebrates (a new beta) this summer
 
 
 
=== Target milestone: First quarter: October 1, 2011  ===
 
* Public release of KB (Oct. 31, 2011)
 
 
 
=== Target milestone: Second quarter: January 1, 2012  ===
 
* Knowledgebase Encyclopedia of Life integration release
 
* Phenoscape 1 KB user testing of new functionality - conducted at NESCent
 
* Phenoscape OWL datamodel for phenotype annotations
 
** formalize “Phenoscape” ontology serving as datamodel framework
 
 
 
===Target milestone: Third quarter: April 1, 2012===
 
* Reasoning framework proof of concept
 
** OWL scalability
 
** absence-over-develops-from
 
 
 
=== Target milestone: End of Year 1:July 1, 2012  ===
 
* Reasoning framework homology integration
 
 
 
=== Target milestone: Year 2: July 1, 2013  ===
 
* Replace KB website backend with newly developed reasoning and and storage model.
 
 
 
=== Target milestone: Year 3: July 1, 2014  ===
 
=== Target milestone: Year 4: July 1, 2015  ===
 
 
 
=7. Capstone=
 
As a capstone, in years 3 and 4 of the project, we will validate the capabilities of the above suite of tools by testing how well known developmental pathways for the well-studied fin/limb skeletal transition in vertebrate evolution are identified and how well it scales to a datastore containing billions of phenotypes.
 
 
 
=NSF Phenoscape project abstracts=
 
 
DBI 1062404 and 1062542: Collaborative research: ABI Development: Ontology-enabled reasoning across phenotypes from evolution and model organisms
 
DBI 1062404 and 1062542: Collaborative research: ABI Development: Ontology-enabled reasoning across phenotypes from evolution and model organisms
  

Latest revision as of 17:11, 28 August 2012

Goals in progress, coming up, and scheduled

We are keeping our high-level goals, milestones, and deliverables in a public Trello Board. (Click this link to get the board in its own window.)

<embedurl>https://trello.com/b/xYkl0qmC</embedurl>

Aims and working groups

1. Scalable workflow 
1.1 NLP in aid of ontology building and EQ annotation 
Curation of legacy phenotypes from the literature is a major bottleneck. The overall objective of this part of our work is to improve the efficiency with which curators can find accurate terms, add missing terms, etc.
  • Participants: Hong Cui (lead), Jim, Todd, Wasila, Judy, Hong's MS student Zilong Chang, Paula
1.2 Term broker, in collaboration with NCBO 
The overall goal here is to obtain temporary ID for anatomy ontology terms, communicate it to Phenex, and automatically replace with permanent terms when available.
  • Participants: Jim (lead), Natasha, Hilmar, Judy, Wasila
2. Ontology development and coordination
2.1 Anatomy ontologies 
The objective is to coordinate the development and alignment of multispecies and single species ontologies for vertebrates. The lead curators of the zebrafish, Xenopus, and mouse anatomy ontologies, the teleost and amphibian multi-species ontologies, and the proposed amniote anatomy ontology will meet regularly to review terms from the skeletal branch, and update and synchronize ontologies accordingly. The focus of ontology development in year 1 is the limb/fin skeletal branch. Specific ontology development plans are here and Ontology development workflow is here.
  • Participants: Wasila lead), Nizar, Lauren, Paul, David, Terry, Yvonne, Ceri, Christina, VG, Paula, Chris
2.2 Vertebrate Taxonomy Ontology 
The objective is to develop a taxonomic ontology that includes all fossil and extant vertebrate taxa from community-vetted sources (PBDB, Paul's database, AmphibiaWeb). This ontology is required for curation of phenotypes and querying.
  • Participants: Peter (lead), Paul, David, Nizar, Wasila
2.3 Sync Tool 
  • Participants: Jim (lead), Wasila, Chris, Ceri, Yvonne
3. Phenotype annotation 
3.1 Evolutionary phenotypes 
The objective is to transform the characters and character states from published phylogenetic studies into ontology-based descriptions ('Evolutionary phenotypes'), with a focus on fin and limb morphology. This will require the development of a list of papers to be curated, re-evaluation of software curation tool, training of personnel in use of curation software and ontology development, and development of appropriate ontologies. Annotation workflow is described here.
  • Participants: Paula (lead), David, Paul, Wasila, Jim, Nizar
3.2 Model organism phenotypes (Monte) 
To annotate the skeletal phenotypes for fin and limb for genetic mutants of zebrafish, Xenopus, and mouse. The model organism (MOD) curators will initially prioritize comprehensive annotation of skeletal phenotypes for the fin and limb, and subsequently of skeletal phenotypes in general.
  • Participants: Monte (lead), ZFIN (Monte, Ceri,Yvonne), Xenbase (Aaron, Christina), MGI (Judy, Terry)
4. Homology 
The legacy homology assertions for the fin-limb skeleton, including assertions of both phylogenetic and iterative (serial) homology, and the genes involved in growth and patterning of the limb at various stages, e.g., Bmps, Fgfs, Gdf5, Sox9 are also well known.
  • Participants: Hilmar (lead), Chris, Paula, David, Paul, Nizar
5. Semantic similarity search engine (aka Phenoblast) 
Provide the ability for users to take a phenotype collection (of terms) and look across all collections for those that semantically match the terms. Like BLAST, highest ‘hits’ would be ranked first, and user could drill down.
  • Participants: Todd (lead), Hilmar, Jim, Chris, Judy, Paula, Peter
6. Knowledgebase reasoning and development 
  • Participants: Hilmar (lead), Todd, Jim, Chris, Paula, Wasila and others for UI development
7. Capstone 
As a capstone, in years 3 and 4 of the project, we will validate the capabilities of the above suite of tools by testing how well known developmental pathways for the well-studied fin/limb skeletal transition in vertebrate evolution are identified and how well it scales to a datastore containing billions of phenotypes.
  • Participants: Todd (lead), everyone in project

NSF Phenoscape project abstracts

DBI 1062404 and 1062542: Collaborative research: ABI Development: Ontology-enabled reasoning across phenotypes from evolution and model organisms

1. Technical description of the project.

An award is made to the University of South Dakota and the University of North Carolina to develop ontology-driven tools for machine reasoning over large volumes of phenotype data. A fast semantic similarity engine will be developed to allow searches for evolutionary transitions and mutant genes characterized by similar phenotypic profiles. An ontological framework for reasoning over homology will be developed to allow rigorous reasoning over evolutionary diverse lineages. Natural language processing tools will be developed to improve upon the efficiency of mining phenotype data from the literature and improving data consistency. This suite of tools will be tested on a large number of skeletal phenotypes from diverse fossil and modern vertebrates. Taxonomic and anatomical ontologies for vertebrates will be augmented and hypotheses of anatomical homology formally encoded. The ontologies and software tools, together with phenotypes extracted from the vertebrate systematic literature, will be integrated in the knowledgebase with genetic and phenotype data from three vertebrate model organisms: zebrafish (Danio rerio), African clawed frog (Xenopus laevis), and mouse (Mus musculus). The knowledgebase will be exposed to generic reasoners using semantic web standards. The system will be validated by its success in retrieving candidate genes for the well-studied vertebrate fin-limb transition and other major events in skeletal evolution.

2. Non-technical explanation of the project's broader significance and importance.

Human-readable descriptions of “phenotypic” properties such as anatomy and behavior are not well-suited to computational analysis. Yet, in evolutionary biology, genetics and development, computational assistance is necessary to discover patterns within the enormous volumes of descriptive phenotype data that are being reported in the literature and in online databases. Ontologies are structured, controlled vocabularies that can be applied to collections of descriptive data to permit logical reasoning to be used. Using the evolutionary transition from fins to limbs as a test system, this project will develop ontologically-aware software that allows users to discover similar sets of phenotypes for different taxa or mutant genes within large and diverse datasets. The evolutionary breadth of the test data requires the development of a rigorous framework for reasoning over hypotheses of homology. Another goal is to develop and evaluate natural language processing tools for efficiently capturing ontological descriptions of phenotype from the descriptions available in the published literature. Phenotype data from the systematic literature for both extinct and extant vertebrates will be combined with mutant phenotype data from three vertebrate genetic models: zebrafish (Danio rerio), frog (Xenopus laevis), and mouse (Mus musculus). The suite of tools will be validated by recovering developmental genetic pathways that underlie the evolutionary transition from fin to limb in vertebrates, and refined by iterative testing with domain bioinformaticians on the project and biologists from the broader user community.

3. Indicate how your project addresses criteria specific to Development

A broad community of users will participate through the lifecycle of this project in the development of community standards and resources for the interoperability and computability of phenotypic knowledge. This will be achieved through workshops, usability testing sessions, and coordination with key research networks. Stakeholder ownership will be enhanced by rapid and open release of a variety of products that we anticipate to be of immediate and enduring value to the greater biology community, including tools for streamlining data curation and performing large-scale semantic similarity searches, high quality vertebrate taxonomy and anatomy ontologies, and standards for reasoning over homology. We will provide a unique training environment for students, postdocs and summer interns, including Native Americans through outreach at the University of South Dakota and minority and female students though a collaboration with Project Exploration at the University of Chicago. Project progress and outcomes will be disseminated through both traditional and online outlets for scholarly communication (including blog posts at mailing lists); the primary web presence will be at https://www.phenoscape.org/wiki/.