https://wiki.phenoscape.org/wg/phenoscape/api.php?action=feedcontributions&user=Crk18&feedformat=atomphenoscape - User contributions [en]2024-03-28T18:55:56ZUser contributionsMediaWiki 1.31.10https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Informatics&diff=7170Informatics2010-03-03T18:53:07Z<p>Crk18: /* Source code */</p>
<hr />
<div>This page provides a broad overview of our informatics activities. Phenoscape supports open development processes and collaboration. All source code we create is available from open source repositories such as Sourceforge, and we work with existing open-source projects whenever possible. Development plans can be found at our [[software roadmap]].<br />
<br />
One of the chief objectives of the Phenoscape project is to present a centralized repository to store annotations entered by ichthyologists. These annotations can be queried from a user interface and used to answer [[Driving Research Questions]] and for advanced [[Phenoscape use cases]]. The following tools which are under various stages of development, will serve to realize this objective.<br />
<br />
==Phenoscape software components==<br />
<br />
===Phenex curation tool===<br />
[[Phenex]] is a platform-independent desktop application for annotating character-by-taxon matrices with ontology terms. Phenex allows a user to describe phenotypic variation among taxa (or specimens) using Entity-Quality syntax and write the data to a NeXML file. Phenex is written in Java, based on code from the [http://oboedit.org/ OBO-Edit] and [http://www.phenote.org/ Phenote] project, and is being released under an open-source license. It can be configured to load user-selected ontologies, and in this way can be adapted to data curation in different taxonomic groups.<br />
<br />
===Phenoscape data repository===<br />
We are adopting [http://www.bioontology.org/wiki/index.php/OBD:Main_Page OBD] as the ontology-driven datastore for our phenotype annotations. We are collaborating with the [http://www.berkeleybop.org/ Berkeley Bioinformatics Open-source Projects] group in driving future development of OBD. The ontological definitions of all the terms used in the annotation stage (on Phenex) and the annotations themselves are stored in the [[Phenoscape data repository]]. In sequential order, the term definitions in all the ontologies are first downloaded and stored in the database. Next, the annotations of phenotypes are downloaded from [http://zfin.org ZFIN] and [http://phenoscape.svn.sourceforge.net Phenoscape], post composed and stored in the database. An outline of these two stages is described in the [[Phenoscape data loader]] section. Then, the [[OBD Reasoner]] is used to extract inferences from the annotations and definitions in the OBD Phenoscape database. These inferences are added to the database as well.<br />
<br />
===Data services built on OBD===<br />
We are also developing a suite of web services on top of OBD to serve as a [http://apidocs.phenoscape.org data access API] and foundation for our user-oriented Phenoscape web application. These web services make use of mostly standard SQL [[Queries]] and present a [http://en.wikipedia.org/wiki/Representational_State_Transfer RESTful] service interface using [http://www.restlet.org/ Restlet]. The specifications of these services are detailed in [[Data Services]].<br />
<br />
===Phenoscape web UI===<br />
The [[Phenoscape web UI|Phenoscape web application]] will allow scientists to browse and query the phenotype annotations as well as the supporting ontologies. Initially, the query capabilities will concentrate on implementing a select set of "use-cases", research questions that show the utility of the approach. Ultimately, we will build interfaces that allow researchers to ask open-ended questions of the data. The web application is being developed using [http://www.rubyonrails.org/ Ruby on Rails] and accesses phenotype data and ontology information via our OBD web services.<br />
<br />
===Synchronization Tool===<br />
The [[Synchronization Tool]] is a plug-in for [http://oboedit.org/ OBO-Edit] which aids in keeping the Teleost Anatomy Ontology and the Zebrafish Anatomy Ontology consistent with each other.<br />
<br />
== Source code ==<br />
<br />
The software source code is in part being contributed back directly to a variety of existing projects we build upon, and deposited in the [http://obo.svn.sourceforge.net/viewvc/obo/ OBO] and [http://phenoscape.svn.sourceforge.net/viewvc/phenoscape Phenoscape] Subversion repositories on SourceForge.<br />
* Phenex:<br />
** [http://obo.svn.sourceforge.net/viewvc/obo/phenex/trunk/ Browse source]<br />
** Code checkout: <code>svn co https://obo.svn.sourceforge.net/svnroot/obo/phenex/trunk</code><br />
* Database and database loading:<br />
** [http://obo.svn.sourceforge.net/viewvc/obo/OBDAPI/trunk/scripts Browse source]<br />
** Code check out: <code>svn co https://obo.svn.sourceforge.net/svnroot/obo/OBDAPI/trunk/scripts</code><br />
** Unit tests<br />
*** [http://phenoscape.svn.sourceforge.net/viewvc/phenoscape/trunk/src/PhenoscapeDbTests/test/org/phenoscape/db/test Browse source]<br />
*** Code checkout: <code>svn co https://phenoscape.svn.sourceforge.net/svnroot/phenoscape/trunk/src/PhenoscapeDbTests/test</code><br />
* Data services and middleware:<br />
** Data model and database access API (OBD-API):<br />
*** [http://obo.svn.sourceforge.net/viewvc/obo/OBDAPI/trunk Browse source]<br />
*** Code check out: <code>svn co https://obo.svn.sourceforge.net/svnroot/obo/OBDAPI/trunk</code><br />
** Web-services (OBD-WS):<br />
*** [http://obo.svn.sourceforge.net/viewvc/obo/OBD-WS/trunk/ Browse source]<br />
*** Code checkout: <code>svn co https://obo.svn.sourceforge.net/svnroot/obo/OBD-WS/trunk</code><br />
* Web-based user interface ([http://kb.phenoscape.org Knowledge Base]):<br />
** [http://phenoscape.svn.sourceforge.net/viewvc/phenoscape/trunk/src/PhenoscapeWeb/ Browse source]<br />
** Code checkout: <code>svn co https://phenoscape.svn.sourceforge.net/svnroot/phenoscape/trunk/src/PhenoscapeWeb</code><br />
* Ontology conversion and management:<br />
** TTOUpdate<br />
*** [http://obo.svn.sourceforge.net/viewvc/phenoscape/trunk/tools/TTOUpdate Browse source]<br />
*** Code checkout: <code>svn co https://obo.svn.sourceforge.net/svnroot/phenoscape/trunk/tools/TTOUpdate</code><br />
** Synchronization Tool<br />
*** [http://obo.svn.sourceforge.net/viewvc/phenoscape/trunk/src/SynchronizationTool/ Browse source]<br />
*** Code checkout: <code>svn co https://phenoscape.svn.sourceforge.net/svnroot/phenoscape/trunk/src/SynchronizationTool</code><br />
* Other scripts and small tools:<br />
<br />
==Affiliated projects==<br />
<br />
===OBO-Edit===<br />
We are using the [http://oboedit.org/ OBO-Edit] ontology editor to develop and maintain our [[ontologies]] such as the Teleost Anatomy Ontology and the Teleost Taxonomy Ontology.<br />
<br />
===NeXML===<br />
Phenex saves character matrix data using the new evolutionary data standard [http://www.nexml.org/ NeXML]. NeXML is an XML Schema and has robust facilities for embedding additional data, such as our phenotype annotations, within a traditional character-by-taxon matrix.<br />
<br />
[[Category:Informatics]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Phenoscape_web_UI&diff=7158Phenoscape web UI2010-03-01T18:14:56Z<p>Crk18: /* April 2010 */</p>
<hr />
<div>Phenoscape web UI development plan.<br />
<br />
==2009==<br />
===February 2009===<br />
*Mockup changes to existing interfaces required to incorporate 1/5/09 team feedback on existing interface components<br />
* Taxon search interface '''Done'''<br />
** Mockup interface using dummy data within web application - request group feedback<br />
*** Front page search interface and results table - may include intermediate results summary page<br />
* Interface demo and feedback session - DECAP meeting, Feb. 27, 2009 '''Done'''<br />
** Present screenshots and mockups<br />
** Possible live demo depending on data service performance progress<br />
*** Services should return in no more than 2 seconds<br />
*** Evaluate feasibility 2 weeks before meeting<br />
* Mockup publication data interface within web application '''Done'''<br />
* Mockup "splashy" search entry page - we should have a more graphically capturing entry gateway for exploring the data content<br />
** One of the following:<br />
***Hierarchical term explorer<br />
***Visual explorer, using schematic drawings ("the prototypical fish")<br />
***Visual explorer using 3D scans of catfish (and eventually zebrafish when done).<br />
**Should explore various prototypes of these ideas using HTML mockups and get feedback from project team<br />
* Mockup taxonomy-based tree-mapped data perspective<br />
** View taxonomic phenotype annotation results organized by a phylogenetic tree<br />
** Group phenotypes as simplistic union of descendant nodes, rather than ancestral reconstruction<br />
** Use taxonomy as basis for phylogenetic tree<br />
**Develop HTML mockups in web application<br />
<br />
===March 2009===<br />
* Incorporate 1/5/09 team feedback on existing interface components '''In progress'''<br />
** Anatomy term search results - reorganize according to [[:Image:Phenotype_search_page_mockup.jpg|sketch]]<br />
** Gene search results - reorganize according to [[:Image:Gene_search_page_mockup.jpg|sketch]]<br />
** Taxonomic phenotype results page - add Order column for taxonomic grouping<br />
** Present various phenotypic results grouped by custom "character slim" done<br />
*** Requires slim development by team members (Wasila, Paula) Done--[[User:Pmabee@usd.edu|Pmabee@usd.edu]] 12:20, 20 March 2009 (EDT)done and service implementation by Cartik<br />
* Taxon search interface<br />
** Design data service schema to be implemented by Cartik<br />
** Implement interface using live data service once developed by Cartik<br />
<br />
===April 2009===<br />
* Implement "splashy" search entry page as defined by mockup work<br />
* Incorporate publication data into user interface<br />
** Incorporate publication links into annotation results displays<br />
*** Columns referencing numbers of taxonomic phenotype or mutant phenotype match results will be accompanied by a column including number of publications referencing the search item<br />
*** Link from publication count to a publication listing<br />
** Publication listing includes brief citation each of which links to a publication detail page<br />
** Publication detail page(s)<br />
*** Full citation information<br />
*** Display curator credits<br />
*** Display or link to original matrix including free-text data and specimen listing<br />
*** Display or link to all phenotype annotations resulting from publication<br />
** Requires publication data implementation in OBD by Cartik<br />
** Requires specimen data implementation in OBD by Cartik<br />
* Possible user testing session in conjunction with RCN meeting at NESCent<br />
<br />
===May 2009===<br />
* Implement taxonomy-based tree-mapped data perspective<br />
*Implement appropriate scalable deployment of web application '''Done'''<br />
**Multiple application instances and load balancing '''Done'''<br />
<br />
===Release candidate - June 2009===<br />
*Project team testing<br />
*Bug fixes<br />
*Overall performance evaluation<br />
<br />
===Public launch: ASIH meeting, Portland, Oregon - July 22, 2009===<br />
<br />
==2010==<br />
===February 2010===<br />
* Develop overall site revision plan based on feedback from Knowledgebase Beta 1 interface '''[Done]'''<br />
* Knowledgebase 2.0 site revision mockup testing '''[Done]'''<br />
** Present mockups to naive users in Eugene, Oregon, February 16-17 '''[Done]'''<br />
* Revise mockups using testing session user feedback<br />
* Develop 2.0-beta implementation plans with feedback from Phenoscape stakeholders<br />
<br />
===March 2010===<br />
* Annotation search results pages with editable data filters<br />
* Working taxonomy cladogram data interfaces<br />
* Complete mockups of advanced query interfaces<br />
* Phenoscape meeting at Field Museum, Chicago<br />
** Conduct user feedback sessions with mockups of forthcoming interfaces, and live testing of implemented pages<br />
* Generate permanent unique ID for each publication ['''Done''']<br />
* Enter unique IDs for publications into Endnote ['''Done''']<br />
* Data imported into knowledgebsae via Endnote XML files<br />
*** NOTE: This part was done quite a while ago. However, a slight modification needs to be made to use the new unique IDs for publications<br />
** citation information as individual semantic pieces (individual authors, title, journal, etc.)<br />
** abstract<br />
** DOI if available<br />
* Investigate importing previous names of ZFIN genes<br />
* Gene symbol as primary name <br />
** NOTE: This was the old status quo before we decided to the display the complete name of the gene, can easily revert to this<br />
<br />
===April 2010===<br />
* Term search results pages incorporating ontology tree browser<br />
* Publication pages with citation and abstract, links to publications from other data types<br />
* Advanced query interface, data download implementation<br />
* type of mutation/defect producing phenotype result (requires feasibility investigation)<br />
* ZFIN publication ID for each phenotype annotation<br />
* import common names for taxa<br />
<br />
===May 2010===<br />
* Publication pages with data matrix (table view and downloadable)<br />
* Publication pages with specimens<br />
* Link to taxonomy cladogram interfaces via data on other pages<br />
* Official release of Phenoscape Knowledgebase 2.0-beta<br />
<br />
===June 2010===<br />
* User testing sessions of Knowledgebase 2.0-beta<br />
* Bug and feedback revisions<br />
<br />
===July 2010===<br />
* Official release of Phenoscape Knowledgebase 2.0<br />
<br />
[[Category:Roadmaps]]<br />
[[Category:User Interface]]<br />
[[Category:Informatics]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Phenoscape_web_UI&diff=7157Phenoscape web UI2010-03-01T18:14:04Z<p>Crk18: /* March 2010 */</p>
<hr />
<div>Phenoscape web UI development plan.<br />
<br />
==2009==<br />
===February 2009===<br />
*Mockup changes to existing interfaces required to incorporate 1/5/09 team feedback on existing interface components<br />
* Taxon search interface '''Done'''<br />
** Mockup interface using dummy data within web application - request group feedback<br />
*** Front page search interface and results table - may include intermediate results summary page<br />
* Interface demo and feedback session - DECAP meeting, Feb. 27, 2009 '''Done'''<br />
** Present screenshots and mockups<br />
** Possible live demo depending on data service performance progress<br />
*** Services should return in no more than 2 seconds<br />
*** Evaluate feasibility 2 weeks before meeting<br />
* Mockup publication data interface within web application '''Done'''<br />
* Mockup "splashy" search entry page - we should have a more graphically capturing entry gateway for exploring the data content<br />
** One of the following:<br />
***Hierarchical term explorer<br />
***Visual explorer, using schematic drawings ("the prototypical fish")<br />
***Visual explorer using 3D scans of catfish (and eventually zebrafish when done).<br />
**Should explore various prototypes of these ideas using HTML mockups and get feedback from project team<br />
* Mockup taxonomy-based tree-mapped data perspective<br />
** View taxonomic phenotype annotation results organized by a phylogenetic tree<br />
** Group phenotypes as simplistic union of descendant nodes, rather than ancestral reconstruction<br />
** Use taxonomy as basis for phylogenetic tree<br />
**Develop HTML mockups in web application<br />
<br />
===March 2009===<br />
* Incorporate 1/5/09 team feedback on existing interface components '''In progress'''<br />
** Anatomy term search results - reorganize according to [[:Image:Phenotype_search_page_mockup.jpg|sketch]]<br />
** Gene search results - reorganize according to [[:Image:Gene_search_page_mockup.jpg|sketch]]<br />
** Taxonomic phenotype results page - add Order column for taxonomic grouping<br />
** Present various phenotypic results grouped by custom "character slim" done<br />
*** Requires slim development by team members (Wasila, Paula) Done--[[User:Pmabee@usd.edu|Pmabee@usd.edu]] 12:20, 20 March 2009 (EDT)done and service implementation by Cartik<br />
* Taxon search interface<br />
** Design data service schema to be implemented by Cartik<br />
** Implement interface using live data service once developed by Cartik<br />
<br />
===April 2009===<br />
* Implement "splashy" search entry page as defined by mockup work<br />
* Incorporate publication data into user interface<br />
** Incorporate publication links into annotation results displays<br />
*** Columns referencing numbers of taxonomic phenotype or mutant phenotype match results will be accompanied by a column including number of publications referencing the search item<br />
*** Link from publication count to a publication listing<br />
** Publication listing includes brief citation each of which links to a publication detail page<br />
** Publication detail page(s)<br />
*** Full citation information<br />
*** Display curator credits<br />
*** Display or link to original matrix including free-text data and specimen listing<br />
*** Display or link to all phenotype annotations resulting from publication<br />
** Requires publication data implementation in OBD by Cartik<br />
** Requires specimen data implementation in OBD by Cartik<br />
* Possible user testing session in conjunction with RCN meeting at NESCent<br />
<br />
===May 2009===<br />
* Implement taxonomy-based tree-mapped data perspective<br />
*Implement appropriate scalable deployment of web application '''Done'''<br />
**Multiple application instances and load balancing '''Done'''<br />
<br />
===Release candidate - June 2009===<br />
*Project team testing<br />
*Bug fixes<br />
*Overall performance evaluation<br />
<br />
===Public launch: ASIH meeting, Portland, Oregon - July 22, 2009===<br />
<br />
==2010==<br />
===February 2010===<br />
* Develop overall site revision plan based on feedback from Knowledgebase Beta 1 interface '''[Done]'''<br />
* Knowledgebase 2.0 site revision mockup testing '''[Done]'''<br />
** Present mockups to naive users in Eugene, Oregon, February 16-17 '''[Done]'''<br />
* Revise mockups using testing session user feedback<br />
* Develop 2.0-beta implementation plans with feedback from Phenoscape stakeholders<br />
<br />
===March 2010===<br />
* Annotation search results pages with editable data filters<br />
* Working taxonomy cladogram data interfaces<br />
* Complete mockups of advanced query interfaces<br />
* Phenoscape meeting at Field Museum, Chicago<br />
** Conduct user feedback sessions with mockups of forthcoming interfaces, and live testing of implemented pages<br />
* Generate permanent unique ID for each publication ['''Done''']<br />
* Enter unique IDs for publications into Endnote ['''Done''']<br />
* Data imported into knowledgebsae via Endnote XML files<br />
*** NOTE: This part was done quite a while ago. However, a slight modification needs to be made to use the new unique IDs for publications<br />
** citation information as individual semantic pieces (individual authors, title, journal, etc.)<br />
** abstract<br />
** DOI if available<br />
* Investigate importing previous names of ZFIN genes<br />
* Gene symbol as primary name <br />
** NOTE: This was the old status quo before we decided to the display the complete name of the gene, can easily revert to this<br />
<br />
===April 2010===<br />
* Term search results pages incorporating ontology tree browser<br />
* Publication pages with citation and abstract, links to publications from other data types<br />
* Advanced query interface, data download implementation<br />
<br />
===May 2010===<br />
* Publication pages with data matrix (table view and downloadable)<br />
* Publication pages with specimens<br />
* Link to taxonomy cladogram interfaces via data on other pages<br />
* Official release of Phenoscape Knowledgebase 2.0-beta<br />
<br />
===June 2010===<br />
* User testing sessions of Knowledgebase 2.0-beta<br />
* Bug and feedback revisions<br />
<br />
===July 2010===<br />
* Official release of Phenoscape Knowledgebase 2.0<br />
<br />
[[Category:Roadmaps]]<br />
[[Category:User Interface]]<br />
[[Category:Informatics]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Phenoscape_web_UI&diff=7156Phenoscape web UI2010-03-01T16:16:41Z<p>Crk18: /* March 2010 */</p>
<hr />
<div>Phenoscape web UI development plan.<br />
<br />
==2009==<br />
===February 2009===<br />
*Mockup changes to existing interfaces required to incorporate 1/5/09 team feedback on existing interface components<br />
* Taxon search interface '''Done'''<br />
** Mockup interface using dummy data within web application - request group feedback<br />
*** Front page search interface and results table - may include intermediate results summary page<br />
* Interface demo and feedback session - DECAP meeting, Feb. 27, 2009 '''Done'''<br />
** Present screenshots and mockups<br />
** Possible live demo depending on data service performance progress<br />
*** Services should return in no more than 2 seconds<br />
*** Evaluate feasibility 2 weeks before meeting<br />
* Mockup publication data interface within web application '''Done'''<br />
* Mockup "splashy" search entry page - we should have a more graphically capturing entry gateway for exploring the data content<br />
** One of the following:<br />
***Hierarchical term explorer<br />
***Visual explorer, using schematic drawings ("the prototypical fish")<br />
***Visual explorer using 3D scans of catfish (and eventually zebrafish when done).<br />
**Should explore various prototypes of these ideas using HTML mockups and get feedback from project team<br />
* Mockup taxonomy-based tree-mapped data perspective<br />
** View taxonomic phenotype annotation results organized by a phylogenetic tree<br />
** Group phenotypes as simplistic union of descendant nodes, rather than ancestral reconstruction<br />
** Use taxonomy as basis for phylogenetic tree<br />
**Develop HTML mockups in web application<br />
<br />
===March 2009===<br />
* Incorporate 1/5/09 team feedback on existing interface components '''In progress'''<br />
** Anatomy term search results - reorganize according to [[:Image:Phenotype_search_page_mockup.jpg|sketch]]<br />
** Gene search results - reorganize according to [[:Image:Gene_search_page_mockup.jpg|sketch]]<br />
** Taxonomic phenotype results page - add Order column for taxonomic grouping<br />
** Present various phenotypic results grouped by custom "character slim" done<br />
*** Requires slim development by team members (Wasila, Paula) Done--[[User:Pmabee@usd.edu|Pmabee@usd.edu]] 12:20, 20 March 2009 (EDT)done and service implementation by Cartik<br />
* Taxon search interface<br />
** Design data service schema to be implemented by Cartik<br />
** Implement interface using live data service once developed by Cartik<br />
<br />
===April 2009===<br />
* Implement "splashy" search entry page as defined by mockup work<br />
* Incorporate publication data into user interface<br />
** Incorporate publication links into annotation results displays<br />
*** Columns referencing numbers of taxonomic phenotype or mutant phenotype match results will be accompanied by a column including number of publications referencing the search item<br />
*** Link from publication count to a publication listing<br />
** Publication listing includes brief citation each of which links to a publication detail page<br />
** Publication detail page(s)<br />
*** Full citation information<br />
*** Display curator credits<br />
*** Display or link to original matrix including free-text data and specimen listing<br />
*** Display or link to all phenotype annotations resulting from publication<br />
** Requires publication data implementation in OBD by Cartik<br />
** Requires specimen data implementation in OBD by Cartik<br />
* Possible user testing session in conjunction with RCN meeting at NESCent<br />
<br />
===May 2009===<br />
* Implement taxonomy-based tree-mapped data perspective<br />
*Implement appropriate scalable deployment of web application '''Done'''<br />
**Multiple application instances and load balancing '''Done'''<br />
<br />
===Release candidate - June 2009===<br />
*Project team testing<br />
*Bug fixes<br />
*Overall performance evaluation<br />
<br />
===Public launch: ASIH meeting, Portland, Oregon - July 22, 2009===<br />
<br />
==2010==<br />
===February 2010===<br />
* Develop overall site revision plan based on feedback from Knowledgebase Beta 1 interface '''[Done]'''<br />
* Knowledgebase 2.0 site revision mockup testing '''[Done]'''<br />
** Present mockups to naive users in Eugene, Oregon, February 16-17 '''[Done]'''<br />
* Revise mockups using testing session user feedback<br />
* Develop 2.0-beta implementation plans with feedback from Phenoscape stakeholders<br />
<br />
===March 2010===<br />
* Annotation search results pages with editable data filters<br />
* Working taxonomy cladogram data interfaces<br />
* Complete mockups of advanced query interfaces<br />
* Phenoscape meeting at Field Museum, Chicago<br />
** Conduct user feedback sessions with mockups of forthcoming interfaces, and live testing of implemented pages<br />
* Generate permanent unique ID for each publication ['''Done''']<br />
* Enter unique IDs for publications into Endnote ['''Done''']<br />
* Data imported into knowledgebsae via Endnote XML files <br />
*** NOTE: This part was done quite a while ago. However, a slight modification needs to be made to use the new unique IDs for publications<br />
** citation information as individual semantic pieces (individual authors, title, journal, etc.)<br />
** abstract<br />
** DOI if available<br />
<br />
===April 2010===<br />
* Term search results pages incorporating ontology tree browser<br />
* Publication pages with citation and abstract, links to publications from other data types<br />
* Advanced query interface, data download implementation<br />
<br />
===May 2010===<br />
* Publication pages with data matrix (table view and downloadable)<br />
* Publication pages with specimens<br />
* Link to taxonomy cladogram interfaces via data on other pages<br />
* Official release of Phenoscape Knowledgebase 2.0-beta<br />
<br />
===June 2010===<br />
* User testing sessions of Knowledgebase 2.0-beta<br />
* Bug and feedback revisions<br />
<br />
===July 2010===<br />
* Official release of Phenoscape Knowledgebase 2.0<br />
<br />
[[Category:Roadmaps]]<br />
[[Category:User Interface]]<br />
[[Category:Informatics]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7136Logic and Reasoning Challenges2010-02-22T17:18:03Z<p>Crk18: /* A possible simpler solution (Update: Feb 22, 2010) */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation ''all_exhibit'' can be used in the top-down reasoning as shown below. The new rule is shown in (Rule-4)<br />
<br />
'''Rule-4:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(B, x) <math>\and</math> ¬''rank''(B, "species") <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus Gen1. Now, let's assume Gen1 has 2 species Sp1 and Sp2. Inferences to Sp1 and Sp2 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown in (I-14) and (I-15).<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:exhibit Phen1 -- (A-6)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-7)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-8)<br />
<br />
Sp1 PHENOSCAPE:all_exhibit Phen1 -- (I-14)<br />
Sp2 PHENOSCAPE:all_exhibit Phen1 -- (I-15)<br />
</javascript><br />
<br />
The inferences with the ''all_exhibit'' relation cannot be associated with the higher taxa.<br />
<br />
Similarly, for any taxon to which a phenotype assertion has been made, the phenotype can be inferred on the higher taxa using the ''some_exhibit'' relation as shown in (Rule-5)<br />
<br />
'''Rule-5:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
<br />
Given assertion (A-6) to a genus and assuming it belongs to a family Fam1, the inference (I-16) can be made about the family Fam1.<br />
<javascript><br />
Gen1 PHENOSCAPE:exhibit Phen1 -- (A-6)<br />
Gen1 OBO_REL:is_a Fam1 -- (A-9)<br />
<br />
Fam1 PHENOSCAPE:some_exhibit Phen1 -- (I-16)<br />
</javascript><br />
<br />
<br />
The relation ''some_exhibit'' cannot be used to infer down the taxonomy. Therefore, we can see that this new methodology can use the existing assertions that use the ''exhibits'' relation, but can distinguish the inferences that are made in both top-down and bottom-up reasoning. The relations ''all-exhibit'' and ''some-exhibit'' are only inferred, never asserted. Moreover, they can never be used to create new inferences. This distinction averts the extraction of incorrect inferences.<br />
<br />
In this solution, no changes are necessary to the NeXML format or to Phenex. The changes need to be made to the OBD reasoner to use these two new relations and the data query module needs to be changed to deal with the two new relations as well. The REST services would not need to be modified for this purpose.<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7135Logic and Reasoning Challenges2010-02-22T17:14:55Z<p>Crk18: /* Different relations for different purposes */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation ''all_exhibit'' can be used in the top-down reasoning as shown below. The new rule is shown in (Rule-4)<br />
<br />
'''Rule-4:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(B, x) <math>\and</math> ¬''rank''(B, "species") <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus Gen1. Now, let's assume Gen1 has 2 species Sp1 and Sp2. Inferences to Sp1 and Sp2 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown in (I-14) and (I-15).<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:exhibit Phen1 -- (A-6)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-7)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-8)<br />
<br />
Sp1 PHENOSCAPE:all_exhibit Phen1 -- (I-14)<br />
Sp2 PHENOSCAPE:all_exhibit Phen1 -- (I-15)<br />
</javascript><br />
<br />
The inferences with the ''all_exhibit'' relation cannot be associated with the higher taxa. <br />
<br />
Similarly, for any taxon to which a phenotype assertion has been made, the phenotype can be inferred on the higher taxa using the ''some_exhibit'' relation as shown in (Rule-5)<br />
<br />
'''Rule-5:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
<br />
Given assertion (A-6) to a genus and assuming it belongs to a family Fam1, the inference (I-16) can be made about the family Fam1. <br />
<javascript><br />
Gen1 PHENOSCAPE:exhibit Phen1 -- (A-6)<br />
Gen1 OBO_REL:is_a Fam1 -- (A-9)<br />
<br />
Fam1 PHENOSCAPE:some_exhibit Phen1 -- (I-16)<br />
</javascript><br />
<br />
<br />
The relation ''some_exhibit'' cannot be used to infer down the taxonomy. Therefore, we can see that this new methodology can use the existing assertions that use the ''exhibits'' relation, but can distinguish the inferences that are made in both top-down and bottom-up reasoning. The relations ''all-exhibit'' and ''some-exhibit'' are only inferred, never asserted. Moreover, they can never be used to create new inferences. This distinction averts the extraction of incorrect inferences.<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7131Logic and Reasoning Challenges2010-02-22T16:58:15Z<p>Crk18: /* A possible simpler solution (Update: Feb 22, 2010) */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation can be used in the top-down reasoning as shown below. The new rule is shown in (Rule-4)<br />
<br />
'''Rule-4:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(B, x) <math>\and</math> ¬''rank''(B, "species") <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus G1. Now, G1 has 6 species S1 ~ S6. Inferences to S1 ~ S6 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown.<br />
<br />
<javascript><br />
<br />
</javascript><br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7130Logic and Reasoning Challenges2010-02-22T16:57:17Z<p>Crk18: /* A possible simpler solution (Update: Feb 22, 2010) */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation can be used in the top-down reasoning as shown below. Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus G1. Now, G1 has 6 species S1 ~ S6. Inferences to S1 ~ S6 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown.<br />
<br />
The new rules are shown in (Rule-4) and (Rule-5)<br />
<br />
'''Rule-4:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(B, x) <math>\and</math> ¬''rank''(B, "species") <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7128Logic and Reasoning Challenges2010-02-22T16:50:19Z<p>Crk18: /* A possible simpler solution (Update: Feb 22, 2010) */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation can be used in the top-down reasoning as shown below. Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus G1. Now, G1 has 6 species S1 ~ S6. Inferences to S1 ~ S6 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown.<br />
<br />
The new rules are shown in (Rule-4) and (Rule-5)<br />
<br />
'''Rule-4:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(B, x) <math>\and </math> <math>\NOT</math> ''rank''(B, "species") <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7127Logic and Reasoning Challenges2010-02-22T16:49:57Z<p>Crk18: /* A possible simpler solution (Update: Feb 22, 2010) */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation can be used in the top-down reasoning as shown below. Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus G1. Now, G1 has 6 species S1 ~ S6. Inferences to S1 ~ S6 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown.<br />
<br />
The new rules are shown in (Rule-4) and (Rule-5)<br />
<br />
'''Rule-4:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(B, x) <math>\and </math> <math>\neg</math> ''rank''(B, "species") <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7126Logic and Reasoning Challenges2010-02-22T16:49:16Z<p>Crk18: /* Different relations for different purposes */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation can be used in the top-down reasoning as shown below. Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus G1. Now, G1 has 6 species S1 ~ S6. Inferences to S1 ~ S6 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown.<br />
<br />
The rule is shown in (Rule -4)<br />
<br />
'''Rule-4:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibit''(B, x) <math>\and </math> <math>\not</math> ''rank''(B, "species") <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7124Logic and Reasoning Challenges2010-02-22T16:46:03Z<p>Crk18: /* A possible simpler solution (Update: Feb 22, 2010) */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation can be used in the top-down reasoning as shown below. Assertion (A-6) is a phenotype assertion to a taxon of a higher rank, let us say a Genus G1. Now, G1 has 6 species S1 ~ S6. Inferences to S1 ~ S6 from the assertion (A-6) may use the new relation ''all_exhibit'' as shown.<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Logic_and_Reasoning_Challenges&diff=7123Logic and Reasoning Challenges2010-02-22T16:43:59Z<p>Crk18: /* What has to change? */</p>
<hr />
<div>This page discusses issues to be resolved in the near future. These issues pertain to relation semantics as well as inference procedures.<br />
<br />
==Inferring in both directions on the taxonomy==<br />
<br />
It is desired that annotations to higher taxa in the taxonomy be propagated to the lower taxa that are subsumed by the higher taxon; i.e. classical top down inferences. Given that the reasoner already reasons bottom upward, associating phenotype annotations from the lower level taxa to the higher level taxa, adding top-down inferencing may cause widespread inconsistencies in the data if unchecked.<br />
<br />
The OBD reasoner can reason from annotations at the lower levels of the taxonomy to the higher levels. Given that ''Danio rerio'' exhibits a phenotype P, the OBD reasoner infers that ''Danio'' exhibits the same phenotype P. This is reasoning up the taxonomy, using the subsumption relationship between ''Danio rerio'' and ''Danio''. This is possible because the annotations to each taxon are (implicitly) existentially quantified. The annotation "''Danio rerio'' exhibits increased length of maxillary barbel towards orbit" is shown in (1). The semantics are in (2).<br />
<br />
<javascript><br />
TTO:1001979 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (1)<br />
</javascript><br />
<br />
<br />
<math>\exists</math> X : ''instance_of''(X, TTO:1001979) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967)) -- (2)<br />
<br />
<br />
Given that ''Danio rerio'' (TTO:1001979) is subsumed by the genus ''Danio'' (TTO:101040) in the Teleost Taxonomy as shown in (3), it is possible to infer that "''Danio'' exhibits increased length of maxillary barbel towards orbit" (4).<br />
<br />
<javascript><br />
TTO:1001979 OBO_REL:is_a TTO:101040 -- (3)<br />
TTO:101040 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (4)<br />
</javascript><br />
<br />
Inferring down the taxonomy, that is using assertions at higher levels to extract inferences at lower levels, requires universal quantification. For example, the assertion that all "Siluriformes exhibit decreased width of mesethmoid bone" can be captured using OBD semantics as shown in (5). The universal semantics of this assertion is shown in (6). Siluriformes directly subsumes Ictaluridae as shown in (7). From (5) and (7), it is straightforward to infer that "Ictaluridae exhibit decreased width of mesethmoid bone" as shown in (8).<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (5)<br />
</javascript><br />
<br />
<br />
<math>\forall</math> X : ''instance_of''(X, TTO:1380) <math>\and</math> ''PHENOSCAPE:exhibits''(X, PATO:0000599^OBO_REL:inheres_in(TAO:0000323)) -- (6)<br />
<br />
<br />
<javascript><br />
TTO:10930 OBO_REL:is_a TTO:1380 -- (7)<br />
TTO:10930 PHENOSCAPE:exhibits PATO:0000599^OBO_REL:inheres_in(TAO:0000323) -- (8)<br />
</javascript><br />
<br />
The problem with using top-down inferences using universally quantified statements is that currently there is no way to distinguish these from existentially quantified statements. We use the ''PHENOSCAPE:exhibits'' relation for existentially quantified statements. Using the same relation for universally quantified statements would make it possible to extract incorrect inferences given the current configuration. Consider the subsumption relationship between ''Danio'' and ''Danio choprai'' shown in (9). If there is no distinction between existentially and universally quantified statements, it is possible to infer from (9) and (4) the erroneous conclusion that "''Danio choprai'' exhibits increased length of maxillary barbel towards orbit" (10). At present, there are no annotations to ''Danio choprai''.<br />
<br />
<javascript><br />
TTO:1052801 OBO_REL:is_a TTO:101040 -- (9)<br />
TTO:1052801 PHENOSCAPE:exhibits PATO:0000573^OBO_REL:inheres_in(TAO:0001938)^OBO_REL:towards(TAO:0001967) -- (10)<br />
</javascript><br />
<br />
Recall that the reasoner works in sweeps. It extracts one set of inferences (Inf-1) from the assertions (A) in its first sweep. In the next sweep, the reasoner pulls out a different set of inferences (Inf-2) from the assertions A '''AS WELL AS''' the inferences Inf-1 from the previous sweep. The reasoner repeats these sweeps until no new inferences are added. This is why the reasoner will likely infer all taxa exhibit all phenotypes if it is used to reason both up and down the taxonomy without checking for universal and existential semantics.<br />
<br />
===Possible solutions===<br />
<br />
In this section, we discuss possible approaches to resolving this issue with reasoning both up and down the taxonomy.<br />
<br />
====Different relations for different purposes====<br />
<br />
In classical first-order logic (FOL), all relations and properties asserted upon concepts (or taxa in the case of Phenoscape) are inherited by the subsumed concepts. This is because by default, all assertions about the concepts are universally quantified, i.e. hold true for ALL instances of the concept. If all cars have four wheels, and if all SUVs are cars, then all SUVs have four wheels. This is the way of top-down, classical FOL inferencing.<br />
<br />
In Phenoscape, we have adopted the OBD schema of modeling concepts, wherein all assertions to the concepts are existentially quantified, i.e. the assertion is true with at least one instance of the concept. This is very convenient for the life sciences, where exceptions are so prevalent. As a ready example, consider how the duck-billed platypus easily overrules the "all mammals are viviparous" rule. Further, existential quantification allows us to reason up the taxonomy. If some Teleostei exhibit round fins, and all Teleostei are Ostariophysi, then some Ostariophysi exhibit round fins.<br />
<br />
By default, we use the ''PHENOSCAPE:exhibits'' relation to link taxa to phenotypes using existential semantics. Using the same relation to model universally quantified relationships between taxa and phenotypes, would cause incorrect inferencing and loss of data integrity. The easiest way to address this issue is to use different relations; one for universally quantified relations and the other for existentially quantified relations. Let us call these relations ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit'' respectively.<br />
<br />
Now the OBD reasoner uses the following rule to extract inferences up the taxonomy using the ''PHENOSCAPE:exhibits'' relation (1).<br />
<br />
'''Rule-1:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''exhibits''(A, x) <math>\Rightarrow</math> ''exhibits''(B, x)<br />
<br />
This can be replaced with the following two rules, which use the two new relations, ''PHENOSCAPE:all_exhibit'' and ''PHENOSCAPE:some_exhibit''. (Please suggest better names for these if you can think of them).<br />
<br />
'''Rule-2:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''some_exhibit''(A, x) <math>\Rightarrow</math> ''some_exhibit''(B, x)<br />
<br />
'''Rule-3:''' <math>\forall</math>A, B, x: ''is_a''(A, B) <math>\and </math>''all_exhibit''(B, x) <math>\Rightarrow</math> ''all_exhibit''(A, x)<br />
<br />
This will keep the inferences from getting mixed up. Let us consider the scenario where species Sp1 and Sp2 (from genus Gen1) are asserted to exhibit phenotype Phen1. These assertions are shown in (A-1) and (A-2). The subsumption relations are shown in (A-3) and (A-4)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:some_exhibit Phen1 -- (A-1)<br />
Sp2 PHENOSCAPE:some_exhibit Phen1 -- (A-2)<br />
Sp1 OBO_REL:is_a Gen1 -- (A-3)<br />
Sp2 OBO_REL:is_a Gen1 -- (A-4)<br />
</javascript><br />
<br />
The reasoner makes the inference (I-1) from the assertions (A-1) ~ (A-4) and the inference rule Rule-2.<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:some_exhibit Phen1 -- (I-1)<br />
</javascript><br />
<br />
Now. given this new inference (I-1), the reasoner cannot infer that all the species Sp1, Sp2, and let us say 10 other species Sp3 ~ Sp12 also exhibit Phen1, because the inference rule for some_exhibit cannot be used to infer down the taxonomy. Again, consider the assertion that ALL instances of genus Gen1 exhibit a phenotype Phen2 as shown in (A-5)<br />
<br />
<javascript><br />
Gen1 PHENOSCAPE:all_exhibit Phen2 -- (A-5)<br />
</javascript><br />
<br />
Given (A-5) and all the subsumption relations between Gen1 and the hypothetical twelve species under Gen1 (including A-3 and A-4), the reasoner uses inference rule Rule-3 to infer (I-2) ~ (I-13)<br />
<br />
<javascript><br />
Sp1 PHENOSCAPE:all_exhibit Phen2 -- (I-2)<br />
Sp2 PHENOSCAPE:all_exhibit Phen2 -- (I-3)<br />
..<br />
..<br />
Sp12 PHENOSCAPE:all_exhibit Phen2 -- (I-13)<br />
</javascript><br />
<br />
Again, cyclical inferences are ruled out because there are no inference rules to infer up the taxonomy using the ''all-exhibit'' relation.<br />
<br />
=====What has to change?=====<br />
<br />
To implement this strategy, two new relations can be defined in the Phenoscape Vocab ontology, where the current definition of the ''PHENOSCAPE:exhibits'' relation is found. At the curation level, curators have to qualify their assertions as being either existentially or universally quantified. Specifically, the Phenex UI could tap the curator's shoulder and ask, "Ahem, does this annotation hold true for all specimens belonging to this taxa or just some specimens?" This needs some changes (no less!) to the Phenex interface and also to the character matrix format in which the data is exported. The data loader module of Phenoscape has to know this information so that the appropriate relation is used in creating the taxon-phenotype statement to be loaded into the knowledgebase. The query module will have to be modified to retrieve both inferred and asserted taxon-phenotype statements using the two different relations. The JSON format in which the data is exported needs to be modified to accommodate the two different kinds of relation statements, and lastly the UI will have to explicitly distinguish between the two.<br />
<br />
======A possible simpler solution (Update: Feb 22, 2010)======<br />
<br />
It is possible to check the rank of the taxon to which the phenotype assertion is made. If the rank of the taxon is not "species", then the new relation can be used in the top-down reasoning as shown in (15) below<br />
<br />
====Probabilistic assertions====<br />
<br />
Uncertainties are everywhere in the life sciences. The taxon-phenotype assertions can be augmented with uncertainty factors to address this issue. Inferences could use uncertainty calculi such as the [http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory Dempster-Schafer method] or Bayes conditional probability rule to derive uncertainty factors of the inferences given the uncertainty factors of the assertions.<br />
<br />
The advantage of this strategy will be that we can continue to use the ''PHENOSCAPE:exhibits'' relation for taxon-phenotype statements, and at the same time display the uncertainty values associated with every assertion displayed at the UI; far more intuitive than "Taxon T exhibits increased size of E AND decreased size of E."<br />
<br />
=====What needs to change?=====<br />
<br />
Curators will have to manually enter uncertainty factors (UFs) of the assertions in the Phenex UI, which needs modification to handle these. The character matrix format needs to be modified to accommodate UFs. The data loader module needs to use reified statements around assertions to store UFs. The OBD reasoner will have to be augmented with an implementation of uncertainty calculus. The query module needs to retrieve the UF associated with every assertion, and export this in a modified JSON format. Lastly, the UI will have to add a provision to display uncertainty factors.<br />
<br />
== The problem with absence of features==<br />
<br />
Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the [http://sig.biostr.washington.edu/projects/fm/ FMA] and the [http://www.berkeleybop.org/ontologies/obo-all/teleost_anatomy/teleost_anatomy.obo TAO] contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.<br />
<br />
Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent in organism" (PATO:0000462), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:1380). This is shown below.<br />
<br />
<javascript><br />
TTO:1380 PHENOSCAPE:exhibits PATO:0000462^OBO_REL:inheres_in(TAO:0001510)<br />
</javascript><br />
<br />
In plain English, this translates to "Siluriformes exhibit absence in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this methodology, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to ''at least one'' instance of basihyal cartilage. Combining a quality ''absent'' with a ''feature'' through the ''inheres_in'' property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the ''inheres_in'' relation. These problems have been discussed in [http://www.ncbi.nlm.nih.gov/pubmed/17369081 Ceusters et al] and [http://www.biomedcentral.com/1471-2105/8/377 Hoehndorf et al]. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.<br />
<br />
[[Media:PhenotypesInPhenoscape.ppt]]<br />
<br />
[[Discussion about the Absence of Phenotypes issue]]<br />
<br />
Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner<br />
<br />
'''Rule:'''<math>\forall</math>F1, F2, S: ''absent_in''(F1, S) <math>\and</math> ''develops_from''(F2, F1) <math>\Rightarrow</math> ''absent_in''(F2, S)<br />
<br />
This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage ''absent_in'' Siluriformes AND Basihyal_Bone ''develops_from'' Basihyal_cartilage, THEN Basihyal_Bone ''absent_in'' Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.<br />
<br />
Differences between the [[The exhibits relation conundrum|existing semantics]] and [[Relating taxa to phenotypes|desired semantics]] of the ''exhibits'' relation need to be resolved to address this issue. Potential strategies to implement the absence of features problem are discussed [[Novel reasoning strategies|here]].<br />
<br />
[[Category:EQ Annotation]]<br />
[[Category:Informatics]]<br />
[[Category:Ontology]]<br />
[[Category:Queries]]<br />
[[Category:Reasoning]]<br />
[[Category:Data]]<br />
[[Category:Curation]]<br />
[[Category:Taxonomy]]<br />
[[Category:OBD]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Data_Services&diff=6754Data Services2010-01-12T16:44:38Z<p>Crk18: /* Timestamp */</p>
<hr />
<div>This section details of the data services that query the OBD Phenoscape database and transfer the retrieved results to the Phenoscape UI. Each service may support multiple media types. The desired media type can be specified by appending <code>?media=json</code> or similar to the request URL. URI specifications are defined (loosely) using [http://bitworking.org/projects/URI-Templates/draft-gregorio-uritemplate-00.html URI Templates].<br />
<br />
==Timestamp==<br />
'''URI'''<br />
<br />
<BASE URI>/timestamp<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"timestamp":"2010-01-09"<br />
}<br />
<br />
</javascript><br />
<br />
==Term info==<br />
'''URI'''<br />
<br />
<BASE URI>/term/{term_id}<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"id" : "TAO:0001700",<br />
"name" : "caudal-fin stay",<br />
"definition" : "Bone that is located anterior to the caudal procurrent rays. Caudal fin stays are unpaired bone.",<br />
"comment" : "Some comment text that may be in there."<br />
"parents" :<br />
[<br />
{<br />
"relation" : {<br />
"id" : "OBO_REL:is_a",<br />
"name" : "is_a"<br />
},<br />
"target" : {<br />
"id" : "TAO:0001514",<br />
"name" : "bone"<br />
}<br />
<br />
},<br />
{<br />
"relation" : {<br />
"id" : "OBO_REL:part_of",<br />
"name" : "part_of"<br />
},<br />
"target" : {<br />
"id" : "TAO:0000862",<br />
"name" : "caudal fin skeleton"<br />
}<br />
}<br />
],<br />
"children" : [] // if there are children, this content should be in the same format as the parents list<br />
}<br />
// how should xrefs, etc. be represented, property_value definitions?<br />
</javascript><br />
<br />
OWL-RDF:<br />
<br />
Todo...<br />
<br />
'''Error'''<br />
<br />
If there is no term with the given ID, the service should return "404 Not Found".<br />
<br />
===Handling of anonymous post-compositions===<br />
<br />
==Autocomplete==<br />
'''URI'''<br />
<br />
<BASE URI>/term/search?text=[input]&name=[true|false]&syn=[true|false]&def=[true|false]&ontology=[ont1,ont2,...]&limit=[count]<br />
<br />
All URI parameters are optional except for <code>text</code>. Default values are name=true, syn=false, def=false. The "ontology" parameter should be a comma-separated list of ontology prefixes to search within. If not given, the default is to search all ontologies. Specifying "ZFIN" for the ontology should be a search for gene nodes, by gene name. The "limit" parameter limits the number of results to the given integer.<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"matches" : [<br />
{ // overall format<br />
"id" : "TAO:0001514",<br />
"name" : "bone",<br />
"match_type" : "name" | "syn" | "def",<br />
"match_text" : "this is the term name, synonym name, or definition that matched"<br />
},<br />
{ // a name example<br />
"id" : "TAO:0001514",<br />
"name" : "bone",<br />
"match_type" : "name",<br />
"match_text" : "bone"<br />
},<br />
{ // a synonym example<br />
"id" : "TAO:0001795",<br />
"name" : "ceratohyal foramen",<br />
"match_type" : "syn",<br />
"match_text" : "bericiform foramen"<br />
},<br />
{ // a definition example<br />
"id" : "TAO:0000488",<br />
"name" : "ceratobranchial bone",<br />
"match_type" : "def",<br />
"match_text" : "Ceratobranchials are bilaterally paired cartilage bones that form part of the ventral branchial arches. They articulate medially with the hypobranchials and laterally and dorsally with the epibranchials. Ceratobranchials 1-5 ossify in the ceratobranchial cartilages."<br />
}<br />
],<br />
"search_term" : "bone",<br />
"total" : 1859<br />
}<br />
</javascript><br />
<br />
'''Error'''<br />
<br />
If there are no terms matching the given input, a document should still be returned, containing an empty results list.<br />
<br />
==Annotations summary==<br />
'''URI'''<br />
<br />
<BASE URI>/phenotypes/summary?subject={subject_id}&entity={entity_id}&quality={quality_id}&pub={publication_id]&examples={examples_count}<br />
<br />
This service returns a summary of the phenotype annotations involving the given terms. The annotations are grouped by "character", which a unique combination of an entity and the "character quality" to which the quality in the annotation corresponds.<br />
<br />
All parameters are optional. If no parameters are specified, the service should return a summary of all the phenotype annotations in the database. Otherwise, any term ID given for the various parameters restricts the annotations to only those concerning those terms (where "concerning" is defined for each type of parameter).<br />
* '''subject''' - an ID for a taxon or gene - the summarized characters are only those which this term ''exhibits''<br />
* '''entity''' - an ID for an anatomical term - the summarized characters are only those whose phenotype ''inheres_in'' this entity (note - maybe we should add another parameter to include phenotypes which ''inhere_in_part_of'' the entity)<br />
* '''quality''' - an ID for a quality term - the summarized characters are only those whose phenotype ''is_a'' type of this term<br />
* '''pub''' - an ID for a publication - the summarized characters are only those contributed by the given publication<br />
* '''examples''' - the number of example terms to include in the output - default is zero<br />
<br />
For each character, find all taxa and genes which exhibit phenotypes corresponding to that character. If "subject" is a taxon, limit the included taxa to those within the given subject taxon. If "pub" is specified, limit the included taxa to those within the given publication.<br />
<br />
''Returns''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"characters" : [<br />
{<br />
"entity" : { "id" : "TAO:34242", "name" : "basihyal bone" },<br />
"character_quality" : { "id" : "PATO:34242", "name" : "texture" },<br />
"qualities" : {<br />
"count" : 14,<br />
"examples" : [<br />
{ "id" : "PATO:34242", "name" : "some quality" },<br />
{ "id" : "PATO:34242", "name" : "some quality" }<br />
]<br />
},<br />
"taxa" : {<br />
"count" : 2,<br />
"examples" : [<br />
{ "id" : "TTO:34242", "name" : "some taxon" },<br />
{ "id" : "TTO:34246", "name" : "some taxon" }<br />
]<br />
},<br />
"genes" : {<br />
"count" : 2,<br />
"examples" : [<br />
{ "id" : "ZDB:34242", "name" : "some gene" },<br />
{ "id" : "ZDB:34246", "name" : "some gene" }<br />
]<br />
}<br />
},<br />
{<br />
"entity" : { "id" : "TAO:34242", "name" : "dorsal fin ray" },<br />
"character_quality" : { "id" : "PATO:34242", "name" : "shape" },<br />
"qualities" : {<br />
"count" : 9,<br />
"examples" : [<br />
{ "id" : "PATO:34242", "name" : "some quality" },<br />
{ "id" : "PATO:34242", "name" : "some quality" }<br />
]<br />
},<br />
"taxa" : {<br />
"count" : 5,<br />
"examples" : [<br />
{ "id" : "TTO:34242", "name" : "some taxon" },<br />
{ "id" : "TTO:34246", "name" : "some taxon" }<br />
]<br />
},<br />
"genes" : {<br />
"count" : 7,<br />
"examples" : [<br />
{ "id" : "ZDB:34242", "name" : "some gene" },<br />
{ "id" : "ZDB:34246", "name" : "some gene" }<br />
]<br />
}<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
==Annotations results==<br />
'''URI'''<br />
<br />
<BASE URI>/phenotypes?subject={subject_id}&entity={entity_id}&quality={quality_id}&pub={publication_id]&type=[evo|devo]&group=[root|{taxon_id}]<br />
<br />
This service returns the phenotype annotations involving the given terms.<br />
<br />
All parameters are optional. If no parameters are specified, the service should return all the phenotype annotations in the database. Otherwise, any term ID given for the various parameters restricts the annotations to only those concerning those terms (where "concerning" is defined for each type of parameter).<br />
* '''subject''' - an ID for a taxon or gene - the returned annotations are only those which this term ''exhibits''<br />
* '''entity''' - an ID for an anatomical term - the returned annotations are only those whose phenotype ''inheres_in'' this entity (note - maybe we should add another parameter to include phenotypes which ''inhere_in_part_of'' the entity)<br />
* '''quality''' - an ID for a quality term - the returned annotations are only those whose phenotype ''is_a'' type of this term<br />
* '''pub''' - an ID for a publication - the returned annotations are only those contributed by the given publication<br />
* '''type''' - either "evo" or "devo" - return either evolutionary data (evo) or model organism data (devo) or both if not specified<br />
* '''group''' - if "root", return, as the subject, only the most recent common ancestor taxon for all taxa found in the query, along with the set of phenotypes found in the results. If a taxon id, return the results grouped into immediate children of that taxon (only those that have results). Only include the annotation ID if that phenotype-to-taxon link was actually asserted.<br />
<br />
''Returns''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"subjects" : [<br />
{ "id" : "TTO:34242",<br />
"name" : "some taxon",<br />
"leaf" : false, //whether it is useful to query for more specific sub-taxa<br />
"phenotypes" : [<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : 22<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
}<br />
]<br />
},<br />
{ "id" : "TTO:34243",<br />
"name" : "some taxon",<br />
"phenotypes" : [<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
}<br />
]<br />
},<br />
{ "id" : "TTO:34244",<br />
"name" : "some taxon",<br />
"phenotypes" : [<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
==Homology info==<br />
Return any homology statements which the given term participates in. Regardless of the direction of the homologous link in the database, the queried term should be returned as the "subject" in the returned data - the homologous term would be the "target".<br />
<br />
'''URI'''<br />
<br />
<BASE URI>/term/{term_id}/homology<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{"homologies" : [<br />
{<br />
"subject" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"target" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"source" : {<br />
"publication" : "citation text",<br />
"evidence" : {"id" : "ECO:xxxx", "name" : "some evidence code"}<br />
}<br />
},<br />
{<br />
"subject" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"target" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"source" : {<br />
"publication" : "citation text",<br />
"evidence" : {"id" : "ECO:xxxx", "name" : "some evidence code"}<br />
}<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
==Annotation source information==<br />
<br />
NOTE: URI spec has been changed because of Resolution problems on the Eryops server: Cartik<br />
<br />
'''URI'''<br />
<br />
<BASE URI>/phenotypes/source/{annotation_id}<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"phenotype" : {<br />
"subject" : { "id" : "TTO:34242", "name" : "some taxon" },<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" }<br />
},<br />
"sources" : [<br />
{<br />
"publication" : "citation text",<br />
"character_text" : "blah",<br />
"character_comment" : "blah",<br />
"state_text" : "blah",<br />
"curated_by" : "blah"<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
[[Category:Informatics]]<br />
[[Category:API]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Data_Services&diff=6753Data Services2010-01-12T16:44:05Z<p>Crk18: </p>
<hr />
<div>This section details of the data services that query the OBD Phenoscape database and transfer the retrieved results to the Phenoscape UI. Each service may support multiple media types. The desired media type can be specified by appending <code>?media=json</code> or similar to the request URL. URI specifications are defined (loosely) using [http://bitworking.org/projects/URI-Templates/draft-gregorio-uritemplate-00.html URI Templates].<br />
<br />
==Timestamp==<br />
'''URI'''<br />
<br />
<BASE URI>/term/{term_id}<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"timestamp":"2010-01-09"<br />
}<br />
<br />
</javascript><br />
<br />
==Term info==<br />
'''URI'''<br />
<br />
<BASE URI>/term/{term_id}<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"id" : "TAO:0001700",<br />
"name" : "caudal-fin stay",<br />
"definition" : "Bone that is located anterior to the caudal procurrent rays. Caudal fin stays are unpaired bone.",<br />
"comment" : "Some comment text that may be in there."<br />
"parents" :<br />
[<br />
{<br />
"relation" : {<br />
"id" : "OBO_REL:is_a",<br />
"name" : "is_a"<br />
},<br />
"target" : {<br />
"id" : "TAO:0001514",<br />
"name" : "bone"<br />
}<br />
<br />
},<br />
{<br />
"relation" : {<br />
"id" : "OBO_REL:part_of",<br />
"name" : "part_of"<br />
},<br />
"target" : {<br />
"id" : "TAO:0000862",<br />
"name" : "caudal fin skeleton"<br />
}<br />
}<br />
],<br />
"children" : [] // if there are children, this content should be in the same format as the parents list<br />
}<br />
// how should xrefs, etc. be represented, property_value definitions?<br />
</javascript><br />
<br />
OWL-RDF:<br />
<br />
Todo...<br />
<br />
'''Error'''<br />
<br />
If there is no term with the given ID, the service should return "404 Not Found".<br />
<br />
===Handling of anonymous post-compositions===<br />
<br />
==Autocomplete==<br />
'''URI'''<br />
<br />
<BASE URI>/term/search?text=[input]&name=[true|false]&syn=[true|false]&def=[true|false]&ontology=[ont1,ont2,...]&limit=[count]<br />
<br />
All URI parameters are optional except for <code>text</code>. Default values are name=true, syn=false, def=false. The "ontology" parameter should be a comma-separated list of ontology prefixes to search within. If not given, the default is to search all ontologies. Specifying "ZFIN" for the ontology should be a search for gene nodes, by gene name. The "limit" parameter limits the number of results to the given integer.<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"matches" : [<br />
{ // overall format<br />
"id" : "TAO:0001514",<br />
"name" : "bone",<br />
"match_type" : "name" | "syn" | "def",<br />
"match_text" : "this is the term name, synonym name, or definition that matched"<br />
},<br />
{ // a name example<br />
"id" : "TAO:0001514",<br />
"name" : "bone",<br />
"match_type" : "name",<br />
"match_text" : "bone"<br />
},<br />
{ // a synonym example<br />
"id" : "TAO:0001795",<br />
"name" : "ceratohyal foramen",<br />
"match_type" : "syn",<br />
"match_text" : "bericiform foramen"<br />
},<br />
{ // a definition example<br />
"id" : "TAO:0000488",<br />
"name" : "ceratobranchial bone",<br />
"match_type" : "def",<br />
"match_text" : "Ceratobranchials are bilaterally paired cartilage bones that form part of the ventral branchial arches. They articulate medially with the hypobranchials and laterally and dorsally with the epibranchials. Ceratobranchials 1-5 ossify in the ceratobranchial cartilages."<br />
}<br />
],<br />
"search_term" : "bone",<br />
"total" : 1859<br />
}<br />
</javascript><br />
<br />
'''Error'''<br />
<br />
If there are no terms matching the given input, a document should still be returned, containing an empty results list.<br />
<br />
==Annotations summary==<br />
'''URI'''<br />
<br />
<BASE URI>/phenotypes/summary?subject={subject_id}&entity={entity_id}&quality={quality_id}&pub={publication_id]&examples={examples_count}<br />
<br />
This service returns a summary of the phenotype annotations involving the given terms. The annotations are grouped by "character", which a unique combination of an entity and the "character quality" to which the quality in the annotation corresponds.<br />
<br />
All parameters are optional. If no parameters are specified, the service should return a summary of all the phenotype annotations in the database. Otherwise, any term ID given for the various parameters restricts the annotations to only those concerning those terms (where "concerning" is defined for each type of parameter).<br />
* '''subject''' - an ID for a taxon or gene - the summarized characters are only those which this term ''exhibits''<br />
* '''entity''' - an ID for an anatomical term - the summarized characters are only those whose phenotype ''inheres_in'' this entity (note - maybe we should add another parameter to include phenotypes which ''inhere_in_part_of'' the entity)<br />
* '''quality''' - an ID for a quality term - the summarized characters are only those whose phenotype ''is_a'' type of this term<br />
* '''pub''' - an ID for a publication - the summarized characters are only those contributed by the given publication<br />
* '''examples''' - the number of example terms to include in the output - default is zero<br />
<br />
For each character, find all taxa and genes which exhibit phenotypes corresponding to that character. If "subject" is a taxon, limit the included taxa to those within the given subject taxon. If "pub" is specified, limit the included taxa to those within the given publication.<br />
<br />
''Returns''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"characters" : [<br />
{<br />
"entity" : { "id" : "TAO:34242", "name" : "basihyal bone" },<br />
"character_quality" : { "id" : "PATO:34242", "name" : "texture" },<br />
"qualities" : {<br />
"count" : 14,<br />
"examples" : [<br />
{ "id" : "PATO:34242", "name" : "some quality" },<br />
{ "id" : "PATO:34242", "name" : "some quality" }<br />
]<br />
},<br />
"taxa" : {<br />
"count" : 2,<br />
"examples" : [<br />
{ "id" : "TTO:34242", "name" : "some taxon" },<br />
{ "id" : "TTO:34246", "name" : "some taxon" }<br />
]<br />
},<br />
"genes" : {<br />
"count" : 2,<br />
"examples" : [<br />
{ "id" : "ZDB:34242", "name" : "some gene" },<br />
{ "id" : "ZDB:34246", "name" : "some gene" }<br />
]<br />
}<br />
},<br />
{<br />
"entity" : { "id" : "TAO:34242", "name" : "dorsal fin ray" },<br />
"character_quality" : { "id" : "PATO:34242", "name" : "shape" },<br />
"qualities" : {<br />
"count" : 9,<br />
"examples" : [<br />
{ "id" : "PATO:34242", "name" : "some quality" },<br />
{ "id" : "PATO:34242", "name" : "some quality" }<br />
]<br />
},<br />
"taxa" : {<br />
"count" : 5,<br />
"examples" : [<br />
{ "id" : "TTO:34242", "name" : "some taxon" },<br />
{ "id" : "TTO:34246", "name" : "some taxon" }<br />
]<br />
},<br />
"genes" : {<br />
"count" : 7,<br />
"examples" : [<br />
{ "id" : "ZDB:34242", "name" : "some gene" },<br />
{ "id" : "ZDB:34246", "name" : "some gene" }<br />
]<br />
}<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
==Annotations results==<br />
'''URI'''<br />
<br />
<BASE URI>/phenotypes?subject={subject_id}&entity={entity_id}&quality={quality_id}&pub={publication_id]&type=[evo|devo]&group=[root|{taxon_id}]<br />
<br />
This service returns the phenotype annotations involving the given terms.<br />
<br />
All parameters are optional. If no parameters are specified, the service should return all the phenotype annotations in the database. Otherwise, any term ID given for the various parameters restricts the annotations to only those concerning those terms (where "concerning" is defined for each type of parameter).<br />
* '''subject''' - an ID for a taxon or gene - the returned annotations are only those which this term ''exhibits''<br />
* '''entity''' - an ID for an anatomical term - the returned annotations are only those whose phenotype ''inheres_in'' this entity (note - maybe we should add another parameter to include phenotypes which ''inhere_in_part_of'' the entity)<br />
* '''quality''' - an ID for a quality term - the returned annotations are only those whose phenotype ''is_a'' type of this term<br />
* '''pub''' - an ID for a publication - the returned annotations are only those contributed by the given publication<br />
* '''type''' - either "evo" or "devo" - return either evolutionary data (evo) or model organism data (devo) or both if not specified<br />
* '''group''' - if "root", return, as the subject, only the most recent common ancestor taxon for all taxa found in the query, along with the set of phenotypes found in the results. If a taxon id, return the results grouped into immediate children of that taxon (only those that have results). Only include the annotation ID if that phenotype-to-taxon link was actually asserted.<br />
<br />
''Returns''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"subjects" : [<br />
{ "id" : "TTO:34242",<br />
"name" : "some taxon",<br />
"leaf" : false, //whether it is useful to query for more specific sub-taxa<br />
"phenotypes" : [<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : 22<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
}<br />
]<br />
},<br />
{ "id" : "TTO:34243",<br />
"name" : "some taxon",<br />
"phenotypes" : [<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
}<br />
]<br />
},<br />
{ "id" : "TTO:34244",<br />
"name" : "some taxon",<br />
"phenotypes" : [<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
},<br />
{<br />
"id" : "internal annotation ID",<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" },<br />
"count" : ""<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
==Homology info==<br />
Return any homology statements which the given term participates in. Regardless of the direction of the homologous link in the database, the queried term should be returned as the "subject" in the returned data - the homologous term would be the "target".<br />
<br />
'''URI'''<br />
<br />
<BASE URI>/term/{term_id}/homology<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{"homologies" : [<br />
{<br />
"subject" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"target" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"source" : {<br />
"publication" : "citation text",<br />
"evidence" : {"id" : "ECO:xxxx", "name" : "some evidence code"}<br />
}<br />
},<br />
{<br />
"subject" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"target" : {<br />
"entity" : {"id" : "TAO:xxxx", "name" : "some anatomical thing"},<br />
"taxon" : {"id" : "TTO:xxxx", "name" : "some taxon"}<br />
},<br />
"source" : {<br />
"publication" : "citation text",<br />
"evidence" : {"id" : "ECO:xxxx", "name" : "some evidence code"}<br />
}<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
==Annotation source information==<br />
<br />
NOTE: URI spec has been changed because of Resolution problems on the Eryops server: Cartik<br />
<br />
'''URI'''<br />
<br />
<BASE URI>/phenotypes/source/{annotation_id}<br />
<br />
'''Returns'''<br />
<br />
JSON:<br />
<javascript><br />
{<br />
"phenotype" : {<br />
"subject" : { "id" : "TTO:34242", "name" : "some taxon" },<br />
"entity" : { "id" : "TAO:34242", "name" : "some entity" },<br />
"quality" : { "id" : "PATO:34242", "name" : "some quality" }<br />
},<br />
"sources" : [<br />
{<br />
"publication" : "citation text",<br />
"character_text" : "blah",<br />
"character_comment" : "blah",<br />
"state_text" : "blah",<br />
"curated_by" : "blah"<br />
}<br />
]<br />
}<br />
</javascript><br />
<br />
[[Category:Informatics]]<br />
[[Category:API]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Phenoscape_data_repository&diff=6702Phenoscape data repository2009-12-14T21:46:36Z<p>Crk18: /* The data warehouse schema */</p>
<hr />
<div>The Phenoscape data repository is a relational database, which holds phenotypic data from the model organism ''Danio Rerio'' (Zebrafish) and the evolutionary organisms belong to the clade of [http://tolweb.org/Ostariophysi/15077 Ostariophysi]. This page describes the schema of this data repository and outlines the methods to load and query this data repository.<br />
<br />
== Data Repository ==<br />
<br />
The Phenoscape data repository has been implemented as a [http://www.postgresql.org/ PostgreSQL] relational database, and at present is housed on a dedicated database server.<br />
<br />
== Schema ==<br />
<br />
The schema of the Phenoscape data repository is based upon the Open Biomedical Database (OBD) data format developed at the [http://www.berkeleybop.org/ Berkeley Bioinformatics Open-source Projects (BBOP)]. OBD is based upon the [http://www.w3.org/RDF Resource Description Framework (RDF)] format for capturing metadata about Web (and Semantic Web) resources such as Web pages and Web services.<br />
<br />
The philosophy of OBD is to represent every conceptual entity, be it a type or a token (synonymously a class or an object, or a concept or an instance) or a relation definition, as a Node. Instances of relations between these nodes are represented as Statements, specifically Link Statements. OBD also allows for [http://en.wikipedia.org/wiki/Reification_(computer_science) reification], which is vital to the life sciences with their emphasis on evidence codes and attributions (provenance). For this purpose, OBD provides Literal Statements (and Annotation Statements) to capture metadata about Nodes and Link Statements, such as the source publication, evidence codes, specimens used, and so forth.<br />
<br />
=== Tables ===<br />
<br />
Two relational tables are central to the schema of the Phenoscape data repository. These are: LINK and NODE. The SQL commands for the creation of these tables (and the others) can be found at this [http://phenoscape.svn.sourceforge.net/viewvc/phenoscape/trunk/src/Database/sql/obd/obd-core-schema.sql?view=markup Phenoscape Sourceforge] page.<br />
<br />
==== The NODE table ====<br />
<br />
The NODE table contains information about every concept such as its unique identifier, label, and source ontology. The NODE table contains this information about concepts extracted from the source [[Ontologies]]. In addition, it also holds information about scientific publications (in a rudimentary format which will be improved upon soon), the ontologies themselves, and the representation of phenotypes from the ZFIN and NeXML databases. It will be augmented in the future to hold information about collection specimens. The NODE table adds a unique identifier (generated from a sequence) of its own to every row. An excerpt of the row from the NODE table for the ''Gymnotiformes'' term is shown below<br />
<br />
<javascript><br />
<br />
node_id | uid | label | metatype | source_id<br />
---------+----------+---------------+----------+-----------<br />
46050 | TTO:1390 | Gymnotiformes | C | 9630<br />
<br />
</javascript><br />
<br />
* The NODE_ID column holds the unique identifier generated by the Phenoscape database for this term<br />
* The UID column holds the identifier of this term that is obtained from the Teleost Taxonomy Ontology (TTO). The 'TTO' is the namespace prefix<br />
* The LABEL column displays the label for this term<br />
* The METATYPE column shows term is a Class (C). Other metatypes are Relation (R) and Instance (I).<br />
* The SOURCE_ID column holds the NODE_ID of the ontology from which the term was extracted. In this case, the source ontology is the TTO<br />
<br />
==== The LINK table ====<br />
<br />
The LINK table contains rows which represent Statements which link the Nodes to one another, and also the metadata about these Nodes. The excerpt below shows some of the rows in the LINK table about the Gymnotiformes term<br />
<br />
<javascript><br />
<br />
link_id | node_id | predicate_id | object_id<br />
---------+---------+--------------+-----------<br />
23854 | 9637 | 102 | 46050<br />
59897 | 45723 | 102 | 46050<br />
60223 | 46050 | 102 | 46160<br />
501448 | 9932 | 102 | 46050<br />
<br />
</javascript><br />
<br />
* The LINK_ID column shows the database generated identifier for the link<br />
* The NODE_ID column shows the Subject of the Statement (in RDF parlance). This ID the is database generated identifier for the concept ''Eigenmanniidae'' (TTO:10000005)<br />
* The PREDICATE_ID column shows the Predicate of the Statement. This ID is the database generated identifier for the relation ''OBO_REL:is_a''.<br />
* The OBJECT_ID column shows the Object of the Statement which is the ID generated by the database for concept ''Gymnotiformes''<br />
<br />
In simple terms, a sub species of Gymnotiformes is displayed by this Statement as shown in the triple below<br />
<br />
<javascript><br />
Eigenmanniidae is_a Gymnotiformes<br />
</javascript><br />
<br />
Similarly, The third row in the display shows that Gymnotiformes is an Otophysian as shown below<br />
<br />
<javascript><br />
Gymnotiformes is_a Otophysi<br />
</javascript><br />
<br />
==== Other important tables ====<br />
<br />
* The ALIAS table keeps track of the various aliases (alternate labels) of the concepts and relations, which are sourced from the ontologies<br />
* The DESCRIPTION table stores rich text descriptions of the concepts and relations, which are extracted from the ontologies<br />
* The OBD_SCHEMA_METADATA stores metadata about OBD such as the version of the OBD format in use, and also the last refresh date and time for the database<br />
<br />
=== Views ===<br />
<br />
The Phenoscape data repository also generates several views from the tables. These views are used in querying the database, some of which are part of the [[OBD API Documentation | OBDAPI]]<br />
<br />
=== Procedures ===<br />
<br />
Stored procedures are used in populating the database with defined terms from the ontologies, and with phenotypic descriptions obtained from curators. In addition, they are also used in generating inferences from the asserted data. In the future, stored procedures may be used as necessary for speedier data retrieval.<br />
<br />
== Loading the data ==<br />
<br />
The repository will be periodically refreshed to include the latest ontology definitions and curated data. At present, curated data is obtained from two different source which are:<br />
# The ZFIN data repository (model organism database) containing descriptions of mutant phenotypes and the related genes and genotypes of zebrafish. This data exists primarily as tab delimited simple text files<br />
# Annotations from a set of selected publications, which describe in rich-text unstructured format, the observed phenotypes of about 25000 different species of fish belonging to the clade of Ostariophysi. These annotations are entered by curators using the [[Phenex]] annotation tool and are saved in the [[http://www.nexml.org/ NeXML]] data format.<br />
<br />
A complete database refresh using the [[Phenoscape data loader]] can be started off by running the "refresh-database" target in the Ant build file in the 'Phenoscape' folder of the [https://obo.svn.sourceforge.net/svnroot/obo/OBDAPI/trunk OBDAPI project].<br />
<br />
== Querying the data ==<br />
<br />
Queries have been implemented for retrieving phenotype information (summaries and details), homology information, summaries of search terms, metadata about phenotype assertions, and auto complete suggestions for search terms as they are being entered. Data retrieved by these queries are accessed by the various [[Data Services | Phenoscape data services]]. The details about these queries are presented here. <br />
<br />
===Relations of interest===<br />
<br />
This section discusses the various entities and binary directed links between these entities, which are leveraged by the database queries. Assertions about the model organism (from ZFIN) and the evolutionary species are converted into the ''exhibits'' link specified in (1) below. Note the right hand side of the ''exhibits'' link. It is a post composition of an Entity and a Quality, which makes up a description of a phenotype.<br />
<br />
<javascript><br />
Taxon exhibits inheres_in(Quality, Entity) --(1)<br />
</javascript><br />
<br />
The post composed phenotype is related to its components by the ''is_a'' and the ''inheres_in'' relation as shown in (2) and (3) below<br />
<br />
<javascript><br />
inheres_in(Quality, Entity) inheres_in Entity --(2)<br />
inheres_in(Quality, Entity) is_a Quality --(3)<br />
</javascript><br />
<br />
The Quality is related to a Character by an inferred ''value_for'' relation as shown in (4)<br />
<br />
<javascript><br />
Quality value_for Character --(4)<br />
</javascript><br />
<br />
An example should make this clearer. Consider the statement, "In Siluriformes, the shape of the dorsal surface of the basihyal bone is flat or convex" from [Albert, 2001]. This statement can be represented as in (1ex) below. Note the similar form to (1). Siluriformes is the taxon, flat is the quality, and basihyal bone is the entity.<br />
<br />
<javascript><br />
Siluriformes exhibits inheres_in(flat, basihyal bone) --(1ex)<br />
</javascript><br />
<br />
Now the post composed phenotype is related to its entity and quality components as in (2ex) and (3ex). Note the similarity to (2) and (3)<br />
<br />
<javascript><br />
inheres_in(flat, basihyal bone) inheres_in basihyal bone --(2ex)<br />
inheres_in(flat, basihyal bone) is_a flat --(3ex)<br />
</javascript><br />
<br />
Finally, the quality 'flat' is related to the character 'shape' by (4ex). Note that 'flat' is just one of the values for 'shape'. Other values my be 'rounded', 'curved', etc.<br />
<br />
<javascript><br />
flat value_for shape --(4ex)<br />
</javascript><br />
<br />
Moving on, the database also stores provenance information (metadata) about the assertion that Siluriformes exhibits flat basihyal bones. AT the very minimum, we need to know the publication from which the assertion was extracted. If the curators have specifically cited the text from the publication which forms the basis of their assertion, we need to know that as well.<br />
<br />
The database provides a handle to access this metadata from the assertion itself. The LINK table includes a reiflink_node_id attribute, from which publication, curator names, character and state text, and all other relevant metadata can be accessed. Without going into more database specific details, conceptually the statement (1ex) is linked to a reification identifer, which is linked to the actual metadata. Transparently, the statement (1ex) can be linked with a publication as shown in (5ex) below. The linkage to the other facets of the metadata is done similarly.<br />
<br />
<javascript><br />
(1ex) posited_by Albert, 2001 --(5ex)<br />
</javascript><br />
<br />
The schema of the relations is shown below [[Image:PhenoscapeInTriples.jpg]]<br />
<br />
===Speeding up the queries: The data warehouse===<br />
<br />
Queries used in the Phenoscape data services module were found to be intolerably slow in returning, esp. when asked to retrieve and [[Data_Services#Annotations_summary | summarize annotation data]] for genes and teleost species. The slow times in query execution were primarily due to the large numbers of JOINs in them, and the extensive volume of data, which needed to be processed in various facets of the query execution plan.<br />
<br />
To address this issue, it was decided to create summaries of the annotations in the database in simple data warehouse tables. New queries which were tested on these summary tables executed much faster, having dispensed with the numerous JOINs between the [[#The_NODE_table | NODE]] and [[#The_LINK_table | LINK]] tables, aliased several times over.<br />
<br />
====[[conceptual_schema | The data warehouse schema ]]====<br />
<br />
<br />
<br />
The data warehouse has been designed with the intent of maximizing the efficiency of queries executed on the Phenoscape knowledge base. For phenotype queries, we need to know the phenotype in question, the taxa or genes which are associated with that phenotype, as well as the entity and quality associated with that phenotype. We also need to find the character, which the quality is associated with. For example if the quality is ''reduced number of'', the character in question would be ''count''.<br />
<br />
To effectively execute this query, the phenotype centric model of the data warehouse is designed as follows (concepts and attributes are capitalized). A taxon or gene may be associated with one or more PHENOTYPE(s) and a PHENOTYPE may be associated with one or more genes or taxa. A PHENOTYPE is associated with exactly one ENTITY and one QUALITY. A QUALITY may be associated with one or more PHENOTYPE(s). Further, a QUALITY is associated with exactly one CHARACTER, which is a QUALITY as well.<br />
<br />
For queries for provenance data about taxon to phenotype assertions, we need to find the publication the assertion is extracted from, the specific text from the publication about character and state, as well as the curators' comments about the assertion.<br />
<br />
To effectively execute these 'metadata' queries, the provenance data is modeled as an association attribute. For every instance of the association between a TAXON and a PHENOYPE, we capture CHARACTER. STATE, CURATORS, and PUBLICATION. The PUBLICATION entity with all its attributes is linked to the REIF entity, which is the link to the metadata of the TAXON and PHENOTYPE.<br />
<br />
This data warehouse can be reduced to the logical schema shown below<br />
<br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene</b></td></tr><br />
<tr><br />
<td align=center>Gene_id {PK} </td><br />
<td align=center>Gene_Uid</td><br />
<td align=center>Gene_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=2><b>Gene_Alias</b></td></tr><br />
<tr><br />
<td align=center>Gene_id{FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Genotype</b></td></tr><br />
<tr><br />
<td align=center>Genotype_id{PK}</td><br />
<td align=center>Genotype_Uid</td><br />
<td align=center>Genotype_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {PK}</td><br />
<td align=center>Taxon_Uid</td><br />
<td align=center>Taxon_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Alias</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Is_A_Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id{FK}</td><br />
<td align=center>Taxon_id{FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {PK}</td><br />
<td align=center>Entity_Uid</td><br />
<td align=center>Entity_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Is_A_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Part_Of_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Quality</b></td></tr><br />
<tr><br />
<td align=center>Quality_Id {PK}</td><br />
<td align=center>Quality_Uid </td><br />
<td align=center>Quality_Label </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=7><b>Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Phenotype_Id {PK}</td><br />
<td align=center>Phenotype_Uid</td><br />
<td align=center>Inheres_In_Entity_id {FK}</td><br />
<td align=center>Towards_Entity_id {FK}</td><br />
<td align=center>Is_A_Quality_id {FK}</td><br />
<td align=center>Is_A_Character_id {FK}</td><br />
<td align=center>Has_count</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene_Genotype_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Gene_Id {FK}</td><br />
<td align=center>Genotype_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Taxon_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=5><b>Taxon_Phenotype_Metadata</b></td></tr><br />
<tr><br />
<td align=center>Reif_Id {PK}</td><br />
<td align=center>Character_Text</td><br />
<td align=center>State_Text </td><br />
<td align=center>Curators </td><br />
<td align=center>Curator_Comments </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=7><b>Publication</b></td></tr><br />
<tr><br />
<td align=center>Publication {PK}</td><br />
<td align=center>Primary_Title</td><br />
<td align=center>Secondary_Title </td><br />
<td align=center>Pages </td><br />
<td align=center>Volume </td><br />
<td align=center>Abstract </td><br />
<td align=center>Year </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=2><b>Publication_Reif_id</b></td></tr><br />
<tr><br />
<td align=center>Publication {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br />
[[Category:OBD]]<br />
[[Category:Database]]<br />
[[Category:Informatics]]<br />
[[Category:Data]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Phenoscape_data_repository&diff=6701Phenoscape data repository2009-12-14T21:45:59Z<p>Crk18: /* The data warehouse schema */</p>
<hr />
<div>The Phenoscape data repository is a relational database, which holds phenotypic data from the model organism ''Danio Rerio'' (Zebrafish) and the evolutionary organisms belong to the clade of [http://tolweb.org/Ostariophysi/15077 Ostariophysi]. This page describes the schema of this data repository and outlines the methods to load and query this data repository.<br />
<br />
== Data Repository ==<br />
<br />
The Phenoscape data repository has been implemented as a [http://www.postgresql.org/ PostgreSQL] relational database, and at present is housed on a dedicated database server.<br />
<br />
== Schema ==<br />
<br />
The schema of the Phenoscape data repository is based upon the Open Biomedical Database (OBD) data format developed at the [http://www.berkeleybop.org/ Berkeley Bioinformatics Open-source Projects (BBOP)]. OBD is based upon the [http://www.w3.org/RDF Resource Description Framework (RDF)] format for capturing metadata about Web (and Semantic Web) resources such as Web pages and Web services.<br />
<br />
The philosophy of OBD is to represent every conceptual entity, be it a type or a token (synonymously a class or an object, or a concept or an instance) or a relation definition, as a Node. Instances of relations between these nodes are represented as Statements, specifically Link Statements. OBD also allows for [http://en.wikipedia.org/wiki/Reification_(computer_science) reification], which is vital to the life sciences with their emphasis on evidence codes and attributions (provenance). For this purpose, OBD provides Literal Statements (and Annotation Statements) to capture metadata about Nodes and Link Statements, such as the source publication, evidence codes, specimens used, and so forth.<br />
<br />
=== Tables ===<br />
<br />
Two relational tables are central to the schema of the Phenoscape data repository. These are: LINK and NODE. The SQL commands for the creation of these tables (and the others) can be found at this [http://phenoscape.svn.sourceforge.net/viewvc/phenoscape/trunk/src/Database/sql/obd/obd-core-schema.sql?view=markup Phenoscape Sourceforge] page.<br />
<br />
==== The NODE table ====<br />
<br />
The NODE table contains information about every concept such as its unique identifier, label, and source ontology. The NODE table contains this information about concepts extracted from the source [[Ontologies]]. In addition, it also holds information about scientific publications (in a rudimentary format which will be improved upon soon), the ontologies themselves, and the representation of phenotypes from the ZFIN and NeXML databases. It will be augmented in the future to hold information about collection specimens. The NODE table adds a unique identifier (generated from a sequence) of its own to every row. An excerpt of the row from the NODE table for the ''Gymnotiformes'' term is shown below<br />
<br />
<javascript><br />
<br />
node_id | uid | label | metatype | source_id<br />
---------+----------+---------------+----------+-----------<br />
46050 | TTO:1390 | Gymnotiformes | C | 9630<br />
<br />
</javascript><br />
<br />
* The NODE_ID column holds the unique identifier generated by the Phenoscape database for this term<br />
* The UID column holds the identifier of this term that is obtained from the Teleost Taxonomy Ontology (TTO). The 'TTO' is the namespace prefix<br />
* The LABEL column displays the label for this term<br />
* The METATYPE column shows term is a Class (C). Other metatypes are Relation (R) and Instance (I).<br />
* The SOURCE_ID column holds the NODE_ID of the ontology from which the term was extracted. In this case, the source ontology is the TTO<br />
<br />
==== The LINK table ====<br />
<br />
The LINK table contains rows which represent Statements which link the Nodes to one another, and also the metadata about these Nodes. The excerpt below shows some of the rows in the LINK table about the Gymnotiformes term<br />
<br />
<javascript><br />
<br />
link_id | node_id | predicate_id | object_id<br />
---------+---------+--------------+-----------<br />
23854 | 9637 | 102 | 46050<br />
59897 | 45723 | 102 | 46050<br />
60223 | 46050 | 102 | 46160<br />
501448 | 9932 | 102 | 46050<br />
<br />
</javascript><br />
<br />
* The LINK_ID column shows the database generated identifier for the link<br />
* The NODE_ID column shows the Subject of the Statement (in RDF parlance). This ID the is database generated identifier for the concept ''Eigenmanniidae'' (TTO:10000005)<br />
* The PREDICATE_ID column shows the Predicate of the Statement. This ID is the database generated identifier for the relation ''OBO_REL:is_a''.<br />
* The OBJECT_ID column shows the Object of the Statement which is the ID generated by the database for concept ''Gymnotiformes''<br />
<br />
In simple terms, a sub species of Gymnotiformes is displayed by this Statement as shown in the triple below<br />
<br />
<javascript><br />
Eigenmanniidae is_a Gymnotiformes<br />
</javascript><br />
<br />
Similarly, The third row in the display shows that Gymnotiformes is an Otophysian as shown below<br />
<br />
<javascript><br />
Gymnotiformes is_a Otophysi<br />
</javascript><br />
<br />
==== Other important tables ====<br />
<br />
* The ALIAS table keeps track of the various aliases (alternate labels) of the concepts and relations, which are sourced from the ontologies<br />
* The DESCRIPTION table stores rich text descriptions of the concepts and relations, which are extracted from the ontologies<br />
* The OBD_SCHEMA_METADATA stores metadata about OBD such as the version of the OBD format in use, and also the last refresh date and time for the database<br />
<br />
=== Views ===<br />
<br />
The Phenoscape data repository also generates several views from the tables. These views are used in querying the database, some of which are part of the [[OBD API Documentation | OBDAPI]]<br />
<br />
=== Procedures ===<br />
<br />
Stored procedures are used in populating the database with defined terms from the ontologies, and with phenotypic descriptions obtained from curators. In addition, they are also used in generating inferences from the asserted data. In the future, stored procedures may be used as necessary for speedier data retrieval.<br />
<br />
== Loading the data ==<br />
<br />
The repository will be periodically refreshed to include the latest ontology definitions and curated data. At present, curated data is obtained from two different source which are:<br />
# The ZFIN data repository (model organism database) containing descriptions of mutant phenotypes and the related genes and genotypes of zebrafish. This data exists primarily as tab delimited simple text files<br />
# Annotations from a set of selected publications, which describe in rich-text unstructured format, the observed phenotypes of about 25000 different species of fish belonging to the clade of Ostariophysi. These annotations are entered by curators using the [[Phenex]] annotation tool and are saved in the [[http://www.nexml.org/ NeXML]] data format.<br />
<br />
A complete database refresh using the [[Phenoscape data loader]] can be started off by running the "refresh-database" target in the Ant build file in the 'Phenoscape' folder of the [https://obo.svn.sourceforge.net/svnroot/obo/OBDAPI/trunk OBDAPI project].<br />
<br />
== Querying the data ==<br />
<br />
Queries have been implemented for retrieving phenotype information (summaries and details), homology information, summaries of search terms, metadata about phenotype assertions, and auto complete suggestions for search terms as they are being entered. Data retrieved by these queries are accessed by the various [[Data Services | Phenoscape data services]]. The details about these queries are presented here. <br />
<br />
===Relations of interest===<br />
<br />
This section discusses the various entities and binary directed links between these entities, which are leveraged by the database queries. Assertions about the model organism (from ZFIN) and the evolutionary species are converted into the ''exhibits'' link specified in (1) below. Note the right hand side of the ''exhibits'' link. It is a post composition of an Entity and a Quality, which makes up a description of a phenotype.<br />
<br />
<javascript><br />
Taxon exhibits inheres_in(Quality, Entity) --(1)<br />
</javascript><br />
<br />
The post composed phenotype is related to its components by the ''is_a'' and the ''inheres_in'' relation as shown in (2) and (3) below<br />
<br />
<javascript><br />
inheres_in(Quality, Entity) inheres_in Entity --(2)<br />
inheres_in(Quality, Entity) is_a Quality --(3)<br />
</javascript><br />
<br />
The Quality is related to a Character by an inferred ''value_for'' relation as shown in (4)<br />
<br />
<javascript><br />
Quality value_for Character --(4)<br />
</javascript><br />
<br />
An example should make this clearer. Consider the statement, "In Siluriformes, the shape of the dorsal surface of the basihyal bone is flat or convex" from [Albert, 2001]. This statement can be represented as in (1ex) below. Note the similar form to (1). Siluriformes is the taxon, flat is the quality, and basihyal bone is the entity.<br />
<br />
<javascript><br />
Siluriformes exhibits inheres_in(flat, basihyal bone) --(1ex)<br />
</javascript><br />
<br />
Now the post composed phenotype is related to its entity and quality components as in (2ex) and (3ex). Note the similarity to (2) and (3)<br />
<br />
<javascript><br />
inheres_in(flat, basihyal bone) inheres_in basihyal bone --(2ex)<br />
inheres_in(flat, basihyal bone) is_a flat --(3ex)<br />
</javascript><br />
<br />
Finally, the quality 'flat' is related to the character 'shape' by (4ex). Note that 'flat' is just one of the values for 'shape'. Other values my be 'rounded', 'curved', etc.<br />
<br />
<javascript><br />
flat value_for shape --(4ex)<br />
</javascript><br />
<br />
Moving on, the database also stores provenance information (metadata) about the assertion that Siluriformes exhibits flat basihyal bones. AT the very minimum, we need to know the publication from which the assertion was extracted. If the curators have specifically cited the text from the publication which forms the basis of their assertion, we need to know that as well.<br />
<br />
The database provides a handle to access this metadata from the assertion itself. The LINK table includes a reiflink_node_id attribute, from which publication, curator names, character and state text, and all other relevant metadata can be accessed. Without going into more database specific details, conceptually the statement (1ex) is linked to a reification identifer, which is linked to the actual metadata. Transparently, the statement (1ex) can be linked with a publication as shown in (5ex) below. The linkage to the other facets of the metadata is done similarly.<br />
<br />
<javascript><br />
(1ex) posited_by Albert, 2001 --(5ex)<br />
</javascript><br />
<br />
The schema of the relations is shown below [[Image:PhenoscapeInTriples.jpg]]<br />
<br />
===Speeding up the queries: The data warehouse===<br />
<br />
Queries used in the Phenoscape data services module were found to be intolerably slow in returning, esp. when asked to retrieve and [[Data_Services#Annotations_summary | summarize annotation data]] for genes and teleost species. The slow times in query execution were primarily due to the large numbers of JOINs in them, and the extensive volume of data, which needed to be processed in various facets of the query execution plan.<br />
<br />
To address this issue, it was decided to create summaries of the annotations in the database in simple data warehouse tables. New queries which were tested on these summary tables executed much faster, having dispensed with the numerous JOINs between the [[#The_NODE_table | NODE]] and [[#The_LINK_table | LINK]] tables, aliased several times over.<br />
<br />
====[[conceptual_schema | The data warehouse schema ]]====<br />
<br />
<br />
<br />
The data warehouse has been designed with the intent of maximizing the efficiency of queries executed on the Phenoscape knowledge base. For phenotype queries, we need to know the phenotype in question, the taxa or genes which are associated with that phenotype, as well as the entity and quality associated with that phenotype. We also need to find the character, which the quality is associated with. For example if the quality is ''reduced number of'', the character in question would be ''count''.<br />
<br />
To effectively execute this query, the phenotype centric model of the data warehouse is designed as follows (concepts and attributes are capitalized). A taxon or gene may be associated with one or more PHENOTYPE(s) and a PHENOTYPE may be associated with one or more genes or taxa. A PHENOTYPE is associated with exactly one ENTITY and one QUALITY. A QUALITY may be associated with one or more PHENOTYPE(s). Further, a QUALITY is associated with exactly one CHARACTER, which is a QUALITY as well.<br />
<br />
For queries for provenance data about taxon to phenotype assertions, we need to find the publication the assertion is extracted from, the specific text from the publication about character and state, as well as the curators' comments about the assertion.<br />
<br />
To effectively execute these 'metadata' queries, the provenance data is modeled as an association attribute. For every instance of the association between a TAXON and a PHENOYPE, we capture CHARACTER. STATE, CURATORS, and PUBLICATION. The PUBLICATION entity with all its attributes is linked to the REIF entity, which is the link to the metadata of the TAXON and PHENOTYPE.<br />
<br />
This data warehouse can be reduced to the logical schema shown below<br />
<br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene</b></td></tr><br />
<tr><br />
<td align=center>Gene_id {PK} </td><br />
<td align=center>Gene_Uid</td><br />
<td align=center>Gene_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=2><b>Gene_Alias</b></td></tr><br />
<tr><br />
<td align=center>Gene_id{FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Genotype</b></td></tr><br />
<tr><br />
<td align=center>Genotype_id{PK}</td><br />
<td align=center>Genotype_Uid</td><br />
<td align=center>Genotype_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {PK}</td><br />
<td align=center>Taxon_Uid</td><br />
<td align=center>Taxon_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Alias</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Is_A_Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id{FK}</td><br />
<td align=center>Taxon_id{FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {PK}</td><br />
<td align=center>Entity_Uid</td><br />
<td align=center>Entity_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Is_A_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Part_Of_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Quality</b></td></tr><br />
<tr><br />
<td align=center>Quality_Id {PK}</td><br />
<td align=center>Quality_Uid </td><br />
<td align=center>Quality_Label </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=7><b>Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Phenotype_Id {PK}</td><br />
<td align=center>Phenotype_Uid</td><br />
<td align=center>Inheres_In_Entity_id {FK}</td><br />
<td align=center>Towards_Entity_id {FK}</td><br />
<td align=center>Is_A_Quality_id {FK}</td><br />
<td align=center>Is_A_Character_id {FK}</td><br />
<td align=center>Has_count</td><br />
<tr><br />
</table><br />
<br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene_Genotype_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Gene_Id {FK}</td><br />
<td align=center>Genotype_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Taxon_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=5><b>Taxon_Phenotype_Metadata</b></td></tr><br />
<tr><br />
<td align=center>Reif_Id {PK}</td><br />
<td align=center>Character_Text</td><br />
<td align=center>State_Text </td><br />
<td align=center>Curators </td><br />
<td align=center>Curator_Comments </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=7><b>Publication</b></td></tr><br />
<tr><br />
<td align=center>Publication {PK}</td><br />
<td align=center>Primary_Title</td><br />
<td align=center>Secondary_Title </td><br />
<td align=center>Pages </td><br />
<td align=center>Volume </td><br />
<td align=center>Abstract </td><br />
<td align=center>Year </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=2><b>Publication_Reif_id</b></td></tr><br />
<tr><br />
<td align=center>Publication {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br />
[[Category:OBD]]<br />
[[Category:Database]]<br />
[[Category:Informatics]]<br />
[[Category:Data]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Phenoscape_data_repository&diff=6700Phenoscape data repository2009-12-14T21:45:17Z<p>Crk18: /* The data warehouse schema */</p>
<hr />
<div>The Phenoscape data repository is a relational database, which holds phenotypic data from the model organism ''Danio Rerio'' (Zebrafish) and the evolutionary organisms belong to the clade of [http://tolweb.org/Ostariophysi/15077 Ostariophysi]. This page describes the schema of this data repository and outlines the methods to load and query this data repository.<br />
<br />
== Data Repository ==<br />
<br />
The Phenoscape data repository has been implemented as a [http://www.postgresql.org/ PostgreSQL] relational database, and at present is housed on a dedicated database server.<br />
<br />
== Schema ==<br />
<br />
The schema of the Phenoscape data repository is based upon the Open Biomedical Database (OBD) data format developed at the [http://www.berkeleybop.org/ Berkeley Bioinformatics Open-source Projects (BBOP)]. OBD is based upon the [http://www.w3.org/RDF Resource Description Framework (RDF)] format for capturing metadata about Web (and Semantic Web) resources such as Web pages and Web services.<br />
<br />
The philosophy of OBD is to represent every conceptual entity, be it a type or a token (synonymously a class or an object, or a concept or an instance) or a relation definition, as a Node. Instances of relations between these nodes are represented as Statements, specifically Link Statements. OBD also allows for [http://en.wikipedia.org/wiki/Reification_(computer_science) reification], which is vital to the life sciences with their emphasis on evidence codes and attributions (provenance). For this purpose, OBD provides Literal Statements (and Annotation Statements) to capture metadata about Nodes and Link Statements, such as the source publication, evidence codes, specimens used, and so forth.<br />
<br />
=== Tables ===<br />
<br />
Two relational tables are central to the schema of the Phenoscape data repository. These are: LINK and NODE. The SQL commands for the creation of these tables (and the others) can be found at this [http://phenoscape.svn.sourceforge.net/viewvc/phenoscape/trunk/src/Database/sql/obd/obd-core-schema.sql?view=markup Phenoscape Sourceforge] page.<br />
<br />
==== The NODE table ====<br />
<br />
The NODE table contains information about every concept such as its unique identifier, label, and source ontology. The NODE table contains this information about concepts extracted from the source [[Ontologies]]. In addition, it also holds information about scientific publications (in a rudimentary format which will be improved upon soon), the ontologies themselves, and the representation of phenotypes from the ZFIN and NeXML databases. It will be augmented in the future to hold information about collection specimens. The NODE table adds a unique identifier (generated from a sequence) of its own to every row. An excerpt of the row from the NODE table for the ''Gymnotiformes'' term is shown below<br />
<br />
<javascript><br />
<br />
node_id | uid | label | metatype | source_id<br />
---------+----------+---------------+----------+-----------<br />
46050 | TTO:1390 | Gymnotiformes | C | 9630<br />
<br />
</javascript><br />
<br />
* The NODE_ID column holds the unique identifier generated by the Phenoscape database for this term<br />
* The UID column holds the identifier of this term that is obtained from the Teleost Taxonomy Ontology (TTO). The 'TTO' is the namespace prefix<br />
* The LABEL column displays the label for this term<br />
* The METATYPE column shows term is a Class (C). Other metatypes are Relation (R) and Instance (I).<br />
* The SOURCE_ID column holds the NODE_ID of the ontology from which the term was extracted. In this case, the source ontology is the TTO<br />
<br />
==== The LINK table ====<br />
<br />
The LINK table contains rows which represent Statements which link the Nodes to one another, and also the metadata about these Nodes. The excerpt below shows some of the rows in the LINK table about the Gymnotiformes term<br />
<br />
<javascript><br />
<br />
link_id | node_id | predicate_id | object_id<br />
---------+---------+--------------+-----------<br />
23854 | 9637 | 102 | 46050<br />
59897 | 45723 | 102 | 46050<br />
60223 | 46050 | 102 | 46160<br />
501448 | 9932 | 102 | 46050<br />
<br />
</javascript><br />
<br />
* The LINK_ID column shows the database generated identifier for the link<br />
* The NODE_ID column shows the Subject of the Statement (in RDF parlance). This ID the is database generated identifier for the concept ''Eigenmanniidae'' (TTO:10000005)<br />
* The PREDICATE_ID column shows the Predicate of the Statement. This ID is the database generated identifier for the relation ''OBO_REL:is_a''.<br />
* The OBJECT_ID column shows the Object of the Statement which is the ID generated by the database for concept ''Gymnotiformes''<br />
<br />
In simple terms, a sub species of Gymnotiformes is displayed by this Statement as shown in the triple below<br />
<br />
<javascript><br />
Eigenmanniidae is_a Gymnotiformes<br />
</javascript><br />
<br />
Similarly, The third row in the display shows that Gymnotiformes is an Otophysian as shown below<br />
<br />
<javascript><br />
Gymnotiformes is_a Otophysi<br />
</javascript><br />
<br />
==== Other important tables ====<br />
<br />
* The ALIAS table keeps track of the various aliases (alternate labels) of the concepts and relations, which are sourced from the ontologies<br />
* The DESCRIPTION table stores rich text descriptions of the concepts and relations, which are extracted from the ontologies<br />
* The OBD_SCHEMA_METADATA stores metadata about OBD such as the version of the OBD format in use, and also the last refresh date and time for the database<br />
<br />
=== Views ===<br />
<br />
The Phenoscape data repository also generates several views from the tables. These views are used in querying the database, some of which are part of the [[OBD API Documentation | OBDAPI]]<br />
<br />
=== Procedures ===<br />
<br />
Stored procedures are used in populating the database with defined terms from the ontologies, and with phenotypic descriptions obtained from curators. In addition, they are also used in generating inferences from the asserted data. In the future, stored procedures may be used as necessary for speedier data retrieval.<br />
<br />
== Loading the data ==<br />
<br />
The repository will be periodically refreshed to include the latest ontology definitions and curated data. At present, curated data is obtained from two different source which are:<br />
# The ZFIN data repository (model organism database) containing descriptions of mutant phenotypes and the related genes and genotypes of zebrafish. This data exists primarily as tab delimited simple text files<br />
# Annotations from a set of selected publications, which describe in rich-text unstructured format, the observed phenotypes of about 25000 different species of fish belonging to the clade of Ostariophysi. These annotations are entered by curators using the [[Phenex]] annotation tool and are saved in the [[http://www.nexml.org/ NeXML]] data format.<br />
<br />
A complete database refresh using the [[Phenoscape data loader]] can be started off by running the "refresh-database" target in the Ant build file in the 'Phenoscape' folder of the [https://obo.svn.sourceforge.net/svnroot/obo/OBDAPI/trunk OBDAPI project].<br />
<br />
== Querying the data ==<br />
<br />
Queries have been implemented for retrieving phenotype information (summaries and details), homology information, summaries of search terms, metadata about phenotype assertions, and auto complete suggestions for search terms as they are being entered. Data retrieved by these queries are accessed by the various [[Data Services | Phenoscape data services]]. The details about these queries are presented here. <br />
<br />
===Relations of interest===<br />
<br />
This section discusses the various entities and binary directed links between these entities, which are leveraged by the database queries. Assertions about the model organism (from ZFIN) and the evolutionary species are converted into the ''exhibits'' link specified in (1) below. Note the right hand side of the ''exhibits'' link. It is a post composition of an Entity and a Quality, which makes up a description of a phenotype.<br />
<br />
<javascript><br />
Taxon exhibits inheres_in(Quality, Entity) --(1)<br />
</javascript><br />
<br />
The post composed phenotype is related to its components by the ''is_a'' and the ''inheres_in'' relation as shown in (2) and (3) below<br />
<br />
<javascript><br />
inheres_in(Quality, Entity) inheres_in Entity --(2)<br />
inheres_in(Quality, Entity) is_a Quality --(3)<br />
</javascript><br />
<br />
The Quality is related to a Character by an inferred ''value_for'' relation as shown in (4)<br />
<br />
<javascript><br />
Quality value_for Character --(4)<br />
</javascript><br />
<br />
An example should make this clearer. Consider the statement, "In Siluriformes, the shape of the dorsal surface of the basihyal bone is flat or convex" from [Albert, 2001]. This statement can be represented as in (1ex) below. Note the similar form to (1). Siluriformes is the taxon, flat is the quality, and basihyal bone is the entity.<br />
<br />
<javascript><br />
Siluriformes exhibits inheres_in(flat, basihyal bone) --(1ex)<br />
</javascript><br />
<br />
Now the post composed phenotype is related to its entity and quality components as in (2ex) and (3ex). Note the similarity to (2) and (3)<br />
<br />
<javascript><br />
inheres_in(flat, basihyal bone) inheres_in basihyal bone --(2ex)<br />
inheres_in(flat, basihyal bone) is_a flat --(3ex)<br />
</javascript><br />
<br />
Finally, the quality 'flat' is related to the character 'shape' by (4ex). Note that 'flat' is just one of the values for 'shape'. Other values my be 'rounded', 'curved', etc.<br />
<br />
<javascript><br />
flat value_for shape --(4ex)<br />
</javascript><br />
<br />
Moving on, the database also stores provenance information (metadata) about the assertion that Siluriformes exhibits flat basihyal bones. AT the very minimum, we need to know the publication from which the assertion was extracted. If the curators have specifically cited the text from the publication which forms the basis of their assertion, we need to know that as well.<br />
<br />
The database provides a handle to access this metadata from the assertion itself. The LINK table includes a reiflink_node_id attribute, from which publication, curator names, character and state text, and all other relevant metadata can be accessed. Without going into more database specific details, conceptually the statement (1ex) is linked to a reification identifer, which is linked to the actual metadata. Transparently, the statement (1ex) can be linked with a publication as shown in (5ex) below. The linkage to the other facets of the metadata is done similarly.<br />
<br />
<javascript><br />
(1ex) posited_by Albert, 2001 --(5ex)<br />
</javascript><br />
<br />
The schema of the relations is shown below [[Image:PhenoscapeInTriples.jpg]]<br />
<br />
===Speeding up the queries: The data warehouse===<br />
<br />
Queries used in the Phenoscape data services module were found to be intolerably slow in returning, esp. when asked to retrieve and [[Data_Services#Annotations_summary | summarize annotation data]] for genes and teleost species. The slow times in query execution were primarily due to the large numbers of JOINs in them, and the extensive volume of data, which needed to be processed in various facets of the query execution plan.<br />
<br />
To address this issue, it was decided to create summaries of the annotations in the database in simple data warehouse tables. New queries which were tested on these summary tables executed much faster, having dispensed with the numerous JOINs between the [[#The_NODE_table | NODE]] and [[#The_LINK_table | LINK]] tables, aliased several times over.<br />
<br />
====[[conceptual_schema | The data warehouse schema ]]====<br />
<br />
<br />
<br />
The data warehouse has been designed with the intent of maximizing the efficiency of queries executed on the Phenoscape knowledge base. For phenotype queries, we need to know the phenotype in question, the taxa or genes which are associated with that phenotype, as well as the entity and quality associated with that phenotype. We also need to find the character, which the quality is associated with. For example if the quality is ''reduced number of'', the character in question would be ''count''.<br />
<br />
To effectively execute this query, the phenotype centric model of the data warehouse is designed as follows (concepts and attributes are capitalized). A taxon or gene may be associated with one or more PHENOTYPE(s) and a PHENOTYPE may be associated with one or more genes or taxa. A PHENOTYPE is associated with exactly one ENTITY and one QUALITY. A QUALITY may be associated with one or more PHENOTYPE(s). Further, a QUALITY is associated with exactly one CHARACTER, which is a QUALITY as well.<br />
<br />
For queries for provenance data about taxon to phenotype assertions, we need to find the publication the assertion is extracted from, the specific text from the publication about character and state, as well as the curators' comments about the assertion.<br />
<br />
To effectively execute these 'metadata' queries, the provenance data is modeled as an association attribute. For every instance of the association between a TAXON and a PHENOYPE, we capture CHARACTER. STATE, CURATORS, and PUBLICATION. The PUBLICATION entity with all its attributes is linked to the REIF entity, which is the link to the metadata of the TAXON and PHENOTYPE.<br />
<br />
This data warehouse can be reduced to the logical schema shown below<br />
<br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene</b></td></tr><br />
<tr><br />
<td align=center>Gene_id {PK} </td><br />
<td align=center>Gene_Uid</td><br />
<td align=center>Gene_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=2><b>Gene_Alias</b></td></tr><br />
<tr><br />
<td align=center>Gene_id{FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Genotype</b></td></tr><br />
<tr><br />
<td align=center>Genotype_id{PK}</td><br />
<td align=center>Genotype_Uid</td><br />
<td align=center>Genotype_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {PK}</td><br />
<td align=center>Taxon_Uid</td><br />
<td align=center>Taxon_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Alias</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Is_A_Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id{FK}</td><br />
<td align=center>Taxon_id{FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {PK}</td><br />
<td align=center>Entity_Uid</td><br />
<td align=center>Entity_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Is_A_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Part_Of_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Quality</b></td></tr><br />
<tr><br />
<td align=center>Quality_Id {PK}</td><br />
<td align=center>Quality_Uid </td><br />
<td align=center>Quality_Label </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=7><b>Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Phenotype_Id {PK}</td><br />
<td align=center>Phenotype_Uid</td><br />
<td align=center>Inheres_In_Entity_id {FK}</td><br />
<td align=center>Towards_Entity_id {FK}</td><br />
<td align=center>Is_A_Quality_id {FK}</td><br />
<td align=center>Is_A_Character_id {FK}</td><br />
<td align=center>Has_count</td><br />
<tr><br />
</table><br />
<br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene_Genotype_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Gene_Id {FK}</td><br />
<td align=center>Genotype_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<tr><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Taxon_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=5><b>Taxon_Phenotype_Metadata</b></td></tr><br />
<tr><br />
<td align=center>Reif_Id {PK}</td><br />
<td align=center>Character_Text</td><br />
<td align=center>State_Text </td><br />
<td align=center>Curators </td><br />
<td align=center>Curator_Comments </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=7><b>Publication</b></td></tr><br />
<tr><br />
<td align=center>Publication {PK}</td><br />
<td align=center>Primary_Title</td><br />
<td align=center>Secondary_Title </td><br />
<td align=center>Pages </td><br />
<td align=center>Volume </td><br />
<td align=center>Abstract </td><br />
<td align=center>Year </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=2><b>Publication_Reif_id</b></td></tr><br />
<tr><br />
<td align=center>Publication {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br />
[[Category:OBD]]<br />
[[Category:Database]]<br />
[[Category:Informatics]]<br />
[[Category:Data]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=Phenoscape_data_repository&diff=6699Phenoscape data repository2009-12-14T21:43:32Z<p>Crk18: /* The data warehouse schema */</p>
<hr />
<div>The Phenoscape data repository is a relational database, which holds phenotypic data from the model organism ''Danio Rerio'' (Zebrafish) and the evolutionary organisms belong to the clade of [http://tolweb.org/Ostariophysi/15077 Ostariophysi]. This page describes the schema of this data repository and outlines the methods to load and query this data repository.<br />
<br />
== Data Repository ==<br />
<br />
The Phenoscape data repository has been implemented as a [http://www.postgresql.org/ PostgreSQL] relational database, and at present is housed on a dedicated database server.<br />
<br />
== Schema ==<br />
<br />
The schema of the Phenoscape data repository is based upon the Open Biomedical Database (OBD) data format developed at the [http://www.berkeleybop.org/ Berkeley Bioinformatics Open-source Projects (BBOP)]. OBD is based upon the [http://www.w3.org/RDF Resource Description Framework (RDF)] format for capturing metadata about Web (and Semantic Web) resources such as Web pages and Web services.<br />
<br />
The philosophy of OBD is to represent every conceptual entity, be it a type or a token (synonymously a class or an object, or a concept or an instance) or a relation definition, as a Node. Instances of relations between these nodes are represented as Statements, specifically Link Statements. OBD also allows for [http://en.wikipedia.org/wiki/Reification_(computer_science) reification], which is vital to the life sciences with their emphasis on evidence codes and attributions (provenance). For this purpose, OBD provides Literal Statements (and Annotation Statements) to capture metadata about Nodes and Link Statements, such as the source publication, evidence codes, specimens used, and so forth.<br />
<br />
=== Tables ===<br />
<br />
Two relational tables are central to the schema of the Phenoscape data repository. These are: LINK and NODE. The SQL commands for the creation of these tables (and the others) can be found at this [http://phenoscape.svn.sourceforge.net/viewvc/phenoscape/trunk/src/Database/sql/obd/obd-core-schema.sql?view=markup Phenoscape Sourceforge] page.<br />
<br />
==== The NODE table ====<br />
<br />
The NODE table contains information about every concept such as its unique identifier, label, and source ontology. The NODE table contains this information about concepts extracted from the source [[Ontologies]]. In addition, it also holds information about scientific publications (in a rudimentary format which will be improved upon soon), the ontologies themselves, and the representation of phenotypes from the ZFIN and NeXML databases. It will be augmented in the future to hold information about collection specimens. The NODE table adds a unique identifier (generated from a sequence) of its own to every row. An excerpt of the row from the NODE table for the ''Gymnotiformes'' term is shown below<br />
<br />
<javascript><br />
<br />
node_id | uid | label | metatype | source_id<br />
---------+----------+---------------+----------+-----------<br />
46050 | TTO:1390 | Gymnotiformes | C | 9630<br />
<br />
</javascript><br />
<br />
* The NODE_ID column holds the unique identifier generated by the Phenoscape database for this term<br />
* The UID column holds the identifier of this term that is obtained from the Teleost Taxonomy Ontology (TTO). The 'TTO' is the namespace prefix<br />
* The LABEL column displays the label for this term<br />
* The METATYPE column shows term is a Class (C). Other metatypes are Relation (R) and Instance (I).<br />
* The SOURCE_ID column holds the NODE_ID of the ontology from which the term was extracted. In this case, the source ontology is the TTO<br />
<br />
==== The LINK table ====<br />
<br />
The LINK table contains rows which represent Statements which link the Nodes to one another, and also the metadata about these Nodes. The excerpt below shows some of the rows in the LINK table about the Gymnotiformes term<br />
<br />
<javascript><br />
<br />
link_id | node_id | predicate_id | object_id<br />
---------+---------+--------------+-----------<br />
23854 | 9637 | 102 | 46050<br />
59897 | 45723 | 102 | 46050<br />
60223 | 46050 | 102 | 46160<br />
501448 | 9932 | 102 | 46050<br />
<br />
</javascript><br />
<br />
* The LINK_ID column shows the database generated identifier for the link<br />
* The NODE_ID column shows the Subject of the Statement (in RDF parlance). This ID the is database generated identifier for the concept ''Eigenmanniidae'' (TTO:10000005)<br />
* The PREDICATE_ID column shows the Predicate of the Statement. This ID is the database generated identifier for the relation ''OBO_REL:is_a''.<br />
* The OBJECT_ID column shows the Object of the Statement which is the ID generated by the database for concept ''Gymnotiformes''<br />
<br />
In simple terms, a sub species of Gymnotiformes is displayed by this Statement as shown in the triple below<br />
<br />
<javascript><br />
Eigenmanniidae is_a Gymnotiformes<br />
</javascript><br />
<br />
Similarly, The third row in the display shows that Gymnotiformes is an Otophysian as shown below<br />
<br />
<javascript><br />
Gymnotiformes is_a Otophysi<br />
</javascript><br />
<br />
==== Other important tables ====<br />
<br />
* The ALIAS table keeps track of the various aliases (alternate labels) of the concepts and relations, which are sourced from the ontologies<br />
* The DESCRIPTION table stores rich text descriptions of the concepts and relations, which are extracted from the ontologies<br />
* The OBD_SCHEMA_METADATA stores metadata about OBD such as the version of the OBD format in use, and also the last refresh date and time for the database<br />
<br />
=== Views ===<br />
<br />
The Phenoscape data repository also generates several views from the tables. These views are used in querying the database, some of which are part of the [[OBD API Documentation | OBDAPI]]<br />
<br />
=== Procedures ===<br />
<br />
Stored procedures are used in populating the database with defined terms from the ontologies, and with phenotypic descriptions obtained from curators. In addition, they are also used in generating inferences from the asserted data. In the future, stored procedures may be used as necessary for speedier data retrieval.<br />
<br />
== Loading the data ==<br />
<br />
The repository will be periodically refreshed to include the latest ontology definitions and curated data. At present, curated data is obtained from two different source which are:<br />
# The ZFIN data repository (model organism database) containing descriptions of mutant phenotypes and the related genes and genotypes of zebrafish. This data exists primarily as tab delimited simple text files<br />
# Annotations from a set of selected publications, which describe in rich-text unstructured format, the observed phenotypes of about 25000 different species of fish belonging to the clade of Ostariophysi. These annotations are entered by curators using the [[Phenex]] annotation tool and are saved in the [[http://www.nexml.org/ NeXML]] data format.<br />
<br />
A complete database refresh using the [[Phenoscape data loader]] can be started off by running the "refresh-database" target in the Ant build file in the 'Phenoscape' folder of the [https://obo.svn.sourceforge.net/svnroot/obo/OBDAPI/trunk OBDAPI project].<br />
<br />
== Querying the data ==<br />
<br />
Queries have been implemented for retrieving phenotype information (summaries and details), homology information, summaries of search terms, metadata about phenotype assertions, and auto complete suggestions for search terms as they are being entered. Data retrieved by these queries are accessed by the various [[Data Services | Phenoscape data services]]. The details about these queries are presented here. <br />
<br />
===Relations of interest===<br />
<br />
This section discusses the various entities and binary directed links between these entities, which are leveraged by the database queries. Assertions about the model organism (from ZFIN) and the evolutionary species are converted into the ''exhibits'' link specified in (1) below. Note the right hand side of the ''exhibits'' link. It is a post composition of an Entity and a Quality, which makes up a description of a phenotype.<br />
<br />
<javascript><br />
Taxon exhibits inheres_in(Quality, Entity) --(1)<br />
</javascript><br />
<br />
The post composed phenotype is related to its components by the ''is_a'' and the ''inheres_in'' relation as shown in (2) and (3) below<br />
<br />
<javascript><br />
inheres_in(Quality, Entity) inheres_in Entity --(2)<br />
inheres_in(Quality, Entity) is_a Quality --(3)<br />
</javascript><br />
<br />
The Quality is related to a Character by an inferred ''value_for'' relation as shown in (4)<br />
<br />
<javascript><br />
Quality value_for Character --(4)<br />
</javascript><br />
<br />
An example should make this clearer. Consider the statement, "In Siluriformes, the shape of the dorsal surface of the basihyal bone is flat or convex" from [Albert, 2001]. This statement can be represented as in (1ex) below. Note the similar form to (1). Siluriformes is the taxon, flat is the quality, and basihyal bone is the entity.<br />
<br />
<javascript><br />
Siluriformes exhibits inheres_in(flat, basihyal bone) --(1ex)<br />
</javascript><br />
<br />
Now the post composed phenotype is related to its entity and quality components as in (2ex) and (3ex). Note the similarity to (2) and (3)<br />
<br />
<javascript><br />
inheres_in(flat, basihyal bone) inheres_in basihyal bone --(2ex)<br />
inheres_in(flat, basihyal bone) is_a flat --(3ex)<br />
</javascript><br />
<br />
Finally, the quality 'flat' is related to the character 'shape' by (4ex). Note that 'flat' is just one of the values for 'shape'. Other values my be 'rounded', 'curved', etc.<br />
<br />
<javascript><br />
flat value_for shape --(4ex)<br />
</javascript><br />
<br />
Moving on, the database also stores provenance information (metadata) about the assertion that Siluriformes exhibits flat basihyal bones. AT the very minimum, we need to know the publication from which the assertion was extracted. If the curators have specifically cited the text from the publication which forms the basis of their assertion, we need to know that as well.<br />
<br />
The database provides a handle to access this metadata from the assertion itself. The LINK table includes a reiflink_node_id attribute, from which publication, curator names, character and state text, and all other relevant metadata can be accessed. Without going into more database specific details, conceptually the statement (1ex) is linked to a reification identifer, which is linked to the actual metadata. Transparently, the statement (1ex) can be linked with a publication as shown in (5ex) below. The linkage to the other facets of the metadata is done similarly.<br />
<br />
<javascript><br />
(1ex) posited_by Albert, 2001 --(5ex)<br />
</javascript><br />
<br />
The schema of the relations is shown below [[Image:PhenoscapeInTriples.jpg]]<br />
<br />
===Speeding up the queries: The data warehouse===<br />
<br />
Queries used in the Phenoscape data services module were found to be intolerably slow in returning, esp. when asked to retrieve and [[Data_Services#Annotations_summary | summarize annotation data]] for genes and teleost species. The slow times in query execution were primarily due to the large numbers of JOINs in them, and the extensive volume of data, which needed to be processed in various facets of the query execution plan.<br />
<br />
To address this issue, it was decided to create summaries of the annotations in the database in simple data warehouse tables. New queries which were tested on these summary tables executed much faster, having dispensed with the numerous JOINs between the [[#The_NODE_table | NODE]] and [[#The_LINK_table | LINK]] tables, aliased several times over.<br />
<br />
====[[conceptual_schema | The data warehouse schema ]]====<br />
<br />
<br />
<br />
The data warehouse has been designed with the intent of maximizing the efficiency of queries executed on the Phenoscape knowledge base. For phenotype queries, we need to know the phenotype in question, the taxa or genes which are associated with that phenotype, as well as the entity and quality associated with that phenotype. We also need to find the character, which the quality is associated with. For example if the quality is ''reduced number of'', the character in question would be ''count''.<br />
<br />
To effectively execute this query, the phenotype centric model of the data warehouse is designed as follows (concepts and attributes are capitalized). A taxon or gene may be associated with one or more PHENOTYPE(s) and a PHENOTYPE may be associated with one or more genes or taxa. A PHENOTYPE is associated with exactly one ENTITY and one QUALITY. A QUALITY may be associated with one or more PHENOTYPE(s). Further, a QUALITY is associated with exactly one CHARACTER, which is a QUALITY as well. <br />
<br />
For queries for provenance data about taxon to phenotype assertions, we need to find the publication the assertion is extracted from, the specific text from the publication about character and state, as well as the curators' comments about the assertion.<br />
<br />
To effectively execute these 'metadata' queries, the provenance data is modeled as an association attribute. For every instance of the association between a TAXON and a PHENOYPE, we capture CHARACTER. STATE, CURATORS, and PUBLICATION. The PUBLICATION entity with all its attributes is linked to the REIF entity, which is the link to the metadata of the TAXON and PHENOTYPE. <br />
<br />
This data warehouse can be reduced to the logical schema shown below<br />
<br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene</b></td></tr><br />
<tr><br />
<td align=center>Gene_id {PK} </td><br />
<td align=center>Gene_Uid</td><br />
<td align=center>Gene_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=2><b>Gene_Alias</b></td></tr><br />
<tr><br />
<td align=center>Gene_id{FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=500 align=left border=1><br />
<tr><td align=left colspan=3><b>Genotype</b></td></tr><br />
<tr><br />
<td align=center>Genotype_id{PK}</td><br />
<td align=center>Genotype_Uid</td><br />
<td align=center>Genotype_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {PK}</td><br />
<td align=center>Taxon_Uid</td><br />
<td align=center>Taxon_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Alias</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id {FK}</td><br />
<td align=center>Alias</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Taxon_Is_A_Taxon</b></td></tr><br />
<tr><br />
<td align=center>Taxon_id{FK}</td><br />
<td align=center>Taxon_id{FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {PK}</td><br />
<td align=center>Entity_Uid</td><br />
<td align=center>Entity_Label</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Is_A_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=2><b>Entity_Part_Of_Entity</b></td></tr><br />
<tr><br />
<td align=center>Entity_Id {FK}</td><br />
<td align=center>Entity_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=3><b>Quality</b></td></tr><br />
<tr><br />
<td align=center>Quality_Id {PK}</td><br />
<td align=center>Quality_Uid </td><br />
<td align=center>Quality_Label </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=700 align=left border=1><br />
<tr><td align=left colspan=7><b>Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Phenotype_Id {PK}</td><br />
<td align=center>Phenotype_Uid</td><br />
<td align=center>Inheres_In_Entity_id {FK}</td><br />
<td align=center>Towards_Entity_id {FK}</td><br />
<td align=center>Is_A_Quality_id {FK}</td><br />
<td align=center>Is_A_Character_id {FK}</td><br />
<td align=center>Has_count</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Gene_Genotype_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Gene_Id {FK}</td><br />
<td align=center>Genotype_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<tr><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=3><b>Taxon_Phenotype</b></td></tr><br />
<tr><br />
<td align=center>Taxon_Id {FK}</td><br />
<td align=center>Phenotype_Id {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=5><b>Taxon_Phenotype_Metadata</b></td></tr><br />
<tr><br />
<td align=center>Reif_Id {PK}</td><br />
<td align=center>Character_Text</td><br />
<td align=center>State_Text </td><br />
<td align=center>Curators </td><br />
<td align=center>Curator_Comments </td><br />
<tr><br />
</table><br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=7><b>Publication</b></td></tr><br />
<tr><br />
<td align=center>Publication {PK}</td><br />
<td align=center>Primary_Title</td><br />
<td align=center>Secondary_Title </td><br />
<td align=center>Pages </td><br />
<td align=center>Volume </td><br />
<td align=center>Abstract </td><br />
<td align=center>Year </td><br />
<tr><br />
</table><br />
<br />
<br><br><br><br><br />
<table width=300 align=left border=1><br />
<tr><td align=left colspan=2><b>Publication_Reif_id</b></td></tr><br />
<tr><br />
<td align=center>Publication {FK}</td><br />
<td align=center>Reif_Id {FK}</td><br />
<tr><br />
</table><br />
<br />
[[Category:OBD]]<br />
[[Category:Database]]<br />
[[Category:Informatics]]<br />
[[Category:Data]]</div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=File:DataWarehouseLogicalModel.png&diff=6698File:DataWarehouseLogicalModel.png2009-12-14T21:18:43Z<p>Crk18: uploaded a new version of "Image:DataWarehouseLogicalModel.png"</p>
<hr />
<div></div>Crk18https://wiki.phenoscape.org/wg/phenoscape/index.php?title=File:NewDatawarehouseSchema121409.png&diff=6697File:NewDatawarehouseSchema121409.png2009-12-14T21:16:13Z<p>Crk18: </p>
<hr />
<div></div>Crk18