Difference between revisions of "KB-OWL"

From phenoscape
(Data model)
 
(26 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
{{Obsolete|See [[Phenoscape KB]] for more current information.}}
 +
 
This page describes the data model, data loading process, and query methods being developed for a new version of the Phenoscape Knowledgebase built on top of RDF and making use of standard OWL reasoners.
 
This page describes the data model, data loading process, and query methods being developed for a new version of the Phenoscape Knowledgebase built on top of RDF and making use of standard OWL reasoners.
  
 
==Data model==
 
==Data model==
Phenotypes in the new data model are modeled as OWL class expressions. They are now constructed in an inverse fashion from the form in the original KB - rather than describing classes of particular qualities which inhere in particular structures, we describe parts which bear particular qualities (at the level of instance data the model is the same ''inheres_in'' is the inverse of ''bearer_of'').
+
===Phenotypes===
 +
Phenotypes in the new data model are modeled as OWL class expressions. They are now constructed in an inverse fashion from the form in the original KB rather than describing classes of particular qualities which inhere in particular structures, we describe parts which bear particular qualities (at the level of instance data the model is the same: ''inheres_in'' is the inverse of ''bearer_of'').
  
 
For example, "serrated dorsal fin":
 
For example, "serrated dorsal fin":
* old: ''exhibits'' some (serrated and ''inheres_in'' some dorsal_fin)
+
* '''old:''' ''exhibits'' some (serrated and ''inheres_in'' some dorsal_fin)
* new: ''has_part'' some (dorsal_fin and ''bearer_of'' some serrated)
+
* '''new:''' ''has_part'' some (dorsal_fin and ''bearer_of'' some serrated)
 +
 
 +
We can now model absence correctly. The previous version of the KB used a quality class called 'absent' in a way which produces unintended inferences. In the new model we use negation. For "dorsal fin, absent":
 +
* not (''has_part'' some dorsal_fin)
 +
 
 +
For a localized absence we would use a nested ''has_part'':
 +
* ''has_part'' some (head and not (''has_part'' some scale))
 +
 
 +
The ''has_part'' formulation also provides the benefit of automatic "present" inferences for structures used in any phenotype, not just "present" phenotypes.
 +
 
 +
===Taxa and phenotype annotations===
 +
Taxa in the new data model are modeled as OWL individuals, rather than OWL classes. Thus, the taxonomic hierarchy is represented directly in RDF as a tree of ''subclade_of'' property relationships between taxa, rather than OWL subclass axioms. This model allows for richer modeling of how phenotypes are propagated due to taxon membership and ancestral states than a class-based taxonomy ([http://www.mendeley.com/download/public/2280031/4561422715/faae92a74742bc35ceca9b3f5862cb94a4cde61e/dl.pdf]).
 +
 
 +
A given phenotype expression (''has_part'' some ...) describes the condition of an individual organism, not a whole taxon. So to create a phenotype annotation for a taxon, the relationship of this organism to the taxon must be captured. We use one of two forms, [[Semantics_of_phenotype_annotations|depending on the intended meaning]]:
 +
* '''Observation annotation:''' ''has_member'' some (''has_part'' some (dorsal_fin and ''bearer_of'' some serrated))
 +
* '''Ancestral state annotation:''' ''has_progenitor'' some (''has_part'' some (dorsal_fin and ''bearer_of'' some serrated))
 +
 
 +
The actual phenotype annotation can be created as a class assertion axiom for the given taxon:
 +
* Ictalurus_punctatus Type (''has_member'' some (''has_part'' some (dorsal_fin and ''bearer_of'' some serrated)))
  
We can now model absence correctly. The previous version of the KB uses a quality class called 'absent' in a way which produces unintended inferences. In the new model we use universal quantification with negation. For "dorsal fin, absent":
+
===Genes and phenotype annotations===
* ''has_part'' only (not dorsal_fin)
 
  
 
==Reasoning==
 
==Reasoning==
 +
Currently all reasoning is conducted during the data load process, in which required inferences are precomputed and then asserted in the data. This allows for more expressive reasoning than is available at query time (for example, RDFS reasoners in triplestore SPARQL engines), and also allows for faster queries (since all need facts are asserted).
 +
 +
''Need more here about advantages and limitations.''
  
 
==Data loader==
 
==Data loader==
 +
The data loader is a set of Scala classes which manipulate ontologies in various ways. They can be called from within an Ant file which describes the build process. The data loader classes are developed within the [https://github.com/phenoscape/phenoscape-owl-tools phenoscape-owl-tools] project. The Ant file which builds the RDF files constituting the KB triplestore is part of the [https://github.com/phenoscape/phenoscape-builder phenoscape-builder] project.
  
 
==Data store==
 
==Data store==

Latest revision as of 22:47, 9 January 2014

The tool or documentation in this page is obsolete. See Phenoscape KB for more current information.

This page describes the data model, data loading process, and query methods being developed for a new version of the Phenoscape Knowledgebase built on top of RDF and making use of standard OWL reasoners.

Data model

Phenotypes

Phenotypes in the new data model are modeled as OWL class expressions. They are now constructed in an inverse fashion from the form in the original KB — rather than describing classes of particular qualities which inhere in particular structures, we describe parts which bear particular qualities (at the level of instance data the model is the same: inheres_in is the inverse of bearer_of).

For example, "serrated dorsal fin":

  • old: exhibits some (serrated and inheres_in some dorsal_fin)
  • new: has_part some (dorsal_fin and bearer_of some serrated)

We can now model absence correctly. The previous version of the KB used a quality class called 'absent' in a way which produces unintended inferences. In the new model we use negation. For "dorsal fin, absent":

  • not (has_part some dorsal_fin)

For a localized absence we would use a nested has_part:

  • has_part some (head and not (has_part some scale))

The has_part formulation also provides the benefit of automatic "present" inferences for structures used in any phenotype, not just "present" phenotypes.

Taxa and phenotype annotations

Taxa in the new data model are modeled as OWL individuals, rather than OWL classes. Thus, the taxonomic hierarchy is represented directly in RDF as a tree of subclade_of property relationships between taxa, rather than OWL subclass axioms. This model allows for richer modeling of how phenotypes are propagated due to taxon membership and ancestral states than a class-based taxonomy ([1]).

A given phenotype expression (has_part some ...) describes the condition of an individual organism, not a whole taxon. So to create a phenotype annotation for a taxon, the relationship of this organism to the taxon must be captured. We use one of two forms, depending on the intended meaning:

  • Observation annotation: has_member some (has_part some (dorsal_fin and bearer_of some serrated))
  • Ancestral state annotation: has_progenitor some (has_part some (dorsal_fin and bearer_of some serrated))

The actual phenotype annotation can be created as a class assertion axiom for the given taxon:

  • Ictalurus_punctatus Type (has_member some (has_part some (dorsal_fin and bearer_of some serrated)))

Genes and phenotype annotations

Reasoning

Currently all reasoning is conducted during the data load process, in which required inferences are precomputed and then asserted in the data. This allows for more expressive reasoning than is available at query time (for example, RDFS reasoners in triplestore SPARQL engines), and also allows for faster queries (since all need facts are asserted).

Need more here about advantages and limitations.

Data loader

The data loader is a set of Scala classes which manipulate ontologies in various ways. They can be called from within an Ant file which describes the build process. The data loader classes are developed within the phenoscape-owl-tools project. The Ant file which builds the RDF files constituting the KB triplestore is part of the phenoscape-builder project.

Data store