OBD Reasoner

From phenoscape
Revision as of 21:02, 21 January 2009 by Crk18 (talk | contribs) (The problem with absence of features)

The OBD reasoner uses definitions of transitive relations, relation hierarchies, and relation compositions to infer implicit information. These inferences are added to the OBD Phenoscape database. This section documents the inherited code in Perl and embedded SQL, that extracts implicit inferences from the downloaded ontologies and annotations of ZFIN and Phenoscape phenotypes.

Notation

When describing rules below, we use the following notations:

  • A, B, C: classes (as subjects or objects). Note that relationship concepts can also appear as subject or object in an assertion.
  • a, b, c: individuals (as subjects or objects)
  • R: relationship (predicate)
  • R(A, B): A R B, for example is_a(A, B) is equivalent to A is_a B. This is the functional form of assertions.
  • Reification: assertions about assertions. I.e., A, B, ... may also be assertions. For example, the yellow that inheres_in a particular dorsal fin is_a yellow, which we can write formally as: is_a(inheres_in(yellow, dorsal_fin), yellow).

Implemented Relation Properties

Relation Transitivity

Rule: <math>\forall</math>A, B, C, R and R transitive: R(A, B) <math>\and</math> R(B, C) <math>\Rightarrow</math> R(A, C)

Transitive relationships are the simplest inferences to be extracted and comprise the majority of new assertions added by the reasoner. Transitive relationships include (ontology in brackets):

  • is_a (OBO Relations)
  • has_part (OBO Relations)
  • part_of (OBO Relations)
  • integral_part_of (OBO Relations)
  • has_integral_part (OBO Relations)
  • proper_part_of (OBO Relations)
  • has_proper_part (OBO Relations)
  • improper_part_of (OBO Relations)
  • has_improper_part (OBO Relations)
  • location_of (OBO Relations)
  • located_in (OBO Relations)
  • derives_from (OBO Relations)
  • derived_into (OBO Relations)
  • precedes (OBO Relations)
  • preceded_by (OBO Relations)
  • develops_from (Zebrafish Anatomy)
  • anterior_to (Spatial Ontology)
  • posterior_to (Spatial Ontology)
  • proximal_to (Spatial Ontology)
  • distal_to (Spatial Ontology)
  • dorsal_to (Spatial Ontology)
  • ventral_to (Spatial Ontology)
  • surrounds (Spatial Ontology)
  • surrounded_by (Spatial Ontology)
  • superficial_to (Spatial Ontology)
  • deep_to (Spatial Ontology)
  • left_of (Spatial Ontology)
  • right_of (Spatial Ontology)
  • complete_evidence_for_feature(Sequence Ontology)
  • evidence_for_feature (Sequence Ontology)
  • derives_from (Sequence Ontology)
  • member_of (Sequence Ontology)
  • exhibits (Phenoscape Ontology)

Relation (role) compositions

Rule: <math>\forall</math>A, B, C, R: R(A, B) <math>\and</math> is_a(B, C) <math>\Rightarrow</math> R(A, C)

Rule: <math>\forall</math>A, B, C, R: is_a(A, B) <math>\and</math> R(B, C) <math>\Rightarrow</math> R(A, C)

Relation (role) compositions are of the form A R1 B, B R2 C => A (R1|R2) C. For example, given A is_a B and B part_of C then A part_of C. The reasoner extracts such inferences and adds them to the database.

is_a Relation Reflexivity

Rule:<math>\forall</math>A, 'R' and 'R' reflexive <math>\Rightarrow</math> A R A

Reflexive relations relate their arguments to themselves. A good example: "A rose is_a rose." The is_a relation is reflexive. In the database, every class, instance, or relation (having a corresponding identifier in the Node table of the database) is inferred by the reasoner to be related to itself through the is_a relation. Given a class called Siluriformes (with identifier TTO:302), the reasoner adds the TTO:302 is_a TTO:302 to the database. This is the only reflexive relation that is handled by the reasoner. Other reflexive relations abound in the real world, subset_of is a good mathematical example from the domain of set theory. Every set is a subset of itself. Such relations are NOT dealt with by the reasoner.

Relation Hierarchies

Rule: <math>\forall</math>A, B, R1, R2: R1(A, B) <math>\and</math> is_a(R1, R2) <math>\Rightarrow</math> R2(A, B)

An example: If A father_of B and father_of is_a parent_of, then A parent_of B

Relation Chains

Rule: <math>\forall</math>A, B, C: inheres_in(A, B) <math>\and</math> part_of(B, C) <math>\Rightarrow</math> inheres_in_part_of(A, C)

Relation chains are a special case of relation composition. Component relations are accumulated into an assembly relation. Specifically, instances of the relation inheres_in_part_of are accumulated from instances of the relations of inheres_in and part_of. IF A inheres_in B and B part_of C, THEN A inheres_in_part_of C. This relation chain is specified by a holds_over_chain property in the inheres_in_part_of stanza of the Relation Ontology. However, the actual rule is hard coded into the OBD reasoner and not derived from the ontology.

Relation Intersections

Rule: <math>\forall</math>Q, E: inheres_in(Q, E) <math>\Rightarrow</math> inheres_in(inheres_in(Q, E), E)

Rule: <math>\forall</math>Q, E: inheres_in(Q, E) <math>\Rightarrow</math> is_a(inheres_in(Q, E), Q)

Phenotype annotations are typically "post-composed", where an entity and quality are combined into a Compositional Description. For example, an annotation about the quality decreased size (PATO:0000587) of the entity Dorsal Fin (TAO:0001173) may be post-composed into a Compositional Description that looks like PATO:0000587^OBO_REL:inheres_in(TAO:0001173). Instances of is_a and inheres_in relations are extracted from post compositions like this. In the above example, the reasoner extracts:

  1. PATO:0000587^OBO_REL:inheres_in(TAO:0001173) OBO_REL:inheres_in TAO:0001173, and
  2. PATO:0000587^OBO_REL:inheres_in(TAO:0001173) OBO_REL:is_a PATO:0000587

Relation Properties to be implemented

The following relation properties may be implemented on the reasoner in future if necessary.

Relation Symmetry

Rule: <math>\forall</math>A, B, R and R symmetric: R(A, B) <math>\Rightarrow</math> R(B, A)

An example of a symmetric relation is the neighbor relation. IF Jim neighbor_of Ryan THEN Ryan neighbor_of Jim. A more biologically relevant example is the in_contact_with relation. IF middle_nuchal_plate in_contact_with spinelet, THEN spinelet in_contact_with middle_nuchal_plate

NOTE: There is no direct relationship between relation symmetry and relation reflexivity

Relation Inversion

Rule: <math>\forall</math>A, B, R1, R2: R1(A, B) <math>\and</math> inverse_of(R1, R2) <math>\Rightarrow</math> R2(B, A)

An example of relation inversions can be found in the posterior_to and anterior_to relations. IF anterior_nuchal_plate anterior_to middle_nuchal_plate AND anterior_to inverse_of posterior_to, THEN middle_nuchal_plate posterior_to anterior_nuchal_plate

The problem with absence of features

Descriptions of phenotypes as used in the Phenoscape project (and a plethora of phenomena in the real world) are replete with exceptions, or aberrations from what is considered to be "normal." While canonical ontologies like the FMA and the TAO contain ontological definitions of ideal specimens, observations in the life sciences are full of aberrations to these general rules.

Phenoscape has some typical issues dealing with absence of anatomical features in certain species of Ostariophysian fishes. For example, the basihyal cartilage is found in all species of Ostariophysian fishes, except the Siluriformes. At present, this information is captured in Phenoscape using the combination of the PATO term for "absent or present in fewer numbers in the organism" (PATO:0000587), the "inheres_in" relation from the OBO Relations Ontology, the TAO term for "basihyal cartilage" (TAO:0001510), the "exhibits" relation from the PHENOSCAPE ontology, and the TTO term for Siluriformes (TTO:302). This is shown below.

<javascript> TTO:302 PHENOSCAPE:exhibits PATO:0000587^OBO_REL:inheres_in(TAO:0001510) </javascript>

In plain English, this translates to "Siluriformes exhibit the absence or presence in fewer number in organism which inheres in basihyal cartilage." The semantics of this sentence are vague to say the least. Going by this method, it is impossible to state that basihyal cartilage is absent in Siluriformes without referring to at least one instance of basihyal cartilage. Combining a quality absent with a feature through the inheres_in property is very misleading in itself (ex: absence inheres in cartilage), contorting the intrinsic semantics of the inheres_in relation. These problems have been discussed in Ceusters et al and Hoehndorf et al. Both these publications propose solutions to integrate these aberrant observations with canonical definitions, without causing inconsistencies in reasoning procedures.

Another issue specific to the Phenoscape project was raised by Paula at the SICB workshop. Given that basihyal cartilage is absent in Siluriformes, basihyal bone should be absent in Siluriformes as well. This is because basihyal bone develops from basihyal cartilage. This may be inferred by adding a new relation chaining rule shown below to the OBD reasoner

Rule:<math>\forall</math>F1, F2, S: absent_in(F1, S) <math>\and</math> develops_from(F2, F1) <math>\Rightarrow</math> absent_in(F2, S)

This relation chain corresponds to the observation GIVEN THAT Basihyal_Cartilage absent_in Siluriformes AND Basihyal_Bone develops_from Basihyal_cartilage, THEN Basihyal_Bone absent_in Siluriformes. This and other similar relation chains (as per identified requirements) are to be implemented for the Phenoscape project in the future. Strategies to deal with absent features in general are also to be implemented in the near future.

Sweeps

A reasoner functions over several sweeps. In each sweep, new implicit inferences are derived from the explicit annotations (as described in the previous sections) and added to the database. In the following sweep, inferences added from the previous sweep are used to extract further inferences. This process continues until no additional inferences are added in a sweep. This is when the deductive closure of the inference procedure is reached. No further inferences are possible and the reasoner exits.