Return to Business Page

Ontology and Taxonomy Development

The development of an ontology must be carefully considered. It must designed to expand (also called extensibility) as the need arises. It must consider every facet of a question to maintain coherence. It must be flexible enough to define general concepts but rigid enough to provide a consistent framework. Remember, ontology is not just a categorization of terms but the attempt to understand the nature of a "thing".

For our Recall example, we seek to answer the question, which recall notices could affect life or limb? Seems simple enough, generate a set of lists that declare text variables and run them as a gazetteer PR in GATE Developer. There may be some inherent ambiguities but we can refine the model until it is reduced to the highest figure of merit possible. Yet, I think we are missing the point. Ontology seeks to fundementally understand the nature of being not just a ordered set of terms.

Using the concept of abstraction, we can reduce our question to it's most basic elements. The question being asked is not substantively about the recall notice. The notice is the modality for conveying information, therefore is functionally irrelevent. We are not looking at different types of notices so we are not dealing with potential differences in the content or context. The notice identifies a defect in a vehicle, vehicle component, or implementation of a vehicle. The defect causes some effect to the operation of the vehicle or component. We can also state that the component defect has some outcome. So really, we are dealing with a cause and effect problem. If we view it as logical statement then the subject (vehicle) has some predicate (the defect or faulty implementation) that affects an object (component or system).

∀r (∃v ⇒ ∃d ⇒ ∃o)

Ontology Development

Not only do we establish the foundations for an ontology but also the relationship elements. The "Thing" (or Recall Notice) contains a "vehicle" and a "component" being recalled. It contains a "defect" and there is a potential "outcome". This forms the superclasses for the <recall>: <component>, <defect>, <outcome>, and <vehicle> classes.

There are connections that define the general relationships between the classes. Vehicles are composed on components and component sub-systems. Conversely, components join together to make a vehicle. But a component can also stand as an object by itself just as vehicle can stand as an object by itself. We can therefore state that there are two object properties: <hasComponent> and <componentOf>. These two properties are asymmetric equivalencies. This forms an existential restriction such that:

AsymmetricObjectProperty(a:hasComponent)
ObjectPropertyAssertion( a:hasComponent a:vehicle a:component)

AsymmetricObjectProperty(a: componentOf)
ObjectPropertyAssertion( a:hasComponents a:component a:vehicle)

If we state that the <vehicle> has a defect, then it is asserted that the component has a defect and vice versa. Therfore, we can form an object property <hasA> to define that a <defect> <hasA> <vehicle> and a <vehicle> <hasA> <defect>. Since <hasA> applies to <component> as well, we state that the object property is transitive.

SymmetricObjectProperty( a:hasA)
ObjectPropertyAssertion( a:hasA a:defect a:vehicle)
ObjectPropertyAssertion( a:hasA a:vehicle a:defect)
TransitiveObjectProperty(a:hasA)
ObjectPropertyAssertion( a:hasA a:defect a:vehicle)
ObjectPropertyAssertion( a:hasA a:defect a:component)

Finally, we use the same object property <hasA> between <outcome> and <defect> with the same statement:

SymmetricObjectProperty( a:hasA)
ObjectPropertyAssertion( a:hasA a:outcome a:defect)
ObjectPropertyAssertion( a:hasA a:defect a:outcome)
TransitiveObjectProperty(a:hasA)
ObjectPropertyAssertion( a:hasA a:outcome a:defect)
ObjectPropertyAssertion( a:hasA a:outcome a:vehicle)
ObjectPropertyAssertion( a:hasA a:outcome a:component)

Conceptualization

Now that we have conceptualized our upper ontology, we can look at the structure and determine what is necessary for further development. Within the four superclasses, two are taxonomical (meaning there is a defined hierarchy of terms) and two are concepts. The <defect> and <outcome> classes have no defined hierarchy as do the <vehicle> and <component> classes (we will dicuss these later). Using the process of abstraction we can determine that there are additional sub-classes that can hold terms or "individuals".

The <defect> class can be further divided to include a <compliance> defect, <part> defect, and a <function> defect. The <outcome> class can be further divided to include a <human> outcome and a <material> outcome.

<defect> class identifies defect types.: - <compliance> class identifies defect terms where something fails to meet a standard or measurement.; - <function> class identifies defect terms where a function or functionality is compromised.; <part> class identifies defect terms that occur at a components level during the manufacture or installation of a component.
- <outcome> class specifies outcome types.: - <human> class identifies outcome terms that relate only to humans.; - <material> class identifies outcome terms that relate to the material.

The terms associated with our conceptual classes are related to the concepts during text-mining operations.

Taxonomy Development

The <component> and <vehicle> classes are composed of hierarchically constrained sub-classes. The <component> sub-class was derived by observing the hierarchical structure within the source data. The key to this class is to define sub-classes at the highest level without specific component terms. The <vehicle> class was derived directly from the NHTSA technical documentation. The result are:

Ontology Implementation

There are several methods for implemeting an ontology in GATE Developer. The easiest method is to use ontological concepts to form gazetteer elements. Each super-class is formed into a definition file (.def). Sub-classes are formed into list files (.lst) with annotated terms. The process is laborious but provides a high degree of fidelity. The <vehicle> class is not included in this implementation as it is not required.

The result of this effort is are a set of annotations that specifically identify terms of interest. The fact that they are built on ontological principles is a benefit during the analysis process.

Recall Gazetteer Annotations