Large scale ontology engineering
Alan Rector


Developing "ontologies" to support information systems is essentially a software development task. It is concerned with representing the entities used in information systems in special languages. As with all software development tasks it is both constrained and informed by the power of those languages.

As software, in health informatics, ontologies perform at least three roles: as support for terminologies and terminological engineering, for information integration, and as the skeleton for reference information systems - e.g. drug information, web resources and services, These are not always clearly distinguished. Furthermore, because the word "ontology" is borrowed from philosophy and widely used by both computational linguists and computer scientists, there are many different meanings. Because "ontologies" have become fashionable, the expectations have grown, and what is often referred to as a "ontology" is in fact a much broader "reference information system" including much information which none of the three groups would view as "ontological".

The most general of these is the general reference information system. Such systems are inevitably large with requirements which exceed in principle any single existing software paradigm. Hence the engineering design task is to factor the problem into tractable units. Furthermore, they are inevitably complex and difficult for users to understand, so that providing user accessible "views" becomes a major concern. Finally, they are highly dynamic, reflecting rapid changing in biomedical understanding and conceptualisation. The task of engineering such systems requires a comprehensive approach to the life cycle including software engineering, logical design, metamodel of the underlying meaning and meta data concerning the provenance, editorial history and usage of the entities represented.

We propose an architecture, many of whose features were tested in GALEN, which carefully separates:
  • Meta-ontology - the meaning of the high level concepts and relations used
  • Central ontology - the core model of the meaning of the concepts used in the domain
  • Prototypical knowledge - of what is normally or typically true
  • Generic facts
  • Linguistic knowledge
  • Metadata extending across all of the above.
Furthermore, we propose that the key issue is to produce not a product but an effective process that manages the changes as they are required. Experience makes it clear that we cannot anticipate all requirements in advance. There must be a balance between what Coiera calls "grounding costs" and "clean-up" costs. Since comprehensive resources cannot, in principle, be produced in advance, mechanisms must exist to extent the resources "just in time" as they are required. Paradoxically, this is only possible, if the logical and ontological foundations of the system are explicit and rigorous, so that the effects of changes are predictable and controllable. As in any branch of software engineering, ad hoc changes to ad hoc systems produces chaos.