Using text mining / automated entity recognition for semi-automated ontology construction
Position paper for round table discussion

Martin Hofmann



Ontologies are often being discussed as the “silver bullet” in the context of semantic integra-tion of data, improved annotation of objects and processes, facilitated navigation through knowledge and enhanced indexing capabilities.

In most of those discussions the effort for the construction of ontologies is largely under-estimated. In the life sciences, construction of ontologies has become a major activity of front runners in the field of gene and genome annotation. Ontologies such as GO (gene ontology) and SO (sequence ontology) as well as the MGED ontology for microarray experiment de-scription have been developed by communities of experts; they are widely used despite the fact that they are still incomplete.

Construction of GO and SO and MGED and other ontologies was and is a time- and labor-intensive process involving whole communities of scientists. One means to speed up the process of ontology – construction would be to combine methodologies existing in the field of semantic text analysis / text mining with ontology editing and construction technology. As knowledge in most scientific domains is represented mainly in unstructured text, methods developed for text mining could help us to identify entities of interest and to extract them. In a second step we should be able to define methods that allow qualification of entities for a role as a concept in an ontology (could be e.g. based on semantic neighbourhood).

A “text mining machine” that produces suggestions for possible concepts in ontologies would have a tremendous impact on ontology generation and should significantly speed up the process of ontology construction.