Oncology Ontology and Clinical Bioinformatics
Anand Kumar


Oncology ontology is not a new field in itself but spans those aspects of life-science data integration and representation in oncology where formal ontological principles can play a role in organization and dissemination of information.

In our experience with ontological representations related to commonly prevalent carcinomas like carcinoma of colon, we have realized that if clinical bioinformatics has to do justice to the various diverse professional groups involved in the field, the foundational structure of such a representation must be robust and extensible.

The information within annotations to publicly available ontologies like Gene Ontology (GO) and databases like Uniprot provide a very useful means to find information about gene products. OMIM and LocusLink provide the disease-gene integration while various pathway databases like KEGG and protein-protein interaction databases like DIP, BIND provide pieces of information regarding processes. On the other hand, we have ontologies like Foundational Model of Anatomy (FMA) primarily dealing with coarser levels of granularity and ICD and disease axis in Snomed CT dealing with diseases. Then there are oncology specific vocabulary being developed at the Enterprise terminology projects at the National Cancer Institute and Clinical Genomics information models being developed by HL7 groups for linkage to electronic health records.

These resources do provide very important inputs for on oncology ontology, though the ontology in itself needs clearer distinctions between processes and bearers of those processes, between unitary processes and their collections, between various anatomical levels of granularity, between a class and its instances at various locations and different time periods.

In our experience with representation of entities related to colon carcinoma, a project together with the Swiss Institute of Bioinformatics, we addressed these issues by using the Basic Foundational Ontology and the Theory of Granular Partitions being developed at IFOMIS. We used statistical, probabilistic and linguistic methods on Gene Ontology Annotations to find relations between terms in GO. We then formalized the relations between those terms within GO relevant to colon carcinoma. This portion was integrated with Snomed CT and FMA within the Protégé framework.

The next phase of the project would include a strong Natural Language Processing component to help in enrichment and update of ontology semi-automatically. We will expand the work to include the ten most common male and female carcinomas among the European population. We will also integrate the ontology with the Electronic Health Records and are in the process of exploring HL7’s Version 3 clinical genomics representation for that purpose.