Evaluating the effect of annotation size on measures of semantic similarity

M. Kulmanov, R. Hoehndorf
Journal of Biomedical Semantics, 8:7, (2017)

Evaluating the effect of annotation size on measures of semantic similarity

Keywords

Semantic similarity, Ontology, Gene ontology

Abstract

Background

Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products.

Results

Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation.

Conclusions

Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions.

Code

DOI: 10.1186/s13326-017-0119-z

Sources

Website

See all publications 2017