Improving Query Results with Automatic Duplicate Detection
Abstract
Ontology-based data integration poses significant challenges. One is
that an ontology used as a global reference model during the ontology-based
data integration can contain duplicated attributes, which can easily lead to
improper query results. This problem arises when merging similar or
overlapping information from ontologies extracted from distributed digital
libraries into a single global ontology. To solve the problem, we propose a
novel context-based approach that analyzes a workload of queries over the
single global ontology to automatically calculate (semantic) distances between
attributes, which are then used for duplicate detection. We present experimental
results to demonstrate the quality of our approach.