Improving Query Results with Automatic Duplicate Detection
Ontology-based data integration poses significant challenges. One is that an ontology used as a global reference model during the ontology-based data integration can contain duplicated attributes, which can easily lead to improper query results. This problem arises when merging similar or overlapping information from ontologies extracted from distributed digital libraries into a single global ontology. To solve the problem, we propose a novel context-based approach that analyzes a workload of queries over the single global ontology to automatically calculate (semantic) distances between attributes, which are then used for duplicate detection. We present experimental results to demonstrate the quality of our approach.