Using Semantic Technologies in Digital Libraries – A Roadmap to Quality Evaluation
Abstract
In digital libraries semantic techniques are often deployed to reduce
the expensive manual overhead for indexing documents, maintaining metadata,
or caching for future search. However, using such techniques may cause a decrease
in a collection’s quality due to their statistical nature. Since data quality
is a major concern in digital libraries, it is important to be able to measure the
(loss of) quality of metadata automatically generated by semantic techniques. In
this paper we present a user study based on a typical semantic technique used
for automatic metadata creation, namely taxonomies of author keywords and
tag clouds. We observed experts assessing typical relations between keywords
and documents over a small corpus in the field of chemistry. Based on the
evaluation of this experiment, we focused on communalities between the experts’
perception and thus draw a first roadmap on how to evaluate semantic
techniques by proposing some preliminary metrics.