Enabling Content-Based Image Retrieval in Very Large Digital Libraries
View/ Open
Date
2009Author
Bolettieri, Paolo
Esuli, Andrea
Falchi, Fabrizio
Lucchese, Claudio
Perego, Raffaele
Rabitti, Fausto
Metadata
Show full item recordAbstract
Enabling e ective and e cient Content-Based Image Re-
trieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an
important research issue. While there exist well-known approaches for
information retrieval on textual content for VLDLs, the research for an
e ective CBIR method that is also able to scale to very large collections
is still open. A practical e ect of this situation is that most of the image
retrieval services currently available for VLDLs are based only on tex-
tual metadata. In this paper, we report on our experience in creating a
collection of 106 million images, i.e., the CoPhIR collection, the largest
currently available to the scienti c community for research purposes.We
discuss the various issues arising from working with a such large col-
lection and dealing with a complex retrieval model on information-rich
features. We present the non-trivial process of image crawling and de-
scriptive feature extraction, using the European EGEE computer GRID.
The feature extraction phase is often ignored when discussing the scala-
bility issue while, as we show in this work, it could be one of the toughest
issues to be solved in order to make CBIR feasible on VLDLs.