Enabling Content-Based Image Retrieval in Very Large Digital Libraries

Bolettieri, Paolo; Esuli, Andrea; Falchi, Fabrizio; Lucchese, Claudio; Perego, Raffaele; Rabitti, Fausto

View/Open

diafofa/ecdl107 (182.7Kb)

Date

2009

Author

Bolettieri, Paolo

Esuli, Andrea

Falchi, Fabrizio

Lucchese, Claudio

Perego, Raffaele

Rabitti, Fausto

Metadata

Show full item record

Abstract

Enabling e ective and e cient Content-Based Image Re- trieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an important research issue. While there exist well-known approaches for information retrieval on textual content for VLDLs, the research for an e ective CBIR method that is also able to scale to very large collections is still open. A practical e ect of this situation is that most of the image retrieval services currently available for VLDLs are based only on tex- tual metadata. In this paper, we report on our experience in creating a collection of 106 million images, i.e., the CoPhIR collection, the largest currently available to the scienti c community for research purposes.We discuss the various issues arising from working with a such large col- lection and dealing with a complex retrieval model on information-rich features. We present the non-trivial process of image crawling and de- scriptive feature extraction, using the European EGEE computer GRID. The feature extraction phase is often ignored when discussing the scala- bility issue while, as we show in this work, it could be one of the toughest issues to be solved in order to make CBIR feasible on VLDLs.

URI

http://hdl.handle.net/10797/14059

Collections

Παρουσιάσεις και ομιλίες σε συνέδρια, διημερίδες, ημερίδες και σεμινάρια [2236]