“Catch me if you can”: visual Analysis of Coherence Defects in Web Archiving
View/ Open
Date
2009Author
Weikum, Gerhard
Denev, Dimitar
Mazeika, Arturas
Spaniol, Marc
Metadata
Show full item recordAbstract
The World Wide Web is a continuously evolving network of
contents (e.g. Web pages, images, sound files, etc.) and an
interconnecting link structure. Hence, an archivist may never
be sure if the contents collected so far are still consistent
with those contents she needs to retrieve next. Therefore,
questions arise about detecting, measuring them and – finally
– understanding coherence defects. To this end, visualization
strategies are being presented that might be applied on
different level of granularities: working with (in the ideal case)
properly set last-modified timestamps, based on metadata
extracted from the crawler in accelerated crawl-revisit pairs,
or from the Internet Archive’s WARC files. In order to help
the archivist in understanding the nature of these defects,
this paper investigates means for visualizing change behavior
and archive coherence.