“Catch me if you can”: visual Analysis of Coherence Defects in Web Archiving
MetadataShow full item record
The World Wide Web is a continuously evolving network of contents (e.g. Web pages, images, sound files, etc.) and an interconnecting link structure. Hence, an archivist may never be sure if the contents collected so far are still consistent with those contents she needs to retrieve next. Therefore, questions arise about detecting, measuring them and – finally – understanding coherence defects. To this end, visualization strategies are being presented that might be applied on different level of granularities: working with (in the ideal case) properly set last-modified timestamps, based on metadata extracted from the crawler in accelerated crawl-revisit pairs, or from the Internet Archive’s WARC files. In order to help the archivist in understanding the nature of these defects, this paper investigates means for visualizing change behavior and archive coherence.