dc.contributor.author | Ben Saad, Myriam | en |
dc.contributor.author | Pehlivan, Zeynep | en |
dc.contributor.author | Gançarski, Stéphane | fra |
dc.coverage.spatial | GR - Κέρκυρα | en |
dc.date.available | 2014-03-21T11:18:35Z | |
dc.date.issued | 2009 | |
dc.identifier.uri | http://hdl.handle.net/10797/14073 | en |
dc.description | Περιέχει το πλήρες κείμενο | el_GR |
dc.description.abstract | Due to the growing importance of the World Wide Web,
archiving the web has become a cultural necessity in preserving
knowledge. To maintain a web archive up-to-date,
crawlers harvest the web by iteratively downloading new versions
of documents. However, it is frequent that crawlers
retrieve pages with unimportant changes such as advertisements
which are continually updated. Hence, web archive
systems waste time and space for indexing and storing useless
page versions. In this paper, we present a novel approach
that detects important changes between versions in order to
efficiently archive the web. Our approach combines the concept
of the visual pages segmentation with the concept of
importance while detecting changes between versions. The
approach consists of archiving the visual layout structure
of a web page represented by semantic blocks. We propose
an adequate changes detection algorithm to compute differences
between these visual layout structures of documents.
We describe also a method to evaluate the importance of detected
changes. Tests were conducted to evaluate the feasibility
of our approach. Experimental results show promising
performances of our approach. | en |
dc.language.iso | eng | en |
dc.relation.ispartof | The 9th International Web Archiving
Workshop (IWAW 2009) | en |
dc.rights | info:eu-repo/semantics/openAccess | en |
dc.source | 13th European Conference, ECDL 2009 | en |
dc.title | A Novel Web Archiving Approach based on Visual Pages Analysis | en |
dc.type | Workshop | en |
dc.subject.uncontrolledterm | Web archiving | en |
dc.subject.uncontrolledterm | Change detection | en |
dc.subject.uncontrolledterm | Visual pages analysis | en |
dc.subject.JITA | Διαχείριση υπηρεσιών, λειτουργιών και τεχνικών πληροφόρησης | el_GR |
dc.subject.JITA | Information treatment for information services, Information functions and techniques | en |
dc.contributor.conferenceorganizer | Laboratory on Digital Libraries and Electronic Publishing, Department of Archives and Library Sciences, Ionian University | en |
dc.identifier.JITA | IZ | en |