Show simple item record

dc.contributor.authorBen Saad, Myriamen
dc.contributor.authorPehlivan, Zeynepen
dc.contributor.authorGançarski, Stéphanefra
dc.coverage.spatialGR - Κέρκυραen
dc.date.available2014-03-21T11:18:35Z
dc.date.issued2009
dc.identifier.urihttp://hdl.handle.net/10797/14073en
dc.descriptionΠεριέχει το πλήρες κείμενοel_GR
dc.description.abstractDue to the growing importance of the World Wide Web, archiving the web has become a cultural necessity in preserving knowledge. To maintain a web archive up-to-date, crawlers harvest the web by iteratively downloading new versions of documents. However, it is frequent that crawlers retrieve pages with unimportant changes such as advertisements which are continually updated. Hence, web archive systems waste time and space for indexing and storing useless page versions. In this paper, we present a novel approach that detects important changes between versions in order to efficiently archive the web. Our approach combines the concept of the visual pages segmentation with the concept of importance while detecting changes between versions. The approach consists of archiving the visual layout structure of a web page represented by semantic blocks. We propose an adequate changes detection algorithm to compute differences between these visual layout structures of documents. We describe also a method to evaluate the importance of detected changes. Tests were conducted to evaluate the feasibility of our approach. Experimental results show promising performances of our approach.en
dc.language.isoengen
dc.relation.ispartofThe 9th International Web Archiving Workshop (IWAW 2009)en
dc.rightsinfo:eu-repo/semantics/openAccessen
dc.source13th European Conference, ECDL 2009en
dc.titleA Novel Web Archiving Approach based on Visual Pages Analysisen
dc.typeWorkshopen
dc.subject.uncontrolledtermWeb archivingen
dc.subject.uncontrolledtermChange detectionen
dc.subject.uncontrolledtermVisual pages analysisen
dc.subject.JITAΔιαχείριση υπηρεσιών, λειτουργιών και τεχνικών πληροφόρησηςel_GR
dc.subject.JITAInformation treatment for information services, Information functions and techniquesen
dc.contributor.conferenceorganizerLaboratory on Digital Libraries and Electronic Publishing, Department of Archives and Library Sciences, Ionian Universityen
dc.identifier.JITAIZen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record