Rewiring Peer Connections over Semantic Overlay Networks
MetadataShow full item record
Today's content providers are naturally distributed and produce large amounts of information every day, making peer-to-peer data management a promising approach offering scalability, adaptivity to dynamics and failure resilience. The management of large volumes of data in peer-to-peer networks has generated additional interest in methods for effective network organisation based on providers' contents and consequently, in methods supporting information retrieval. Overlay architectures (or overlay networks) are being introduced as tools for the organisation and sharing of information residing in a network of peers. Moreover, semantic overlay networks partition the overlay layer by clustering the data sources. This is achieved through a rewiring protocol that is executed independently by each peer. The purpose of this protocol is to establish connections among peers being semantically, thematically or socially similar (i.e., peers sharing similar interests). By this, the problem of finding the most relevant resources is reduced to one of locating the clusters similar to the query. In this thesis, we present iCluster, a generic architecture for supporting full- °edged information retrieval in large-scale peer-to-peer networks, along with its associated organisation and query forwarding protocols. The main focus of this work is on rewiring. We study the functional issues related to peer rewiring and discuss a number of choices in designing the rewiring strategy. In addition, we introduce the concept of clustering efficiency, a measure that quantifies the quality of network organisation by exploiting the underlying network structure. Clustering effciency is used for evaluating the performance of rewiring. We study the system performance and identify the rewiring protocol that proves efficient under peer churn. Finally, we apply our architecture to a digital library use case and support searching and filtering functionality using the same infrastructure. The experimental evaluation with real-word data and queries demonstrates that iCluster achieves significant performance improvements (in terms of communication load and retrieval accuracy) over a well-known state-of-the-art peer-to-peer clustering method. Compared to exhaustive search by flooding, iCluster exchanged a small loss in retrieval accuracy for much less message flow. In addition, the proposed distributed protocols are proven efficient under peer churn, shown to achieve high retrieval accuracy and scale up well for large networks. Our experimental results confirm the dependency between rewiring strategies and retrieval performance, and give insights on the trade-offs involved in the selection of a rewiring strategy.