Minersoft: Searching software resourses in large - scale grid and cloud infrastructures

View/ Open
Date
2009-09Author
Κατσιφοδήμος, Αστέριος
Katsifodimos, Asterios
Metadata
Show full item recordAbstract
Software retrieval is concerned with locating and identifying appropriate software resources
to satisfy users requirements. It is considered to be one of the key technical issues in
software reuse since \You must nd it before you can reuse it". In this thesis, we investigate
the problem of supporting keyword-based searching for the discovery of software resources
that are installed on the nodes of large-scale, federated Grid and Cloud computing infrastructures.
We address a number of challenges that arise from the unstructured nature
of software and the unavailability of software-related metadata on large-scale networked
environments. We present Minersoft, a harvester that visits Grid/Cloud infrastructures,
crawls their le-systems, identi es and classi es software resources, and discovers implicit
associations between them. The results of Minersoft harvesting are encoded in a weighted,
typed graph, named the Software Graph. A number of IR algorithms are used to enrich
this graph with structural and content associations, to annotate software resources with
keywords, and build inverted indexes to support keyword-based searching for software. Using
a real testbed, we present an evaluation study of our approach, using data extracted
from production-quality Grid and Cloud computing infrastructures. Experimental results
show that Minersoft is a powerful tool for software retrieval.
Collections
- Τμήμα Πληροφορικής [73]