Text Segmentation Using Named Entity Recognition and Co-Reference Resolution in Greek Texts
MetadataΕμφάνιση πλήρους εγγραφής
In this paper we examine the benefit of performing named entity recognition and co-reference resolution to a Greek corpus used for text segmentation. Segments consist of portions among one of the 300 documents published by ten different authors in the Greek newspaper "To Vima". The aim here is to examine whether the combination of text segmentation and information extraction (and most specifically the named entity recognition and co-reference resolution steps) can prove to be beneficial for the identification of the various topics that appear in a document. Named entity recognition was performed using an already existing tool which was trained on a similar corpus. The produced annotations were manually corrected and enriched in order to cover four types of named entities (i.e. person name, organization, location and time). Coreference resolution and most specifically substitution of every reference of the same instance with the same named entity identifier was performed in a subsequent step. The evaluation using three well known text segmentation algorithms leads to the conclusion that, the benefit highly depends on the segment's topic, the number of named entity instances appearing in it, as well as the segment's length.