GROBID: combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications
Abstract
Based on state of the art machine learning techniques, GROBID
(GeneRation Of BIbliographic Data) performs reliable bibliographic
data extractions from scholar articles combined with multi-level term
extractions. These two types of extraction present synergies and correspond
to complementary descriptions of an article. This tool is viewed as
a component for enhancing the existing and the future large repositories
of technical and scientific publications.