Using Named Entity Recognition to Map Methods, Methodologies, Tools, Tasks and Data in the Digital Humanities
Summary
Methodologies and data are not currently instantly extractable from digital humanities papers. Biology and medicine have a tradition of using named entity recognition to tackle this lacuna. In this thesis, a named entity recognition model, trained on a sample of 197 relevant annotated five-sentence windows extracted from a broad corpus of 692 English-language Digital Humanities Quarterly articles to label entities as either ‘methodology’, ‘method’, ‘tool’, ‘task’ or ‘data’, is employed to multifocally read digital humanities methodologies. Multifocal reading practices are found to be promising, but performance and external validity of the named entity recognition model are lacking.