Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorPaperno, D.
dc.contributor.authorBrandt, A.
dc.date.accessioned2021-08-24T18:00:13Z
dc.date.available2021-08-24T18:00:13Z
dc.date.issued2021
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/41157
dc.description.abstractThe Dutch Expertise Center of Human Trafficking and Human Smuggling aims to use online media archives to extract articles that are useful for them. One approach for this is creating a custom Named Entity Recognition model. Named Entity Recognition (NER) is a subtask of Information Extraction (IE). Its goal is to extract certain ‘named entities’ from unstructured text. These entities used to only be proper names, but today NER encompasses the extraction of all important entities within a given context [17]. When creating a custom NER model, the entities that are extracted are are by definition very domain-specific. Because of this, big, annotated training corpora usually do not exist for custom NER models and annotation is done by hand. This probes the question whether annotation should be done by people with knowledge of the given domain, or by people with knowledge of NER. In this report, a custom NER model created by using the SpaCy library is trained on a dataset that is annotated by either a fourth year AI student or employees of the Expertise Center. This was done in order to assess the importance of domain-specific knowledge in annotating data for custom NER models. Different properties of the annotated datasets are analyzed, as well as the performance of the models. The models trained on the dataset annotated by the AI student slightly outperformed those trained on the dataset annotated by the Expertise Center, but not by a great margin. Most of all, the outcome of the research suggests a trade-off between extracting certain, extremely specific entities and creating a model that performs and generalizes well.
dc.description.sponsorshipUtrecht University
dc.format.extent530999
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleThe importance of domain-specific expertise in training customized Named Entity Recognition models
dc.type.contentBachelor Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsNamed Entity Recognition, Annotation
dc.subject.courseuuKunstmatige Intelligentie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record