Analysis of question type classification and disambiguation

Nijhuis, Antoine

View/Open

MasterThesis_Nijhuis_MB_PublicationVersion.pdf (2.534Mb)

Publication date

2023

Author

Nijhuis, Antoine

Metadata

Show full item record

Summary

This thesis proposes a research on question context analysis, utilizing Natural Language Processing (NLP) techniques. In past time, Question & Answer interactions were handled manually. Recent advancements in NLP and Machine Learning (ML) have created the opportunity to implement a plethora of Question & Answering Systems (QAS). These systems often utilize a classifier to classify the questions into types before answering them. The two most prominent classification methods historically are Rule-based models and Machine Learning models. Rule-based models utilize grammar rules and a dictionary to classify questions into categories and map these to the correct answer. Machine Learning models, such as neural networks, utilize mathematical equations to learn from a Question & Answer data set, with the intention of minimizing classification errors when matching question and answer pairs. This research aims to discover how popular classification techniques perform in a restricted domain environment, with regards to question type recognition and question enrichment. These two tasks are performed on a large structured Dutch data set and a public English data set. To determine how the classifiers score on these two tasks, two rigorous metrics are applied to determine classification power; F1 Score & Area under the ROC surface (AUC). Results suggest that question ambiguity can be recognized with an F1-score upwards of 90%. ML techniques featuring deep learning perform best across both question type detection and question enrichment.

URI

https://studenttheses.uu.nl/handle/20.500.12932/45118

Collections

Theses