Analysis of question type classification and disambiguation
Summary
This thesis proposes a research on question context analysis, utilizing Natural Language
Processing (NLP) techniques. In past time, Question & Answer interactions were handled
manually. Recent advancements in NLP and Machine Learning (ML) have created the opportunity to implement a plethora of Question & Answering Systems (QAS). These systems
often utilize a classifier to classify the questions into types before answering them. The
two most prominent classification methods historically are Rule-based models and Machine
Learning models. Rule-based models utilize grammar rules and a dictionary to classify questions into categories and map these to the correct answer. Machine Learning models, such
as neural networks, utilize mathematical equations to learn from a Question & Answer data
set, with the intention of minimizing classification errors when matching question and answer pairs. This research aims to discover how popular classification techniques perform in
a restricted domain environment, with regards to question type recognition and question
enrichment. These two tasks are performed on a large structured Dutch data set and a
public English data set. To determine how the classifiers score on these two tasks, two rigorous metrics are applied to determine classification power; F1 Score & Area under the ROC
surface (AUC). Results suggest that question ambiguity can be recognized with an F1-score
upwards of 90%. ML techniques featuring deep learning perform best across both question
type detection and question enrichment.