Machine Learning tense classification in Dutch conditional sentences
Summary
In Time in Translation, a project currently researching the translation of tenses in conditional sentences, tenses are annotated manually to
serve as data for the research. This paper researches the performance of
Machine Learning algorithms to automate this classification, specifically,
the classification of tenses of the antecedent part in Dutch conditionals.
Four algorithms were trained using previously annotated conditionals and
features that were found to be relevant to the tense, after which the algorithms were tested with some variation of parameters. MLP, K-NN and
SVM performed the classification with an accuracy of around 93%, using cross-validation, while NB performed at 80%. The features that were
found to be positively influential all measured the occurrences or presence
of certain types of verbs in the antecedent, in some way. Consequently,
the implication is made that these algorithms, requiring only a little data
set and several simple features, can be used to classify the tenses for the
research and that they might get better when provided with feedback.