Easy Data Augmentation Techniques for Traditional Machine Learning Models on Text Classification Tasks

Santing, L.B.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Nguyen, D.
dc.contributor.author	Santing, L.B.
dc.date.accessioned	2021-08-09T18:00:16Z
dc.date.available	2021-08-09T18:00:16Z
dc.date.issued	2021
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/40651
dc.description.abstract	The use of data augmentation techniques in NLP for the creation of more robust models has increased in recent years. Easy Data Augmentation (EDA) techniques by Wei & Zou (2019) proposed a simple method to augment small datasets for text classification that showed promising results. While most research in the topic of data augmentation for NLP has been focused on deep learning models and not traditional machine learning models, these models are still commonly used for text classification. On three text classification tasks, this research tests the application of EDA on the performance of three traditional machine learning models: logistic regression, naïve bayes and decision tree. Results show that EDA marginally improves performance for these classifiers on small and large datasets.
dc.description.sponsorship	Utrecht University
dc.format.extent	400770
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Easy Data Augmentation Techniques for Traditional Machine Learning Models on Text Classification Tasks
dc.type.content	Bachelor Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	artificial intelligence, natural language processing, text classification, machine learning, data augmentation, easy data augmentation, logistic regression, naive bayes, decision tree, classifier, AI, NLP, ML, EDA
dc.subject.courseuu	Kunstmatige Intelligentie

Files in this item

Name:: Santing_5727456.pdf
Size:: 391.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record