Easy Data Augmentation Techniques for Traditional Machine Learning Models on Text Classification Tasks
Summary
The use of data augmentation techniques in NLP for the creation of more robust models has increased in recent years. Easy Data Augmentation (EDA) techniques by Wei & Zou (2019) proposed a simple method to augment small datasets for text classification that showed promising results. While most research in the topic of data augmentation for NLP has been focused on deep learning models and not traditional machine learning models, these models are still commonly used for text classification. On three text classification tasks, this research tests the application of EDA on the performance of three traditional machine learning models: logistic regression, naïve bayes and decision tree. Results show that EDA marginally improves performance for these classifiers on small and large datasets.