Classifying digital documents reliably using machine learning
MetadataShow full item record
Digital documents are the de facto container to transfer information in our age and due to the growing prevalence of large amounts of documents, organizations may struggle to organize and categorize these documents into their respective categories. In this work, we show how a system that uses real-life documents can use scaling of the neural network architecture EfficientNet and its input images, and we compare its performance with the widely used VGG-16 architecture. We explore how we can augment the input data through RandAugment, a data augmentation strategy which has only 2 parameters which allows the user to search through the search space of possible augmentations faster. We show that transforming the input images can cause a classifier which is not trained on transformed images to perform poorly. We adapt a multi modal fusion method that has not, to this point, been applied to document image classification - tensor fusion. We propose suitable parameters for this approach and compare it to the more standard early and late fusion approaches by training our models on a balanced subset of the widely used Tobacco-3482 dataset. We then apply this approach to our own dataset and draw conclusions on which approach is the most appropriate. Our results show that tensor fusion can be more successful than late fusion but stops short of the performance of the simpler to implement early fusion. Additionally, tensor fusion requires more hyper parameters to be considered which makes the implementation of this approach more difficult. By comparing the results of humans on the ImageNet dataset, we argue that the top 3 predictions of our approach can provide reliable category recommendations.