Classifying digital documents reliably using machine learning

Hristov, H.S.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Feelders, A.J.
dc.contributor.author	Hristov, H.S.
dc.date.accessioned	2020-08-28T18:00:12Z
dc.date.available	2020-08-28T18:00:12Z
dc.date.issued	2020
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/37123
dc.description.abstract	Digital documents are the de facto container to transfer information in our age and due to the growing prevalence of large amounts of documents, organizations may struggle to organize and categorize these documents into their respective categories. In this work, we show how a system that uses real-life documents can use scaling of the neural network architecture EfficientNet and its input images, and we compare its performance with the widely used VGG-16 architecture. We explore how we can augment the input data through RandAugment, a data augmentation strategy which has only 2 parameters which allows the user to search through the search space of possible augmentations faster. We show that transforming the input images can cause a classifier which is not trained on transformed images to perform poorly. We adapt a multi modal fusion method that has not, to this point, been applied to document image classification - tensor fusion. We propose suitable parameters for this approach and compare it to the more standard early and late fusion approaches by training our models on a balanced subset of the widely used Tobacco-3482 dataset. We then apply this approach to our own dataset and draw conclusions on which approach is the most appropriate. Our results show that tensor fusion can be more successful than late fusion but stops short of the performance of the simpler to implement early fusion. Additionally, tensor fusion requires more hyper parameters to be considered which makes the implementation of this approach more difficult. By comparing the results of humans on the ImageNet dataset, we argue that the top 3 predictions of our approach can provide reliable category recommendations.
dc.description.sponsorship	Utrecht University
dc.format.extent	6571156
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Classifying digital documents reliably using machine learning
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	document image classification, multi modal fusion, machine learning
dc.subject.courseuu	Computing Science

Files in this item

Name:: Master_s_thesis_COSC___Hristo_ ...
Size:: 6.266Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record