Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKarnstedt-Hulpus, I.R.
dc.contributor.authorAcosta, Christian
dc.date.accessioned2023-09-06T09:40:36Z
dc.date.available2023-09-06T09:40:36Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/44959
dc.description.abstractAutomatic Speech Recognition (ASR) models have shown great progress in recent years. Whisper is one of the latest models, showing state-of-the-art performance on a broad range of unseen datasets. This makes it a useful model for a broad range of applications, such as converting audio files into text transcripts. Detectives of the National Police Corps have a large amount of audio data to process for their investigations. Manual processing is tedious and resource intensive. Whisper can be a useful tool for speeding up investigations and alleviating the workload. While Whisper performs well out-of-the-box, its performance can still be further improved. Through the method of hyperparameter tuning and comparing different implementations of Whisper, the processing time, memory usage, and accuracy have been optimized. Firstly, we show that reducing computational precision improved the performance in all models tested. Secondly, reducing beam size to a more greedy strategy reduced processing time and memory usage with minimal influence on accuracy. Thirdly, larger batch sizes decreased processing time and increased accuracy, but also increased memory usage. Lastly, implementing Voice Activity Detection increased accuracy and decreased processing time without increasing memory usage. We conclude that Faster-Whisper is the overall best performing model for the current use-case. It has the best trade-off between processing time, memory usage, and accuracy. Consequently, this allows for the greatest transcription throughput when multiple instances of the model are used in parallel.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectImprovement of the performance and accuracy of the Whisper Automatic Speech Recognition model and its variants through tuning of various hyperparameters. The aim of this study was to increase processing throughput without negatively affecting model accuracy.
dc.titleImproving the effectiveness of different Automatic Speech Recognition models with hyperparameter tuning
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsAutomatic Speech Recognition; Speech to Text; Whisper
dc.subject.courseuuApplied Data Science
dc.thesis.id23516


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record