Applying Image Recognition to Automatic Speech Recognition: Determining Suitability of Spectrograms for Training a Deep Neural Network for Speech
Recognition

Lambooij, N.L.C.

View/Open

Bachelor Scriptie - Applying Image Recognition to Automatic Speech Recognition.pdf (872.4Kb)

Publication date

2017

Author

Lambooij, N.L.C.

Metadata

Show full item record

Summary

In speech recognition, Neural Networks are used to recognise the sequence of phonemes in an audio signal. These networks are trained on audio data pre-processed into some (type of) spectral vector. We present an alternative method that pre-processes speech utterances into visual representations, called spectrograms, and train a neural network suitable for image recognition to identify phonemes. The resulting network was able to classify 99.73% of a set of vowels containing samples of ‘iy’, ‘ah’ and ‘uw’ correctly, 91.87% of a set of vowels containing samples ‘iy’, ‘ih’ and ‘eh’, and 75.97% of the full dataset of twelve vowels. These results show that using image recognition in automatic speech recognition is worth further investigating.

URI

https://studenttheses.uu.nl/handle/20.500.12932/27440

Collections

Theses