Exploring Arabic Automatic Speech Recognition Bias

Tussenbroek, R. van

View/Open

BSc Thesis - vanTussenbroek 6480373.pdf (611.2Kb)

Publication date

2021

Author

Tussenbroek, R. van

Metadata

Show full item record

Summary

Research is limited regarding the field of Arabic Automatic Speech Recognition (ASR) systems and the bias in these systems. This paper expands on this field by conducting a literature review with the aim to discover how the Arabic language and dialects could introduce difficulties for ASR models. Additionally, it aims to describe the current social situation in Arabic speaking countries and it's perception by the media. To add to that, we explain ASR and discuss the limitations it has for under-resourced languages like Arabic. Lastly, we discuss various datasets, data-distributors and software for Arabic and compare them with the norm for English. We concluded that the dialects and the general complexity of Arabic form a challenge for ASR models. To add to that, the conflicts in the Arab world intensified negative stereotyping of Arabs, that were emphasized by the media. Additionally, since Arabic is an under-resourced language it is hard to find labeled data to train ASR models. Moreover, compared to English, Arabic datasets are less widely available in larger size, less diverse and contain less different types of content related to natural speech. Future work could extensively analyse datasets using data analysis methods to discover more potential reasons for bias.

URI

https://studenttheses.uu.nl/handle/20.500.12932/40679

Collections

Theses