Exploring Arabic Automatic Speech Recognition Bias
Tussenbroek, R. van
MetadataShow full item record
Research is limited regarding the field of Arabic Automatic Speech Recognition (ASR) systems and the bias in these systems. This paper expands on this field by conducting a literature review with the aim to discover how the Arabic language and dialects could introduce difficulties for ASR models. Additionally, it aims to describe the current social situation in Arabic speaking countries and it's perception by the media. To add to that, we explain ASR and discuss the limitations it has for under-resourced languages like Arabic. Lastly, we discuss various datasets, data-distributors and software for Arabic and compare them with the norm for English. We concluded that the dialects and the general complexity of Arabic form a challenge for ASR models. To add to that, the conflicts in the Arab world intensified negative stereotyping of Arabs, that were emphasized by the media. Additionally, since Arabic is an under-resourced language it is hard to find labeled data to train ASR models. Moreover, compared to English, Arabic datasets are less widely available in larger size, less diverse and contain less different types of content related to natural speech. Future work could extensively analyse datasets using data analysis methods to discover more potential reasons for bias.