Voice Quality: An empirical assessment & a computational model
Summary
Current objective assessments of speech signals show little correlation with the listener's perceived voice quality (VQ) , with their quality of experience. To remedy this omission in our knowledge on the voice, a survey was executed, including 102 listeners, who each provided their Self-Assessment Manikin(SAM) on 100 (i.e., 4_ 25) speech samples of two males and two females. These samples were either high quality or degraded by pink noise, impulse noise, packet loss, or bandwidth reduction. An repeated measures analysis of variance (ANOVA) on the obtained SAM , speaker gender, and signal quality revealed that the listeners preferred one female voice and that degradations influences the SAM . The SAM was also compared with International Telecommunication Union Telecommunication standardization sector (ITU-T) 's Perceptual Objective Listening Quality Assessment (POLQA) , which showed to handle the degradations excellently; but, was unable
to assess VQ adequately. To resolve POLQA 's weak spot, we developed initial computational models, founded on paralinguistic parameters solely. These models correctly predicted VQ in 87.84% (4 levels) and 70.58% (8 levels) of the cases. Unknown speaker's VQ was predicted correctly in 88.71% (4 levels) and 70.42% (8 levels) of the cases. The results of this empirical study emphasize that VQ is a complex, multidimensional construct, which is influenced by several types of common noise. Moreover, it shows that ITU-T 's POLQA can be provided with an add-on, which enables it to predict VQ as well. As such, this study provides a major step towards understanding VQ and including it in ITU-T 's standards.