Using meta-text to improve intelligibility of speech-synthesized e-texts
Summary
Current Text-to-Speech software such as Vocalizer is able to produce fairly natural speech from texts that do not contain meta-text. Meta-text that is part of most modern electronic text-formats is ignored by Vocalizer, resulting in unnatural output, or loss of structural information. The present research designs and tests a method to preserve meta-text information in Text-to-Speech conversion. Preservation was done by mapping various structural elements in the e-text to speech, non-speech audio and pauses. A listening experiment, using 23 participants, was performed to measure this method's effectiveness in improving three aspects: listening comfort; perceived speech intelligibility and perceived synthesis quality. In the case of list-structures, significant improvements between 18% and 30% were measured in all three aspects. Omission of a large data-table resulted in significant improvements between 21% and 61% in all three aspects as well. Mappings for headings, images, page-breaks did not result in significant improvements.