Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKlink, Chris
dc.contributor.authorWiemer, Mercylyn
dc.date.accessioned2024-07-25T23:01:20Z
dc.date.available2024-07-25T23:01:20Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46926
dc.description.abstractIndividuals with neurological conditions, including brainstem stroke or progressive Amyotrophic Lateral Sclerosis (ALS), often experience severe speech and motor impairment. In some cases, this results in a complete loss of the ability to speak, as observed in locked-in syndrome (LIS). To restore communication abilities for people with LIS, assistive tools such as brain-computer interfaces (BCIs), can provide a form of communication. By using signals directly from the brain, these technologies can serve as a vital communication channel. Direct word decoding can provide a more natural way of communication by recording brain activity during attempted speech. The current study investigated speaker generalization using real-time Magnetic Resonance Imaging (rtMRI) data capturing speech dynamics of the vocal tract. We trained an autoencoder model to generate compact representations of rtMRI videos containing individual words from multiple speakers. Instead of focusing solely on data reconstruction, the compact representations were also designed to encode phoneme information of the corresponding words. Additionally, we applied a custom loss function to calculate the phonemic distance, adapted from the Levenshtein distance. We compared two types of models: the speaker-invariant model, which was trained on data from all speakers, and the speaker-specific models, which were trained on data from each individual speaker separately. The results of this study showed that the speaker-invariant model reduced the total loss (reconstruction and phoneme loss) by a factor of approximately 10 compared to the speaker-specific models, accurately reconstructing the data and effectively encoding phoneme information. Analysis of the compact representations by calculating the Euclidean distance between vectors and comparing these distances for each model revealed significant positive correlations. This suggests similar processing of the word articulations. Another finding was the impact of data quantity, with weaker correlations between speaker-specific and speaker-invariant models when participants had less data available. Future research should investigate the relationship between neural representations and the compact representations of generalized word articulations to better understand the connection between articulation patterns and neural activity.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectIn this thesis, we explored speaker generalization using real-time Magnetic Resonance Imaging (rtMRI) to capture the speech dynamics of the vocal tract. We developed an autoencoder model to generate compact representations of rtMRI videos featuring individual words spoken by multiple speakers. Beyond data reconstruction, these compact representations were designed to encode the phoneme information of the words.
dc.titleSpeaker Generalization Using Autoencoders for Reconstructing Word Articulations
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsbrain-computer interface; speech production; autoencoder; speaker generalization
dc.subject.courseuuArtificial Intelligence
dc.thesis.id34959


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record