Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSchnack, Hugo
dc.contributor.authorSingh, Vedant
dc.date.accessioned2025-08-21T00:05:17Z
dc.date.available2025-08-21T00:05:17Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/49880
dc.description.abstractBackground: Formal thought disorder in schizophrenia spectrum disorders (SSD) may manifest as disorganized speech patterns that are traditionally assessed through clinical evalu- ation. This is subjective and time consuming, especially when we want to use this for frequent monitoring of patients for early detection of relapses. Thus, this study compared two computa- tional approaches sentence embedding analysis and large language model perplexity analysis for objective assessment of speech coherence in Dutch speaking populations. Having an ob- jective system to capture different aspects of incoherent speech in SSD patients can prove to be useful in tracking exact state the patient is in. Methods: We analyzed interview transcripts from 216 participants (85 healthy controls, 131 SSD patients) using two parallel computational frameworks. Sentence embedding anal- ysis employed four models including Sentence-BERT variants and Dutch specific RobBERT architectures to calculate semantic similarity between sentence combinations. Perplexity anal- ysis utilized three Dutch language models (ChocoLlama-2-7B, Fietje-2-instruct, GPT2-small- dutch) to assess linguistic predictability. Both approaches extracted 14 statistical features char- acterizing coherence patterns, temporal dynamics, and distribution properties. Support Vector Machine classifiers were trained on statistical features, extracted from the transcripts to dis- tinguish between groups. Correlations with Positive and Negative Syndrome Scale (PANSS) scores were examined. Results: Sign Slope Change (SSC) emerged as the most robust discriminative feature across both computational approaches, with large effect sizes (Cohen’s d = -0.819 to -1.155 for embeddings; d = -0.792 to -0.860 for perplexity) indicating reduced temporal consistency in patient speech. Beyond SSC, the best linear sentence embedding model (AUC = 0.825) identi- fied mean coherence as the most discriminative feature, with patients showing reduced average semantic similarity between consecutive sentences, higher variability in coherence patterns (elevated RMS values), and lower peak semantic relatedness (reduced robust maximum co- herence) compared to controls. Machine learning classification achieved strong performance, with the best sentence embedding model (all-MiniLM-L6-v2) reaching AUC = 0.848 and the best perplexity model (Fietje) achieving AUC = 0.875. Multilingual models performed optimally for semantic similarity analysis, while Dutch-specific models excelled for perplexity assessment. Temporal analysis did not reveal significant progressive coherence changes within individual interviews for either group. Clinical correlations showed 10.5% of sentence embed- ding features and 19.3% of perplexity features significantly correlated with PANSS scores (uncorrected p < 0.05). Conclusions: Both computational approaches effectively distinguished SSD patients from healthy controls, with complementary strengths suggesting value for multi modal assessment frameworks. The robust performance of SSC across methodologies establishes temporal in- consistency as a core computational marker of speech disruption. The absence of progressive deterioration patterns indicates potential applicability to shorter clinical assessments. These findings demonstrate substantial potential for developing objective, automated tools to com- plement traditional clinical evaluation of formal thought disorder in psychiatric populations.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectAnalysis of mental health diseases(Schizophrenia) using semantic embeddings.
dc.titleSPEECH TO SCHIZOPHRENIA SPECTRUM DISORDER: MACHINE LEARNING CLASSIFICATION USING NLP FEATURES
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuArtificial Intelligence
dc.thesis.id52013


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record