View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        SPEECH TO SCHIZOPHRENIA SPECTRUM DISORDER: MACHINE LEARNING CLASSIFICATION USING NLP FEATURES

        Thumbnail
        View/Open
        MSCThesis1587099.pdf (32.69Mb)
        Publication date
        2025
        Author
        Singh, Vedant
        Metadata
        Show full item record
        Summary
        Background: Formal thought disorder in schizophrenia spectrum disorders (SSD) may manifest as disorganized speech patterns that are traditionally assessed through clinical evalu- ation. This is subjective and time consuming, especially when we want to use this for frequent monitoring of patients for early detection of relapses. Thus, this study compared two computa- tional approaches sentence embedding analysis and large language model perplexity analysis for objective assessment of speech coherence in Dutch speaking populations. Having an ob- jective system to capture different aspects of incoherent speech in SSD patients can prove to be useful in tracking exact state the patient is in. Methods: We analyzed interview transcripts from 216 participants (85 healthy controls, 131 SSD patients) using two parallel computational frameworks. Sentence embedding anal- ysis employed four models including Sentence-BERT variants and Dutch specific RobBERT architectures to calculate semantic similarity between sentence combinations. Perplexity anal- ysis utilized three Dutch language models (ChocoLlama-2-7B, Fietje-2-instruct, GPT2-small- dutch) to assess linguistic predictability. Both approaches extracted 14 statistical features char- acterizing coherence patterns, temporal dynamics, and distribution properties. Support Vector Machine classifiers were trained on statistical features, extracted from the transcripts to dis- tinguish between groups. Correlations with Positive and Negative Syndrome Scale (PANSS) scores were examined. Results: Sign Slope Change (SSC) emerged as the most robust discriminative feature across both computational approaches, with large effect sizes (Cohen’s d = -0.819 to -1.155 for embeddings; d = -0.792 to -0.860 for perplexity) indicating reduced temporal consistency in patient speech. Beyond SSC, the best linear sentence embedding model (AUC = 0.825) identi- fied mean coherence as the most discriminative feature, with patients showing reduced average semantic similarity between consecutive sentences, higher variability in coherence patterns (elevated RMS values), and lower peak semantic relatedness (reduced robust maximum co- herence) compared to controls. Machine learning classification achieved strong performance, with the best sentence embedding model (all-MiniLM-L6-v2) reaching AUC = 0.848 and the best perplexity model (Fietje) achieving AUC = 0.875. Multilingual models performed optimally for semantic similarity analysis, while Dutch-specific models excelled for perplexity assessment. Temporal analysis did not reveal significant progressive coherence changes within individual interviews for either group. Clinical correlations showed 10.5% of sentence embed- ding features and 19.3% of perplexity features significantly correlated with PANSS scores (uncorrected p < 0.05). Conclusions: Both computational approaches effectively distinguished SSD patients from healthy controls, with complementary strengths suggesting value for multi modal assessment frameworks. The robust performance of SSC across methodologies establishes temporal in- consistency as a core computational marker of speech disruption. The absence of progressive deterioration patterns indicates potential applicability to shorter clinical assessments. These findings demonstrate substantial potential for developing objective, automated tools to com- plement traditional clinical evaluation of formal thought disorder in psychiatric populations.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/49880
        Collections
        • Theses
        Utrecht university logo