dc.description.abstract | Background: Formal thought disorder in schizophrenia spectrum disorders (SSD) may
manifest as disorganized speech patterns that are traditionally assessed through clinical evalu-
ation. This is subjective and time consuming, especially when we want to use this for frequent
monitoring of patients for early detection of relapses. Thus, this study compared two computa-
tional approaches sentence embedding analysis and large language model perplexity analysis
for objective assessment of speech coherence in Dutch speaking populations. Having an ob-
jective system to capture different aspects of incoherent speech in SSD patients can prove to
be useful in tracking exact state the patient is in.
Methods: We analyzed interview transcripts from 216 participants (85 healthy controls,
131 SSD patients) using two parallel computational frameworks. Sentence embedding anal-
ysis employed four models including Sentence-BERT variants and Dutch specific RobBERT
architectures to calculate semantic similarity between sentence combinations. Perplexity anal-
ysis utilized three Dutch language models (ChocoLlama-2-7B, Fietje-2-instruct, GPT2-small-
dutch) to assess linguistic predictability. Both approaches extracted 14 statistical features char-
acterizing coherence patterns, temporal dynamics, and distribution properties. Support Vector
Machine classifiers were trained on statistical features, extracted from the transcripts to dis-
tinguish between groups. Correlations with Positive and Negative Syndrome Scale (PANSS)
scores were examined.
Results: Sign Slope Change (SSC) emerged as the most robust discriminative feature
across both computational approaches, with large effect sizes (Cohen’s d = -0.819 to -1.155 for
embeddings; d = -0.792 to -0.860 for perplexity) indicating reduced temporal consistency in
patient speech. Beyond SSC, the best linear sentence embedding model (AUC = 0.825) identi-
fied mean coherence as the most discriminative feature, with patients showing reduced average
semantic similarity between consecutive sentences, higher variability in coherence patterns
(elevated RMS values), and lower peak semantic relatedness (reduced robust maximum co-
herence) compared to controls. Machine learning classification achieved strong performance,
with the best sentence embedding model (all-MiniLM-L6-v2) reaching AUC = 0.848 and the
best perplexity model (Fietje) achieving AUC = 0.875. Multilingual models performed optimally for semantic similarity analysis, while Dutch-specific models excelled for perplexity
assessment. Temporal analysis did not reveal significant progressive coherence changes within
individual interviews for either group. Clinical correlations showed 10.5% of sentence embed-
ding features and 19.3% of perplexity features significantly correlated with PANSS scores
(uncorrected p < 0.05).
Conclusions: Both computational approaches effectively distinguished SSD patients from
healthy controls, with complementary strengths suggesting value for multi modal assessment
frameworks. The robust performance of SSC across methodologies establishes temporal in-
consistency as a core computational marker of speech disruption. The absence of progressive
deterioration patterns indicates potential applicability to shorter clinical assessments. These
findings demonstrate substantial potential for developing objective, automated tools to com-
plement traditional clinical evaluation of formal thought disorder in psychiatric populations. | |