Differentiating psychotic patients by linguistic features: clustering patients with psychotic disorder to explore the relationship between diagnostic and linguistic properties
Summary
Psychotic disorder causes high social costs due to the impact it has on patients and the high
prevalence, especially among adolescents. No reliable biological indicator exists for the di-
agnosis of psychotic disorder, although research shows language has potential to become a
biomarker. One symptom of psychotic disorder is incoherent language. In this research pa-
per the use of incoherent language to differentiate between different groups of patients was
explored. Incoherent language was represented by a feature set extracted from the interviews
of 50 patients and 50 healthy controls (N=100) processed with word2vec semantic analysis.
Features were chosen by their ability to separate patients from controls. We then used those
language coherence features to group psychotic patients using unsupervised clustering. Mul-
tiple cluster models successfully clustered the patients with up to four features. The general
symptom score was signi?cantly different between clusters and no confounding factors were
found. This exploration shows the usefulness of clustering techniques for this particular use
case. It is among the ?first evidence that symptom severity measures of psychotic disorder
and linguistic coherence may be related. This could be the ?first step towards the detection of
illness severity by language coherence, which could help provide timely care for the patient.