Speech-based Depression Prediction with Symptoms as Interpretable Intermediate Features
Summary
Mood disorders in general and depression in particular are common and their impact on individuals and society is high. Roughly 5% of adults worldwide suffer from depression. Commonly, depression diagnosis involves using questionnaires, either clinician-rated or self-reported. Due to the subjectivity in questionnaire methods and high human-related costs involved, there are ongoing efforts to find more objective and easily attainable depression markers. As is the case with recent audio, visual and linguistic applications, state-of-the-art approaches for automated depression severity prediction heavily depend on deep learning and black box modeling without explainability and interpretability considerations. However, for reasons ranging from regulations to understanding the extent and limitations of the model, the clinicians need to understand the decision making process of the model to confidently form their decisions. In this work, I focus on speech-based depression severity level prediction on DAIC-WOZ corpus and benefit from PHQ-8 questionnaire items to predict the symptoms as interpretable high level features. I show that using a multi-task regression approach with state-of-the-art text-based features to predict the depression symptoms, it is possible to reach a viable test set Concordance Correlation Coefficient performance comparable to the state-of-the-art systems, while improving on the interpretability of the overall prediction system.