Constructing an NLP-pipeline to extract diagnoses and diagnostic evolutions from clinical notes in Psychiatry
Summary
While research in using natural language processing for mining text and coherent concepts from electronic health records within generic healthcare is steadily increasing, psychiatric and mental health care demands a more sophisticated approach due to its ambiguity in texts and is therefore considered to be more difficult. This difficulty is predominantly experienced when analyzing large quantities of letters. Specialists within the UMC Utrecht psychiatry department recognize the advantage of being able to quantify and conceptualize disorders on a large scale, leading to disorder profiles that can be used for further research.
This paper proposes several techniques to extract psychiatric disorders from outpatient- and discharge letters, built on existing frameworks and transparent model-independent processes. These model-independent processes consist of rule-based segmentation and extraction of annotated disorders and disorder status within these letters. Mapping disorders and disorder status has been tackled with different techniques, going from least complex to more complex.
The least complex method, which consisted of a syntactic pattern-matching algorithm called ContextD, supplemented with several rules to fit more nuance for psychiatric texts, performed best on average. Only on precision, the ensemble method based on a majority voting classifier performed better than this purely rule-based approach. The other models within this ensemble classifier are a two-stage method classifying the disorders on a POS-tagged TF-IDF token window masking the patterns from the ContextD rule set using SVM, and a pre-trained transformer model called MedRobBERTa finetuned for negation classifying the disorders on the token window. The last model performed worst overall, with recall being below 0.5.
Analyzing psychiatric outpatient- and discharge letters in such a way has not been done before within this context, this project acts as the foundation where the platform for specialists helps advance analysis by clinical specialists.