Identifying topical trends in relation to common questions, concerns and doubts from the general public to enhance healthcare policies
Summary
This thesis evaluates the effectiveness of various topic tracking and trend identification methods in detecting trending topics in open-ended healthcare policy questionnaire responses. Identifying these trending topics is crucial for providing decision-makers with insights into public concerns and important policy areas.
Two primary model types are used in this study. First, topic tracking models, including Length Weighted Topic Chains (LWTC) and Single Pass Clustering (SPC), are utilized to identify topics and generate a timeseries of their occurrences. Second, trend identification models leverage these timeseries to detect trending topics through methods such as Most Occurring Topics Selection, Moving Average (MA), and AutoRegressive Integrated Moving Average (ARIMA), or by analyzing word frequency using Bursty Term Extraction (BTE).
The evaluation of topic tracking methods is conducted via the word intrusion task, while trend identification models are assessed using precision and F1-scores across two labeled timelines. The findings indicate that SPC beats LWTC in topic modeling but lacks generalizability, whereas LWTC demonstrates consistent performance. For trend identification, the Moving Average method emerges as the most effective, achieving the highest precision and F1-score combination, followed by ARIMA, Most Occurring Topics Selection, and BTE.
A notable result is that the combination of LWTC with the Moving Average method yields the best overall performance for identifying trending topics in open-ended healthcare policy questionnaire responses. This combination achieves both a high precision and F1-scores, making it the most robust approach in this context. However, the thesis also reveals a disconnect between topic modeling quality and trend detection effectiveness, suggesting that higher interpretability does not necessarily translate to superior trend identification.
This thesis highlights several limitations, including the subjectivity in topic labeling and comparison with labeled timelines, and the lack of expert involvement in evaluation tasks. Future studies should address these by incorporating domain experts and developing expert-based evaluation metrics to enhance the practical utility and accuracy of these methods. Additionally, improvements in topic modeling performance for LWTC through BERTopic or Biterm Topic Model (BTM) could further refine trending topic identification.
In conclusion, this thesis contributes valuable insights into the application of topic tracking and trend identification models in healthcare policy analysis, offering a promising approach for extracting valuable information from open-ended questionnaire responses.