dc.description.abstract | The increasing volume and complexity of tabular data generated in clinical trials has outpaced traditional manual review workflows, which typically rely on univariate thresholds and two‐dimensional visualizations. Individual anomalous measurements—so-called cellwise outliers—can evade such marginal checks and compromise entire records, underscoring the need for automated, scalable detection pipelines that also provide explainability to clinical data managers .
The primary objective of this thesis is to evaluate classification performance of both univariate and multivariate cellwise anomaly detection methods on a tabular dataset of multiple clinical studies.
Clinical data from 50 placebo-arm studies comprising 1,104 subjects and spanning January 2016 to September 2022 across vital signs, laboratory, ECG, and demographic domains were injected with two mutually exclusive synthetic outlier types—small (±3 SD) and extreme (×10)—each at a 1% frequency . The univariate approach employed the STAR_outlier algorithm to identify marginal deviations. In parallel, the multivariate workflow included within-day last-observation-carried-forward and between-day iterative imputation followed by a self-supervised LightGBM gradient boosting regression model that predicted each feature using all other parameters (including lagged and lead timepoints). Reconstruction errors were transformed into anomaly scores and cellwise anomalies flagged based on a quantile threshold.
When evaluated across studies, the multivariate LightGBM model consistently flagged extreme anomalies with high reliability but struggled to detect subtle deviations when extremes were present, prompting threshold adjustments that improved small-anomaly recall. Study-specific models modestly enhanced small-outlier detection but still fell short of operational requirements, and the univariate STAR_outlier method delivered intermediate results—outperforming multivariate detection of minor anomalies in the presence of extreme outliers. but not matching its sensitivity to the latter.
Importantly, the LightGBM model was intrinsically capable of detecting small outliers, as evaluation of a dataset with only small outliers did result in classification performance metrics comparable to extreme outliers.
In conclusion, while both automated multivariate and marginal univariate techniques can effectively flag gross cellwise anomalies in clinical trial data, the reliable detection of subtle anomalies, alongside extreme values, remains challenging. Future efforts should focus on refined threshold strategies or 2-stage approaches, enriched feature engineering (including temporal‐difference and rolling‐window statistics) and targeted hyperparameter optimization to advance explainable, scalable anomaly detection in clinical data review . | |