Identifying patterns in GvHD patients
Summary
Acute Graft-versus-Host Disease (aGvHD) is a common and serious complication following allogeneic hematopoietic cell transplantation (allo-HCT) in pediatric patients. Although corticosteroids are the standard treatment, nearly half of the affected children do not respond adequately.
This thesis aims to identify diagnostic features that distinguish steroid non-responsive patients
using high-dimensional clinical data collected post-transplant. A dataset of 607 pediatric alloHCT patients, including 266 with aGvHD, of whom 41 were non-responders, was analyzed. After
preprocessing, feature selection was performed using mutual information, followed by modeling
with Random Forest and XGBoost classifiers. Results showed that Random Forest achieved high
accuracy in identifying treatment responsiveness (AUC = 0.85), outperforming XGBoost. Key
predictive features included reticulocyte fractions, albumin, and bilirubin levels. These findings
suggest that machine learning models can effectively support early risk stratification and personalized treatment strategies in pediatric aGvHD care. Limitations include class imbalance and the
need for external validation. Future work should focus on prospective studies and time-to-event
modeling to enable clinical applicability.