Machine Learning Analysis of Gut Microbiome Profiles in Infants from Different Feeding Practices and Their Association with Health Outcomes
Summary
Breast milk is widely recognized for its critical role in infant health, with emerging evidence
indicating that early nutrition shapes the infant gut microbiome, immune system development, and
long-term health. Feeding patterns, particularly the choice between human milk and formula,
significantly impact the infant microbiota. In this study, we applied machine learning-based feature
selection algorithms to identify taxonomic differences between breast milk and formula milk and
their relationship to infant health. Using three microbiome datasets, we identified sixteen key
bacterial taxa in the discovery dataset PRJNA633365. At the genus level, these include four
Bifidobacterium species, two Clostridium sensu stricto species, Streptococcus, Dialister,
Dysgonomonas, and Clostridium innocuum. At the family level, Enterobacteriaceae,
Clostridiaceae, and Peptostreptococcaceae were identified, while Lactobacillales was observed
at the order level and Firmicutes at the phylum level. In the testing datasets PRJDB7295 and
PRJNA562650, thirteen and twelve of sixteen taxa, respectively, were confirmed. The MLP
algorithm achieved the best performance, with AUCs of 0.93 for PRJNA633365 and 0.92 for
PRJNA562650, while the AdaBoost algorithm achieved an AUC of 0.69 for PRJDB7295. The
heatmap plot shows differential abundance between breastfed group and formula-fed group. In
breastfed group, Bifidobacterium and Dialister are reported as beneficial microbes against allergic
disease, such as food allergy and atopic dermatitis. Future work will expand this analysis to
additional datasets and further explore the link between the infant microbiome and health outcomes
in relation to breast milk and formula.