Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKenna, Kevin
dc.contributor.authorBallieux, Rutger
dc.date.accessioned2025-02-01T00:01:57Z
dc.date.available2025-02-01T00:01:57Z
dc.date.issued2025
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/48441
dc.description.abstractAbstract Background: Neurodegenerative diseases like Alzheimer's disease (AD) and Amyotrophic Lateral Sclerosis (ALS) pose significant public health challenges due to their complex aetiologies involving genetic, environmental, and biological factors. Single-cell RNA sequencing (scRNA-seq) enables detailed analysis of cellular heterogeneity in these diseases. However, the high dimensionality and sparsity of scRNA-seq data complicate the classification of diseased versus healthy cells, necessitating systematic evaluation of machine learning models and feature engineering strategies. Methods: We analysed scRNA-seq datasets from the dorsolateral prefrontal cortex of AD patients and controls, and from the primary motor cortex of C9orf72-associated ALS (C9ALS) patients, sporadic ALS (SALS) patients, and shared controls. Logistic Regression and Random Forest classifiers were trained to distinguish diseased from healthy cells using various feature extraction methods: random feature selection, dimensionality reduction (Most Variable Features and Principal Component Analysis), and a biologically focussed approach combining Differential Expression (DE) analysis and Weighted Gene Co-expression Network Analysis (WGCNA). Five-fold cross-validation ensured robust evaluation. Results: Classification accuracy was exceptionally high across all datasets. The biologically focussed method achieved the highest performance, with Logistic Regression attaining peak test AUCs of 0.980 in SALS and 0.976 in C9ALS. Dimensionality reduction was also effective, particularly in AD, where fewer significant features limited the biologically focussed method. Classifiers identified 104 shared genes, among AD, C9ALS, and SALS, implicated in neurodegeneration. Pathway enrichment analysis of these genes highlighted associations with neurodegenerative pathways, mitochondrial dysfunction, and synaptic processes. Machine learning classifiers identified additional critical genes and pathways beyond those detected by DE analysis alone. Conclusions: Accurate classification of single-cell transcriptomic data in AD and ALS is feasible, with performance significantly influenced by feature extraction methods. Biologically focussed approaches and dimensionality reduction techniques enhanced classifier accuracy and identified key transcriptomic features distinguishing diseased and healthy cells. These findings deepen our understanding of molecular mechanisms in neurodegenerative diseases and may inform the development of novel diagnostic and therapeutic strategies. Further validation and functional studies are needed to translate these insights into clinical applications.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis study applies machine learning to classify single cells from Alzheimer’s and ALS patients using scRNA-seq data. Logistic Regression and Random Forest models, with feature extraction methods like DE analysis and WGCNA, achieved high accuracy (AUC ~0.98). A biologically focused approach identified 104 shared neurodegeneration-related genes and key pathways. These results enhance understanding of AD and ALS mechanisms and may aid in developing diagnostics and therapies.
dc.titleDecoding Neurodegeneration: Leveraging Machine Learning Approaches to Classify Single Cells and Identify Transcriptomic Features in Alzheimer’s Disease and ALS
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuBioinformatics and Biocomplexity
dc.thesis.id42596


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record