Using deep learning to predict children’s age and risk of dyslexia from the event related potential
Summary
This thesis uses electroencephalography (EEG) data to predict the age and risk of developmental dyslexia of young children. It is useful to diagnose dyslexia at a young age to conduct interventions that reduce reading and writing difficulties later in life. The ePodium and the Dutch Dyslexia Project (DDP) dataset are used. Both these
datasets use the auditory oddball paradigm to elicit a standard and deviant Event Related Potential (ERP). The EEG data is pre-processed with the autoreject library, which removes many artifacts in the data. The cleaned trials around an auditory event are averaged to create ERPs. These ERPs are used by deep learning models to predict the age and risk of dyslexia from patterns within the data. The results of a previous master thesis affiliated with the ePodium project are reproduced. The thesis trained a deep learning model that found a correlation between age and the standard ERP signals of children in the DDP dataset. Reproducing the results confirms that the encoder model is the state-of-the-art model on age prediction from ERPs. Trained models can already make reasonable age predictions from a small subset of the total standard trials within an experiment. Also, adding a significant amount of Gaussian noise to each ERP signal does not significantly alter the performance of the models. These observations indicate that the models base their predictions on the global pattern of the ERP and not from local voltage differences in the millisecond range. Transfer learning between datasets is possible as models trained on the DDP dataset found a correlation between the ERPs from the standard event and age within the ePodium dataset despite the differences between the two datasets. There was a difference in results between models that used standardised and non-standardised ERPs. The encoder model was unable to find patterns for predicting age and dyslexia from the ePodium dataset. This dataset may be too small for deep learning to make predictions. Some solutions to this problem can be to use more data-efficient methods like timefrequency analysis on raw EEG data or to create simulated data to artificially increase the size of the dataset.