View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Feature selection for biomarker discovery

        Thumbnail
        View/Open
        Master_Thesis_Kristof_Fellegi_Reviewed.pdf (1.338Mb)
        Publication date
        2018
        Author
        Fellegi, K.
        Metadata
        Show full item record
        Summary
        This research is a comparative study of feature selection methods for biomarker discovery. 10 different machine learning techniques were considered for feature selection. The main assumption behind the research was that certain biomarkers can reflect the perceived strenuousness of the different exercise levels. For measuring the perceived exercise intensity, the Borg scale was used. Using the top 10 most expressive biomarkers selected by each model, 39 different biomarkers were selected out of the total 64. The most frequently occurred one was "factord" selected by 7 models. Biomarkers "trp" and "CORT" were both selected by 6 of the models. "ifabp", "LEUCO" and "BICARB" were selected by 5 of the models. In general, the predictive power of the applied machine learning techniques do not vary much. The highest accuracy, 78% was achieved by Logistic Regression. Regarding the area under the ROC curve, the best result was achieved using the full logistic regression model with an AUC = 0.72. Applying feature selection however, a better performance can be achieved compared to the models with all the predictors. Recursive feature elimination on the random forest model yielded an 81% accuracy and the Lasso on logistic regression yielded an even higher 84% accuracy. All in all, considering the criteria for selecting candidate models, Logistic regression represents a balanced mix of model performance and interpretability.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/31758
        Collections
        • Theses
        Utrecht university logo