One standard error cross validation criterion for tuning a classification prediction model

Fang, Zicheng

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Luijken, Kim
dc.contributor.author	Fang, Zicheng
dc.date.accessioned	2023-07-20T00:01:00Z
dc.date.available	2023-07-20T00:01:00Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/44206
dc.description.abstract	Abstract Introduction The one standard error (1 SE) criterion has been widely used for tuning hyperparameters (for example, the penalty term in L1 and L2 regularization) in cross validation (CV). Although it is common for clinical classification prediction model building, the performance of this criterion in terms of these following aspects remained to be investigated: (1) estimating standard error of CV error, (2) variable selection, (3) out-of-sample prediction error. Methods I conducted a full-factorial simulation study to evaluate the performance of the 1 SE criterion in logistic models in three aspects: accuracy of estimating standard error of CV error, the impact on variable selection, and prediction error performance. The parameters varied in this simulation were: data dimensionality, proportion of noise predictors, covariance between predictors and number of CV folds. A polynomial regression metamodel was developed to predict the true standard error of CV error, where a piece-wise metamodel was built in an iterative manner, evaluating R2, CV error, and p-values of simulation parameter coefficients to arrive at the best fitting metamodel. The metamodel was externally validated in test datasets which contained different values of simulation parameters and corresponding true standard errors of CV error. The values of simulation parameters in test datasets were randomly selected from different ranges. Results Results for the three aspects of performance of the 1 SE criterion in logistic models were: (1) the 1 SE criterion generally underestimated the 1 SE of CV error; (2) the 1 SE criterion in lasso selected less true predictors compared to models with the lowest CV error; and (3) models developed using the 1 SE criterion generally had worse out-of-sample prediction error than models with the lowest CV error, while it performed well in data with decaying coefficient structure and correlated predictors. The metamodel consisted of 4 pieces: a constant coefficient structure and independent predictors, a decaying coefficient structure and independent predictors, a constant coefficient structure and correlated predictors, a decaying coefficient structure and correlated predictors. The out-of-sample mean absolute percentage errors (MAPEs) for the 4 pieces of metamodel were 0.57%, 0.59%, 1.1%, 1.3%. The R2s on test datasets were 0.9955, 0.9969, 0.9886, 0.9895 for the 4 parts, respectively. Discussion Based on the three simulation experiments, I recommend the application of 1 SE criterion in data with high dimensionality, correlated predictors, small predictor coefficients values or a decaying coefficient structure. I estimated a metamodel to predict true standard error of CV error, but further theoretical and simulation studies are needed to explain the non-linear effects of data dimensionality on true standard error of CV error.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	This thesis aims at estimating the performance of one standard error cross validation criterion (1 SE criterion) for tuning a logistic regression model. This study includes Monte Carlo studies and polynomial meta-model building to investigate 1 SE criterion's performance in: 1) estimating standard error of cross validation error, 2) variable selection, 3) out-of-sample prediction error estimation, 4) feasibility of predicting true standard error of cross validation error.
dc.title	One standard error cross validation criterion for tuning a classification prediction model
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.courseuu	Epidemiology
dc.thesis.id	19400

Files in this item

Name:: Paper_ZF.pdf
Size:: 2.131Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record

One standard error cross validation criterion for tuning a classification prediction model

Files in this item

This item appears in the following Collection(s)

Related items

Ontsnappen aan error theory? Een onderzoek naar de onderbouwing voor de conceptuele stelling van error theory ﻿

Phonological errors in fluent and non-fluent aphasia: A literature research comparing the phonological speech production errors of people with fluent and non-fluent aphasia ﻿

Interlingual and developmental errors in the spoken production of young Dutch learners of English: an error analysis ﻿

Ontsnappen aan error theory? Een onderzoek naar de onderbouwing voor de conceptuele stelling van error theory

Phonological errors in fluent and non-fluent aphasia: A literature research comparing the phonological speech production errors of people with fluent and non-fluent aphasia

Interlingual and developmental errors in the spoken production of young Dutch learners of English: an error analysis