Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorLuijken, Kim
dc.contributor.authorFang, Zicheng
dc.date.accessioned2023-07-20T00:01:00Z
dc.date.available2023-07-20T00:01:00Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/44206
dc.description.abstractAbstract Introduction The one standard error (1 SE) criterion has been widely used for tuning hyperparameters (for example, the penalty term in L1 and L2 regularization) in cross validation (CV). Although it is common for clinical classification prediction model building, the performance of this criterion in terms of these following aspects remained to be investigated: (1) estimating standard error of CV error, (2) variable selection, (3) out-of-sample prediction error. Methods I conducted a full-factorial simulation study to evaluate the performance of the 1 SE criterion in logistic models in three aspects: accuracy of estimating standard error of CV error, the impact on variable selection, and prediction error performance. The parameters varied in this simulation were: data dimensionality, proportion of noise predictors, covariance between predictors and number of CV folds. A polynomial regression metamodel was developed to predict the true standard error of CV error, where a piece-wise metamodel was built in an iterative manner, evaluating R2, CV error, and p-values of simulation parameter coefficients to arrive at the best fitting metamodel. The metamodel was externally validated in test datasets which contained different values of simulation parameters and corresponding true standard errors of CV error. The values of simulation parameters in test datasets were randomly selected from different ranges. Results Results for the three aspects of performance of the 1 SE criterion in logistic models were: (1) the 1 SE criterion generally underestimated the 1 SE of CV error; (2) the 1 SE criterion in lasso selected less true predictors compared to models with the lowest CV error; and (3) models developed using the 1 SE criterion generally had worse out-of-sample prediction error than models with the lowest CV error, while it performed well in data with decaying coefficient structure and correlated predictors. The metamodel consisted of 4 pieces: a constant coefficient structure and independent predictors, a decaying coefficient structure and independent predictors, a constant coefficient structure and correlated predictors, a decaying coefficient structure and correlated predictors. The out-of-sample mean absolute percentage errors (MAPEs) for the 4 pieces of metamodel were 0.57%, 0.59%, 1.1%, 1.3%. The R2s on test datasets were 0.9955, 0.9969, 0.9886, 0.9895 for the 4 parts, respectively. Discussion Based on the three simulation experiments, I recommend the application of 1 SE criterion in data with high dimensionality, correlated predictors, small predictor coefficients values or a decaying coefficient structure. I estimated a metamodel to predict true standard error of CV error, but further theoretical and simulation studies are needed to explain the non-linear effects of data dimensionality on true standard error of CV error.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis thesis aims at estimating the performance of one standard error cross validation criterion (1 SE criterion) for tuning a logistic regression model. This study includes Monte Carlo studies and polynomial meta-model building to investigate 1 SE criterion's performance in: 1) estimating standard error of cross validation error, 2) variable selection, 3) out-of-sample prediction error estimation, 4) feasibility of predicting true standard error of cross validation error.
dc.titleOne standard error cross validation criterion for tuning a classification prediction model
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuEpidemiology
dc.thesis.id19400


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record