View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        One standard error cross validation criterion for tuning a classification prediction model

        Thumbnail
        View/Open
        Paper_ZF.pdf (2.131Mb)
        Publication date
        2023
        Author
        Fang, Zicheng
        Metadata
        Show full item record
        Summary
        Abstract Introduction The one standard error (1 SE) criterion has been widely used for tuning hyperparameters (for example, the penalty term in L1 and L2 regularization) in cross validation (CV). Although it is common for clinical classification prediction model building, the performance of this criterion in terms of these following aspects remained to be investigated: (1) estimating standard error of CV error, (2) variable selection, (3) out-of-sample prediction error. Methods I conducted a full-factorial simulation study to evaluate the performance of the 1 SE criterion in logistic models in three aspects: accuracy of estimating standard error of CV error, the impact on variable selection, and prediction error performance. The parameters varied in this simulation were: data dimensionality, proportion of noise predictors, covariance between predictors and number of CV folds. A polynomial regression metamodel was developed to predict the true standard error of CV error, where a piece-wise metamodel was built in an iterative manner, evaluating R2, CV error, and p-values of simulation parameter coefficients to arrive at the best fitting metamodel. The metamodel was externally validated in test datasets which contained different values of simulation parameters and corresponding true standard errors of CV error. The values of simulation parameters in test datasets were randomly selected from different ranges. Results Results for the three aspects of performance of the 1 SE criterion in logistic models were: (1) the 1 SE criterion generally underestimated the 1 SE of CV error; (2) the 1 SE criterion in lasso selected less true predictors compared to models with the lowest CV error; and (3) models developed using the 1 SE criterion generally had worse out-of-sample prediction error than models with the lowest CV error, while it performed well in data with decaying coefficient structure and correlated predictors. The metamodel consisted of 4 pieces: a constant coefficient structure and independent predictors, a decaying coefficient structure and independent predictors, a constant coefficient structure and correlated predictors, a decaying coefficient structure and correlated predictors. The out-of-sample mean absolute percentage errors (MAPEs) for the 4 pieces of metamodel were 0.57%, 0.59%, 1.1%, 1.3%. The R2s on test datasets were 0.9955, 0.9969, 0.9886, 0.9895 for the 4 parts, respectively. Discussion Based on the three simulation experiments, I recommend the application of 1 SE criterion in data with high dimensionality, correlated predictors, small predictor coefficients values or a decaying coefficient structure. I estimated a metamodel to predict true standard error of CV error, but further theoretical and simulation studies are needed to explain the non-linear effects of data dimensionality on true standard error of CV error.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/44206
        Collections
        • Theses

        Related items

        Showing items related by title, author, creator and subject.

        • Ontsnappen aan error theory? Een onderzoek naar de onderbouwing voor de conceptuele stelling van error theory 

          Volberda, H.W. (2019)
          Error theory stelt dat alle morele oordelen onwaar zijn. De theorie baseert zich hierbij op een conceptuele en een ontologische stelling. De conceptuele stelling houdt in dat morele oordelen verwijzen naar objectieve en ...
        • Phonological errors in fluent and non-fluent aphasia: A literature research comparing the phonological speech production errors of people with fluent and non-fluent aphasia 

          Leerdam, A.S. van (2011)
        • Interlingual and developmental errors in the spoken production of young Dutch learners of English: an error analysis 

          Verploegen, Ymke (2022)
        Utrecht university logo