Probabilistic thunderstorm forecasts using statistical post-processing:Comparison of logistic regression and quantile regression forests and aninvestigation of physical predictors
Summary
Probabilities of thunderstorm occurrence and conditional probabilities of lightning intensity over The Netherlands areforecast using statistical post-processing with predictors derived from the operational non-hydrostatic numerical weatherprediction model Harmonie,  at lead times up to 45 hours.  Quantile regression forests (QRF) is compared with logisticregression  (LR)  for  thunderstorm  occurrence  forecasts  and  with  extended  LR  for  lightning  intensity  forecasts.   Usingdifferent sets of predictors that these statistical methods may select,  it is demonstrated that pre-selection of predictorsbased on physical understanding and simultaneously exploiting QRF as machine learning tool can help improving statisticalpost-processing models.  QRF is demonstrated to be beneficial for the predictions, with more skillful forecasts than LR forthunderstorm occurrence.  Lightning intensity predictions are influenced by inhomogeneity of lightning detection datasets;despite  inhomogeneity,  skillful  predictions  can  be  made  with  both  extended  LR  and  QRF.  The  regional  maximum  ofModified Jefferson index and most unstable CAPE are found as best thunderstorm occurrence predictors and the regionalminimum of Bradbury index and maximum of K-index emerge as best for lightning intensity.  Neither most unstable CAPEnor microphysical predictors (graupel, snow) are essential for thunderstorm occurrence prediction.
