Processing math: 53%

Machine Learning-Based Temperature and Wind Forecasts in the Zhangjiakou Competition Zone during the Beijing 2022 Winter Olympic Games

基于机器学习的北京2022年冬奥会张家口赛区温度和风速客观预报

+ Author Affiliations + Find other works by these authors
Funds: 
Supported by the National Key Research and Development Program of China (2018YDD0300104), Key Research and Development Program of Hebei Province of China (21375404D), and After-Action-Review Project of China Meteorological Administration (FPZJ2023-014).

PDF

  • Weather forecasting for the Zhangjiakou competition zone of the Beijing 2022 Winter Olympic Games is a challenging task due to its complex terrain. Numerical weather prediction models generally perform poorly for cold air pools and winds over complex terrains, due to their low spatiotemporal resolution and limitations in the description of dynamics, thermodynamics, and microphysics in mountainous areas. This study proposes an ensemble-learning model, named ENSL, for surface temperature and wind forecasts at the venues of the Zhangjiakou competition zone, by integrating five individual models—linear regression, random forest, gradient boosting decision tree, support vector machine, and artificial neural network (ANN), with a ridge regression as meta model. The ENSL employs predictors from the high-resolution ECMWF model forecast (ECMWF-HRES) data and topography data, and targets from automatic weather station observations. Four categories of predictors (synoptic-pattern related fields, surface element fields, terrain, and temporal features) are fed into ENSL. The results demonstrate that ENSL achieves better performance and generalization than individual models. The root-mean-square error (RMSE) for the temperature and wind speed predictions is reduced by 48.2% and 28.5%, respectively, relative to ECMWF-HRES. For the gust speed, the performance of ENSL is consistent with ANN (best individual model) in the whole dataset, whereas ENSL outperforms on extreme gust samples (42.7% compared with 38.7% obtained by ECMWF-HRES in terms of RMSE reduction). Sensitivity analysis of predictors in the four categories shows that ENSL fits their feature importance rankings and physical explanations effectively.

    北京2022年冬奥会张家口赛区地形复杂,气象要素预报面临极大的挑战。由于时空分辨率不足,数值模式产品对于复杂地形下频发的冷池现象及山地风的预报效果较差。本文提出一种集成学习模型(ENSL),生成张家口赛区场馆温度、平均风和阵风的精细化客观预报产品。该模型应用线性回归(LR)、随机森林(RF)、梯度提升树(GBDT)、支持向量机(SVM)、神经网络(ANN)5种个体机器学习模型,使用岭回归(ridge regression)作为组合器进行模型集成。ENSL应用欧洲中心细网格模式预报产品(ECMWF-HRES)、自动站及地形资料,将预报因子分成4类,包括:形势场、要素场、地形和时间特征。结果表明,ENSL相比于个体机器学习模型具有更好的预报性能和更强的泛化性,对温度和平均风预报的均方根误差与ECMWF-HRES相比分别提高48.2%和28.5%。对于阵风预报,ENSL整体预报效果与ANN基本持平,但是对于极端样本ENSL的预报效果优于ANN(相对于ECMWF-HRES分别提升42.7%和38.7%)。总之,ENSL集成了个体机器学习器的优点,具有更好的订正效果;对于复杂地形下模式不能报出的冷池现象及误差较大的阵风有较好的订正效果。特征敏感性实验表明,集成学习模型ENSL较好拟合了4类预报因子的不同物理意义,具有一定的可解释性。

  • The Beijing 2022 Winter Olympics held Nordic skiing and ski-jumping events in the Zhangjiakou competition zone. Weather is a key factor determining the success of the Winter Games. Accurate and reliable near-surface meteorological predictions of variables like temperature and wind speed are crucial for event scheduling, course preparation, and athlete safety (Chen et al., 2018). Nevertheless, forecasting in the Zhangjiakou competition zone, with its complex terrain, presents significant challenges.

    Jia et al. (2019), Liu et al. (2020), Fu et al. (2021), Li et al. (2022), and Wang et al. (2022) analyzed the temperature and wind in the Zhangjiakou competition zone from three perspectives (temporal and spatial distribution patterns, formation mechanisms, and forecast targets) and found the following: (1) Nighttime temperatures in the Zhangjiakou competition zone are frequently affected by cold air pools, and are difficult to forecast. (2) Winds, particularly gusts, exhibit intricate evolution due to the complex mountainous terrain and are likewise challenging to forecast. Furthermore, the low spatiotemporal resolutions and limitations in describing the dynamics, thermodynamics, and microphysics within moun-tainous areas inherent in numerical weather prediction (NWP) models contribute to their persistently low forecast accuracy for temperature and wind over complex terrain (Reeves and Stensrud, 2009; Smith et al., 2010).

    Application of machine learning (ML) approaches is gaining significant momentum in the earth system sciences. The potential applicability of ML approaches extends to all aspects of the NWP workflow, including observations, data assimilation, forecast models, and product generation (Bonavita et al., 2021). Numerous studies have employed ML techniques as a post-processing method for NWP models to improve the forecast accuracy. The method takes advantage of both the physical simulation (NWP models) and data-driven (ML models) approaches to provide more accurate objective forecast products (Xia et al., 2020).

    Linear regression (LR) models, the most widely used technique in model output statistics, are effective in correcting systematic biases of NWP models (Li et al., 2019). Random Forest (RF) and Gradient Boosting Decision Tree (GBDT) are decision tree (DT)—based ensemble-learning models that use bagging and boosting ensemble strategies, respectively. Sun et al. (2019) and Goutham et al. (2021) improved temperature and wind forecasting performances from the outputs of NWP models by using RF models. Ren et al. (2020), Zhou et al. (2022), and Fang et al. (2023) employed GBDT models to produce wind forecasts. Artificial neural networks (ANNs), support vector machines (SVMs), and deep learning (DL) have likewise been used in the statistical post-processing of NWP models (Men et al., 2019; Zhang et al., 2022). These research achievements show that ML algorithms have certain advantages over conventional methods. Furthermore, different ML models exhibit different forecasting performances in identical scenarios due to their distinct model structures (Zhang and Ye, 2021).

    Ensemble forecasts from multiple NWP models have been extensively employed in NWP post-processing, with numerous studies demonstrating their superiority over forecasts from individual NWP models (Fritsch et al., 2000; Woodcock and Engel, 2005; Mass et al., 2008). Allard et al. (2012), Gneiting and Ranjan (2013), and Baran and Lerch (2016), and Zamo et al. (2021), developed probabilistic forecasts issued as cumulative distribution functions, calculated from both raw and post-processed ensembles. Sequential aggregation, a technique that dynamically weights NWP models based on their evolving forecast performance, has been applied to integrate various types of forecasts (Cesa-Bianchi and Lugosi, 2006; Mallet et al., 2009).

    Similarly, numerous studies have established ensemble forecasts generated by multiple ML models, effectively integrating their advantages and improving the accuracy of forecasting products. Cho et al. (2020) established an ensemble model composed of RF, SVM, and ANN models by using the simple equal-weight averaging method to produce day-ahead maximum and minimum temperature forecasts, proving better performance than those generated by the individual models. Cho et al. (2022) developed an ensemble method to improve the performance and generalization by combining multivariate LR, SVM, convolutional neural network, and recurrent neural network (RNN) models based on the forecast skill score. Chen et al. (2020) created a stacking ensemble model by using ANN and RNN models to correct NWP model forecasts for station temperatures. In terms of its correction capacity, ensemble models are particularly effective for regions where NWP models involve large forecasting errors, as well as peaks in weather variables (e.g., temperature and wind).

    An ensemble-learning algorithm, named ENSL, is therefore proposed in this study for forecasting temperature, gust speed, and wind speed in the Zhangjiakou competition zone, through the integration of five individual ML models. The primary focus is to downscale the NWP model results based on the terrain features, thereby improving the forecast accuracy for meteorological variables at the competition venues. The main objectives of this study are as follows: (1) to establish five ML correction models and evaluate their performance, (2) to improve the forecast accuracy through retraining outputs from the five individual models based on ENSL, and (3) to analyze the sensitivity of predictors and weighting coefficients of individual models in ENSL, and explain its effectiveness and limitations.

    The high-resolution ECMWF model (ECMWF-HRES) forecast data are employed in this study. ECMWF-HRES is initialized twice a day at 0800 and 2000 Beijing Time (BT) and it generates surface and atmospheric pressure level products. The surface products utilized in this study possess a spatial resolution of 0.125° × 0.125° and a temporal resolution of 3 h during 12–72 h and 6 h during 72–84 h. The atmospheric pressure level products cover 5 levels (500, 700, 850, 925, and 950 hPa) with a spatial resolution of 0.25° × 0.25° and the same temporal resolution as the surface products. The spatial resolution of the terrain data in ECMWF-HRES is consistent with that (0.125° × 0.125°) of the surface products.

    This study utilizes hourly observations of temperature, wind speed, and gust speed from automatic weather stations (AWSs) within the Zhangjiakou competition zone. The data are from the China Meteorological Administration Data as a Service (CMADaaS) of Hebei Province, China. Nine observation stations (Fig. 1) are selected as experimental stations—namely, Genting stations 1–6 (termed G1–6) within the Genting venue cluster, along with stations 2 and 3 in the cross-country skiing venue (termed CCS2 and CCS3), and station 1 in the Biathlon venue (termed BIA1) within the Guyangshu venue cluster, based on the Beijing 2022 Winter Olympics Forecast Demonstration Project (SMART2022-FDP)––Sciences of Meteorology using Artificial-intelligence in Research and Technology (Chen et al., 2021). These stations are representative in terms of their spatial location and they provide available data in continuous time series.

    Fig  1.  Terrain and distribution of AWSs in the Zhangjiakou competition zone: (a) the Zhangjiakou competition zone (black triangles indicate AWS locations) and (b) location of the Zhangjiakou competition zone within Hebei Province, China. CRH: China Railway High-speed.

    The two venue clusters in the Zhangjiakou competition zone are located over 40.9°–41°N, 115.4°–115.5°E spanning approximately 10 km both from north to south and from east to west (Fig. 1). The coarse spatial resolution of ECMWF-HRES forecasts precludes an accurate depiction of different patterns of meteorological variables between adjacent stations. Similarly, the spatial resolution of the model terrain makes it impossible to achieve an accurate description of the complex terrain features in the Zhangjiakou competition zone (Fig. 1a).

    The terrain data used in this study consist of geographic information (i.e., longitude, latitude, and elevation information) at each point of observation, and the underlying surface conditions [e.g., slope, aspect, observation location (on a slope or in a valley), and direction of the mountain range] surrounding each point of observation. The slope and aspect (Florinsky, 2017) data were calculated based on the Advanced Land Observing Satellite digital elevation model data, with 12.5-m spatial resolution, from the Japan Aerospace Exploration Agency, for the Zhangjiakou competition zone. The location of each observation (on a slope or in a valley) and the direction of the mountain range are manually labeled. Table 1 summarizes the terrain data for each point of observation used in this study.

    Table  1.  Terrain data for the observation sites
    Station nameModel elevation (m)Actual elevation (m)Slope (°)Aspect (°)Underlying surface condition
    G11640.581923.440.68207.47Slope (northern)
    G21641.901853.960.36203.96Slope (northern)
    G31648.872075.851.0765.22Slope (northern)
    G41646.532012.139.118.13Slope (southern)
    G51645.801949.455.9430.96Slope (northern)
    G61644.261886.837.2546.02Slope (southern)
    BIA11613.781650.252.04175.91Valley (east–west)
    CCS21615.241687.570.80330.07Slope (southern)
    CCS31619.731622.863.127.13Valley (east–west)
     | Show Table
    DownLoad: CSV

    Based on the forecast targets of surface temperature and wind, four categories of predictors are selected for ML models with domain knowledge, including synoptic-pattern related fields, surface element fields, terrain features, and temporal features. The synoptic-pattern fields and surface element fields are extracted from the NWP model. The synoptic-pattern fields, including physical variables [e.g., temperature, zonal/meridional (U/V) wind component, relative humidity (RH), and 24-h temperature variation] at different pressure levels, describe large-scale synoptic systems. These features convey information about the strength of synoptic systems and the intensity of cold air, which affects the formation and development of cold air pools and mountain–valley winds.

    The surface element fields (e.g., the U/V wind component at 10 and 100 m, 10-m gust speed, and 2-m temperature) are direct outputs of the NWP model with higher spatial resolution. Terrain features are derived from refined terrain data and the geographic information of AWSs including longitude, latitude, elevation, slope, and aspect. They are able to accurately characterize the location, elevation, and underlying surface conditions of each AWS. In addition, they can reflect the refined terrain features of the mountains in the Zhangjiakou competition zone. Temporal features consist of the forecast valid and lead times of the NWP models, conveying information about the diurnal variation of each meteorological variable and forecast errors for different lead times. Table 2 summarizes the predictors for temperature, gust speed, and wind speed used in the ML models.

    Table  2.  Predictors used in the ML models
    Category Predictor Pressure level (hPa) Unit
    Temperature predictor Synoptic pattern field 24-h temperature variation 850 °C
    RH 500/700/850 %
    U component of wind 850 m s−1
    V component of wind 850 m s−1
    Temperature 500/700/850/925/950 °C
    Surface element field 2-m temperature °C
    Maximum 2-m temperature °C
    Minimum 2-m temperature °C
    Terrain feature Longitude, latitude °
    Elevation, model terrain elevation m
    Slope, aspect °
    Terrain (slope/valley), direction of the mountain range
    Temporal feature Valid time, lead time h
    Wind/gust speed predictor Synoptic pattern field 24-h temperature variation 850 °C
    U wind component 500/700/850 m s−1
    V wind component 500/700/850 m s−1
    Surface element field 10-/100-m U wind component m s−1
    10-/100-m V wind component m s−1
    10-m gust m s−1
    Terrain feature Longitude, latitude °
    Elevation, model terrain elevation m
    Slope, aspect °
    Terrain (slope/valley), direction of the mountain range
    Temporal feature Valid time, lead time h
     | Show Table
    DownLoad: CSV

    The ML dataset in this study is constructed based on the predictors by feature engineering (Section 2.4) along with ground-truth observations as labels. ECMWF-HRES products are interpolated to each observation by using a bilinear interpolation algorithm, with their valid forecast times being matched with the observation times. The ML correction forecasts are initiated at 0800 and 2000 BT every day, consistent with ECMWF-HRES. Considering delays in the assimilation of the NWP model and data transmission, the forecast lead times are between 12 and 84 h, and the temporal resolution of ML data is the same as that of ECMWF-HRES. Lead times of 12–36, 36–60, and 60–84 h represent the forecasts for Days 1, 2, and 3, respectively.

    Data from November through March of the following year, spanning 2018–2020 (excluding February 2020), are selected in this study to form a training dataset. Data from February 2020 are used as the verification dataset to identify the most accurate model for February, as the Beijing 2022 Winter Olympics were held in that month. Data during 3–20 February 2022 are used as the testing dataset to evaluate the practical forecast performance for the Beijing 2022 Winter Olympics. The sample sizes of the training, verification, and testing datasets are prov-ided in Table 3.

    Table  3.  Sample sizes of the training, verification, and testing datasets
    Training datasetVerification datasetTesting dataset
    Temperature84,42193287623
    Gust speed78,91391387623
    Wind speed78,92991867623
     | Show Table
    DownLoad: CSV

    Linear regression (LR) is a statistical method commonly used in post-processing due to its ability to efficiently handle linear relationships. LR models combine m features through the following equation: f(x)=ω1x1+ω2x2++ωmxm+b, where ω=ω1, ω2, , ωm denotes the weight of each feature. However, LR models are susceptible to overfitting in the presence of a large number of sample features and a small sample size (Zhou, 2016). LASSO regression and ridge regression reduce the risk of model overfitting by introducing a regularization term. They represent more realistic regression methods at the expense of information loss and reduced accuracy (Men et al., 2019; Sun et al., 2019).

    A first-order norm is used as the regularization term in LASSO regression with a loss function, expressed as follows: loss(ω)=||f(x)y||2+α||ω||. A second-order norm is used as the regularization term in ridge regression, with a loss function expressed as follows: loss(ω)=||f(x)y||2+α||ω||2. In both loss functions, y is the label. LR models are simply interpretable ML models, as their weighting coefficients can indicate the importance of the predictors. In this study, LASSO regression is used as one of the individual learners, while ridge regression is used as the meta learner. Therefore, m in LASSO regression is the number of features used to generate temperature and wind forecasts, whereas m in ridge regression is the number of individual models.

    RF is a bagging-based ensemble-learning algorithm with DTs as individual learners, which randomly selects features during the DT training process. RF has low computational cost and exhibits good forecasting performance in model post-processing (Li et al., 2019; Men et al., 2019). In an RF regression problem, each individual learner is a regression DT that searches for the optimal splitting feature and node, such that

    minj,aj[minc1xϵR1(j,aj)(yc1)2+minc2xϵR2(j,aj)(yc2)2], (1)

    where R1(j,aj)={x|xj<aj} and R2(j,aj)={x|xj. The output c_m=E\left[y|x\ \epsilon\ R_m\right] , where \mathit{\mathrm{\mathit{m}}}=1,\mathrm{ }2 . The above steps are repeated to obtain the result of a single regression DT. The final prediction offered by an RF is the average prediction given by all DTs. In this study, the number of trees associated with the RF models is selected as 100. To achieve good results, the maximum depth of each tree is set as none in combination with the minimum number of samples required to split the internal node as 2, because the sample sizes and numbers of features in the datasets are not excessively large.

    Like RF, GBDT is an ensemble-learning algorithm of multiple DTs. However, a boosting strategy is employed to improve the performance through sequential weak learners. In GBDT, a gradient histogram of features is established to calculate the optimal splitting point. The tree nodes are split based on the optimal splitting point to start iterations (Jiang et al., 2019). GBDT creates an additive ensemble, where the kth DT is fitted to the residual from the first (k − 1) DTs (Lagerquist et al., 2017). This study utilizes LightGBM, a distributed GBDT algorithm developed by Microsoft (Ke et al., 2017). LightGBM offers several advantages over other GBDT algorithms, including high computational speed, low memory consumption, and good generalization ability. These benefits stem from the implementation of two innovative sampling algorithms—namely, Exclusive Feature Bundling (EFB) and Gradient-based One-Side Sampling (GOSS). EFB reduces the number of data features by bundling mutually exclusive features. GOSS eliminates most samples with small gradients (Liu et al., 2021). In this study, the maximum number of trees is selected as 100, and the maximum depth of each tree is not constrained.

    With the basics of a binary classification model, SVM can also be used to solve multi-class classification and regression problems. SVM fundamentally maps the samples to a high-dimensional feature space by using a kernel function, enabling a linear learner to tackle nonlinear classification and regression problems. Moreover, the SVM strives to determine the optimal solution within this feature space. There are numerous kernel functions for SVMs. In this study, the radial basis function is selected as the kernel to map the feature vectors to a high-dimensional space (Mercer et al., 2008; Han et al., 2017; Yang et al., 2018).

    ANNs construct in multiple layers of interconnected neurons, organized sequentially. Information flows in one direction, from the input layer to the hidden layers and then to the output layer. Each neuron in the network receives inputs from the previous layer and applies a weighted sum and an activation function to produce an output (Flora et al., 2024). This process allows the ANN to attempt to fit intricate patterns and relationships within the dataset. An ANN with multiple hidden layers constitutes a typical DL architecture (Men et al., 2019; Chen et al., 2020). In this study, we choose one hidden layer with 100 neurons, with maximum iterations of 200. Models are fitted by using the Adam optimizer, with a learning rate of 0.001.

    Ensemble learning completes learning tasks by combining multiple individual learners, which leads to a higher generalization ability than individual models. Ensemble learning can be divided into two categories: homogeneous and heterogeneous. Homogeneous ensembles (e.g., RF and GBDT models) employ the same type of individual learners (e.g., DTs). Heterogeneous ensembles consist of different types of individual learners (e.g., DTs and an ANNs) (Zhou, 2016). Hansen and Salamon (1990) noted that an ensemble learning model generally produces forecasts with higher accuracy than those produced by the best individual learner or the average of forecasts produced by all the individual learners. The stacking ensemble learning method, a heterogeneous ensemble method, is adopted in this study to combine multiple ML correction models. It consists of individual learners and a meta learner. Clarke (2003) reported greater robustness properties compared to the Bayesian averaging method.

    Ensemble diversity has a direct impact on the performance of an ensemble learning model. Hence, five individual learners of different types—an LR model (an LASSO regression model), two tree models using different ensemble strategies (an RF model and a GBDT model), a soft-margin-based model (an SVM model), and an ANN model—are selected in ENSL. All of the five individual learners have different advantages, detailed in Section 3.1. For instance, LR handles linear relationships, RF delivers low computational cost and good performance, GBDT guarantees high computational speed and good generalization ability, SVM excels in fitting nonlinear patterns, and ANN provides expertise in complex patterns and relationships in datasets. This combination enables ENSL to capture a wider range of relationships within datasets and improve the overall predictive performance. To improve the generalization ability of the ensemble learning model, relatively simple models (e.g., LR and ANN models) are often used as a meta learner. A ridge regression model is used to combine the five models into an ensemble model in ENSL.

    ENSL comprises two stages of model training (Fig. 2). In Stage 1, the five individual ML models are trained to generate five forecasts. Five-fold cross-validation is used to avoid overfitting. Individual ML models are trained on four subsets, with the last one being kept for prediction. In Stage 2, an ensemble forecasting model is obtained through retraining with the forecasts generated by the five individual ML models in Stage 1 as the input. Leave-one-out cross-validation is used to find the optimal hyperparameters in this stage. ENSL is trained by using all observations with different initial times and lead times together in order to obtain more samples in the training dataset. Also, it is implemented by using the scikit-learn Python library (https://scikit-learn.org/).

    Fig  2.  The structure of ENSL.

    Forecasts of temperature, gust speed, and wind speed are regression problems. In this study, the root-mean-square error (RMSE), bias, correlation coefficient (R), and skill score (SS) [see Eqs. (2)–(5) for their calculation] are used as metrics to evaluate model forecasts. All four of these metrics are commonly used for evaluation in regression problems and can effectively evaluate the model performance:

    \mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{\left({F}_{i}-{O}_{i}\right)}^{2}} , (2)
    \mathrm{B}\mathrm{ias}=\frac{1}{N}\sum_{i=1}^N(F_i-O_i), (3)
    R = \frac{{\sum }_{i=1}^{\mathrm{N}}({F}_{i}-\overline{F})({O}_{i}-\overline{O})}{\sqrt{{\sum }_{i=1}^{N}{({F}_{i}-\overline{F})}^{2}}\sqrt{{\sum }_{i=1}^{N}{({O}_{i}-\overline{O})}^{2}}}, (4)
    {\mathrm{SS}} = \frac{({\mathrm{RMSE}}_{\mathrm{ECMWF}{\text{-}}\mathrm{HRES}}-{\mathrm{RMSE}}_{\mathrm{ML}})}{{\mathrm{RMSE}}_{\mathrm{ECMWF}{\text{-}}\mathrm{HRES}}}\times 100\text \% , (5)

    where N is the total number of samples, Fi is the forecast for sample i, Oi is the observation of sample i, and RMSEML and RMSEECMWF-HRES are the RMSEs of the ML models and ECMWF-HRES, respectively.

    Figure 3 compares the correction models (i.e., ENSL and individual models) and ECMWF-HRES in terms of their forecast performance for temperature. The correction models improve appreciably compared to ECMWF-HRES in terms of forecast accuracy. The performance of the correction models is directly correlated with the forecast performance of the NWP model. Out of the five individual models, the LR and GBDT models represent the poorest and best performers, respectively. Integrating the advantages of the individual models, ENSL is superior to the individual models in terms of correction performance. Overall, the individual models reduce the RMSE by 36.7%–47.8% relative to ECMWF-HRES, compared to 48.2% for ENSL.

    Fig  3.  Performances of temperature forecasts generated by ML correction models and ECMWF-HRES (x-axis denotes the forecast lead time, left- and right-hand y-axes indicate the RMSE and SS, respectively; histograms and curves show RMSEs and SS for the different ML correction models, respectively).

    An analysis of the model performance (in terms of skill score) reveals that the correction performance of the tree models (i.e., RF and GBDT) remain consistent for each lead time, whereas the LR model’s performance significantly increases with the lead time. The other two nonlinear models (i.e., ANN and SVM) remain similar for the first 2 days, but decrease considerably for Day 3. ENSL emerges as the highest-performing model across all forecast time ranges, particularly on Day 1, exhibiting a remarkable 49.4% reduction in RMSE compared to ECMWF-HRES and exceeding the five individual models by 1.4%–2.6% points. The correction performance of the ENSL model is consistent with that of the GBDT model for Day 3. The analysis of the evolution with the lead time reveals that the forecasts generated by the ML models are similar to ECMWF-HRES in terms of trends, but have significantly lower RMSEs and amplitudes.

    One-day-ahead forecasts produced by each correction model or the NWP model demonstrate the highest accuracy. To evaluate the spatial and temporal performance of the correction models, these forecasts are selected to eliminate the effects of the forecast lead time. Figure 4 shows the performance of ECMWF-HRES and each correction model on Day 1. Evidently, ECMWF-HRES markedly overestimates the temperature, with a bias of 1.93°C (Fig. 4a). A better fit is found between the forecasts generated by each nonlinear correction model and the ground-truth observations, as evidenced by a slope close to 1 and an R of 0.96 (Figs. 4c–f). However, the fit is slightly worse for the linear model (Fig. 4b). The biases for the LR and RF models are −0.09 and −0.08°C, while the biases for ANN, SVM, and GBDT range from 0.1 to 0.23°C. However, the RMSEs (1.78 and 1.52°C, respectively) for LR and RF are higher than those for the ANN, SVM, and GBDT models (1.4–1.47°C). These results suggest that the LR and RF models are effective in reducing the systematic bias of the NWP model, but significantly increase the dispersion of forecast errors. Compared to the LR and RF models, the ANN, SVR, and GBDT models yield a lower randomness of forecast error, albeit with a slightly higher systematic bias. The effective combination of the advantages of individual models enables the ENSL model to produce the best forecasts, as reflected by a bias of 0.084°C and an RMSE of 1.37°C (Fig. 4g).

    Fig  4.  Scatterplots between forecasts and observations based on (a) ECMWF-HRES and correction models by using (b) LR, (c) RF, (d) SVM, (e) ANN, (f) GBDT, and (g) ENSL. The blue dotted line depicts y = x. The red solid line indicates the LR line for forecasts of ECMWF-HRES and correction models and observations.

    Figure 5 illustrates the diurnal variation of the forecast error for each correction model. As shown in the box plots, during the Winter Olympics (3–21 February), the largest variation in temperature (−5 to −28.8°C) occurs in the second half of the night at 0200 BT, and the smallest temperature span (−7.8 to −26.5°C) is apparent in the morning at 0800 BT. As shown in the curves, the forecast accuracy of ECMWF-HRES is the highest for 0800 BT (RMSE = 2.07°C) and the lowest is for 1400 and 1700 BT (RMSE = 3.13°C). All the correction models perform similarly in terms of the diurnal variation of the forecast error, and are better during the daytime than at nighttime. The forecast accuracy of ENSL is highest (RMSE = 1.09°C) at 1100 BT, and lowest (RMSE = 1.49°C) at 0500 BT. In addition, the performance of ENSL is evaluated separately for the stations on the slopes and in the valleys. On average, the RMSE of stations on the slopes (1.2°C) is lower than in the valleys (1.82°C). These results confirm that the performance of the correction models is related to the predictability. The cold air pools that occur mostly over the stations in the valleys at night increase the difficulty of forecasting at nighttime. The forecasts in cold air pools are further analyzed in Section 4.4.

    Fig  5.  Diurnal variation performance of each ML correction model and ECMWF-HRES. Box plots show temperature observations, in which triangles represent their mean values, and curves indicate the RMSE of forecasts.

    Figure 6a shows the gust-speed forecast performances of the correction models and ECMWF-HRES in different time periods. Evidently, each ML model improves substantially over the NWP model in terms of the forecast performance. As the lead time increases, the forecast error of ECMWF-HRES for gust speed decreases gradually, whereas the forecast error of each ML model increases slightly. The RMSE of ENSL is 1.87 m s−1, reducing 65.3% relative to ECMWF-HRES. Of the five individual models, the LR model delivers the poorest performance, with an RMSE of 2.5 m s−1 (a 53.7% reduction), whereas the ANN model performs the best, with an RMSE of 1.87 m s−1 (a 65.3% reduction). The forecast performance of the ENSL model is generally consistent with that of the ANN model. Nonlinear regression models are appreciably advantageous over the LR model in terms of their gust-speed forecasts.

    Fig  6.  As in Fig. 3, but for gust-speed forecasts of the (a) whole testing dataset and (b) extreme testing dataset (above 11 m s−1).

    Extremes are of greater concern in gust-speed forecasting. In the Winter Olympic Games, a gust speed of 11 m s−1 is the threshold for determining whether certain competitions can be held. Therefore, forecast performance of the correction models for extreme gusts is analyzed based on the threshold of 11 m s−1. As shown in Figs. 6a, b, ECMWF-HRES is generally consistent in its forecast performance on the extreme testing dataset (RMSE = 5.2 m s−1) and the whole testing dataset (RMSE = 5.4 m s−1), whereas the forecast performance of the ML correction models differs significantly. The forecast error of the correction models is markedly higher for the extreme testing dataset than the whole testing dataset, while it increases in a more pronounced manner with the lead time increasing. The RMSE of the forecasts produced by ENSL for extreme gusts is 2.98 m s−1, a reduction of 42.7% relative to ECMWF-HRES.

    Each correction model exhibits generally similar trends in forecast errors. This phenomenon is significantly related to the predictability of gusts. More extreme gust speeds and longer lead times degrade the gust predictability, thereby compromising the ability of correction models to improve the forecast accuracy. Nevertheless, the ML models still offer considerable improvements over the NWP model in terms of forecast performance. Of the five individual models, the LR model performs the poorest, with an RMSE of 3.37 m s−1 (35% reduction), whereas the SVM model performs best, with an RMSE of 2.93 m s−1 (43.6% reduction). Considering the extreme testing dataset, the RMSE of the ANN model, at 3.19 m s−1 (38.7% reduction), is higher than those of the other nonlinear models, whereas its performance is best for the whole testing dataset. This suggests that the ANN model, while effective for general gust prediction, might underestimate the extreme samples. The performance of the ENSL model is slightly poorer than that of the SVM model for the extreme testing dataset, but better for the whole testing dataset. In summary, ENSL achieves the best performance, along with better generalization.

    The box plots of 3-h gust speeds throughout each day (Fig. 7) reveal that gust speeds are highest at 1100–1700 BT, during which period extreme gust speeds are likely to occur. These are potentially associated with the rise in boundary layer height and the enhancement of thermal and turbulence effects. The RMSE for ECMWF-HRES is the highest at 1700 and lowest at 2000 BT. Except for the LR model, whose performance fluctuates differently throughout the day, both the nonlinear regression models and the ENSL model exhibit a consistent pattern: their RMSE is lowest at 1700 and highest at 0200 BT.

    Fig  7.  As in Fig. 5, but for gust-speed forecasts.

    Wind speed is less challenging to predict compared to temperature and gust speed. While ECMWF-HRES exhibits a satisfactory forecast performance (RMSE = 1.7 m s−1), the ML models perform well in correcting the forecast bias of the NWP model. As shown in Fig. 8, the forecast error of ECMWF-HRES gradually decreases with increasing lead time, whereas that of each ML model remains generally consistent, with a slight increase. In terms of forecast accuracy, the nonlinear models perform better than the LR model, among which the ENSL model exhibits the best performance with an RMSE of 1.22 m s−1 (a 28.5% reduction).

    Fig  8.  As in Fig. 3, but for wind-speed forecasts.

    The ML models and ECMWF-HRES show some similarities in their diurnal variations of forecast error (Fig. 9). The forecast accuracy of ECMWF-HRES is the highest at 0800 and 2000 BT. The correction performance of the ML models is likewise the best for these two time points. However, at 1400 BT, when peak daily wind speeds occur, the ECMWF-HRES forecast exhibits the highest error, with an RMSE of 2.1 m s−1. The ML models exhibit high correction performance at 1400 BT, among which the ENSL model performs the best with an RMSE of 1.2 m s−1. In summary, ECMWF-HRES performs well in forecasting the wind speed, and each ML model likewise performs markedly well in correcting the forecast bias of ECMWF-HRES.

    Fig  9.  As in Fig. 5, but for wind-speed forecasts.

    Cold air pools generally occur in valleys over mountainous regions on calm and clear nights under the control of a weak high-pressure ridge. They are characterized by a mid-level warming air mass and a low wind speed in the boundary layer. In the feature engineering process, the whole-layer RH, the temperature variation, and wind are used to characterize the synoptic patterns under which cold air pools occur. The combination of these predictors and terrain features is effective in predicting cold air pools. During a typical cold air pool event, the minimum temperature generally occurs before sunrise. Daily temperature forecasts at 0500 BT from 3 to 21 February, generated by ENSL and ECMWF-HRES, are compared with observations at G1 (2011 m), CCS2 (1687.5 m), and CCS3 (1622.8 m) (Fig. 10). The intensity of the cold air pool is defined as the temperature at or near the top of the cold air pool minus the temperature at the bottom of the cold air pool (Kelsey et al., 2019). We calculate the temperature differences among G1, CCS2, and CCS3 to represent the intensity of cold air pools.

    Fig  10.  Daily temperatures (a) observed at three stations, (b) forecasted by the ENSL model, and (c) forecasted by ECMWF-HRES at 0500 BT 3–21 February 2022. Comparison of 3-h temperatures (d) observed at stations, (e) forecasted by ENSL, and (f) forecasted by ECMWF-HRES during the cold-air-pool events between 9 and 10 February.

    As illustrated in Fig. 10a, cold air pool events occurred during 7–10 February. During the four days, daily temperatures of the station located on the top of mountain (G1) are higher than those of the station located at the bottom of the valley (CCS3), by 7, 2.5, 6, and 4°C, respectively. The forecasts of the ENSL model reveal daily temperature differences of 4, 2, 1, and 3°C between G1 and CCS3 (Fig. 10b). This indicates that the ENSL model is generally effective in forecasting the development trend of the cold air pool events, whereas it slightly underestimates its intensity.

    Further analysis of the 3-h observations and forecasts corresponding to the cool air pools between 9 and 10 February are conducted (Figs. 10d, e). The highest cold air pool intensity on 9 February occurs in the second half of the night, whereas ENSL forecasts it in the first half of night. The occurrence time of the cold air pool on 10 February is accurately predicted. This reveals that the underestimation can be primarily attributed to complex factors affecting the intensity and occurrence time of cold air pools. A cold air pool may reach its highest intensity in the first or second half of the night. The NWP model employed for feature engineering rarely reflects these fine characteristics. Moreover, the temperatures in the valley during cold air pools are significantly lower than during non-cold-air-pool situations, which likewise causes the underestimation of intensity.

    Overall, the ENSL model is effective in forecasting the different temperature patterns during cold-air-pool and non-cold-air-pool events. Considering cold-air-pool events, the forecast performance of ENSL is considerably better than for non-cold-air-pool events (e.g., the event between 19 and 21 February; Figs. 10a, b).

    ML models are considered “black boxes”, as forecasters cannot understand their internal working. To build human forecasters’ trust in ML predictions, it is important to explain the “why” of an ML model prediction (Linardatos et al., 2021; Silva et al., 2022; Flora et al., 2024). In this section, the interpretability of ENSL is discussed to explain physics for predictors and weighting coefficients of individual models in ridge regression.

    The predictors of ENSL established by feature engineering are divided into synoptic-pattern fields, surface element fields, terrain features, and temporal features (Table 2). To examine the feature importance of each category to ENSL, the model is retrained after separately removing each list of corresponding category features. A larger decrease in performance suggests that the corresponding category of features has a greater influence on the model. After separately removing the lists of terrain, surface element fields, synoptic-pattern fields, and temporal features, the RMSE of ENSL increased by 21.23%, 19.02%, 14.46%, and 1.23%, respectively, relative to the original ENSL with all features (Table 4). The terrain features are the most crucial category. Surface element fields are the direct temperature forecasts given by the NWP model and they represent the second most important features. Temporal features demonstrate the least influence among all categories.

    Table  4.  RMSE change rate after removing each category of features relative to the original ENSL model
    FeatureExcluding surface-element fieldExcluding synoptic-pattern fieldExcluding terrain featuresExcluding temporal features
    Temperature19.02%14.46%21.23%1.23%
    Gust speed5.38%0.52%4.18%3.50%
    Wind speed2.43%2.85%−0.50%2.59%
     | Show Table
    DownLoad: CSV

    The performances before and after removing the list of each category’s features are analyzed to discuss the physical mechanisms involved in each case in ENSL. Different categories of features vary considerably in terms of their impacts on the forecast performance of the correction model. As shown in Figs. 10b, 11b, in the absence of terrain features, the forecasts for all stations are generally consistent, suggesting that the terrain features downscale the NWP model in ENSL and effectively describe the complex underlying surface conditions around the stations. Comparison of the forecasts produced by ENSL with and without the synoptic-pattern fields (Figs. 10b, 11a) reveals that the synoptic-pattern fields effectively describe different synoptic patterns during cold-air-pool and non-cold-air-pool events. In the presence of non-cold-air-pool events (e.g., the events between 20 and 21 February), the forecasts for G1, CCS2, and CCS3 are almost the same. The correction model fails to accurately forecast the temperatures in the Genting and Guyangshu competition zones that vary due to different elevations. In cold-air-pool events, the ENSL model is somewhat effective in forecasting strong cold air pools (e.g., the event that occurred between 7 and 10 February). However, including the synoptic-pattern fields improves the fit between the observed and forecast temperatures in terms of their spatial distribution. Meanwhile, removing the synoptic-pattern fields generally renders the model ineffective in forecasting weak cold air pools (e.g., the event that occurred on 15 and 17 February).

    Fig  11.  Daily forecasted temperatures at 0500 BT by the ENSL model after removal of the (a) synoptic-pattern fields (Ts) and (b) terrain features (Tt).

    The feature importance of different categories for gust speed and wind speed forecasts exhibits relative similarities (Table 4), with removal of each category contributing an RMSE improvement of less than 0.1 m s−1. The surface element field features are the most important predictors for gust speed forecasts, and synoptic-pattern field features contribute the least, while the two categories of features for wind speed are similar. These can be attributed to the high altitudes of stations in the Zhangjiakou competition zone, where the wind speed forecasts of the surface are close to the low level by ECMWF-HRES. Nevertheless, the gust speed forecasts in the surface element fields are the important features in the ENSL gust forecasts.

    Terrain features play a key role in gust forecasts but minimally impact wind predictions, potentially due to their different statistical patterns. In the Zhangjiakou competition zone, wind speed variations between stations are relatively small, but gust speeds vary significantly, especially during high wind events. Temporal features are essential for both gust and wind forecasts, as they exhibit strong diurnal variability (Li et al., 2022; Fang et al., 2023). It is important to note that the analysis of feature importance is based on a testing dataset from the Beijing 2022 Winter Olympic Games. While the findings align with established physical principles for each feature category, some degree of randomness may be present due to the limited data.

    The weighting coefficients of LR can reveal the relative importance of predictors within the model (Jergensen et al., 2020; Linardatos et al., 2021). To assess the importance of individual models, the weighting coefficients of the meta learner (ridge regression) are analyzed. As Table 5 illustrates, the GBDT model dominates the temperature prediction with the highest coefficient, while LR and SVM contribute less. RF and GBDT lead for gust and wind forecasts, with LR having a weaker influence.

    Table  5.  Coefficients of each individual learner in ridge regression
     LRRFGBDTANNSVM
    Temperature0.0600.1840.4710.290.002
    Gust speed−0.0750.3770.3120.1940.176
    Wind speed−0.0500.3880.3470.1680.130
     | Show Table
    DownLoad: CSV

    The weighting coefficients of individual models align with the performance analysis in Section 4, but are not completely consistent. Among individual learners, nonlinear models show significantly better prediction performance than linear models. This demonstrates that the ensemble of models is effective, and the ENSL improves the performance by overlapping the differences in the sign of the errors associated with forecasts of the individual models (Fritsch et al., 2000), rather than selecting the best one. The performance analysis likewise confirms the better generalization of ENSL.

    An ensemble-learning correction model, named ENSL, is proposed in this study to yield the temperature, gust-speed, and wind-speed forecasts for the Winter Olympic venues in the Zhangjiakou competition zone based on ECMWF-HRES. Specifically, five individual models (LASSO regression, RF, GBDT, SVM, and ANN) are combined into an ensemble model with a ridge regression model as the meta learner. By effectively integrating the advantages of the five individual models, the ENSL model performs satisfactorily in correcting the bias of ECMWF-HRES forecasts for temperature, gust speed, and wind speed. The conclusions of this study can be summarized as follows:

    (1) The ENSL model produces satisfactory temperature and wind forecasts, with RMSEs of 1.63°C, 1.87 m s−1, and 1.22 m s−1 for the temperature, gust-speed, and wind-speed forecasts, respectively, constituting reductions of 48.2%, 65.3%, and 28.5% relative to ECMWF-HRES.

    (2) Of the five individual models, the nonlinear models are superior to the linear model in terms of correction performance. As temperature and wind speed are continuous variables, GBDT demonstrates greater applicability owing to its additive ensemble to fit the forecast errors gradually. For gust speed, the ANN model performs the best on the whole testing dataset, while its performance falters when forecasting extreme samples (above 11 m s−1). The SVM model, known for its computational approach to finding optimal hyperplanes, demonstrates superior forecasting accuracy for extreme samples.

    (3) The ENSL model exhibits better performance and generalization than the individual models. For temperature and wind speed, the ENSL model outperforms all individual models in correcting the forecast bias of the NWP model. In terms of gust-speed forecasting, the performance of ENSL is consistent with ANN (best individual model) for the whole testing dataset, but performs better on extreme samples.

    (4) The ENSL model achieves good forecasting performance for cold-air-pool events that frequently occur in the Zhangjiakou competition zone. It is generally effective in forecasting the development trend of the cold-air-pool events, but slightly underestimates their intensity.

    (5) Predictors established by feature engineering are divided into four categories, including synoptic-pattern-field, surface-element-field, terrain, and temporal features. Experimental results show that the ENSL model is effective in fitting physical explanations of features in each category, suggesting that ENSL is an interpretable model.

    Future research should expand the ENSL correction model to larger regions, such as the whole of Hebei Province, to evaluate its performance. Additionally, incorporating regional NWP models as predictors for ENSL would facilitate multi-scale meteorological information fusion and potentially enhance its performance.

  • Fig.  6.   As in Fig. 3, but for gust-speed forecasts of the (a) whole testing dataset and (b) extreme testing dataset (above 11 m s−1).

    Fig.  1.   Terrain and distribution of AWSs in the Zhangjiakou competition zone: (a) the Zhangjiakou competition zone (black triangles indicate AWS locations) and (b) location of the Zhangjiakou competition zone within Hebei Province, China. CRH: China Railway High-speed.

    Fig.  2.   The structure of ENSL.

    Fig.  3.   Performances of temperature forecasts generated by ML correction models and ECMWF-HRES (x-axis denotes the forecast lead time, left- and right-hand y-axes indicate the RMSE and SS, respectively; histograms and curves show RMSEs and SS for the different ML correction models, respectively).

    Fig.  4.   Scatterplots between forecasts and observations based on (a) ECMWF-HRES and correction models by using (b) LR, (c) RF, (d) SVM, (e) ANN, (f) GBDT, and (g) ENSL. The blue dotted line depicts y = x. The red solid line indicates the LR line for forecasts of ECMWF-HRES and correction models and observations.

    Fig.  5.   Diurnal variation performance of each ML correction model and ECMWF-HRES. Box plots show temperature observations, in which triangles represent their mean values, and curves indicate the RMSE of forecasts.

    Fig.  7.   As in Fig. 5, but for gust-speed forecasts.

    Fig.  8.   As in Fig. 3, but for wind-speed forecasts.

    Fig.  9.   As in Fig. 5, but for wind-speed forecasts.

    Fig.  10.   Daily temperatures (a) observed at three stations, (b) forecasted by the ENSL model, and (c) forecasted by ECMWF-HRES at 0500 BT 3–21 February 2022. Comparison of 3-h temperatures (d) observed at stations, (e) forecasted by ENSL, and (f) forecasted by ECMWF-HRES during the cold-air-pool events between 9 and 10 February.

    Fig.  11.   Daily forecasted temperatures at 0500 BT by the ENSL model after removal of the (a) synoptic-pattern fields (Ts) and (b) terrain features (Tt).

    Table  1   Terrain data for the observation sites

    Station nameModel elevation (m)Actual elevation (m)Slope (°)Aspect (°)Underlying surface condition
    G11640.581923.440.68207.47Slope (northern)
    G21641.901853.960.36203.96Slope (northern)
    G31648.872075.851.0765.22Slope (northern)
    G41646.532012.139.118.13Slope (southern)
    G51645.801949.455.9430.96Slope (northern)
    G61644.261886.837.2546.02Slope (southern)
    BIA11613.781650.252.04175.91Valley (east–west)
    CCS21615.241687.570.80330.07Slope (southern)
    CCS31619.731622.863.127.13Valley (east–west)
    Download: Download as CSV

    Table  2   Predictors used in the ML models

    Category Predictor Pressure level (hPa) Unit
    Temperature predictor Synoptic pattern field 24-h temperature variation 850 °C
    RH 500/700/850 %
    U component of wind 850 m s−1
    V component of wind 850 m s−1
    Temperature 500/700/850/925/950 °C
    Surface element field 2-m temperature °C
    Maximum 2-m temperature °C
    Minimum 2-m temperature °C
    Terrain feature Longitude, latitude °
    Elevation, model terrain elevation m
    Slope, aspect °
    Terrain (slope/valley), direction of the mountain range
    Temporal feature Valid time, lead time h
    Wind/gust speed predictor Synoptic pattern field 24-h temperature variation 850 °C
    U wind component 500/700/850 m s−1
    V wind component 500/700/850 m s−1
    Surface element field 10-/100-m U wind component m s−1
    10-/100-m V wind component m s−1
    10-m gust m s−1
    Terrain feature Longitude, latitude °
    Elevation, model terrain elevation m
    Slope, aspect °
    Terrain (slope/valley), direction of the mountain range
    Temporal feature Valid time, lead time h
    Download: Download as CSV

    Table  3   Sample sizes of the training, verification, and testing datasets

    Training datasetVerification datasetTesting dataset
    Temperature84,42193287623
    Gust speed78,91391387623
    Wind speed78,92991867623
    Download: Download as CSV

    Table  4   RMSE change rate after removing each category of features relative to the original ENSL model

    FeatureExcluding surface-element fieldExcluding synoptic-pattern fieldExcluding terrain featuresExcluding temporal features
    Temperature19.02%14.46%21.23%1.23%
    Gust speed5.38%0.52%4.18%3.50%
    Wind speed2.43%2.85%−0.50%2.59%
    Download: Download as CSV

    Table  5   Coefficients of each individual learner in ridge regression

     LRRFGBDTANNSVM
    Temperature0.0600.1840.4710.290.002
    Gust speed−0.0750.3770.3120.1940.176
    Wind speed−0.0500.3880.3470.1680.130
    Download: Download as CSV
  • Allard, D., A. Comunian, and P. Renard, 2012: Probability aggregation methods in geoscience. Math. Geosci., 44, 545–581, doi: 10.1007/s11004-012-9396-3.
    Baran, S., and S. Lerch, 2016: Mixture EMOS model for calibrating ensemble forecasts of wind speed. Environmetrics, 27, 116–130, doi: 10.1002/env.2380.
    Bonavita, M., R. Arcucci, A. Carrassi, et al., 2021: Machine learning for Earth system observation and prediction. Bull. Amer. Meteor. Soc., 102, E710–E716, doi: 10.1175/BAMS-D-20-0307.1.
    Cesa-Bianchi, N., and G. Lugosi, 2006: Prediction, Learning, and Games. Cambridge University Press, Cambridge, 394 pp, doi: 10.1017/CBO9780511546921.
    Chen, M. X., J. N. Quan, S. G. Miao, et al., 2018: Enhanced weather research and forecasting in support of the Beijing 2022 Winter Olympic and Paralympic Games. WMO Bull., 67 , 58−61.
    Chen, M. X., Z. Y. Fu, F. Liang, et al., 2021: A review of SMART2022-FDP progress. Adv. Meteor. Sci. Technol., 11, 8–13, doi: 10.3969/j.issn.2095-1973.2021.06.002. (in Chinese)
    Chen, Y. W., X. M. Huang, Y. Li, et al., 2020: Ensemble learning for bias correction of station temperature forecast based on ECMWF products. J. Appl. Meteor. Sci., 31, 494–503, doi: 10.11898/1001-7313.20200411. (in Chinese)
    Cho, D., C. Yoo, J. Im, et al., 2020: Comparative assessment of various machine learning-based bias correction methods for numerical weather prediction model forecasts of extreme air temperatures in urban areas. Earth Space Sci., 7, e2019EA000740, doi: 10.1029/2019EA000740.
    Cho, D., C. Yoo, B. Son, et al., 2022: A novel ensemble learning for post-processing of NWP Model’s next-day maximum air temperature forecast in summer using deep learning and statistical approaches. Wea. Climate Extrem., 35, 100410, doi: 10.1016/j.wace.2022.100410.
    Clarke, B., 2003: Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res., 4, 683–712.
    Fang, Y., Y. F. Wu, F. M. Wu, et al., 2023: Short-term wind speed forecasting bias correction in the Hangzhou area of China based on a machine learning model. Atmos. Oceanic Sci. Lett., 16, 100339, doi: 10.1016/j.aosl.2023.100339.
    Flora, M. L., C. K. Potvin, A. McGovern, et al., 2024: A machine learning explainability tutorial for atmospheric sciences. Artif. Intell. Earth Syst., 3, e230018, doi: 10.1175/AIES-D-23-00 18.1.
    Florinsky, I. V., 2017: An illustrated introduction to general geomorphometry. Prog. Phys. Geogr. Earth Environ., 41, 723–752, doi: 10.1177/0309133317733667.
    Fritsch, J. M., J. Hilliker, J. Ross, et al., 2000: Model consensus. Wea. Forecasting, 15, 571–582, doi: 10.1175/1520-0434(2000)015<0571:MC>2.0.CO;2.
    Fu, X. M., H. Y. Liu, T. T. Li, et al., 2021: Analysis of mountain-valley breeze on Biathlon venue of 2022 Beijing Winter Olympics. Desert Oasis Meteor., 15, 24–29. (in Chinese)
    Gneiting, T., and R. Ranjan, 2013: Combining predictive distributions. Electron. J. Statist., 7, 1747–1782, doi: 10.1214/13-EJS823.
    Goutham, N., B. Alonzo, A. Dupré, et al., 2021: Using machine-learning methods to improve surface wind speed from the outputs of a numerical weather prediction model. Bound.-Layer Meteor., 179, 133–161, doi: 10.1007/s10546-020-005 86-x.
    Han, L., J. Z. Sun, W. Zhang, et al., 2017: A machine learning nowcasting method based on real-time reanalysis data. J. Geophys. Res. Atmos., 122, 4038–4051, doi: 10.1002/2016jd 025783.
    Hansen, L. K., and P. Salamon, 1990: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell., 12, 993–1001, doi: 10.1109/34.58871.
    Jergensen, G. E., A. McGovern, R. Lagerquist, et al., 2020: Classifying convective storms using machine learning. Wea. Forecasting, 35, 537–559, doi: 10.1175/WAF-D-19-0170.1.
    Jia, C. H., J. J. Dou, S. G. Miao, et al., 2019: Analysis of characteristics of mountain-valley winds in the complex terrain area over Yanqing–Zhangjiakou in the winter. Acta Meteor. Sinica, 77, 475–488, doi: 10.11676/qxxb2019.033. (in Chinese)
    Jiang, J. W., F. C. Fu, Y. X. Shao, et al., 2019: Distributed gradient boosting decision tree algorithm for high-dimensional and multi-classification problems. J. Software, 30, 784–798, doi: 10.13328/j.cnki.jos.005690. (in Chinese)
    Ke, G. L., Q. Meng, T. Finley, et al., 2017: LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Long Beach, USA, 3149−3157.
    Kelsey, E. P., M. D. Cann, K. M. Lupo, et al., 2019: Synoptic to microscale processes affecting the evolution of a cold-air pool in a northern New England forested mountain valley. J. Appl. Meteor. Climatol., 58, 1309–1324, doi: 10.1175/JAMC-D-17-0329.1.
    Lagerquist, R., A. McGovern, and T. Smith, 2017: Machine learning for real-time prediction of damaging straight-line convective wind. Wea. Forecasting, 32, 2175–2193, doi: 10.1175/WAF-D-17-0038.1.
    Li, H. C., C. Yu, J. J. Xia, et al., 2019: A model output machine learning method for grid temperature forecasts in the Beijing area. Adv. Atmos. Sci., 36, 1156–1170, doi: 10.1007/s00376-019-9023-z.
    Li, J. R., J. L. Fu, Y. W. Tao, et al., 2022: Temperature and wind characteristic analysis in Zhangjiakou Olympic area for the Winter Olympic Games. Meteor. Mon., 48, 149–161, doi: 10.7519/j.issn.1000-0526.2022.010301. (in Chinese)
    Linardatos, P., V. Papastefanopoulos, and S. Kotsiantis, 2021: Explainable AI: A review of machine learning interpretability methods. Entropy, 23, 18, doi: 10.3390/e23010018.
    Liu, H. Y., Y. H. Duan, T. T. Li, et al., 2020: Observation analysis on cold air lake structure in the biathlon venue for Beijing 2022 Winter Olympic Games. J. Arid Meteor., 38, 929–936, doi: 10.11755/j.issn.1006-7639(2020)-06-0929. (in Chinese)
    Liu, X. W., W. B. Huang, Y. S. Jiang, et al., 2021: Study of the classified identification of the strong convective weathers based on the LightGBM Algorithm. Plateau Meteor., 40, 909–918. (in Chinese)
    Mallet, V., G. Stoltz, and B. Mauricette, 2009: Ozone ensemble forecast with machine learning algorithms. J. Geophys. Res. Atmos., 114, D05307, doi: 10.1029/2008JD009978.
    Mass, C. F., J. Baars, G. Wedam, et al., 2008: Removal of systematic model bias on a model grid. Wea. Forecasting, 23, 438–459, doi: 10.1175/2007WAF2006117.1.
    Men, X. L., R. L. Jiao, D. Wang, et al., 2019: A temperature correction method for multi-model ensemble forecast in North China based on machine learning. Climatic Environ. Res., 24, 116–124, doi: 10.3878/j.issn.1006-9585.2018.18049. (in Chinese)
    Mercer, A. E., M. B. Richman, H. B. Bluestein, et al., 2008: Statistical modeling of downslope windstorms in Boulder, Colorado. Wea. Forecasting, 23, 1176–1194, doi: 10.1175/2008 WAF2007067.1.
    Reeves, H. D., and D. J. Stensrud, 2009: Synoptic-scale flow and valley cold pool evolution in the western United States. Wea. Forecasting, 24, 1625–1643, doi: 10.1175/2009WAF222223 4.1.
    Ren, P., M. X. Chen, W. H. Cao, et al., 2020: Error analysis and correction of short-term numerical weather prediction under complex terrain based on machine learning. Acta Meteor. Sinica, 78, 1002–1020, doi: 10.11676/qxxb2020.060. (in Chinese)
    Silva, S. J., C. A. Keller, and J. Hardin, 2022: Using an explainable machine learning approach to characterize Earth System model errors: Application of SHAP analysis to modeling lightning flash occurrence. J. Adv. Model. Earth Syst., 14, e2021MS002881, doi: 10.1029/2021MS002881.
    Smith, S. A., A. R. Brown, S. B. Vosper, et al., 2010: Observations and simulations of cold air pooling in valleys. Bound.-Layer Meteor., 134, 85–108, doi: 10.1007/s10546-009-9436-9.
    Sun, Q. D., R. L. Jiao, J. J. Xia, et al., 2019: Adjusting wind speed prediction of numerical weather forecast model based on machine learning methods. Meteor. Mon., 45, 426–436. (in Chinese)
    Wang, Y. F., G. P. Li, Z. M. Wang, et al., 2022: Numerical simulation of the formation and dissipation of a cold air pool in the Chongli Winter Olympic Games Area. Chinese J. Atmos. Sci., 46, 206–224. (in Chinese)
    Woodcock, F., and C. Engel, 2005: Operational consensus forecasts. Wea. Forecasting, 20, 101–111, doi: 10.1175/WAF-831.1.
    Xia, J. J., H. C. Li, Y. Y. Kang, et al., 2020: Machine learning-based weather support for the 2022 Winter Olympics. Adv. Atmos. Sci., 37, 927–932, doi: 10.1007/s00376-020-0043-5.
    Yang, L., F. Han, M. X. Chen, et al., 2018: Thunderstorm gale identification method based on support vector machine. J. Appl. Meteor. Sci., 29, 680–689, doi: 10.11898/1001-7313.2018 0604. (in Chinese)
    Zamo, M., L. Bel, and O. Mestre, 2021: Sequential aggregation of probabilistic forecasts—Application to wind speed ensemble forecasts. J. Roy. Stat. Soc. Ser. C: Appl. Stat., 70, 202–225, doi: 10.1111/rssc.12455.
    Zhang, Y. B., M. X. Chen, L. Han, et al., 2022: Multi-element deep learning fusion correction method for numerical weather prediction. Acta Meteor. Sinica, 80, 153–167, doi: 10.11676/qxxb2021.066. (in Chinese)
    Zhang, Y. H., and A. Z. Ye, 2021: Machine learning for precipitation forecasts postprocessing: Multimodel comparison and experimental investigation. J. Hydrometeorol., 22, 3065–3085, doi: 10.1175/jhm-d-21-0096.1.
    Zhou, C. S., H. C. Li, C. Yu, et al., 2022: A station-data-based model residual machine learning method for fine-grained meteorological grid prediction. Appl. Math. Mech., 43, 155–166, doi: 10.1007/s10483-022-2822-9.
    Zhou, Z. H., 2016: Machine Learning. Tsinghua University Press, Beijing, 425 pp. (in Chinese)
  • Related Articles

  • Other Related Supplements

  • Cited by

    Periodical cited type(1)

    1. Yuchi Xie, Linye Song, Mingxuan Chen, et al. A Segmented Classification and Regression Machine Learning Approach for Correcting Precipitation Forecast at 4–6 h Leadtimes. Journal of Meteorological Research, 2025, 39(1): 79. DOI:10.1007/s13351-025-4117-2

    Other cited types(0)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return