The probabilistic forecast of wind gusts poses a significant challenge during the post-processing of numerical model outputs. Comparative analysis of probabilistic forecasting methods plays a crucial role in enhancing forecast accuracy. Within the context of meteorological services for alpine skiing at the 2022 Beijing Winter Olympics, The ECMWF ensemble products were used to evaluate six post-processing methods. These methods include ensemble model output statistics (EMOS), backpropagation neural networks (BP), particle swarm optimization algorithms with backpropagation neural networks (PSO), truncated normal distributions, truncated logarithmic distributions, and generalized extreme value (GEV) distributions. The performance of these methods in predicting gust probabilities at five observation points along a ski track was compared. All six methods exhibited a substantial reduction in forecast errors compared to the original ECMWF products; however, the ability to correct the model forecast results varied significantly across different wind speed ranges. Specifically, the EMOS, truncated normal distribution, truncated logarithmic distribution, and GEV distribution demonstrated advantages in low wind-speed ranges, whereas the BP and PSO methods exhibit lower forecast errors for high wind-speed events. Furthermore, this study affirms the rationality of utilizing the statistical characteristics derived from ensemble forecasts as probabilistic forecast factors. The application of probability integral transform (PIT) and quantile–quantile (QQ) plots demonstrates that gust variations at the majority of observation sites conform to the GEV distribution, thereby indicating the potential for further enhanced forecast accuracy. The results also underscore the significant utility of the PSO hybrid model, which amalgamates particle swarm optimization with a BP neural network, in the probabilistic forecasting of strong winds within the field of meteorology.
The 24th Winter Olympic Games took place in Beijing during 4–20 February 2022. Among the various sporting events, alpine skiing emerged as the most captivating and closely watched discipline, characterized by its exigent meteorological conditions. The construction of a narrow and elongated competition area on Haituo Mountain (Yanqing District, Beijing) with a remarkable vertical drop of 900 m and a slope length exceeding 3000 m served as the host site for this particular event. Meteorological conditions play a paramount role in determining the success and safety of alpine ski competitions, and gusty winds (the maximum value of 3-s instantaneous wind within a 6-h period), in particular, act as a critical factor that influences the feasibility of conducting the events smoothly and directly impacts athletes’ performance and wellbeing. As each track’s observatory points exhibit substantial variations in meteorological elements, attaining enhanced precision in gusty wind forecasting has become the primary focus of meteorological services related to this sport.
Errors in numerical weather prediction (NWP) come from several sources (Vannitsem et al., 2021), including initial conditions (Vannitsem, 2017), boundary conditions (Collins and Allen, 2002), and model structure (Lorenz, 1982). The NWP errors induced by these model deficiencies increased rapidly as the lead time increased. In recent decades, improvements in the initial conditions and model structure have substantially affected the development of NWP (Magnusson and Källén, 2013). Despite significant developments, errors still occurred (Bauer et al., 2015). Statistical post-processing methods are used as an integral part of operational weather forecasting to improve the forecast quality.
Prominent methods of statistical post-processing are EMOS (Gneiting et al., 2005) and Bayesian model averaging (BMA; Raftery et al., 2005). Probabilistic forecasts take the form of probability density functions (PDF) for EMOS techniques and a weighted average of PDFs for BMA techniques. Based on machine learning (ML) tools, many of the post-processing methods were proposed to calibrate numerical weather forecasts (Haupt et al., 2021; Vannitsem et al., 2021). Quantile regression methods have been proposed and extended in several ways (Bremnes, 2004; Wahl, 2015; Velthoen et al., 2019). In combination with random forests, quantile regression forests (QRF) have been applied to weather forecasting (Taillardat et al., 2016; Whan and Schmeits, 2018).
In the field of wind forecasting, neural networks, as an essential part of ML techniques, have provided marked improvements (Haupt et al., 2021). McCann (1992) demonstrated that the neural network method provided a significant thunderstorm forecast. A neural network was designed for tornado prediction, and probabilities were obtained by using neural network prediction (Marzban and Stumpf, 1996). For tornado prediction, an appropriate neural network structure was devised and the PDF for the outputs was assessed (Marzban and Stumpf, 1998). Neural networks were used for the post-processing as distributional regression networks (DRN; Rasp and Lerch, 2018). Bremnes (2020) proposed the use of a highly adaptable neural network for probabilistic distributions using quantile functions. Based on a neural network and convolutional neural network framework, a post-processing approach was developed to improve the forecasting skill (Scheuerer et al., 2020). Veldkamp et al. (2021) applied convolutional neural networks to probabilistic forecasting.
Although these methods are used to correct wind speed forecasts of the NWP, they are scarce for wind gust speeds (Schulz and Lerch, 2022). Pantillon et al. (2018) investigated potential improvements in the predictability of wind gusts through statistical post-processing based on a truncated logistic distribution. Schulz and Lerch (2022) compared eight post-processing methods for probability wind gust speeds and concluded that these methods can improve the forecast skill of raw high-resolution ensemble prediction products.
When providing gust forecasting services, it is necessary to analyze two key aspects. First, the selection of a forecasting model involves comparing various modeling methods to determine appropriate post-processing techniques for numerical models. The use of a 100-km horizontal resolution numerical model forecast product is a reasonable choice because it primarily captures large-scale weather processes while filtering out local weather effects. By combining modeling techniques with observational data, a more accurate analysis of the strengths and weaknesses of the different methods can be achieved. Second, the advantage of using a hybrid model [combining particle swarm optimization (PSO) algorithm and neural networks (NN)] in improving the accuracy of strong wind forecasting is significant. In the context of meteorological services for high-altitude skiing, the accurate forecasting of strong gusts is crucial to ensure smooth competition. The hybrid model that combines PSO algorithm and NNs demonstrates an advantage in forecasting strong winds. This has important implications for maintaining the normal progression of skiing events. Therefore, in the provision of gust forecasting services, selecting an appropriate forecasting model and leveraging the advantages of a hybrid model can enhance the accuracy of strong wind predictions, thereby ensuring the successful execution of high-altitude skiing competitions.
2.
Data
Wind velocity forecasts at 12-h intervals were used from the ensemble prediction system (EPS) with 100-km horizontal resolution and 51 members at 850 hPa issued at 0800 BT (Beijing Time). The EPS product used in the operational forecast practice will lag by 24 h. For example, a 12-h post-processing forecast today will use the 36-h EPS product from the previous day. The wind gusts at 6-h intervals (6-, 18-, 30-, and 42-h lead times) were calculated with bilinear interpolation from the EPS product, as there were 12-h intervals (12, 24, 36, and 48 h) in the EPS received.
Five observation stations were established on Xiao Haituo Mountain at Yanqing of Beijing for alpine skiing during the 2022 Winter Olympics. Table 1 indicates a small horizontal distance and large vertical drop among the 5 stations, above 2000 m for station identifier (ID) 1701 and below 1000 m for station ID 1489. Figure 1 shows that the model resolution of 100 km in the horizontal direction is so coarse that the direct model output (DMO) calculated by using the interpolation method for the five stations will have similar values.
Fig
1.
Spatial distributions of grid (black square) and five stations (black star). (a) Grid point (station) locations and topographic map and (b) enlarged map of station locations. Terrain height [m MSL (mean sea level)] is indicated by color shading.
The training set consists of 250 observation–DMO pairs from 15 January to 15 March 2018, 2019, 2020, and 2021. Several models based on the training set are used for wind gust forecasts at 8 lead times (6-, 12-, 18-, 24-, 30-, 36-, 42-, and 48-h probabilistic forecasts).
3.
Post-processing methods
Six predictors are used as input variables in post-processing: ensemble mean (x1), maximum (x2), minimum (x3), 25% fractile (x4), 50% fractile (x5), and 70% fractile (x6). The six modeling methods with distributional assumptions are as follows:
The multiple regression line method is used to estimate the magnitude (y) of wind gusts at 6-h intervals.
y=w0+w1x1+⋯+w6x6+ε,
(1)
where w0, w1, w2, w3, w4, w5, and w6 refer to coefficients, and ε is an error term and the estimated expectation for ε is zero.
Var(ε)=c+ds2.
(2)
The linear function for error variance prediction is applied, where Var(ε) is the estimation of error variance, c and d are a nonnegative coefficient, s2 is an ensemble spread, and s2=166∑i=1(xi−¯x)2and ¯x=166∑i=1xi.
3.1
Nonhomogeneous normal regression (NM)
Using the normal distribution for wind speed gust forecasting and combining Eqs. (1) and (2) yields Eq. (3) (Gneiting et al., 2005) as follows:
N(w0+w1x1+⋯+w6x6,c+ds2).
(3)
3.2
Truncated normal predictive distribution (TM)
As wind gust has a nonnegative variance, a normal distribution that is left-truncated at zero (Thorarinsdottir and Gneiting, 2010) is employed. The predictive density of the EMOS is:
PTN(x|μ,σ)=[1σφ(x−μσ)]/Φ(μσ),x⩾
(4)
For P_{\text{TN}}\text{(}x\text{|}\mu\text{,}\sigma\text{)}=\text{0}, otherwise, \varphi and \mathit{\Phi} denote the standard normal density function (PDF) and standard normal cumulative distribution function (CDF), respectively.
3.3
Truncated logistic predictive distribution (LN)
Truncated logistic distribution (LN) is more appropriate for wind speed forecasting (Scheuerer and Möller, 2015). P_{\text{TN-LN}}\text{(}x\text{|}\mu\text{,}\sigma\text{)} denotes the PDF for the LN with locations \mu and \sigma > 0 .
P_{\text{TN-LN}}\text{(}x\text{|}\mu\text{,}\sigma\text{)}=\text{0}, otherwise, \varphi is the standard normal density function.
3.4
Generalized extreme value (GEV)
The GEV distribution is a reasonable choice for wind speed distribution. The CDF of the GEV distribution (Scheuerer, 2014) with \mu , \sigma , and \xi that characterizes the location, scale, and shape of the GEV is
If \xi > 0 and y < μ − \dfrac{\sigma }{\xi } , {C_{\rm GEV}}(x | \mu , \sigma ,\xi ) = 0 .
3.5
Backpropagation neural (BP)
The BP network method was used with a 6-15-1 net structure that included the 6 statistical variables listed above as the input node, 15 nodes as the hidden layer, and the prediction error as the output layer (Xiong, 2017,2020). The BP method refers to the input flow through a hidden layer and produces an output through connection weights. If the output (forecast bias) is not sufficiently small, it is backpropagated through the neural network by adjusting the connection weights. The learning rate of the NN was set to 0.05. The weights of the neural network were optimized during iterative processing by using the Levenberg–Marquardt (L–M) algorithm.
Based on the 6-15-1 net structure, the output {\mu _{{\text{BP}}}} was calculated, which represents the post-processing products of the 6-h wind gust speed over each lead time and station. Using the normal distribution N(\mu ,\sigma ) with \mu (mean) and \sigma (standard deviation), the predictive density of the EMOS for the BP method was N({\mu _{\text{BP}}},c + d{s^2}).
3.6
Particle swarm optimization (PSO) algorithm with BP network
Many techniques have been applied to NN models to improve forecast accuracy. A PSO algorithm was introduced to optimize the weights of the NN (Sheikhan and Mohammadi, 2013; Xiong et al., 2022). The principle of PSO is that the fitness of particles without mass or volume is determined by the optimization function, and the optimal fitness is then obtained by continuous iterations. Each particle at a certain speed and position can be expressed as follows:
where \delta_j\ (j=1,2,\cdots,n_1) is the j{\text{th}} weight of BP neural network, \delta\mathit{^i}(k)=[\delta_1^i,\delta_2^i,\cdots,\delta_{n_1}^i] is the position of the i{\text{th}} particle, {p}^{i}(k) = [p_1^i,p_2^i, \cdots ,p_{n_1}^i] is the personal best position, {p}^{g}(k) = [p_1^g,p_2^g, \cdots p_{n_1}^g] is the global best position, and v^{i}(k) = [v_1^i,v_2^i, \cdots ,v_{n_1}^i] is the velocity of the {\text{ith}} particle; and V_{j}^{i}(k + 1) is expressed by the sum of three terms, one of which is inertia at k{\text{th}} step [ \omega (k)V_j^i(k) ], the other is individual cognitive variable [{\phi _1}{\text{rand}}(0,{a_1})(p_j^i(k) - \delta _j^i(k)) ] and social cognitive variable [{\phi _2}\;{\text{rand}}\;(0,{a_2}) (p_j^g(k) - \delta _j^i(k) ], where \omega (k) is an inertial function that is calculated by using \omega {(k)} = {\omega _{{\text{max}}}} - \dfrac{{({\omega _{\max }} - {\omega _{\min }}) k}}{{{K_{\max }}}}, {\varphi _{\text{1}}} and {\varphi _{\text{2}}} are equal to 1.5, and {a_{\text{1}}} and {a_{\text{2}}} are equal to 1.0. The mean population size (m_1) is 30. The maximum number of iterations ( {K_{{\text{max}}}} ) is 500. To avoid becoming trapped in a local minimum in a traditional NN, a hybrid optimization algorithm called PSO is used.
The variable {\mu _{{\text{PSO}}}} refers to the forecast values obtained by the PSO algorithm. Using the normal distribution for the wind gust speed forecast, the predictive density of PSO model is N({\mu _{\text{PSO}}},c + d{s^2}).
3.7
Parameter estimation
A continuous ranked probability score (CRPS) is widely used to evaluate the reliability and sharpness of probability forecasts. Given C (predictive CDF), {x_{{\text{pred}}}} (prediction), and o (observation), CRPS is defined as (Gneiting et al., 2005) {\text{CRPS}}(C,o) = \displaystyle\int_{ - \infty }^\infty [C({x_{\text{CRPS}}}) - H({x_{\text{CRPS}}} - o)]^2{\text{d}}{x_{\text{CRPS}}}, where H({x_{\text{CRPS}}} - o) denotes the Heaviside function whose value equals 0 when {x_{\text{CRPS}}} < o and 1 otherwise; C is the CDF of the distribution with parameters \mu and \sigma , which are functions of the ensemble members and ensemble variance, depending on w_i, c , and d . To obtain the optimal EMOS coefficient for each station and lead time, the scoring role is the CRPS score by minimizing the mean value of the CRPS based on the dataset of forecast–observation pairs in the previous month around the date of interest. For the BP and PSO models, \mu is estimated directly by the NN algorithm, whereas the optimal parameters c and d are calculated by minimizing the CPRS score.
To assess the quality of predictions, a 2 × 2 contingency table (Table 2) is widely utilized (Roebber, 2009). The evaluation metrics were defined as follows: \mathrm{P}\mathrm{O}\mathrm{D}=\dfrac{A}{A+C} , \mathrm{P}\mathrm{O}\mathrm{D}=\dfrac{A}{A+C} , \mathrm{C}\mathrm{S}\mathrm{I}=\dfrac{A}{A+B+C} .
4.
Evaluation
4.1
Difference between DMO and observations at each station
DMO refers to the ensemble mean. Due to the coarse resolution (100 km), the DMO of each station interpolated from the EPS product had the same value. The DMO is represented by a bar chart along the date coordinates from 4 to 20 February 2022, and the observation of the five weather stations is shown by curing in different colors for each station in Fig. 2. The horizontal axis represents the date coordinates with a 6-h interval (four values per day). The DMOs of the four time periods of the day (0200, 0800, 1400, and 2000 BT) correspond to the EPS 6-h forecast released at 2000 BT on the previous day, the EPS initial field at 0800 BT on the current day, the EPS 6-h forecast released at 0800 BT on the current day, and the EPS initial field at 2000 BT on the current day. For ease of comparison, the 6-h forecast was approximated to the analysis field and treated as the DMO.
Fig
2.
Comparison of wind gust observations in 6 h over 5 stations (colorful lines) and DMO (gray columns) for 4–20 February 2022. The inset histograms display ME (left, ME = predicted gust − observed gust) and MAE (right) of DMO at each station.
The gust observation values of all observation stations were significantly higher than DMO (Fig. 2); for example, the maximum gust at high-altitude measurement points exceeded 25 m s–1, and even at low-altitude measurement points, it reached 15 m s–1, whereas the maximum value of DMO was only 10 m s–1. The oscillation of the column represents the EPS prediction of the environmental wind field, but the changes in the strength of the environmental wind and local gusts at the observation station are not consistent; when there is a higher column, the prediction error of the DMO is larger. The phase difference in gust fluctuations at different stations reflects the uncertainty of gust changes in areas with complex terrain. For example, the mean error (ME) and mean absolute error (MAE) of three high-altitude stations (1701, 1703, and 1705) were above 5 m s–1 (subplot of Fig. 2).
4.2
Probability integral transform (PIT) histogram
Figure 3a indicates that the wind speeds forecasted by the DMO are significantly weaker and exhibit insufficient spread. This characteristic is consistent with the “observation–forecast” discrepancies shown in Fig. 2. For the six methods (Figs. 3b–g), a relatively uniform rank histogram was obtained, and the shape was slightly different from one method to another.
Fig
3.
Verification rank histograms of (a) raw ensemble and PIT histograms of the EMOS post-processed forecasts for all lead times at five stations showing (b) NM, (c) LN, (d) TN, (e) GEV, (f) BP, and (g) PSO models.
The nominal coverage of the six input factors is 71% (5/7). The empirical average of a prediction interval was used to assess the sharpness of the rank and PIT histograms for each method. Table 3 summarizes the coverage values to indicate the predictive performance of the six methods at the five weather stations. The average coverage for the raw forecast is 45.56%, which is lower than the nominal value (71%). Calibration of the six post-processing methods showed a significant improvement corresponding to the raw ensemble. The best performance was that of the GEV, in which the maximum coverage occurred at three stations (1703, 1705, and 1708), PSO for 1701, and LN for 1489.
Table
3.
Coverage for raw products and six post-processing methods of all lead times at each station
From the perspective of the assumptions of normal and GEV distributions, the coverage values of the GEV at the six observation stations reached a maximum (85.14%), indicating that the changing characteristics of gusts at the observation points can be predicted accurately. It is necessary to analyze the fitting ability of the GEV at each measurement point based on the actual gusts and theoretical distributions. The probability density distribution and quantile comparison graphs visually display the degree of fit between the GEV and the actual observations.
Figure 4 shows the distribution of observation at each station and three GEV parameters to fit the variance of the observation, where shape of GEV at each station has a little difference and shape value varies between −0.14 and −0.025. The GEV fit curve presented a different relationship from that of the observation histogram. The histograms of observations for each station were calculated based on the frequency of 4-m s−1 speed intervals as follows: 0–4, 4–8, 8–12, and 12–16 m s−1 .
Fig
4.
Relationships between wind gust speed and frequency. The black curve line represents the probability density of GEV at five stations. Three corresponding parameters are shown in the top right. The dotted column refers to frequency of wind gust observation.
From the perspective of the actual gusts at the five observation points, most showed a skewed distribution. However, the GEV and gust observation histograms are consistent, achieving a relatively ideal fit. The calculated shape parameters are all less than zero, indicating a Weibull distribution. The location parameter and scale parameter can reflect the size changes of gusts. The difference in height between the adjacent bars in the graph indicates the closeness of the GEV fitting. The curve is consistent with the histogram, indicating the rationality of the GEV. However, in the interval of 8–16 m s−1 at measurement station 1489, the heights of the corresponding two bars were the same, indicating that the theoretical distribution curve of the GEV did not reproduce the actual gust distribution characteristics in this interval.
A quantile–quantile (QQ) plot is a graphical method for testing the distribution of random variables. The abscissa axis in Fig. 5 refers to the quantile of the GEV distribution, and the vertical axis refers to the corresponding quantile of the sample. In Fig. 5, the black dots are closely clustered around a straight line, indicating that the fitted values of the GEV are equal to the empirical distribution values. For most observation stations, the GEV accurately reflected the changing patterns of gusts. Similar to the probability density distribution graph, when the wind speed was greater than 10 m s−1 , there was a certain deviation between the black dots and straight line at 1489 observation points.
Fig
5.
QQ plots of wind gust speed at 6 h in 5 observation stations. The black dots are plotted against the corresponding empirical quantiles from the fitted GEV model. A symmetry line is shown in green.
Based on Table 3, Figs. 4 and 5, except for one observation point that had some deviation, the GEV had a good fitting effect on the gusts at other observation points. Probability forecasts under the assumption of a GEV can achieve a high forecasting skill.
4.4
Comparison of forecast errors with varying wind speeds
Wind gusts not only play an important role in race performance, but also affect whether alpine ski matches can be held in time. The difference in the predictive skill under different wind speeds is discussed to evaluate the performance of each model.
The wind gust ranged from 2 to 26 m s−1 and was divided into 13 segments at intervals of 2 m s–1. The dashed columns in Fig. 6 refer to the frequency of the corresponding speed segment, and the peak value appears between 10 and 12 m s–1. Several characterizations are presented for the bias of the ensemble median prediction shown in Fig. 6. First, the bias was negative when the wind speed was lower than 10 m s–1 and positive when the wind speed was higher than 12 m s–1. The ensemble median prediction was always lower than that observed during high winds. Second, the best models were LN for weak winds (< 10 m s–1) and PSO for strong winds (> 10 m s–1), as illustrated in the subplots in Fig. 6. Third, the bias was small during the prevailing wind and increased with increasing or decreasing wind.
Fig
6.
ME distributions of wind speed forecast median of six models (colorful lines) averaged over all lead times and stations. Dashed column illustrates the frequency of wind gust speed observations at 6 h over all stations. The inset histograms indicate the difference of ME of forecast median of six models’ products for weak wind (< 10 m s–1 at 6 h, left) and strong wind ( \geqslant 10 m s–1 at 6 h, right).
According to the mean absolute error (MAE), the tendency was similar to that of the bias. When the wind speed was greater than 18 m s−1, the best scores were obtained for the BP and PSO models. Figure 7 also illustrates that there is a daily cycle of MAE where the peak occurs at 12- and 36-h lead times.
Fig
7.
The curve of MAE variations for the median forecasts of the six models in different wind speed ranges. The top-left subplot shows the curve of MAE variations with forecast lead time. The bar chart on the right represents the comparison of average MAE for the six models at wind speeds below and above 18 m s–1.
Given the probabilistic CRPS, the predictive performance of the PSO models presented a significant advantage over the other models for strong winds (> 14 m s–1). The features of the daily cycle are presented in the subplot (Fig. 8), where the high value of CRPS is at 12- or 35-h lead time.
Fig
8.
The CRPS curves for the six models as a function of wind speed (main plot) and forecast lead time (subplot). Comparison of average CRPS for the six models in two wind speed categories: below 14 m s–1 (top subplot) and above 14 m s–1 (bottom subplot).
If the predictive distributions can be categorized into two groups, including the normal distribution (NM, LN, and TN) and GEV, the models with a normal distribution have higher forecast skill in low winds, and the GEV provides better performance when high winds are expected (Figs. 6, 7, and 8).
For high winds, the above statements show that the BP and PSO methods have an advantage in forecasting accuracy compared to the other methods. To better understand the forecast skill at each station, a performance diagram was used to evaluate the reliability of strong-wind forecasts. We selected the 60th percentile as a threshold for strong wind gust speed at each station, of which the 60th percentile equals 17.10 m s–1 (1701), 12.87 m s–1 (1703), 14.20 m s–1 (1705), 9.80 m s–1 (1708), and 9.58 m s–1 (1489), respectively. To summarize the model performance, a single diagram can be used to visualize categorical indicators, where the best value is in the top right and the worst is in the bottom left. The results (Fig. 9) indicated that the BP and PSO models performed better than the other models at three stations (1703, 1705, and 1489). All six models yielded better accuracy than the raw forecasts at each station.
Fig
9.
For the high wind above 60th percentile, performance diagram of wind gust forecasts of raw ensemble means (DMO) and six method outputs (colorful lines) at each station (indicated above the panel). The POD (probability of detection), FAR ratio (false alarm ratio), bias (dashed lines), and CSI (critical success index, labeled solid contours) are given in diagram. The crosshairs indicate the sampling uncertainty.
4.5
Comparative analysis of different predictors
The selection of predictors was closely related to the performance of the forecast models. In the previous section, the statistical characteristics of the ensemble members were used as predictors for modeling; however, if we consider the geopotential height and wind speed fields from different pressure levels as input factors, there would be a total of 204 (4 × 51) candidate factors from the four pressure levels (500, 700, 850, and 925 hPa) and 51 ensemble members.
The basic calculation process for predictor selection through stepwise regression is as follows: first, a regression equation is established using the target variable (observed values) and a subset of predictor variables (ensemble forecast members). Next, we gradually introduced predictor variables that had the highest variance contribution among all predictors not yet included in the equation, reaching a certain level of statistical significance. This helps build a regression equation. Simultaneously, we calculated the variance contribution of each predictor variable in the existing equation to the target variable when new variables were introduced. The predictor variables no longer have a significant variance contribution to the target variable because of the introduction of new variables, forming a new set of predictor variables. Factors with significant variance contributions were gradually introduced, whereas those with insignificant variance contributions were eliminated. Finally, the predictor variables with p-values less than 0.05 and non-zero regression coefficients, forming the so-called “optimal” set of predictor variables, were retained.
For each observation point, a stepwise regression method was employed, using 204 forecast members to obtain the selected factors. For example, if point 1701’s potential height forecast is selected as the predictor, 204 forecast members exist for the four height levels. By utilizing the “forecast–observation” pairs consisting of the daily forecasts at 0800 LT from 15 January to 15 March 2018–2021 and their corresponding observations, a total of 32 members are selected using the stepwise regression method (Table 4). Subsequently, based on the PSO forecast model, these 32 factors were used as input variables to train the PSO forecast model for potential height. Similarly, a PSO forecast model was obtained using wind speed as the input variable.
Table
4.
Forecast factors obtained through stepwise regression
Figure 10 illustrates the impact of the different predictors on the forecast performance. During the Winter Olympics, with five observation points and eight forecast lead times, the forecasts and corresponding observations were plotted on a single graph. Treating the observed values ({x}_{\rm obs}) as the independent variable and the forecast values ( y_{\rm{fcst}} ) as the dependent variable, a simple linear regression equation can be derived. Under a 95% confidence level, with \alpha =0.05 and degrees of freedom being n − 2, the prediction interval of \hat{y}_{\rm{f}\mathrm{cst}} at \mathit{t}_{\mathit{n}-\mathrm{2,1}-\frac{\alpha}{2}} , based on the t-distribution table, is as follows: \hat{y}_{\rm{f}\mathrm{cst}}\pm t_{n-\mathrm{2,1}-\frac{\alpha}{2}} \sqrt{{\sigma}^2 \left[1+\dfrac{1}{n}+\dfrac{(x_{\rm{o}bs}-\overline{x}_{\rm{o}bs})^2}{\sigma_x^2}\right]} , where {\widehat{y}}_{\rm fcst} is the estimated value of y\mathrm{_{\rm{f}cst}} , {\sigma }^{2}=\dfrac{1}{n-2} ({\sigma }_{y}^{2}-{\sigma }_{xy}^{2}/ {\sigma }_{x}^{2}\cdot {\sigma }_{xy}^{2}) is calculated as \sigma_x^2 = \sum_{ }^{ }(x_{\rm{o}\mathrm{bs}}-\overline{x}_{\rm{o}\mathrm{bs}})^2,\sigma_{xy}^2 = \sum_{ }^{ }(x_{\rm{o}\mathrm{bs}}-\overline{x}_{\mathrm{\rm{o}bs}})(y\mathrm{_{\rm{f}cst}}- \overline{y}\mathrm{_{\rm{f}cst}}) , \sigma_y^2=\sum_{ }^{ }(y_{\rm{f}\mathrm{cst}}-\overline{y}_{\rm{f}\mathrm{cst}})^2 . By establishing the functional relationship described above, the deviation of the predicted values can be inferred using the actual observation values.
Fig
10.
Distributions of predicted and observed values obtained by modeling with different predictors: (a) potential height, (b) wind speed, and (c) six statistical features. The gray line represents the regression line, while the dashed line represents the symmetry line. The shaded area represents the 95% confidence interval.
Based on the analysis, the following key findings can be observed. First, the model based on potential height as the predictor exhibits significant outliers in the predicted values. The upper boundaries of the confidence intervals for wind speed were in the ranges of 25–40 (Fig. 10a), 15–25 (Fig. 10b), and 8–15 m s–1 (Fig. 10c), whereas the observed wind speeds ranged from 2 to 30 m s–1. This indicates that the reliability of wind-speed predictions using potential height as the independent variable was relatively low. Second, the model based on the potential height tended to overestimate wind speed. The intersection points between the regression and symmetry lines occur at approximately 24 (Fig. 10a), 13 (Fig. 10b), and 11 m s–1 (Fig. 10c). Finally, considering the aforementioned observations, the model utilizing ensemble statistical features as input factors demonstrated a higher prediction accuracy. When using individual ensemble members as input factors, the model may fail to fully capture the characteristics of the ensemble forecast and may be more susceptible to the influence of individual members. Conversely, by incorporating ensemble statistical features, the model can effectively leverage the overall information of the ensemble, resulting in an improved predictive performance and greater stability. Overall, these findings highlight the importance of selecting appropriate predictor variables and utilizing ensemble statistical features to enhance the accuracy and reliability of wind-speed predictions in the studied context.
5.
Conclusions
The focus of this research is to study gust forecasting methods and their applications in alpine skiing. A comprehensive and detailed analysis was conducted by comparing six modeling methods and two distribution assumptions, with a specific emphasis on the forecast performance at five observation points along the ski track. Based on PIT analysis, differences in the effectiveness of the aforementioned forecasting methods were revealed. The rationality of the GEV distribution was theoretically and empirically demonstrated through a combination of probability density distribution and QQ plots. This study discusses the variations in the performance of different methods under different wind speed conditions and provides a detailed analysis of their forecasting abilities for strong winds. By using the forecast members of the potential height and wind field as forecast factors and the six statistical characteristics mentioned in this paper as input factors, a comparative analysis was conducted, which demonstrated the rationality of these six features as input variables. In summary, the following conclusions are drawn:
(1) All six methods exhibited significant error correction capabilities. First, from a quantitative perspective, the post-processing methods of the numerical models, as indicated by PIT graphs, average error, average absolute error, and CRPS tests, substantially reduced the error of gust forecasts compared with Direct Model Output (DMO) forecasts. Second, with observation points having a horizontal spacing of tens of meters and vertical drops of nearly one kilometer, there were significant differences in wind speeds among the measurement points. The corrected gust forecasts closely matched the observed values at each point and reflected the characteristics of gust variations with height. Finally, although different modeling methods were used, the input factor values at the five points were exactly the same. This highlights the importance of forecasting methods and facilitates a comparison of their merits and drawbacks.
(2) With the variation in wind speed, different methods yield different forecast results. LN exhibited the smallest average error when the wind speed was low. When the wind speed was high, the PSO had the smallest average error, whereas the LN had the largest average error. In the prevailing wind zone (10–12 m s–1), the average bias is the smallest. Similar characteristics were observed when the mean absolute error and CRPS evaluation metrics were used.
(3) It is appropriate to use ensemble features as input factors. It is well known that the selection of forecast factors is an important step in the post-processing of models. Using the forecast members of the potential height and wind field as forecast factors and applying a stepwise regression method, an optimal set of factors was obtained. Modeling and forecasting based on the PSO method indicated a significant advantage of the six statistical features used in this study.
(4) The ability to forecast strong winds is of great concern for alpine skiers. The comprehensive figure (using a percentile threshold of 60 as an example) demonstrates that model post-processing methods based on neural networks (BP and PSO) exhibit higher forecasting skills. Multiple evaluation metrics also show that PSO slightly outperforms BP.
(5) The assumption about gust distribution affects the performance of probabilistic forecasts. The PIT, probability density, and QQ plots reveal that the observed gust distributions at most measurement locations closely resemble the GEV distribution and exhibit leftskewness. The modeling methods employed in this study roughly fall into two distribution assumptions: normal distribution (NM, LN, TN, BP, and PSO) and GEV. In the analysis of the relative error, absolute error, and CRPS, compared to LN, the GEV exhibited forecasting advantages in the high wind zone, indicating the good performance of the GEV distribution in extreme value prediction.
The exploration of hybrid forecasting models that combine PSO algorithm, BP neural network, and the GEV distribution will be the focus of future research, aiming to gradually improve the forecasting skills for strong winds.
Fig.
4.
Relationships between wind gust speed and frequency. The black curve line represents the probability density of GEV at five stations. Three corresponding parameters are shown in the top right. The dotted column refers to frequency of wind gust observation.
Fig.
1.
Spatial distributions of grid (black square) and five stations (black star). (a) Grid point (station) locations and topographic map and (b) enlarged map of station locations. Terrain height [m MSL (mean sea level)] is indicated by color shading.
Fig.
2.
Comparison of wind gust observations in 6 h over 5 stations (colorful lines) and DMO (gray columns) for 4–20 February 2022. The inset histograms display ME (left, ME = predicted gust − observed gust) and MAE (right) of DMO at each station.
Fig.
3.
Verification rank histograms of (a) raw ensemble and PIT histograms of the EMOS post-processed forecasts for all lead times at five stations showing (b) NM, (c) LN, (d) TN, (e) GEV, (f) BP, and (g) PSO models.
Fig.
5.
QQ plots of wind gust speed at 6 h in 5 observation stations. The black dots are plotted against the corresponding empirical quantiles from the fitted GEV model. A symmetry line is shown in green.
Fig.
6.
ME distributions of wind speed forecast median of six models (colorful lines) averaged over all lead times and stations. Dashed column illustrates the frequency of wind gust speed observations at 6 h over all stations. The inset histograms indicate the difference of ME of forecast median of six models’ products for weak wind (< 10 m s–1 at 6 h, left) and strong wind ( \geqslant 10 m s–1 at 6 h, right).
Fig.
7.
The curve of MAE variations for the median forecasts of the six models in different wind speed ranges. The top-left subplot shows the curve of MAE variations with forecast lead time. The bar chart on the right represents the comparison of average MAE for the six models at wind speeds below and above 18 m s–1.
Fig.
8.
The CRPS curves for the six models as a function of wind speed (main plot) and forecast lead time (subplot). Comparison of average CRPS for the six models in two wind speed categories: below 14 m s–1 (top subplot) and above 14 m s–1 (bottom subplot).
Fig.
9.
For the high wind above 60th percentile, performance diagram of wind gust forecasts of raw ensemble means (DMO) and six method outputs (colorful lines) at each station (indicated above the panel). The POD (probability of detection), FAR ratio (false alarm ratio), bias (dashed lines), and CSI (critical success index, labeled solid contours) are given in diagram. The crosshairs indicate the sampling uncertainty.
Fig.
10.
Distributions of predicted and observed values obtained by modeling with different predictors: (a) potential height, (b) wind speed, and (c) six statistical features. The gray line represents the regression line, while the dashed line represents the symmetry line. The shaded area represents the 95% confidence interval.
Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 47–55. doi: 10.1038/nature14956
Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338–347. doi: 10.1175/1520-0493(2004)132<0338:PFOPIT>2.0.CO;2
Bremnes, J. B., 2020: Ensemble postprocessing using quantile function regression based on neural networks and Bernstein polynomials. Mon. Wea. Rev., 148, 403–414. doi: 10.1175/MWR-D-19-0227.1
Collins, M., and M. R. Allen, 2002: Assessing the relative roles of initial and boundary conditions in interannual to decadal climate predictability. J. Climate, 15, 3104–3109. doi: 10.1175/1520-0442(2002)015<3104:ATRROI>2.0.CO;2
Gneiting, T., A. E. Raftery, A. H. Westveld III, et al., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118. doi: 10.1175/MWR2904.1
Haupt, S. E., W. Chapman, S. V. Adams, et al., 2021: Towards implementing artificial intelligence post-processing in weather and climate: Proposed actions from the Oxford 2019 workshop. Philos. Trans. A Math. Phys. Eng. Sci., 379, 20200091. doi: 10.1098/rsta.2020.0091
Lorenz, E. N., 1982: Atmospheric predictability experiments with a large numerical model. Tellus, 34, 505–513. doi: 10.3402/tellusa.v34i6.10836
Magnusson, L., and E. Källén, 2013: Factors influencing skill improvements in the ECMWF forecasting system. Mon. Wea. Rev., 141, 3142–3153. doi: 10.1175/MWR-D-12-00318.1
Marzban, C., and G. J. Stumpf, 1996: A neural network for tornado prediction based on Doppler radar-derived attributes. J. Appl. Meteor., 35, 617–626. doi: 10.1175/1520-0450(1996)035<0617:ANNFTP>2.0.CO;2
Pantillon, F., S. Lerch, P. Knippertz, et al., 2018: Forecasting wind gusts in winter storms using a calibrated convection-permitting ensemble. Quart. J. Roy. Meteor. Soc., 144, 1864–1881. doi: 10.1002/qj.3380
Raftery, A. E., T. Gneiting, F. Balabdaoui, et al., 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 1155–1174. doi: 10.1175/MWR2906.1
Rasp, S., and S. Lerch, 2018: Neural networks for postprocessing ensemble weather forecasts. Mon. Wea. Rev., 146, 3885–3900. doi: 10.1175/MWR-D-18-0187.1
Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601–608. doi: 10.1175/2008WAF2222159.1
Scheuerer, M., 2014: Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quart. J. Roy. Meteor. Soc., 140, 1086–1096. doi: 10.1002/qj.2183
Scheuerer, M., and D. Möller, 2015: Probabilistic wind speed forecasting on a grid based on ensemble model output statistics. Ann. Appl. Stat., 9, 1328–1349. doi: 10.1214/15-AOAS843
Scheuerer, M., M. B. Switanek, R. P. Worsnop, et al., 2020: Using artificial neural networks for generating probabilistic subseasonal precipitation forecasts over California. Mon. Wea. Rev., 148, 3489–3506. doi: 10.1175/MWR-D-20-0096.1
Schulz, B., and S. Lerch, 2022: Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison. Mon. Wea. Rev., 150, 235–257. doi: 10.1175/MWR-D-21-0150.1
Sheikhan, M., and N. Mohammadi, 2013: Time series prediction using PSO-optimized neural network and hybrid feature selection algorithm for IEEE load data. Neural Comput. Appl., 23, 1185–1194. doi: 10.1007/s00521-012-0980-8
Taillardat, M., O. Mestre, M. Zamo, et al., 2016: Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics. Mon. Wea. Rev., 144, 2375–2393. doi: 10.1175/MWR-D-15-0260.1
Thorarinsdottir, T. L., and T. Gneiting, 2010: Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression. J. Roy. Stat. Soc. A Stat. Soc., 173, 371–388. doi: 10.1111/j.1467-985X.2009.00616.x
Vannitsem, S., 2017: Predictability of large-scale atmospheric motions: Lyapunov exponents and error dynamics. Chaos, 27, 032101. doi: 10.1063/1.4979042
Vannitsem, S., J. B. Bremnes, J. Demaeyer, et al., 2021: Statistical postprocessing for weather forecasts: Review, challenges, and avenues in a big data world. Bull. Amer. Meteor. Soc., 102, E681–E699. doi: 10.1175/BAMS-D-19-0308.1
Veldkamp, S., K. Whan, S. Dirksen, et al., 2021: Statistical postprocessing of wind speed forecasts using convolutional neural networks. Mon. Wea. Rev., 149, 1141–1152. doi: 10.1175/MWR-D-20-0219.1
Velthoen, J., J. J. Cai, G. Jongbloed, et al., 2019: Improving precipitation forecasts using extreme quantile regression. Extremes, 22, 599–622. doi: 10.1007/s10687-019-00355-1
Wahl, S., 2015: Uncertainty in mesoscale numerical weather prediction: Probabilistic forecasting of precipitation. Ph.D. dissertation, University of Bonn, Bonn, 120 pp.
Whan, K., and M. Schmeits, 2018: Comparing area probability forecasts of (extreme) local precipitation using parametric and machine learning statistical postprocessing methods. Mon. Wea. Rev., 146, 3651–3673. doi: 10.1175/MWR-D-17-0290.1
Xiong, M. Q., 2017: Calibrating daily 2 m maximum and minimum air temperature forecasts in the ensemble prediction system. Acta Meteor. Sinica, 75, 211–222. (in Chinese) doi: 10.11676/qxxb2017.023
Xiong, M. Q., K. Dai, and J. Tan, 2020: Analyzing and calibrating extended-range forecast of China’s daily maximum temperature in spring. J. Trop. Meteor., 36, 795–804. (in Chinese) doi: 10.16032/j.issn.1004-4965.2020.040
Xiong, M. Q., W. Feng, and C. H. Liu, 2022: Analysis of wind prediction skills for the Winter Olympics playing area in Yanqing Beijing. Acta Meteor. Sinica, 80, 289–303. (in Chinese) doi: 10.11676/qxxb2022.018