In order to improve the accuracy of hail forecasting for mountainous and plateau areas in China, this study presents a novel fusion forecast model based on machine learning techniques. Specifically, known mechanisms of hail formation and two newly proposed elevation features calculated from radar data, sounding data, automatic station data, and terrain data, are firstly combined, from which a hail/short-duration heavy rainfall (SDHR) classification model based on the random forest (RF) algorithm is built up. Then, we construct a hail/SDHR probability identification (PI) model based on the Bayesian minimum error decision and principal component analysis methods. Finally, an “and” fusion strategy for coupling the RF and PI models is proposed. In addition to the mechanism features, the new elevation features improve the models’ performance significantly. Experimental results show that the fusion strategy is particularly notable for reducing the number of false alarms on the premise of ensuring the hit rate. A comparison with two classical hail indexes shows that our proposed algorithm has a higher forecasting accuracy for hail in mountainous and plateau areas. All 19 hail cases used for testing could be identified, and our algorithm is able to provide an early warning for 89.5% (17 cases) of hail cases, among which 52.6% (10 cases) receive an early warning of more than 42 minutes in advance. The PI model sheds new light on using Bayesian classification approaches for high-dimensional solutions.
China is recognized as one of the most prone regions in the world to hail (Hand and Cappelluti, 2011). Hail is one of the most serious types of severe convective weather (SCW) in China, characterized by strong locality, suddenness, and rapid spatiotemporal evolution (Guan et al., 2015; Yu and Zheng, 2020). It can severely impact agriculture and transportation, and even endanger people’s lives (Brimelow et al., 2017). Therefore, the studies on hail identification and forecasting are extremely important for disaster prevention and reduction. However, there are large differences in the characteristics of hail between western plateau area and eastern plain area in China.
Zhang et al. (2008) analyzed hail cases in China from 1961 to 2005 and found that hail generally appears in mountainous and plain areas. Cao et al. (2018) leveraged an altitude of 1 km as the threshold to divide China into a “first-step region” and “second-step region”, analyzed the environmental conditions of hail in these two regions, and suggested that moisture, heating power, and unstable energy in these two regions have significant differences. The frequency of hail cases in western plateau area is higher than that in eastern plain area, but the size of hail is smaller (Zhao et al., 2015). Based on observational data from 1961 to 2015, the spatiotemporal variation and spatial distribution characteristics of hail cases were analyzed by Xue et al. (2019). The results showed that areas with a high frequency of hail cases are distributed in the plateau areas and surrounding mountains of China.
However, the formation mechanism of hailstones is similar. Dennis and Kumjian (2017) summarized five key conditions: (1) suitable updraft strength and breadth to facilitate the suspension of hailstones or embryos; (2) sufficient supercooled liquid water; (3) an appropriate temperature that is conducive to the growth of hail; and (4) an appropriate size, number, and growth trajectory of embryos. Based on forecasting experience and knowledge of the mechanism of hail formation mentioned above, independent meteorological variables or parameters can be subjectively extracted for use in hail forecasting via different technical methods.
Forecasting of SCW often relies on remote sensing data, such as weather radars, meteorological satellites, lightning positioning systems, and automatic weather stations (Zhang X. L. et al., 2020). In existing radar-based studies, maximum reflectivity (Kunz and Puskeiler, 2010), the Waldvogel parameter (Waldvogel et al., 1979), and the severe hail index (SHI; Witt et al., 1998), designed by specific methods, have been applied to forecast hail cases. López and Sánchez (2009) built a linear discriminant analysis and logistic regression model to predict hail cases based on C-band radar data. A diagnostic bivariate analysis of 52 sounding-derived indices was presented by Manzato (2012). Shi et al. (2019) proposed a quasi-real-time weak echo region morphology identification algorithm via radar echo bottom height images and presented a parameter that can describe the scale of a weak echo region to identify the hail thunderstorms.
The forecasting methods mentioned above are all based on single meteorological factors or linear correlation analysis. However, SCW is a low-probability event, and hail is no exception. The methods mentioned above may have limitations. Multiple meteorological parameters, complex image processing techniques, and nonlinear algorithms may solve the problem (Manzato, 2013), such as the application of wavelet transform and support vector machine approaches in quantitative precipitation estimation, the identification of gust fronts via the local binary with dual-template method, and the forecasting of surface air temperature via machine learning (ML; Yuan et al., 2018; Dai et al., 2019; Zhang C. J. et al., 2020).
Learning from data, ML usually trains multiple implicit relationships and make decisions without being explicitly programmed (Ahmad, 2019). ML has been proven to be effective in SCW forecasting (Haberlie and Ashley, 2018; Scher and Messori, 2018). Wang and Pan (2013) proposed overhang (OH), kurtosis (K), and the strong echo ratio (SER) via analyzing the mechanism and structure of hail. Wang et al. (2016) conducted a study on hail and short-duration heavy rainfall (SDHR) caused by thunderstorms within 50 km and put forward several meteorological features. Based on these features, ML models yielded good forecasting results in the plain area of China, such as support vector machine and random forest (RF) models. In addition, gradient boosting trees and linear regression were employed by Gagne II et al. (2015) to predict hail cases, and the experimental results indicated that the RF model achieved the best forecasting performance. Wang et al. (2018) approached an identification algorithm in the early stages of hail cases by combining hail case time series and two comprehensive features transformed from 10 features. In addition to radar data, the fifth-generation ECMWF Reanalysis (ERA5) data have also been applied to hail forecasting in the past two years (Czernecki et al., 2019; Yao et al., 2020). Furthermore, in terms of overcoming the difficulties involved in collecting hail cases or the inaccuracies of datasets, a radar-based hail-producing storm detection method based on positive unlabeled learning was proposed by Shi et al. (2020).
Recently, the use of deep neural networks has achieved remarkable success in related fields—for example, a dynamic convolutional layer for short-range weather prediction (Klein et al., 2015), a convolutional long short-term memory for precipitation nowcasting (Shi et al., 2015), and a deep convolutional neural network algorithm to forecast SCW (Zhou et al., 2019). Although these deep learning frameworks have achieved advanced results in the forecasting of SCW, they lack interpretability and require large amounts of data.
Despite great achievements having been achieved in hail forecasting, few quantitative studies on the forecasting of hail cases in plateau areas have been carried out. Based on the known mechanisms of hail characteristics in plain areas, this paper introduces novel elevation features and combines them to train an RF model. Additionally, a hail probability identification (PI) model is proposed based on the principal component analysis (PCA) and Bayesian minimum error decision (BMED) methods. Finally, with the aim to reduce the false alarm rate, an “and” fusion strategy for coupling the RF and PI models is also developed.
The paper is structured as follows. The data and their processing are specified in Section 2. Section 3 elaborates the proposed methods. The validity of the new features and the fusion strategy, along with results from experiments comparing the new approach with classical hail indexes, as well as case analysis results, are presented in Section 4. Finally, Section 5 briefly outlines the main conclusions drawn from this study.
2.
Data
2.1
Data sources
The data used in this paper include Doppler weather radar data, radiosonde sounding data, automatic station data, and terrain data. The radar data are generated from five single-polarization C-band radars, which are deployed in Guizhou Province. The area covered by the five radars is shown in Fig. 1. Guizhou Province is mainly composed of mountainous and plateau areas. The radars perform volume scans once every six minutes, and each volume scan includes nine elevations. The radar images are created by transforming radar data from polar coordinates to Cartesian coordinates neighbor interpolation, and the resolution of an image is 0.25 km × 0.25 km. The sounding data derived from nearby radiosonde stations are used to calculate the heights of the 0°C and −20°C layers.
Fig
1.
Geographical locations of the radars. The yellow circles represent the scan ranges of the radars.
The hail cases are obtained from the hail reports provided by Zou (2017), in which detailed records of hail cases are provided. This paper studies the identification of hail in the plateau areas of China, so hail cases are selected as positive samples. SDHR cases are selected as negative samples, which is easily confused with hail and derived from the automatic station data. If the rainfall rate reported by the automatic station is greater than 20 mm h−1, it is considered as an SDHR case. The radar data and automatic station data are all provided by the Public Meteorological Service Center of the China Meteorological Administration.
Terrain data are mainly used to calculate the elevations. Specifically, the Shuttle Radar Topography Mission, jointly measured by NASA and the National Imagery and Mapping Agency, which is the most complete, finest-resolution digital elevation model of the earth (Farr et al., 2007), is employed in this study. After several years of data processing and interpolation algorithms used to fill in holes in the data, a dataset with a resolution of 90 m is now available online at http://www.gscloud.cn.
2.2
Data handling
In this study, all the algorithms are applied to convective cells. An expansion of the collision avoidance algorithm (Wang et al., 2013) is utilized to separate convective cells, and then anomalous propagation (AP) clutter filtering (Wang et al., 2014) is employed. Finally, 95 hail cases (1287 hail cells) are obtained from Zou (2017), and 110 SDHR cases (1210 SDHR cells) are obtained from 392 SDHR observation stations. All radar data come from five radars in Guizhou Province from 2010 to 2015. The SDHR cases occur in the same or a similar period as the hail cases. From all 95 hail cases and 110 SDHR cases, a training set, verification set, and testing set are formed according to the ratio of 6 : 2 : 2.
3.
Proposed method
A fusion forecast model of hail weather in plateau areas is proposed. First, six mechanism features and two elevation features are combined to represent convection cells. Second, an RF model is built to identify hail cells. Third, a PI model based on PCA and BMED is obtained. Finally, an “and” fusion strategy is proposed that combines the RF model and PI model. A flowchart that sets out the overall process of forecasting hail cells is presented in Fig. 2.
Fig
2.
Flowchart of the proposed hail forecasting algorithm.
3.1
Feature representation
According to meteorological forecasting experience and the basic research work of our team, we combine the known mechanisms of hail formation with the proposed elevation features to represent convection cells. All eight features are shown in Table 1, and then specific descriptions of them are provided in the following subsections. A visualized representation is given in Fig. 3, which includes two typical cross-section and range height indicator(RHI) images of hail and SDHR.
Fig
3.
Two typical cross-section and RHI images of hail and SDHR.
3.1.1
Mechanism features of hail
(1) OH (Wang and Pan, 2013): Strong, upward-moving air in a thunderstorm, which is known as the updraft, is the main reason for the weak echo region. A tilted echo above the weak echo region is called an OH echo, which is considered an indicator of particularly large hail in Doppler weather radar observations (Yu et al., 2005; Shi et al., 2019).
(2) ET (Li, 2014): Strong echoes extending vertically above the height of the −20°C layer isotherm have been proven to be an effective feature in hail forecasting (Xu, 1991; Yu et al., 2005, 2020).
(3) LRN (Li, 2014): Vertically integrated liquid (VIL) water and its density have been demonstrated to be feasible in predicting hail cases (Diao et al., 2008).
(4) ARN (Li, 2014): The maximum connected region with reflectivity greater than 45 dBZ is called the cell nucleus, and the ARN is used to reflect the average intensity of the cell nucleus.
(5) SER (Wang and Pan, 2013): The SER is designed to describe the proportion of strong echoes above the −20°C layer.
(6) K (Wang and Pan, 2013): In order to represent the reflectivity distribution difference between the hail cell and SDHR cell in composite reflectivity, K is calculated by using the statistical method outlined in Wang and Pan (2013).
As shown in Fig. 3, the 45-dBZ echo of hail appears above the melting level in radar observations. It can be seen that the OH, ET, and ARN of hail are higher than those of SDHR. The VIL content of hail is significantly lower than that of SDHR, and so the LRN is lower than that of SDHR. The SER and K are calculated by the composite reflectivity and the composite reflectivity above the −20°C layer, respectively.
3.1.2
Construction of elevation features
The 0°C layer can also be called the thawed layer. Once hail falls from the thawed layer, it will gradually melt in the process of falling due to friction. Therefore, if the falling distance is too long, the hail cannot be observed on the ground. Under the action of the same trigger mechanism, hail is more likely to occur at higher altitudes owing to the shorter landing distance of hail blocks (Cao et al., 2018). A more intuitive illustration of hail being easier to observe in high-altitude regions is shown in Fig. 3.
In view of the influence of terrain on the forecasting of hail, the maximum elevation and average elevation of the ground corresponding to the convection cell are calculated. The maximum elevation represents the highest altitude of the ground area, and the average elevation represents the average altitude of the ground. The SMD and AMD of the hail landing can be calculated as follows:
where H0 is the height of the 0°C layer, Haverage is the average elevation, and Hmax is the maximum elevation.
3.2
Classification and identification based on RF algorithm
3.2.1
RF algorithm
The RF algorithm is a kind of ensemble learning, and it is one of the classifiers with the best classification effect at present, as proposed by Breiman (2001). Not only can it deal with the classification problem described by thousands of features directly, but it can also provide a means to evaluate the importance of the features (Lahouar and Slama, 2017). The RF algorithm has been broadly applied and widely studied in hail forecasting (Gagne II et al., 2017; Czernecki et al., 2019; Yao et al., 2020). For these reasons, the hail/SDHR model is constructed based on the RF algorithm firstly.
3.2.2
Feature importance evaluation
During the construction of each base classifier, some samples are not selected for training. This part of the samples is called “out-of-bag” (OOB). The importance of each feature can be obtained by calculating the error by the OOB. In order to evaluate the importance of the mth feature, it is necessary to keep the remaining features unchanged and add random disturbances to this feature of all OOB samples. Setting ei as the OOB error before adding random disturbances to the features, and {{e'}_{i}} as the OOB error after adding random disturbances to the feature, the importance of the mth feature can then be measured as follows:
V = \Delta \bar e = \frac{1}{{{n_{\rm tree}}}}\sum\limits_{i = 1}^{{n_{\rm tree}}} {({{e'}_{i}}} - {e_i}),
(3)
where ntree is the number of decision trees. The larger the V, the more important that feature is (Malek et al., 2018).
3.3
PI model based on PCA and BMED
Since the eight features form a hyperspace, it is difficult to intuitively analyze the validity of each feature for forecasting hail, and there is information redundancy between the various features. Therefore, the eight features are firstly transformed by PCA, which has been applied previously to the forecasting of hail (Mallafre et al., 2009; Wang et al., 2018).
3.3.1
Feature space transformation
In order to avoid the influence of feature range differences on the identification result, all features are normalized to zi, i = 1, 2, ..., 8. New eigenvectors are obtained by PCA as follows:
The PCA transforms the original eight-dimensional vector into a new eight-dimensional vector PC. In PC, each element is called a principal component and no longer correlated. In Eq. (5), [aij]8 × 8 is the coefficient matrix, and the eigenvalues λi of the coefficient matrix introduce the following relationship: λ1 ≥ λ2 ≥ … ≥ λ7 ≥ λ8, as shown in Table 2. We can see that the joint contribution rate of pc1, pc2, and pc3 accounts for nearly 80%, and their coefficient matrix is shown in Table 3.
Table
2.
Eigenvalues and contribution rates of all principal components
The distribution histograms of pc1, pc2, and pc3 are shown in Fig. 4. From Fig. 4, although the contribution rates of pc2 and pc3 are relatively high, the two types of samples almost overlap, so their classification ability is insufficient. However, the peak values of hail and SDHR are obviously separated for pc1, and the overlapping area accounts for less than 40%. Thus, pc1 has a certain classification ability. According to the weight of pc1 in Table 3, the weight of K is relatively low, and the weight of the other seven features is between 0.25 and 0.44. Hence, we can conclude that the proportion of the feature mentioned above in this principal component (pc1) is relatively balanced. Finally, we take pc1 as the important basis for constructing the hail PI model.
Fig
4.
Distribution histograms of (a) pc1, (b) pc2, and (c) pc3.
3.3.2
PI model
The Bayesian classification model approach can also be used for the supervised classification of SCW, such as in the hydrometeor classification algorithm (Marzano et al., 2008; Yang et al., 2019).
According to the Bayesian formula,
P({\omega _i}\left| x \right.) =\frac{{P({\omega _i})P(\left. x \right|{\omega _i})}}{{\sum\nolimits_{j = 1}^2 {P(x|{\omega _j})P({\omega _j})} }}, \quad i = 1,2,
(6)
and assuming that the prior probability P(ωi) is known, the Bayesian classifier can be transformed into a Bayesian classifier based on the class conditional probability. From the perspective of technical implementation, we can use all samples to estimate the conditional probability density function. The number of samples should be much larger than mn, where m is the number of discrete features from a continuous feature, and n is the feature dimension. Setting m = 10, and as long as the feature dimension exceeds 4, the actual classification problem will not be able to successfully train an optimal (minimum classification error) classification model, because it cannot provide enough samples.
With the help of PCA, pc1 in this paper shows a good inter-class distribution and the ability to reflect the original features in a balanced manner on hail and SDHR (see Fig. 4a and Table 3). Thus, pc1 (one dimension) is used to describe hail cells and SDHR cells, as shown in Eq. (7):
To simplify the analysis, we transform Fig. 4a into a percentage stacking diagram, as shown in Fig. 5. For each pc1, the upper and lower range lengths represent the SDHR ratio and hail ratio, respectively. The continuous fitting curve is as follows:
where N1 is the number of hail cells and N2 is the number of SDHR cells. Additionally, S1 and S2 represent the numbers misidentified by the PI model.
It can be seen from Eq. (9) that the determination of the optimal β is equivalent to finding the minimum value of S1 + S2. Therefore, we calculate pc1 of all the validation set by Eq. (8) and move the horizontal dividing line y = β to obtain different values of S1 + S2. Then, we draw them point by point, as shown in Fig. 7. It is clear that the optimal probability threshold is 0.35. In particular, Eq. (8) is the PI model based on PCA and BMED, and the probability threshold is 0.35 (pc1 = −1).
Hail cases belong to low-probability events. While ensuring an accurate forecast, the false alarm rate should be reduced as far as possible. Therefore, we propose the following fusion strategy: if and only if the two models (RF and PI) both identify a cell as hail at the same time, can the cell be considered as a hail cell. The test results for this fusion strategy are shown in Section 4.3.
4.
Experiments and results
4.1
Evaluation measures
For binary events, the classification results of the proposed algorithm are classified into a 2 × 2 consistency table, as shown in Table 4.
Based on the contingency table, seven verification measures are calculated to assess the performance of the model, as presented in Table 5, and the optimal value for each measure is given. In Table 5, n represents the total number of samples.
Table
5.
Measures used to evaluate hail forecasting
We denote the features used in this paper as yi (i = 1, 2, ..., 8), as shown in Table 6. To test the classification performance of each feature, all samples are used to form their respective distribution histograms, as shown in Fig. 8. Meanwhile, the following statistical tests are performed. Assuming that these samples come from two normal distributions with the same variance, the original hypothesis is that the features have no significant difference between the two classes’ samples, while the alternative hypothesis is that there is a significant difference. The Student’s t-distribution is used to analyze the validity of the features, as shown in Eq. (10):
Table
6.
Significant-difference test results of the eight features
where {\bar x_1} and {\bar x_2} denote the mean estimator; s_1^2 and s_2^2 denote the corresponding variances; and n1 and n2 are the numbers of hail and SDHR samples. Setting the confidence level to α = 0.01, the look-up table obtains tα/2(n1 + n2 − 2) = t0.005(2497) < t0.005(∞) = 2.576. That is, if ti > 2.576, i = 1, 2, ..., 8, the original hypothesis is invalid at the confidence level of 1 − α, and the alternative hypothesis is considered to be correct.
According to the eight distribution histograms (Fig. 8) and the statistical test results (Table 6), it can be seen that (i) the eight features selected and newly proposed have statistically significant differences between the hail samples and SDHR samples, (ii) the seven features except K are at least 6.81 times higher than the significant difference threshold [t0.005(∞) = 2.576], and (iii) the classification ability of K is relatively weak (less than twice).
4.2.2
Determination of RF model parameters and the importance of features
In the RF model, the adjustable parameters include the number of base classifiers C1, the number of features assigned to each base classifier (decision tree) C2, the maximum decision tree depth C3, the minimum number of samples C4 for splitable nodes, and the minimum number of leaf node samples C5. Considering that the positive and negative samples of the problem in this paper are only more than 2000 and the number of features is less than 10, we set C2 as the total feature amount, and there is no restriction on C3. In addition, C4 takes the default value 2. Only C1 and C5 need to be tuned.
In order to avoid the blind threshold selection, a grid search method is applied to obtain the two optimal thresholds for the validation set. In this method, the CSI parameter is the scoring parameter. Finally, C1 = 17 and C5 = 12 yield the highest CSI scores. Figure 9 shows the variation curves of C1 and C5 with CSI under the premise of determining the remaining parameters, and the importance of each feature as shown in Fig. 10.
Fig
10.
Variable importance of features applied in the RF model.
It is worth mentioning that: (1) The importance of the two proposed elevation features are ranked third and fourth, respectively, and these two features occupy 31.3%. (2) The importance of LRN, OH, and SER are ranked first, second, and fifth, respectively. The three features jointly account for nearly half (48.8%). Therefore, the cell morphology and strong echo structure caused by the strong updraft show an excellent ability to distinguish hail cells and SDHR cells. (3) The importance of ARN and ET amounts to only 15.1%, indicating that, in the plateau area, due to the shortening of the melting distance, some slightly weak convective cells or strong updrafts that are not particularly high will cause hail. (4) The importance of K accounts for only 4.8%, which is consistent with the statistical test results (Table 6).
In order to further verify the validity of the elevation features, we employ the six mechanism features only in the same testing set (204 positive samples and 271 negative samples) for comparison, as shown in Table 7. We can see that the two elevation features improve the performance of hail identification in the plateau areas by a large margin.
Table
7.
Test results for different combinations of features
In order to verify the effectiveness of the fusion strategy, we first test the RF model (Section 3.2) and the PI model (Section 3.3), respectively. We report the classification performance of the two models using the testing set, which does not participate in training and verification. Table 8 presents the POD, FAR, and CSI test results, which are the most common measures used in meteorology.
Table
8.
Skill scores for different models using the testing set
It can be seen that the performances of the RF and PI models are equivalent. To further analyze the results of the RF and PI models in the testing set, we divide the results in a more detailed way, as shown in Fig. 11. Taking Fig. 11a as an example, the red part refers to the two models making a correct identification; yellow refers to neither model making a correct identification; and green and blue refer to only the RF or PI models making a correct identification, respectively.
Fig
11.
A more detailed test result of the (a) hail testing set and (b) SDHR testing set.
Table 8 shows the separate test results of the RF and PI models, and now we combine Table 8 and Fig. 11 to analyze the effectiveness of the fusion strategy. It can be seen from Fig. 11a that, among the 204 hail samples, 95.1% of the samples are a hit in at least one model, and 88.2% are a hit in two models simultaneously. Besides, we can see from Fig. 11b that, among the 271 samples of SDHR, 25.5% of the samples are misidentified by both models simultaneously, and 42.9% of the samples are misidentified by at least one model. Based on this, Table 9 presents the results of the fusion strategy using the “and” approach (if and only if the two models both identify a cell as hail at the same time is the cell then considered as hail) and the “or” approach (as long as one of the models identifies a cell as hail, the cell is considered as hail).
Table
9.
Skill scores for different fusion strategies using the testing set
Obviously, the “and” fusion strategy reduces the POD by 1.0% and 5.9%, but it can reduce the FAR by 3.6% and 7% (compare the “and” strategy in Table 9 with Table 8). On the contrary, the “or” fusion strategy can increase the POD by 5.9% and 1.0%, but the FAR is raised by 6.1% and 2.7% (compare the “or” strategy in Table 9 with Table 8). At the same time, the CSI of the “and” strategy is higher than that of the “or” strategy. In summary, the “and” fusion strategy, which ensures the hit rate while reducing the false alarm rate, is proven to be more effective. A more detailed evaluation of the fusion strategy based on case results is presented in Section 4.5.
4.4
Comparison with classical hail indexes
The hail forecasting algorithm proposed in this paper is compared with two other methods that are widely used at present. The first is the Waldvogel parameter (Waldvogel et al., 1979), and the second is probability of severe hail (POSH; Witt et al., 1998). The curves in Fig. 12 show the changes in POD, FAR, and CSI for the training set with different threshold parameters. As shown in Fig. 12a, the best Waldvogel parameter threshold is 2 km, which corresponds to the largest CSI score. Similarly, Fig. 12b shows that the best POSH threshold is 6%.
Fig
12.
Frequency distributions of the (a) Waldvogel parameter and (b) POSH methods.
We choose seven skill measures to evaluate the performance of the algorithm, as shown in Table 5. The forecasting of hail cells with the testing set using the different algorithms is summarized in Table 10, from which it can be seen that the algorithms have different classification skills. Compared to the traditional algorithms, the proposed algorithm achieves the best performance for hail forecasting in terms of the seven skill measures. The POD and CSI are increased by at least 13.2% and 12.6%, respectively, and the FAR is reduced by at least 7.5%. In addition to these three measures, the other four measures are closer to their optimums than the other two classical hail indexes.
Table
10.
Comparative skill scores for three methods
In order to test the performance of the fusion forecast model in real cases, 204 hail cells (from 19 cases) and 271 SDHR cells (from 22 cases) are analyzed according to cases. First, we take the moment that hail begins in each case as the origin of coordinates. Each case in Fig. 13a is retained from the first identification of hail until the two sweeps after the hail ends. All 22 SDHR cases are formed into Fig. 13b. The blue boxes represent the identification results as hail cells, and the gray boxes represent the identification results as SDHR cells.
Fig
13.
Case evaluation results of (a) 19 hail cases and (b) 22 SDHR cases.
The following conclusions can be drawn from Fig. 13a: (1) The proposed algorithm can identify all 19 hail cases, and only one of them (hail case #3) misses three cells after the first identification. (2) The proposed algorithm can provide 42-min warning for more than half (52.6%) of hail cases, and more detailed early warning capability information is summarized in Table 11. (3) For hail case #5, the proposed algorithm does not give an advanced warning. For hail case #14, it is not identified until 6 min after the hail begins. In particular, pc1, three-dimensional cell, and composite reflectivity of the cells in cases #5 and #14 are shown in Figs. 14, 15, respectively.
Fig
15.
As in Fig. 14, but for hail case #14 occurred in Jinsha, Guizhou on 8 May 2012.
Figure 14 shows the evolution of the hail case in Panxian, Guizhou, observed by Xingyi radar on 3 June 2015. The hailstones were recorded by observers from 0817 UTC, and hailstones stopped falling after 2 min. The proposed algorithm detects a convection cell at 0810 UTC, but the reflectivity of the cell is weak and the height of the cell nucleus fails to reach the −20°C layer. Therefore, the algorithm does not correctly identify it (the convection cell at 0810 UTC) as hail. Starting from 0815 UTC, the strength, height, and OH state of the cell nucleus are all high, and the algorithm in this paper also successfully identifies them as hail.
Figure 15 shows the evolution of the hail case in Jinsha, Guizhou, observed by Bijie radar on 8 May 2012. The hailstones were recorded by observers from 0918 UTC, and hailstones stopped falling after 3 min. It can be seen that the 4 cells before 0918 UTC (0901, 0906, 0912, and 0917 UTC) have limited extension thickness, a loose structure, and weak strength, height, and volume, which are the crucial reasons why the algorithm in this paper does not identify them as hail. Because of the corrupted radar data, the algorithm does not provide an identification result for the first cell after 0918 UTC (0922 UTC). In the subsequent cells, the strength, height, and OH state of the cells are raised and maintained, so the proposed algorithm successfully identifies them as hail cells.
The following conclusions can be drawn from Fig. 13b:
(1) The proposed algorithm has a hit rate of 74.5% for SDHR cells and 81.8% for SDHR cases (the number of hits is greater than or equal to 50% of the whole case). (2) Among the 69 SDHR cells that are mistakenly identified, 47 cells (greater than 2/3) are concentrated in cases #2, #7, #13, and #20 (less than 1/5). The pc1, three-dimensional cell, and composite reflectivity of the cells in case #2 are shown in Fig. 16. It can be seen that the reflectivity and height of the nucleus are generally high, and the height of the nucleus exceeds the height of the −20°C layer. Starting from the second cell (0808 UTC), pc1 exceeds −1 and continues to increase. Therefore, the algorithm identifies this SDHR case as a hail case.
Fig
16.
As in Fig. 14, but for SDHR case #2 occurred in Anlong, Guizhou on 29 April 2013.
The above testing and analysis reveals that the fusion forecast model can distinguish about 90% of hail and SDHR cases. However, for hail cases with a sudden increase in intensity, height, and OH, the model cannot give an advanced warning. Likewise, SDHR with high intensity, height, and OH will be misidentified as hail. These, however, account for only 10% of the cases.
5.
Conclusions
Focusing on the mountainous and plateau areas of China, 6-yr historical data, including radar data, sounding data, automatic station data, and terrain data, are employed to research and propose a fusion forecast model for hail cases. The major conclusions from this study can be summarized as follows:
(1) Based on six known mechanisms features of hail formation, two further elevation features are proposed, which reflect the melting distance of hailstones.
(2) According to a set of more than 2000 hail cells and SDHR cells, we first construct an RF model. Then, we further construct a PI model based on BMED and PCA. Finally, an “and” fusion strategy is proposed. That is, if and only if the two models (RF and PI) both identify a cell as hail simultaneously, is it then considered as a hail cell.
(3) Based on the feature analysis, we can conclude that all eight features have a high classification quality between hail cells and SDHR cells, which lays an important foundation for training classification and identification models. Among them, the two newly proposed elevation features play a crucial role in hail identification.
(4) Through the results of experiments, the proposed algorithm is found to be more favorable for distinguishing hail and SDHR than classical hail indexes. On the premise of ensuring the hail hit rate, the proposed fusion strategy is helpful in reducing the number of false alarms. In addition, all 19 hail cases in the test are “hits”, and an early warning 42 minutes in advance is given for more than half of the hail cases. If more than half of the cells in an SDHR case are identified as hail, it is regarded as a false alarm for hail. Accordingly, the FAR of the proposed algorithm for hail cases is less than 18%.
(5) This paper proposes a BMED classifier method based on PCA (Section 3.3), in which the classification performance of pc1 (the first principal component) is a necessary prerequisite for applying Bayesian classification. The experimental results verify its effectiveness, and a research idea is also proposed for establishing Bayesian classifiers for high-dimensional solutions.
In the process of training the hail identification model, the method outlined in this paper selects SDHR that is easily confused with hail as the negative sample. Therefore, if some additional rules are added to filter the non-SDHR cells or compelling features are added to distinguish between SDHR and non-SDHR, we can then obtain a classification model between them, which is precisely what we aim to achieve in our future work.
Acknowledgments. The authors wish to thank the Public Meteorological Service Center of the China Meteorological Administration for providing the data used in this paper.
Fig.
3.
Two typical cross-section and RHI images of hail and SDHR.
Brimelow, J. C., W. R. Burrows, and J. M. Hanesiak, 2017: The changing hail threat over North America in response to anthropogenic climate change. Nat. Climate Change, 7, 516–522. doi: 10.1038/nclimate3321
Cao, Y. C., F. Y. Tian, Y. G. Zheng, et al., 2018: Statistical characteristics of environmental parameters for hail over the two-step terrains of China. Plateau Meteor., 37, 185–196. (in Chinese) doi: 10.7522/j.issn.1000-0534.2017.00044
Czernecki, B., M. Taszarek, M. Marosz, et al., 2019: Application of machine learning to large hail prediction-The importance of radar reflectivity, lightning occurrence and convective parameters derived from ERA5. Atmos. Res., 227, 249–262. doi: 10.1016/j.atmosres.2019.05.010
Dai, Y., N. He, Z. Y. Fu, et al., 2019: Beijing intelligent grid temperature objective prediction method (BJTM) and verification of forecast result. J. Arid Meteor., 37, 339–344, 350. (in Chinese)
Dennis, E. J., and M. R. Kumjian, 2017: The impact of vertical wind shear on hail growth in simulated supercells. J. Atmos. Sci., 74, 641–663. doi: 10.1175/jas-d-16-0066.1
Diao, X. G., J. J. Zhu, X. S. Huang, et al., 2008: Application of VIL and VIL density in warning criteria for hailstorm. Plateau Meteor., 27, 1131–1139. (in Chinese)
Farr, T. G., P. A. Rosen, E. Caro, et al., 2007: The shuttle radar topography mission. Rev. Geophys., 45, RG2004. doi: 10.1029/2005rg000183
Gagne II, D. J., A. McGovern, J. Brotzge, et al., 2015: Day-ahead hail prediction integrating machine learning with storm-scale numerical weather models. Proc. 29th AAAI Conference on Artificial Intelligence, AAAI, Austin, TX, USA, 3954–3960.
Gagne II, D. J., A. McGovern, S. E. Haupt, et al., 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 1819–1840. doi: 10.1175/waf-d-17-0010.1
Guan, Y. H., F. L. Zheng, P. Zhang, et al., 2015: Spatial and temporal changes of meteorological disasters in China during 1950–2013. Nat. Hazards, 75, 2607–2623. doi: 10.1007/s11069-014-1446-3
Haberlie, A. M., and W. S. Ashley, 2018: A method for identifying midlatitude mesoscale convective systems in radar mosaics. Part I: Segmentation and classification. J. Appl. Meteor. Climatol., 57, 1575–1598. doi: 10.1175/jamc-d-17-0293.1
Hand, W. H., and G. Cappelluti, 2011: A global hail climatology using the UK Met Office convection diagnosis procedure (CDP) and model analyses. Meteor. Appl., 18, 446–458. doi: 10.1002/met.236
Klein, B., L. Wolf, and Y. Afek, 2015: A dynamic convolutional layer for short range weather prediction. Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, MA, USA, 4840–4848, doi: 10.1109/CVPR.2015.7299117.
Kunz, M., and M. Puskeiler, 2010: High-resolution assessment of the hail hazard over complex terrain from radar and insurance data. Meteor. Z., 19, 427–439. doi: 10.1127/0941-2948/2010/0452
Lahouar, A., and J. B. H. Slama, 2017: Hour-ahead wind power forecast based on random forests. Renew. Energy, 109, 529–541. doi: 10.1016/j.renene.2017.03.064
Li, C., 2014: Research on severe hail automatic identification and hail suppression decision technology. Master dissertation, University of Tianjin, Tianjin, 57 pp. (in Chinese)
López, L., and J. L. Sánchez, 2009: Discriminant methods for radar detection of hail. Atmos. Res., 93, 358–368. doi: 10.1016/j.atmosres.2008.09.028
Malek, S., R. Gunalan, S. Y. Kedija, et al., 2018: Random forest and self organizing maps application for analysis of pediatric fracture healing time of the lower limb. Neurocomputing, 272, 55–62. doi: 10.1016/j.neucom.2017.05.094
Mallafre, M. C., T. R. Ribas, M. del Carmen Llasat Botija, et al., 2009: Improving hail identification in the Ebro Valley region using radar observations: Probability equations and warning thresholds. Atmos. Res., 93, 474–482. doi: 10.1016/j.atmosres.2008.09.039
Manzato, A., 2012: Hail in northeast Italy: Climatology and bivariate analysis with the sounding-derived indices. J. Appl. Meteor. Climatol., 51, 449–467. doi: 10.1175/jamc-d-10-05012.1
Manzato, A., 2013: Hail in northeast Italy: A neural network ensemble forecast using sounding-derived indices. Wea. Forecasting, 28, 3–28. doi: 10.1175/waf-d-12-00034.1
Marzano, F. S., D. Scaranari, M. Montopoli, et al., 2008: Supervised classification and estimation of hydrometeors from C-band dual-polarized radars: A Bayesian approach. IEEE Trans. Geosci. Remote Sens., 46, 85–98. doi: 10.1109/tgrs.2007.906476
Scher, S., and G. Messori, 2018: Predicting weather forecast uncertainty with machine learning. Quart. J. Roy. Meteor. Soc., 144, 2830–2841. doi: 10.1002/qj.3410
Shi, J. Z., P. Wang, D. Wang, et al., 2019: Radar-based automatic identification and quantification of weak echo regions for hail nowcasting. Atmosphere, 10, 325. doi: 10.3390/atmos10060325
Shi, J. Z., P. Wang, D. Wang, et al., 2020: Radar-based hail-producing storm detection using positive unlabeled classification. Teh. Vjesn., 27, 941–950. doi: 10.17559/tv-20190903094335
Shi, X. J., Z. R. Chen, H. Wang, et al., 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proc. 28th International Conference on Neural Information Processing Systems, ACM, Montreal, Canada, 802–810.
Wang, P., and Y. Pan, 2013: Severe hail identification model based on saliency characteristics. Acta Phys. Sinica, 62, 069202. (in Chinese) doi: 10.7498/aps.62.069202
Wang, P., C. Li, and Y. Zhang, 2013: An adaptive segmentation arithmetic adapted to intertwined irregular convective storm images. Proc. 2013 International Conference on Machine Learning and Cybernetics, IEEE, Tianjin, China, 896–900, doi: 10.1109/ICMLC.2013.6890410.
Wang, P., Y. Zhang, C. Li, et al., 2014: Feature construction and AP clutter filtering based on gray lever co-occurrence matrix. Comput. Technol. Dev., 24, 1–5. (in Chinese)
Wang, P., Y. Gao, and C. Li, 2016: Method study of classification and recognition of thunderstorm system less than 50 km. Meteor. Mon., 42, 230–237. (in Chinese)
Wang, P., J. Y. Shi, J. Y. Hou, et al., 2018: The identification of hail storms in the early stage using time series analysis. J. Geophys. Res. Atmos., 123, 929–947. doi: 10.1002/2017jd027449
Witt, A., M. D. Eilts, G. J. Stumpf, et al., 1998: An enhanced hail detection algorithm for the WSR-88D. Wea. Forecasting, 13, 286–303. doi: 10.1175/1520-0434(1998)013<0286:AEHDAF>2.0.CO;2
Xu, Y. C., 1991: A comprehensive indexes recogniting hail cloud by using weather radar in mountain area of south Ningxia. Plateau Meteor., 10, 420–425. (in Chinese)
Xue, X. Y., G. Y. Ren, X. B. Sun, et al., 2019: Climatological characteristics of meso-scale and micro-scale strong convective weather events in China. Climatic Environ. Res., 24, 199–213. (in Chinese) doi: 10.3878/j.issn.1006-9585.2018.17148
Yang, J., K. Zhao, G. F. Zhang, et al., 2019: A Bayesian hydrometeor classification algorithm for C-band polarimetric radar. Remote Sens., 11, 1884. doi: 10.3390/rs11161884
Yao, H., X. D. Li, H. J. Pang, et al., 2020: Application of random forest algorithm in hail forecasting over Shandong peninsula. Atmos. Res., 244, 105093. doi: 10.1016/j.atmosres.2020.105093
Yu, X. D., and Y. G. Zheng, 2020: Advances in severe convection research and operation in China. J. Meteor. Res., 34, 189–217. doi: 10.1007/s13351-020-9875-2
Yu, X.-D., Y.-C. Wang, M.-X. Chen, et al., 2005: Severe convective weather warnings and its improvement with the introduction of the NEXRAD. Plateau Meteor., 24, 456–464. (in Chinese) doi: 10.3321/j.issn:1000-0534.2005.03.025
Yu, X. D., X. M. Wang, W. L. Li, et al., 2020: Thunderstorm and Severe Convection Nowcasting. China Meteorological Press, Beijing, 416 pp. (in Chinese)
Yuan, Y., P. Wang, D. Wang, et al., 2018: An algorithm for automated identification of gust fronts from Doppler radar data. J. Meteor. Res., 32, 444–455. doi: 10.1007/s13351-018-7089-7
Zhang, C. J., H. Y. Wang, J. Zeng, et al., 2020: Short-term dyna-mic radar quantitative precipitation estimation based on wavelet transform and support vector machine. J. Meteor. Res., 34, 413–426. doi: 10.1007/s13351-020-9036-7
Zhang, C. X., Q. H. Zhang, and Y. Q. Wang, 2008: Climatology of hail in China: 1961–2005. J. Appl. Meteor. Climatol., 47, 795–804. doi: 10.1175/2007jamc1603.1
Zhang, X. L., J. H. Sun, Y. G. Zheng, et al., 2020: Progress in severe convective weather forecasting in China since the 1950s. J. Meteor. Res., 34, 699–719. doi: 10.1007/s13351-020-9146-2
Zhao, J.-T., Y.-J. Yue, J.-A. Wang, et al., 2015: Study on spatio-temporal pattern of hail disaster in China mainland from 1950 to 2009. Chinese J. Agrometeor., 36, 83–92. (in Chinese) doi: 10.3969/j.issn.1000-6362.2015.01.011
Zhou, K. H., Y. G. Zheng, B. Li, et al., 2019: Forecasting different types of convective weather: A deep learning approach. J. Meteor. Res., 33, 797–809. doi: 10.1007/s13351-019-8162-6
Zou, S. P., 2017: Guizhou Hail Cloud Radar Echo Atlas. China Meteorological Press, Beijing, 423 pp. (in Chinese)
Na Li, Jun Zhang, Di Wang, et al. Research on Hail Mechanism Features Based on Dual-Polarization Radar Data. Atmosphere, 2023, 14(12): 1827.
DOI:10.3390/atmos14121827
Other cited types(0)
Search
Citation
Zhang, Y., Z. Ji, B. Xue, et al., 2021: A novel fusion forecast model for hail weather in plateau areas based on machine learning. J. Meteor. Res., 35(5), 896–910, doi: 10.1007/s13351-021-1021-2.
Zhang, Y., Z. Ji, B. Xue, et al., 2021: A novel fusion forecast model for hail weather in plateau areas based on machine learning. J. Meteor. Res., 35(5), 896–910, doi: 10.1007/s13351-021-1021-2.
Zhang, Y., Z. Ji, B. Xue, et al., 2021: A novel fusion forecast model for hail weather in plateau areas based on machine learning. J. Meteor. Res., 35(5), 896–910, doi: 10.1007/s13351-021-1021-2.
Citation:
Zhang, Y., Z. Ji, B. Xue, et al., 2021: A novel fusion forecast model for hail weather in plateau areas based on machine learning. J. Meteor. Res., 35(5), 896–910, doi: 10.1007/s13351-021-1021-2.
Export: BibTexEndNote
Article Metrics
Article views: 1066 PDF downloads: 94Cited by: 1
Manuscript History
Received: 26 January 2021
Revised: 02 June 2021
Accepted: 08 June 2021
Available online: 15 July 2021
Final form: 17 June 2021
Typeset Proofs: 21 July 2021
Issue in Progress: 26 August 2021
Published online: 25 October 2021
Share
Catalog
Abstract
摘要
1.
Introduction
2.
Data
2.1
Data sources
2.2
Data handling
3.
Proposed method
3.1
Feature representation
3.2
Classification and identification based on RF algorithm