Feature Construction and Identification of Convective Wind from Doppler Radar Data

基于多普勒雷达数据和机器学习模型的对流大风的特征构建与识别

+ Author Affiliations + Find other works by these authors

PDF

  • Convective wind is one of the common types of severe convective weather. Identification and Forecasting of convective wind are essential. In this paper, five kinds of features are firstly constructed from characteristics of typical convective wind-related echo phenomena based on Doppler radar data. The features include storm motion, high-value reflectivity, high-value velocity, velocity shear, and velocity texture. A severe convective wind (SCW) identification model is then built by applying the above features to the random forest model. With convective wind samples collected over 13 cities of China in June–August 2016, it is found that the probability of detection (POD) of SCW is 78.9%, the false alarm ratio (FAR) is 26.4%, and the critical success index (CSI) is 61.5%. For the convective wind samples that carry typical echo features, the POD, FAR, and CSI range from 89.4% to 99.3%, 4.2% to 16.0%, and 76.4% to 95.1%, respectively. Meanwhile, the POD and negative-case POD of samples without typical echo features are 66.8% and 85.4%, respectively. The experimental results demonstrate that the SCW identification model can classify non-SCW effectively, and performs better with SCW samples carrying typical echo features than without.
    对流大风是常见的强对流天气之一,对对流大风的智能识别十分重要。本文基于与对流大风相关的典型多普勒雷达回波现象的特征,构建了包括风暴运动特征、反射率高值特征、速度高值特征、速度切变特征和速度纹理特征等五种特征。在随机森林模型的基础上,利用上述特征建立了强对流大风(SCW)识别模型。以2016年6月、7月和8月中国十三个城市的对流大风事件作为样本,强对流大风的击中率(POD)达到为78.9%,误报率(FAR)为26.4%,临界成功指数(CSI)为61.5%。对于携带典型回波现象的对流风样本,其POD、FAR和CSI分别在89.4%–99.3%、4.2%–16.0%和76.4%–95.1%之间。同时,对无典型回波现象的样品的POD和负例击中率(NPOD)分别达到66.8% 和85.4%。实验结果表明本文强对流大风识别模型可以有效地对强对流大风和非强对流风进行分类,对携带典型回波现象的强对流大风样本的识别能力比对无典型回波现象的强对流大风样本更好。
  • 加载中
  • Fig. 1.  The parameter r and the positional relationship between points p and i.

    Fig. 2.  Illustration of association points for circular ${\rm{LBP}}_q^R$ codes: (a) R = 1, q = 8; (b) R = 2, q = 16; and (c) R = 2, q = 8.

    Fig. 3.  Frequency distribution histogram of each feature on SCW (red) and NSCW (blue) samples: (a) vCMS, (b) vCDS, (c) Ref99, (d) Rref, (e) Vel99, (f) Rvel, (g) shear features’ first principal component (Shearpca_1), and (h) texture features’ first principal component (VLBPpca_1).

    Fig. 4.  Frequency distribution histograms of radial velocity high-value features in the sample set of SCW samples with SWAs (black) and without SWAs (red): (a) Rvel and (b) Vel99.

    Fig. 5.  Performance of (a) $ {\text{Shea}}{{\text{r}}_{{\text{pca\_1}}}} $ and (b) $ {\text{VLB}}{{\text{P}}_{{\text{pca\_1}}}} $ in classifying SCW samples with/without MARC.

    Fig. 6.  Performance of (a) $ {\text{Shea}}{{\text{r}}_{{\text{pca\_1}}}} $ and (b) $ {\text{VLB}}{{\text{P}}_{{\text{pca\_1}}}} $ on SCW samples with/without mesocyclones.

    Fig. 7.  Performance of the moving speed feature in SCW sample sets with/without squall lines.

    Fig. 8.  Detailed timing information for three cases, including the maximum wind speed under each individual body scan, the occurrence of typical echo phenomena, and the model’s prediction results.

    Fig. 9.  Scatter diagrams for the three cases: (a) reflectivity high-value features, (b) radial velocity high-value and shear, and (c) shear and texture.

    Table 1.  Data sources

    SourceData typeResolutionTime period
    CMA Public Meteorological Service CenterRadar data1 km × 1 km, 6 minJune–August 2016
    ERA5Sounding data0.25° × 0.25°, 1 h
    NCEPSounding data2.5° × 2.5°, 6 h
    Automatic meteorological stationWind speed recording dataSpatially variable, 5 min
    Download: Download as CSV

    Table 2.  Sample number of each city

    IDCitySample numberIDCitySample number
    1Tianjin 889 8Changzhou1801
    2Nanjing1184 9Jinan1077
    3Shijiazhuang 75510Qingdao515
    4Cangzhou131611Yantai586
    5Yancheng 99812Weifang1167
    6Xuzhou108513Binzhou1340
    7Lianyungang 999Total13,712
    Download: Download as CSV

    Table 3.  Statistics of phenomena associated with convective wind in samples

    Sample setPhenomenonNo relevant phenomenon
    Squall lineSWAMARCMesocyclone
    NSCW sample 8381 80133 23 370 7980
    SCW sample 53312065564931592 3782
    Total13,712286689516196211,762
    Download: Download as CSV

    Table 4.  MI values of environmental parameters

    ParameterMI
    CAPE0.076
    K index0.025
    LI0.161
    Dewpoint0.050
    DCI0.023
    LR0.124
    VWS0.054
    Download: Download as CSV

    Table 5.  Sample number of each sample set

    Training
    set
    Validation
    set
    Testing
    set
    Total
    Number of SCW samples2518156812455331
    Number of NSCW samples4400198919928381
    Download: Download as CSV

    Table 6.  Comparison test results

    ModelPODFARCSINPOD
    cmp-model0.7020.2580.5640.846
    Radar feature model0.7890.2640.6150.823
    cmp and EP model0.7130.2390.5830.865
    Radar feature and EP model0.7980.2170.6530.861
    Download: Download as CSV

    Table 7.  Numbers of samples in each phenomenon group

    Testing setPhenomenonNo relevant phenomenon
    Squall lineSWAMARCMesocyclone
    Number of SCW samples47115138386 719
    Number of NSCW samples26 21 81131817
    Total731361464992536
    Download: Download as CSV

    Table 8.  Test results of the radar feature model and phenomenon-based model

    ModelPhenomenonNo relevant phenomenonAll samples
    Squall line SWA MARCMesocyclone
    PODPhenomenon-based model100%0%42.8%
    Radar feature model89.4%93.0%99.3%97.2%66.8%83.5%
    NPODPhenomenon-based model0%100%91.9%
    Radar feature model69.2%35.0%25.0%40.7%85.4%80.1%
    FARPhenomenon-based model35.6%14.8% 5.5%22.6%19.5%
    Radar feature model16.0%10.8% 4.2%15.1%35.6%23.0%
    CSIPhenomenon-based model64.4%85.2%94.5%77.4%38.7%
    Radar feature model76.4%83.6%95.1%82.8%48.8%66.8%
    Download: Download as CSV

    Table 9.  Information percentages of shear features

    Principal componentp1p2p3 +…+ p54
    Information percentage86.3%4.2%9.5%
    Download: Download as CSV

    Table 10.  Information percentages of texture features

    Principal componentp1p2p3p4 +…+ p18
    Information percentage60.2%14.0%9.5%16.3%
    Download: Download as CSV

    Table 11.  Detection radar, time, and continuous scan number of three cases (BT: Beijing Time)

    Case nameRadar locationDateStart time (BT)End time (BT)Scan number
    Case 1Binzhou201606130740090215
    Case 2Lianyungang201607221104121513
    Case 3Xuzhou201607180642073410
    Download: Download as CSV
  • [1]

    Augros, C., P. Tabary, A. Anquez, et al., 2013: Development of a nationwide, low-level wind shear mosaic in France. Wea. Forecasting, 28, 1241–1260. doi: 10.1175/WAF-D-12-00115.1.
    [2]

    Cao, C. Y., Y. Z. Chen, D. H. Liu, et al., 2015: The optical flow method and its application to nowcasting. Acta Meteor. Sinica, 73, 471–480. doi: 10.11676/qxxb2015.034. (in Chinese)
    [3]

    Hou, J. Y., and P. Wang, 2017: Mesocyclone automatic recognition method based on detection of velocity couplets. J. Tianjin Univ. (Sci. Technol.), 50, 1176–1184. (in Chinese)
    [4]

    Klimowski, B. A., M. J. Bunkers, M. R. Hjelmfelt, et al., 2003: Severe convective windstorms over the northern High Plains of the United States. Wea. Forecasting, 18, 502–519. doi: 10.1175/1520-0434(2003)18<502:SCWOTN>2.0.CO;2.
    [5]

    Kraskov, A., H. Stögbauer, and P. Grassberger, 2004: Estimating mutual information. Phys. Rev. E, 69, 066138. doi: 10.1103/PhysRevE.69.066138.
    [6]

    Lagerquist, R., A. McGovern, and T. Smith, 2017: Machine learning for real-time prediction of damaging straight-line convective wind. Wea. Forecasting, 32, 2175–2193. doi: 10.1175/WAF-D-17-0038.1.
    [7]

    Li, C., 2015: Research on severe hail automatic identification and hail suppression decision technology. Master dissertation, Tianjin University, Tianjin, 6–8. (in Chinese)
    [8]

    Niu, Z. Y., 2014: The extraction and relevance study of strong convective weather disasters based on the radar velocity image. Master dissertation, Tianjin University, Tianjin, 55–58. (in Chinese)
    [9]

    Ojala, T., M. Pietikäinen, and D. Harwood, 1994: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. Proceedings of 12th International Conference on Pattern Recognition, IEEE, Jerusalem, 582–585, doi: 10.1109/ICPR.1994.576366.
    [10]

    Ojala, T., M. Pietikäinen, and D. Harwood, 1996: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn., 29, 51–59. doi: 10.1016/0031-3203(95)00067-4.
    [11]

    Ojala, T., M. Pietikäinen, and T. Mäenpää, 2002: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell., 24, 971–987. doi: 10.1109/TPAMI.2002.1017623.
    [12]

    Przybylinski, R. W., 1995: The bow echo: Observations, numerical simulations, and severe weather detection methods. Wea. Forecasting, 10, 203–218. doi: 10.1175/1520-0434(1995)010<0203:TBEONS>2.0.CO;2.
    [13]

    Schmocker, G. K., R. W. Przybylinski, and Y.-J. Lin, 1996: Forecasting the initial onset of damaging downburst winds associated with a mesoscale convective system (MCS) using the mid-altitude radial convergence (MARC) signature. 15th Conference on Weather Analysis and Forecasting, Amer. Meteor. Soc., Norfolk, 306–311.
    [14]

    Wang, J., J.-G. Zhang, Y.-B. Wang, et al., 2009: Characters of Doppler radar velocity on intense gust in the east of Hubei Province. Torr. Rain Dis., 28, 143–146. doi: 10.3969/j.issn.1004-9045.2009.02.008. (in Chinese)
    [15]

    Wang, P., and Z.-Y. Niu, 2014: Automatic recognition of mid-altitude radial convergence and study on the relationship between the convergence and strong convective weather based on Doppler weather radar data. Acta Phys. Sinica, 63, 019201. doi: 10.7498/aps.63.019201. (in Chinese)
    [16]

    Wang, P., and B. J. Dou, 2018: Recognition of strong convergence field based on Doppler radar data. J. Tianjin Univ. (Sci. Technol.), 51, 797–809. (in Chinese)
    [17]

    Wapler, K., T. Hengstebeck, and P. Groenemeijer, 2016: Mesocyclones in Central Europe as seen by radar. Atmos. Res., 168, 112–120. doi: 10.1016/j.atmosres.2015.08.023.
    [18]

    Yang, L., F. Han, M. X. Chen, et al., 2018: Thunderstorm gale identification method based on support vector machine. J. Appl. Meteor. Sci., 29, 680–689. doi: 10.11898/1001-7313.20180604. (in Chinese)
    [19]

    Yang, X. L., and J. H. Sun, 2018: Organizational modes of severe wind-producing convective systems over North China. Adv. Atmos. Sci., 35, 540–549. doi: 10.1007/s00376-017-7114-2.
    [20]

    Yu, X. D., 2011: Detection and warnings of severe convection with Doppler weather radar. Adv. Meteor. Sci. Technol., 1, 31–41. (in Chinese)
    [21]

    Yu, X. D., and Y. G. Zheng, 2020: Advances in severe convection research and operation in China. J. Meteor. Res., 34, 189–217. doi: 10.1007/s13351-020-9875-2.
    [22]

    Yu, X. D., A. M. Zhang, Y. Y. Zheng, et al., 2006: Doppler radar analysis on a series of downburst events. J. Appl. Meteor. Sci., 17, 385–393. doi: 10.3969/j.issn.1001-7313.2006.04.001. (in Chinese)
    [23]

    Yu, X. D., X. G. Zhou, and X. M. Wang, 2012: The advances in the nowcasting techniques on thunderstorms and severe convection. Acta Meteor. Sinica, 70, 311–337. doi: 10.11676/qxxb2012.030. (in Chinese)
    [24]

    Yuan, Y., and P. Wang, 2018: Automatic detection of linear mesoscale convective systems. 2018 13th World Congress on Intelligent Control and Automation (WCICA), IEEE, Changsha, China, 170–174.
    [25]

    Yuan, Y., P. Wang, D. Wang, et al., 2020: A velocity dealiasing scheme based on minimization of velocity differences between regions. Adv. Meteor., 2020, 6157636. doi: 10.1155/2020/6157636.
    [26]

    Zhang, J., L. F. Yan, and J. Y. Hou, 2019: Evolution relationship between parameters of mesocyclone and severe convective storm. J. Tianjin Univ. (Sci. Technol.), 52, 277–284. (in Chinese)
    [27]

    Zhou, J. L., M. Wei, T. Wu, et al., 2011: Research on the identification method of Doppler radar data for convective windy weather. The 28th Annual Meeting of the Chinese Meteorological Society, Chinese Meteorological Society, Xiamen, China, 1096–1104. (in Chinese)
  • Di WANG and Yuchen BAO.pdf

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Feature Construction and Identification of Convective Wind from Doppler Radar Data

    Corresponding author: Di WANG, wangdi2015@tju.edu.cn
  • School of Electrical and Information Engineering, Tianjin University, Tianjin 300072
Funds: Supported by the Applied Foundation and Frontier Technology Research Program (Youth Project) of Tianjin, China (16JQNJC07500)

Abstract: Convective wind is one of the common types of severe convective weather. Identification and Forecasting of convective wind are essential. In this paper, five kinds of features are firstly constructed from characteristics of typical convective wind-related echo phenomena based on Doppler radar data. The features include storm motion, high-value reflectivity, high-value velocity, velocity shear, and velocity texture. A severe convective wind (SCW) identification model is then built by applying the above features to the random forest model. With convective wind samples collected over 13 cities of China in June–August 2016, it is found that the probability of detection (POD) of SCW is 78.9%, the false alarm ratio (FAR) is 26.4%, and the critical success index (CSI) is 61.5%. For the convective wind samples that carry typical echo features, the POD, FAR, and CSI range from 89.4% to 99.3%, 4.2% to 16.0%, and 76.4% to 95.1%, respectively. Meanwhile, the POD and negative-case POD of samples without typical echo features are 66.8% and 85.4%, respectively. The experimental results demonstrate that the SCW identification model can classify non-SCW effectively, and performs better with SCW samples carrying typical echo features than without.

基于多普勒雷达数据和机器学习模型的对流大风的特征构建与识别

对流大风是常见的强对流天气之一,对对流大风的智能识别十分重要。本文基于与对流大风相关的典型多普勒雷达回波现象的特征,构建了包括风暴运动特征、反射率高值特征、速度高值特征、速度切变特征和速度纹理特征等五种特征。在随机森林模型的基础上,利用上述特征建立了强对流大风(SCW)识别模型。以2016年6月、7月和8月中国十三个城市的对流大风事件作为样本,强对流大风的击中率(POD)达到为78.9%,误报率(FAR)为26.4%,临界成功指数(CSI)为61.5%。对于携带典型回波现象的对流风样本,其POD、FAR和CSI分别在89.4%–99.3%、4.2%–16.0%和76.4%–95.1%之间。同时,对无典型回波现象的样品的POD和负例击中率(NPOD)分别达到66.8% 和85.4%。实验结果表明本文强对流大风识别模型可以有效地对强对流大风和非强对流风进行分类,对携带典型回波现象的强对流大风样本的识别能力比对无典型回波现象的强对流大风样本更好。
    • The convective wind is one of the common types of strong convective weather (including convective wind, hail, short-term heavy precipitation, tornadoes, etc.). Convective wind is ground-level wind with speeds greater than or equal to 17.2 m s−1 that occurs due to strong atmospheric convection (Yu and Zheng, 2020). It can be local, sudden, destructive, and results in substantial economic losses every year on year. Therefore, forecasting convective wind is of great significance.

      Doppler weather radar produces high spatial and temporal resolution data (Yu, 2011) and is an important data source for observing and forecasting convective wind. A variety of echo phenomena in Doppler weather radar images are associated with convective wind, such as the low-elevation severe wind area (SWA; Niu, 2014; Lagerquist et al., 2017), mid-altitude radial convergence (MARC; Schmocker et al., 1996), mesocyclones (Wapler et al., 2016), outflow boundaries, and squall lines (Klimowski et al., 2003; Yang and Sun, 2018).

      The SWA refers to the region of the radial velocity image at low elevation, whose velocities are higher than a certain threshold. A low-altitude SWA often indicates catastrophic convective winds at ground level (Lagerquist et al., 2017). Niu (2014) conducted a statistical analysis of the relationship between SWA and strong convective weather and showed that SWAs are good indicators for convective wind forecasting.

      MARC refers to the radial convergence zone in the mid-altitudes of a convective storm. It is known that MARC is significant when radial velocity differences of 25 m s−1 or more occur in the altitudinal range of 3–7 km from the ground, and the probability of low-level winds, in this case, is significantly increased (Przybylinski, 1995). Wang and Niu (2014) examined 456 samples from 30 thunderstorm and gale processes in Tianjin, China, and found that 362 of them contained significant MARCs and proposed an automatic MARC identification method. Wang and Dou (2018) improved the automatic MARC identification algorithm (Wang and Dou, 2018) and detected significant MARC in 39 out of 57 strong convective processes that triggered convective winds. Yu et al. (2012) concluded that squall lines in strong vertical wind shear (WS) environments, bow echoes, supercell storms, multi-cell strong storms, and pulsating storms in weak vertical WS environments are all accompanied by MARC before producing strong surface gales. The mesocyclone is the Rankine vortex of storm-scale (1–10 km) (Wapler et al., 2016; Zhang et al., 2019). It is a critical basis for forecasting severe convective disasters, including convective winds (Wang et al., 2009).

      In Doppler radar data, the SWA is a large area of high radial velocity on the radial velocity image with elevation angles of 0.5°–1.5°. MARC or mesocyclones lead to strong shear in the radial velocity image, and correspond to a strong echo cell in the reflectivity image. Therefore, high values and high shear are distinctive features of the convective wind storm region, which lays the foundation for constructing features based on radial velocity images and even reflectivity images for convective wind storm identification.

      However, identifying convective wind storms based on convective wind-related echo phenomena requires accurate identification algorithms for these phenomena. Moreover, in statistical analyses of convective winds conducted by meteorologists, weakly organized storms without typical echo structures can also trigger convective winds (Yang and Sun, 2018). Therefore, more adaptable convective wind forecasting models need to be developed. Augros et al. (2013) used horizontal WS combined with four other warning rules for convective wind forecasting, thereby improving the forecasting effectiveness. Yang et al. (2018) used nine predictors for identifying convective winds and developed a forecast model using the support vector machine method. Lagerquist et al. (2017) collected U.S. radar data from 2000 to 2011 and combined them with sounding data to design and collate 431 predictors to build a convective wind forecasting model, with positive results.

      The above predictive algorithms were mostly derived from predictors for identifying strong convective weather, instead of being designed for forecasting convective wind. In this respect, the present paper reports on work in which relevant features are constructed from typical radar echo phenomena of convective wind storms to establish a more adaptable model for identifying convective wind storms.

      The rest of the paper is organized as follows. Section 2 introduces the data sources, data pre-processing, and calculation of labels. Section 3 introduces the methods employed, including the feature construction, principal component analysis (PCA), and the random forest model. Section 4 reports the testing and analysis, including comparing the proposed model and the severe convective wind identification model based on echo phenomena (phenomenon-based model) along with a thorough analysis of the constructed features in this paper. Section 5 summarizes the major findings of this study and suggests avenues for future work.

    2.   Data
    • This paper focuses on developing a wind speed identification model with high spatial and temporal resolution using radar data. However, to understand the gap between merging and not merging environmental information into the model, versions of the model with and without the sounding data are applied in experiments to see their difference in performance. Thus, we use three types of input data, as shown in Table 1—namely, radar data, sounding data, and wind speed recording data. The radar data, obtained from the CINRAD (China New Generation Weather Radar) network deployed by the China Meteorological Administration (CMA) Public Meteorological Service Center, are used to obtain convection samples and construct features; the sounding data, from the fifth major global reanalysis produced by ECMWF (ERA5) and the NCEP Reanalysis, are used to calculate environmental parameters as features; and the wind speed recording data, from automatic meteorological stations, are used to calculate the labels of samples. We utilize these three types of data in 13 cities in China from June to August in 2016 to construct a convective wind identification dataset. The 13 cities are Tianjin, Nanjing, Shijiazhuang, Cangzhou, Yancheng, Xuzhou, Lianyungang, Changzhou, Jinan, Qingdao, Yantai, Weifang, and Binzhou.

      SourceData typeResolutionTime period
      CMA Public Meteorological Service CenterRadar data1 km × 1 km, 6 minJune–August 2016
      ERA5Sounding data0.25° × 0.25°, 1 h
      NCEPSounding data2.5° × 2.5°, 6 h
      Automatic meteorological stationWind speed recording dataSpatially variable, 5 min

      Table 1.  Data sources

      Radar images are created by transforming the radar data from polar coordinates to Cartesian coordinates with nearest neighbor interpolation, and the resolution of an image is 1 km × 1 km in this study. The radar images include reflectivity images and radial velocity images. Each radar scans nine detection elevation angles: 0.5°, 1.5°, 2.4°, 3.4°, 4.3°, 6.0°, 9.9°, 14.6°, and 19.5°. The sounding data include temperature, humidity, and wind speed data for each pressure layer and some composite variables such as convective available potential energy (CAPE) and lifting index (LI) within the radar range. The wind speed recording data comprise the maximum wind velocity per hour and its occurrence time.

      As the ultimate aim here is to be able to provide warning for severe convective wind (SCW) by identifying severe convective storm cells, storm cell segmentation is performed to obtain the storm cells in the radar reflectivity images and filter out hyper-reflectivity via the expansion avoidance algorithm proposed by Li (2015). The reflectivity factor threshold taken for storm cell segmentation in this study is 30 dBZ. For the radar velocity data, the velocity dealiasing scheme of Yuan et al. (2020) is applied before the feature construction procedure. In order to ensure the validity of the radial velocity data, only storms located within 150 km of the radar are used as samples. For the data from automatic meteorological stations, only those records in which the wind velocity is higher than 9 m s−1 are retained to calculate the labels of convective storms, as described in the following subsection.

    • A wind event is considered as a convective wind event if the event is observed by an automatic station in the area of a convective storm. To ensure the validity of the positive sample set and negative sample set, the rules for labeling convective wind samples are as follows:

      Rule 1: If the maximum wind speed is higher than or equal to 17.2 m s−1 in all wind events reported by automatic stations within the storm cell area and within 12 min of the radar scan time, the storm cell is marked as an SCW sample.

      Rule 2: If a convective body satisfies Rule 1 and there is no splitting, merging, incipience, or extinction of its previous and next moment, its previous or next moment body is also marked as an SCW sample.

      Rule 3: If the maximum wind speed value is within 9–15 m s−1 in all wind events reported by automatic stations within the storm cell area and within 12 min of the radar scan time, and all wind events within 20 km from the boundary of the storm cell (including the area of the storm) and within 60 min of the radar scan time are less than 15 m s−1, the storm is marked as a non-strong convective wind (NSCW) sample.

      According to the above rules, 13,712 convective wind samples are obtained, of which 5331 are SCW samples (positive samples) and 8381 are NSCW samples (negative samples). The sample number for each city is shown in Table 2. It should be noted that the convective wind data information identified in this paper is observed by meteorological stations, and the convective wind that does not occur at meteorological stations (such as the gale confirmed by field investigation) may not be completely recorded.

      IDCitySample numberIDCitySample number
      1Tianjin 889 8Changzhou1801
      2Nanjing1184 9Jinan1077
      3Shijiazhuang 75510Qingdao515
      4Cangzhou131611Yantai586
      5Yancheng 99812Weifang1167
      6Xuzhou108513Binzhou1340
      7Lianyungang 999Total13,712

      Table 2.  Sample number of each city

      A certain number of samples can observe some echo phenomena in the sample set, such as a low-elevation SWA, MARC, mesocyclone, and squall line. The detailed statistics are shown in Table 3. If there is more than one phenomenon accompanying one convective wind sample, they are categorized in the priority order of squall line, SWA, MARC, and mesocyclone. In this paper, squall lines (Yuan and Wang, 2018), SWAs (Yuan and Wang, 2018), MARC (Wang and Dou, 2018), and mesocyclones (Hou and Wang, 2017) are obtained by intelligent identification methods.

      Sample setPhenomenonNo relevant phenomenon
      Squall lineSWAMARCMesocyclone
      NSCW sample 8381 80133 23 370 7980
      SCW sample 53312065564931592 3782
      Total13,712286689516196211,762

      Table 3.  Statistics of phenomena associated with convective wind in samples

    3.   Methods
    • This section elaborates on the construction procedure of the radar image features used for the identification of convective wind. Five kinds of features based on the radar images are constructed by referring to the characteristics of typical convective wind-related echo phenomena, including the storm’s (1) moving speed and core sinking speed, (2) high-value reflectivity features, (3) high-value radial velocity features, (4) radial velocity shear features, and (5) radial velocity texture features.

    • The storm’s moving speed and core sinking speed have a directional effect on the strong convective wind (Yang et al., 2018). Zhou et al. (2011) indicated that thunderstorms move slowly during the incipient and developing phases. The mature and waning phases are influenced by fast-moving cold surface outflows, causing the storm to move fast, especially in squall lines. A straight-line wind is also usually generated at ground level when a convective core at low altitude drops rapidly in height close to the ground (Yu et al., 2006). Accordingly, two features, cell move speed (vCMS) and cell down speed (vCDS), are constructed.

    • The optical flow method (Cao et al., 2015) is used to determine the predecessor of the convective storm. Then, the movement speed ${{\boldsymbol{v}}_p}({v_p},{\theta _p})$ of each pixel p within the storm cell in the radar images is obtained, where $ {v_p} $ is pixel velocity and $ {\theta _p} $ is the pixel velocity direction. Cell move speed is computed as follows:

      $$ {v_{\rm{CMS}}} = \frac{1}{n}\sqrt {{{\Bigg(\sum\limits_{p \in {\rm{cell}}} {{v_p}} \cos {\theta _p}\Bigg)}^2} + {{\Bigg(\sum\limits_{p \in {\rm{cell}}} {{v_p}} {\rm{sin}}{\theta _p}\Bigg)}^2}} . $$ (1)
    • $$ {v_{\rm{CDS}}} = \frac{{{h}(t) - {h}(t - \Delta t)}}{{\Delta t}} , $$ (2)

      where ${h}(t)$ and ${h}(t - \Delta t)$ are the storm core heights of its current and previous $ \Delta t $ moment.

    • The high-value reflectivity of storm cells is one of the essential features in strong convective weather forecasting. In general, the higher the value of reflectivity, the more possible it is that a convective hazard will occur.

    • In the composite reflectivity (CR) image, the area of reflectivity intensity greater than or equal to 50 dBZ in the storm cells is $ {S_{50}} $, and the area greater than or equal to 40 dBZ is $ {S_{40}} $. Thus, the percentage of high-value reflectivity can be calculated as:

      $$ {R_{\rm{ref}}} = \frac{{{S_{50}}}}{{{S_{40}}}} . $$ (3)
    • The points in $ {S_{40}} $ are sorted in ascending order according to their reflectivity values (${p_1},\;{p_2},\; \ldots$). Then, the 99th percentile of the maximum reflectivity of storm cells is:

      $$ {{\rm{Ref}}_{99}} = R({p_K}),\quad \frac{K}{{{S_{40}}}} \geqslant 0.99 , $$ (4)

      where $ K $ is the serial number of points after sorting and $ R({p_K}) $ is the reflectivity value of pixel $ {p_K} $.

    • The region of high values at low altitudes on the radial velocity images often corresponds to high winds at ground level (Wang et al., 2009). Therefore, the percentage of high-value velocity and the 99th percentile maximum velocity are constructed as features, collectively referred to as the storm’s high-value velocity features.

    • $$ {R_{\rm{vel}}} = \frac{{{S_{{{v}} \geqslant {\text{15}}}}}}{{{S_{30}}}} , $$ (5)

      where $ {S_{30}} $ represents the area of storm cell reflectivity intensity exceeding 30 dBZ and ${S_{{{v}} \geqslant 15}}$ is the area of radial velocity values $ v $ greater than or equal to 15 m s−1 in the storm cells.

    • Similar in purpose and approach to the design of Ref99, Vel99 is generated on a radial velocity image at an elevation angle of 0.5°:

      $$ {{\rm{Vel}}_{99}} = V({p_K}),\quad \frac{K}{{{S_{40}}}} \geqslant 0.99 , $$ (6)

      where K is the serial number of points after sorting and V(pK) is the radial velocity value at pixel pK.

    • The shear features correspond to the characteristics of atmospheric motion that easily cause convective wind, such as radiation, dispersion, and rotation. The construction method is as follows.

      First, the value of radial velocity WS at point p is obtained in the radial velocity image according to

      $$ {\rm{WS}}(p, r)=\frac{{\rm{max}}\{|v(p)-v(i)|\}}{r},i=1,2,3,4 , $$ (7)

      where v is the point radial velocity value, and r∈{1, 2, 3} is the distance between points p and i. The positional relationship between points p and i is shown in Fig. 1.

      Figure 1.  The parameter r and the positional relationship between points p and i.

      The number of pixels in the storm cells ranging with WS greater than or equal to the given threshold $ \rho $ is $ {N_\rho } $, and the long edge of the minimum outer wrapping rectangle of the storm region above 40 dBZ is $ {L_{40}} $. Then, the calculation of the velocity shear features of the storm is

      $$ R_{{\rm{WS}}\_r\_\rho\_\alpha} = \frac{{N_{r, \rho , \alpha }}}{L_{40}} , $$ (8)

      where r∈{1, 2, 3}, ρ∈{2, 4, 6}, and α∈{0.5°, 1.5°, 2.4°, 3.4°, 4.3°, 6.0°}.

      A high degree of similarity is presented in the shear features because of the spatial extension of the WS field in the convective region. A mutual information (MI) calculation is introduced to reduce the number of dimensions of the similar features and retain those with higher separability.

      The MI can be used to evaluate the correlation between variables (Kraskov et al., 2004) and is calculated as follows:

      $$ {\rm{MI}}({x_i},y) = \sum\limits_{{x_i} \in X}^{} {\sum\limits_{y \in Y}^{} {p({x_i},y)} } \log \frac{{p({x_i},y)}}{{p({x_i})p(y)}} , $$ (9)

      where $ {x_i} $ is the feature to be evaluated and y is the sample label; $ p({x_i}) $, $ p(y) $, and $ p({x_i},y) $ are probabilities that can be estimated from the sample set. MI measures the independence of $ {x_i} $ and $ y $.

      Based on the MI value of features, one feature per elevation angle is selected, and then the six-dimensional features under the following parameter combinations are selected:

      $$ \left[ {\begin{array}{*{20}{c}} \alpha \\ r \\ \rho \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {0.5^\circ }&{1.5^\circ }&{2.4^\circ }&{3.4^\circ }&{4.3^\circ }&{6.0^\circ } \\ 3&3&3&1&3&3 \\ 4&4&4&4&6&6 \end{array}} \right] . $$
    • The region of an SCW storm related to mesocyclones and MARC contains more rough texture compared to an NSCW storm. To extract the texture features of the radial velocity image of strong convective wind, the rotation invariant local binary pattern (RILBP) descriptor is chosen. The RILBP was proposed by Ojala et al. (2002) and was obtained based on the circular local binary pattern (LBP) (Ojala et al., 1994, 1996):

      First, in the $ (2R + 1) \times (2R + 1) $ window, calculate the difference between the gray value $f({p_{\rm{c}}})$ of the center point ${p_{\rm{c}}}$ and the gray value $ f(i) $ of q points that are separated by R, respectively. If $f(i) - f({p_{\rm{c}}}) > \lambda$, label the result “1;” otherwise, label it “0.” Then, obtain a q bit 0/1 code, called the ${\rm{LBP}}_q^R$ code, which is used as the LBP feature of the window center point ${p_{\rm{c}}}$. Considering that in radar radial velocity images the difference between adjacent gray levels after numerical discretization processing is 5 m s−1, the parameter λ is set to 5.

      The LBP is a gray-scale invariant method. Ojala et al. (2002) achieved the RILBP by:

      $$ {\rm{RILBP}}_q^R = \min \{ {\rm{ROR}}({\rm{LBP}}_q^R,i)|i = 0,1,...,q - 1\} , $$ (10)

      where ${\rm{ROR}}(x,i)$ performs a circular bit-wise right shift i times on the q-bit number x.

      Three ${\rm{LBP}}_q^R$ illustrations of descriptors are given separately in Fig. 2. Here, the parameters R = 2 and q = 8 are chosen, and two radial velocity image texture features are constructed based on the RILBP descriptor.

      Figure 2.  Illustration of association points for circular ${\rm{LBP}}_q^R$ codes: (a) R = 1, q = 8; (b) R = 2, q = 16; and (c) R = 2, q = 8.

    • Considering that in the radial velocity image there are more rough textures in the region of strong convective storms, the ratio of RILBP values above 1 in the storm cell region ${\varOmega _{{\rm{cell}}}}$ at six elevation angles is therefore defined as:

      $$\begin{aligned} & R_{{\rm{nz}}\_\alpha} = \frac{1}{n}\sum\limits_{i = 1}^n I({\rm{RILBP}}_8^2({p_i}) \geqslant 1, {p_i} \in {\varOmega _{{\text{cel}}{{\text{l}}_\alpha }}}) , \end{aligned}$$ (11)

      where $ I(X) $ is the indicator function, whose value equals 1 when $ X $ is true and 0 when $ X $ is false, and n is the number of points in ${\varOmega _{{\rm{cell}}_\alpha }}$ at elevation angle α∈{0.5°, 1.5°, 2.4°, 3.4°, 4.3°, 6.0°}.

    • Calculate the number of jumps m between 0 and 1 in the LBP code (LBP 0–1 and 1–0 jumps) for each point within the storm, in which m has five values. To set:

      $$ m'({p_i}) = \left\{ \begin{gathered} m({p_i}),\quad m({p_i}) < 4 \hfill \\ 4,\quad \quad\;\;\, \;m({p_i}) \geqslant 4 \hfill \\ \end{gathered} \right. , $$ (12)

      where $m({p_i}) $∈{0, 2, 4, 6, 8}. Obviously, $ m' $ is limited to three values by Eq. (12). Then, texture feature 2 is defined as:

      $$ R_{{\rm{J}}\_{m'}\_\alpha} = \frac{1}{{{n}}}\sum\limits_{i = 1}^n {I[{m'}({p_i})= {m'},{p_i} \in {\varOmega _{{\text{cel}}{{\text{l}}_\alpha }}}]} , $$ (13)

      where α∈{0.5°, 1.5°, 2.4°, 3.4°, 4.3°, 6.0°}.

      Overall, there are 6 Rnz_α and 18 RJ_m'_α. To reduce the number of features, the Rnz_α and RJ_m'_α are evaluated by the MI calculation method outlined in Section 3.1.4. Based on the MI value, the following three-dimensional, low elevation angle Rnz_α features are selected: Rnz_0.5°, Rnz_1.5°, and Rnz_2.4°; and the following three-dimensional, low elevation angle ${R_{{\rm{J}}\_{m'}\_\alpha}}$ features are selected: RJ_0_0.5°, RJ_0_1.5°, and RJ_0_2.4°.

      In summary, 18 features are constructed to describe a sample. Among them, 14 features are built based on the radial velocity image, including 2 high-value features, 6 shear features, and 6 texture features; and 4 features are built based on the reflectivity image, including 2 high-value features and 2 speed features. The frequency distribution histograms of each feature are shown in Fig. 3. Because of the poor classification ability of ${v_{\rm{CMS}}}$ and ${v_{\rm{CDS}}}$, they are not included in the model training.

      Figure 3.  Frequency distribution histogram of each feature on SCW (red) and NSCW (blue) samples: (a) vCMS, (b) vCDS, (c) Ref99, (d) Rref, (e) Vel99, (f) Rvel, (g) shear features’ first principal component (Shearpca_1), and (h) texture features’ first principal component (VLBPpca_1).

    • Convective parameters including CAPE, convective inhibition energy (CIN), and vertical wind shear (VWS) are generally accepted variables used in the forecasting of the likelihood and severity of an impending storm. CAPE, CIN, and K index are extracted from ERA5; LI is extracted from the NCEP dataset; and the dewpoint, deep convective index (DCI), 700–500-hPa lapse rate (LR), and VWS are calculated from other available ERA5 products. Since there are missing values for CIN, only the MI values for the remaining seven environmental parameters are calculated. Four environmental parameters (CAPE, LI, LR, and VWS) are selected according to the MI results in Table 4.

      ParameterMI
      CAPE0.076
      K index0.025
      LI0.161
      Dewpoint0.050
      DCI0.023
      LR0.124
      VWS0.054

      Table 4.  MI values of environmental parameters

    • To obtain the primary information of features, PCA is applied to the shear features and texture features. The validity and expression skill tests are performed on the combined features integrated with the principal components obtained.

      PCA is a method for transforming original n-dimensional features into n-dimensional composite features. After transformation, each principal component is a linear combination of the original n-dimensional features and is independent of others. PCA can measure each principal component’s percentage of information. The principal component with the largest percentage of information is the first principal component, the next largest is the second principal component, and so on. In general, after transforming the original features into principal components, the feature dimension number can be compressed to just a few or even one while retaining most of the information. The higher the correlation between the original features, the smaller the number of dimensions after PCA.

    • Random forest (RF) is an ensemble-based model that uses decision trees as base classifiers. Randomly sampled subsets of training samples train each base classifier. Therefore, the trees learn in different ways, and the generalization ability of the RF model is enhanced.

      This paper adopts the RF approach to construct an SCW identification model, which employs the features designed in Sections 3.1 and 3.2 as its inputs.

    4.   Experiments and analysis
    • The full dataset obtained as described in Section 2 is divided into a training set, a validation set, and a testing set. Storm cells on the same day are highly similar to each other. Therefore, in order to test the models properly, all samples from the same day are allocated to the same dataset. The specific partitioning is shown in Table 5.

      Training
      set
      Validation
      set
      Testing
      set
      Total
      Number of SCW samples2518156812455331
      Number of NSCW samples4400198919928381

      Table 5.  Sample number of each sample set

    • In this paper, we use the probability of detection (POD), the false alarm ratio (FAR), the critical success index (CSI), and negative-case POD (NPOD) to evaluate the model, which are calculated as follows:

      $$\hspace{40pt} {\rm{POD}} = \frac{A}{{ {A + B} }} , $$ (14)
      $$\hspace{40pt} {\rm{FAR}} = \frac{C}{{ {A + C} }} , $$ (15)
      $$\hspace{40pt} {\rm{CSI}} = \frac{A}{{ {A + B + C} }} , $$ (16)
      $$\hspace{40pt} {\rm{NPOD}}=\frac{D}{C+D} , $$ (17)

      where A is the number of SCW samples being identified as SCW, B is the number of SCW samples being identified as NSCW, C is the number of NSCW samples being identified as SCW, and D is the number of NSCW samples being identified as NSCW.

    • Experiments on the constructed datasets are carried out. Three models are trained and compared for their performance: (1) the RF model with 16 radar features constructed as described in Section 3.1, which is referred to as the radar feature model; (2) the RF model with 16 radar features and 4 environmental parameters as described in Section 3.2, which is referred to as the radar feature and EP model; and (3) the strong convective wind identification model of Lagerquist et al. (2017) as a comparison model, which is referred to as cmp-model. The cmp-model uses radar statistics, storm motion, shape parameters, and sounding indices to predict convective wind based on gradient-boosted ensembles or random forests. Except for the sounding indices, the other three kinds of predictors are utilized to construct the cmp-model in this paper to compare with the radar feature model.

      All models are trained by using the training set in Table 5. Hyperparameters are chosen according to their CSI score in the validation set, and the POD, FAR, CSI, and NPOD results are recorded based on the testing set of four models. The results are shown in Table 6.

      ModelPODFARCSINPOD
      cmp-model0.7020.2580.5640.846
      Radar feature model0.7890.2640.6150.823
      cmp and EP model0.7130.2390.5830.865
      Radar feature and EP model0.7980.2170.6530.861

      Table 6.  Comparison test results

      From the results, it can be seen that the performance of the radar feature model is better than that of the cmp-model in the testing set. The POD of the radar feature model is 8.7% higher than that of the cmp-model, the FAR decreases by 1.4%, and the CSI improves by 5.9%. On the other hand, the radar feature and EP model performs better than the radar feature model, which proves that environmental parameters contribute to convective wind forecasting. However, sounding data usually have low spatial and temporal resolution, whereas the purpose here is to develop a convective wind identification model with high spatial and temporal resolution based on radar data. Therefore, the following analysis focuses on the radar features and radar feature model.

    • Low-level SWAs, MARC, mesocyclones, squall lines, and other related phenomena are often accompanied by strong convective ground-level wind. Therefore, they are often used as a basis for judging the appearance of convective winds in practical use. Therefore, a phenomenon-based dataset is constructed with samples accompanied by SWAs, MARC, mesocyclones, and squall lines, as well as samples with no relevant phenomena. All samples in the testing set are divided into five groups: a squall line group, an SWA group, a MARC group, a mesocyclone group, and a no-relevant-phenomena group. The numbers of samples in each group are shown in Table 7.

      Testing setPhenomenonNo relevant phenomenon
      Squall lineSWAMARCMesocyclone
      Number of SCW samples47115138386 719
      Number of NSCW samples26 21 81131817
      Total731361464992536

      Table 7.  Numbers of samples in each phenomenon group

      For this section, experiments are conducted to compare the performance of the radar feature model and the phenomenon-based model on the phenomena dataset. The phenomenon-based model identifies SCW based on recognition of typical echo phenomena. If typical echo phenomena appear in radar images, the storm is judged as an SCW storm by the phenomenon-based model. It is assumed that the phenomenon-based model is ideal, i.e., the recognition rate of the echo phenomena is 100%. Comparisons with this idealized model can reveal the performance quality of the developed model when the convective wind is not caused by the relevant phenomena. Each group is tested separately by using the radar feature model, and the test results are shown in Table 8.

      ModelPhenomenonNo relevant phenomenonAll samples
      Squall line SWA MARCMesocyclone
      PODPhenomenon-based model100%0%42.8%
      Radar feature model89.4%93.0%99.3%97.2%66.8%83.5%
      NPODPhenomenon-based model0%100%91.9%
      Radar feature model69.2%35.0%25.0%40.7%85.4%80.1%
      FARPhenomenon-based model35.6%14.8% 5.5%22.6%19.5%
      Radar feature model16.0%10.8% 4.2%15.1%35.6%23.0%
      CSIPhenomenon-based model64.4%85.2%94.5%77.4%38.7%
      Radar feature model76.4%83.6%95.1%82.8%48.8%66.8%

      Table 8.  Test results of the radar feature model and phenomenon-based model

      From the results in Table 8, it is apparent that:

      (1) For the phenomenon-based model, the POD for each phenomenon group and the NPOD for the no-relevant-phenomena group are 100%. This is inevitable under the assumption that the recognition algorithm is perfectly correct. The NPOD for each phenomenon group and the POD for the no-relevant-phenomena group are zero. This is because the phenomenon-based model does not have the ability to identify SCW samples without phenomena, and will identify all NSCW samples with phenomena as SCW, which are its major drawbacks.

      (2) For the radar feature model, the POD and FAR for each phenomenon group are better than those for the no-relevant-phenomena group. The reason is that the features of the radar feature model are constructed according to the characteristics of typical phenomena related to convective wind. Meanwhile, there is a higher NPOD for the no-relevant-phenomena group.

      (3) For each phenomenon group, the POD of the phenomenon-based model is better than that of the radar feature model. However, the FAR of the radar feature model is lower than that of the phenomenon-based model, resulting in a higher CSI. This makes sense because the radar feature model has some ability to identify NSCW samples with phenomena.

      (4) For the no-relevant-phenomena group, the POD of the radar feature model increases by 66.8% compared to the phenomenon-based model, accompanied by a decrease in NPOD of approximately 14.6%. This proves that the radar feature model is more advantageous than the phenomenon-based model for identifying no-relevant-phenomena SCW samples.

      (5) For all samples in the phenomenon dataset, with a sample ratio of 1245 : 1992, higher POD, FAR, and CSI values are delivered by the radar feature model, accompanied by a certain decrease in NPOD, which is acceptable. It can be concluded that the model developed here has a strong ability to identify SCW samples with phenomena, whilst at the same time having some ability to discriminate between SCW and NSCW samples without phenomena.

    • Two experiments with all samples in Table 3 are conducted in this part of the study to test the validity of the speed features, high-value reflectivity and velocity features, shear features, and texture features: experiment 1 is designed to test the performance of SCW samples and NSCW samples, and experiment 2 to test the performance of SCW samples with and without phenomena.

    • The shear features obtained according to the method described in Section 3.1.4 have 54 dimensions. Furthermore, the interdependency among the shear features creates a high degree of information redundancy. Therefore, PCA is applied as described in Section 3.3 to transform the 54-dimensional features and reduce the dimensionality. The information percentage of each principal component is obtained, as shown in Table 9. It can be seen that the information content of the first principal component is 86.3%. Therefore, the shear features’ first principal component ${\rm{Shear}}_{{\text{pca\_1}}}$ is chosen to represent the 54-dimensional original shear features for the feature validity analysis.

      Principal componentp1p2p3 +…+ p54
      Information percentage86.3%4.2%9.5%

      Table 9.  Information percentages of shear features

    • The texture features obtained according to the method described in Section 3.1.5 have 18 dimensions. The PCA method is again applied, and each principal component’s information percentage is obtained, as shown in Table 10. The information of the first principal component accounts for more than 60%, so the first principal component ${\rm{VLBP}}_{{\text{pca\_1}}}$ of the texture features is chosen to represent the 18-dimensional original texture features for the feature validity analysis, as follows.

      Principal componentp1p2p3p4 +…+ p18
      Information percentage60.2%14.0%9.5%16.3%

      Table 10.  Information percentages of texture features

    • The classification capabilities of the different features including vCMS, vCDS, Ref99, Rref, Vel99, Rvel, Shearpca_1, and VLBPpca_1 need to be analyzed. To this end, the frequency distribution histogram of each feature is shown in Fig. 3.

      As can be seen from the distributions shown in Fig. 3, in terms of the separability of strong convective wind samples from non-strong convective wind samples:

      (1) The best performance appears with $ {\text{Shea}}{{\text{r}}_{{\text{pca\_1}}}} $ and $ {\text{VLB}}{{\text{P}}_{{\text{pca\_1}}}} $. Specifically, it can be stated that the variability of the shear and texture features based on radial velocity images is greater in these two sample sets. This is because storms that generate convective winds are usually accompanied by strong atmospheric motion, resulting in more shear in the radial velocity images.

      (2) The Vel99 and Rvel directly reflect the speed and high-value ratio of particle motion within the storm. The Ref99 and Rref directly reflect the storm intensity. Their performances are acceptable. As intensity increases, the possibility of convective gales will increase.

      (3) The histogram distributions of vCMS and vCDS are almost indistinguishable, indicating that for convective wind storms, these two features have little ability to classify SCW samples and NSCW samples.

    • To evaluate the performance of features in classifying SCW samples with SWAs, MARC, mesocyclones, and squall lines, histogram statistics are calculated and compared with the feature distributions of SCW samples without these phenomena.

    • Table 3 shows that, among the 689 convective wind samples with SWAs, SCW is observed in 556 samples, accounting for 80.7%. To verify the descriptive ability of the feature to the SWA, the high-value features Rvel and ${{\rm{Vel}}_{99}}$ of the radial velocity associated with SWAs are chosen. The distribution of features is shown in Fig. 4, using 4775 samples of SCW samples without SWAs and 556 samples of SCW samples with SWAs, as in Table 3.

      Figure 4.  Frequency distribution histograms of radial velocity high-value features in the sample set of SCW samples with SWAs (black) and without SWAs (red): (a) Rvel and (b) Vel99.

      Radial velocity high-value features for SCW samples with SWAs have overall higher values than those without SWAs. This indicates that these two features, which already have ability to distinguish between SCW and NSCW samples, will have even more ability to distinguish between SCW samples with accompanying SWAs.

    • From the samples collected in Table 3, it can be seen that, among the 516 samples of convective wind with significant MARC, there are as many as 493 SCW samples, accounting for 95.5%; plus, among the 1962 samples of convective wind samples with mesocyclones, there are 1592 samples reaching the strong convective wind level, accounting for 81.1%. Both MARC and mesocyclones are associated with stronger atmospheric motions such as convergence, dispersion, and rotation. Therefore, the shear feature $ {\text{Shea}}{{\text{r}}_{{\text{pca\_1}}}} $ and texture feature $ {\text{VLB}}{{\text{P}}_{{\text{pca\_1}}}} $ are chosen. Figure 5 shows the feature distributions of 4838 samples of SCW without MARC and 493 samples of SCW with MARC. Figure 6 shows the feature distributions of 3739 samples of SCW without mesocyclones and 1592 samples of SCW with mesocyclones, as in Table 3.

      Figure 5.  Performance of (a) $ {\text{Shea}}{{\text{r}}_{{\text{pca\_1}}}} $ and (b) $ {\text{VLB}}{{\text{P}}_{{\text{pca\_1}}}} $ in classifying SCW samples with/without MARC.

      Figure 6.  Performance of (a) $ {\text{Shea}}{{\text{r}}_{{\text{pca\_1}}}} $ and (b) $ {\text{VLB}}{{\text{P}}_{{\text{pca\_1}}}} $ on SCW samples with/without mesocyclones.

      From the results, it can be seen that: (1) the ranges of the texture feature value domain of the SCW sample set containing MARC or mesocyclones are overall higher than those of the SCW sample set without MARC or mesocyclones; (2) combined with Figs. 3g, h, the indication is that, for SCW samples containing MARC or mesocyclones, they can be further distinguished from NSCW samples in terms of shear and texture features.

    • Squall lines are usually accompanied by thunderstorms, high winds (or tornadoes), and hail, and are characterized by high energy and destructive power. As shown in Table 3, a total of 286 samples of squall line samples are included in the dataset, among which 206 samples of SCW occur, accounting for 72.0%. Considering the separate tests and discussions carried out for the possible low-level SWAs, MARC, and mesocyclones in a squall line, as well as the high-moving-speed characteristics of squall lines, the focus here is only on the tests of the single-body moving speed feature vCMS, and the distribution of this feature organized with the samples in Table 3 is shown in Fig. 7.

      Figure 7.  Performance of the moving speed feature in SCW sample sets with/without squall lines.

      Looking back at Fig. 3a, the vCMS is not very skilled at distinguishing between SCW and NSCW samples, while Fig. 7 demonstrates that SCW caused by a squall line tends to show a stronger vCMS, thus illustrating the usefulness of the “storm moving speed” feature.

      In summary, except for the weak ability of the moving speed features, all the other three kinds of features (i.e., high-value, shear, and texture features) constructed in this study perform well in identifying SCW and NSCW. Also, the value of SCW samples accompanied by the phenomena of SWAs, MARC, and mesocyclones, is always higher than without these phenomena. Moreover, the model has some ability to express SCW samples that do not show these phenomena.

    • In the test sample set, three cases of convective winds with continuous scans are selected to highlight the model’s ability to identify convective storms in advance and to compare the main features used. The first two cases are convective storms with and without the typical echo phenomenon (MARC), and the third case is a convective storm whose wind does not reach the level of strong convective winds. The detection radar, time, and number of body scans are shown in Table 11. Figure 8 presents their detailed timing information, including the maximum wind speed under the relevant single body (within 6 min after the volume scan time) for each case of the body-by-body scan, the occurrence of typical echo phenomena, and the model’s body-by-body scan prediction results.

      Case nameRadar locationDateStart time (BT)End time (BT)Scan number
      Case 1Binzhou201606130740090215
      Case 2Lianyungang201607221104121513
      Case 3Xuzhou201607180642073410

      Table 11.  Detection radar, time, and continuous scan number of three cases (BT: Beijing Time)

      Figure 8.  Detailed timing information for three cases, including the maximum wind speed under each individual body scan, the occurrence of typical echo phenomena, and the model’s prediction results.

      As shown in Fig. 8, the model correctly identifies convective storms with or without MARC, and identifies them six body scans (Case 1) earlier than the appearance of MARC and four body scans (Case 2) earlier than the appearance of strong winds on the ground. Also, the model correctly identifies Case 3 as NSCW at all times.

      In order to illustrate the performance of the extracted features in different types of convective wind sequences intuitively, two single reflectivity high-value features (2/2), two radial velocity shear features (2/6), one radial velocity texture feature (1/6), and one radial velocity high-value feature (1/2) are selected in four categories of 16-dimensional features to form three two-dimensional feature diagrams and conduct a comparative analysis.

      First, the reasons for selecting the six features mentioned above are as follows:

      (1) Compared with the number of radial velocity features, fewer features are based on single reflectivity. All the high-value reflectivity features, Rref and Ref99, are selected and made to form a two-dimensional feature diagram (Fig. 9a).

      Figure 9.  Scatter diagrams for the three cases: (a) reflectivity high-value features, (b) radial velocity high-value and shear, and (c) shear and texture.

      (2) For the high-value radial velocity, shear, and texture features, two two-dimensional feature diagrams are formed: “high-value radial velocity and shear” (Fig. 9b) and “shear and texture” (Fig. 9c). In the “high-value and shear” diagram, the high-value feature uses Vel99 and the shear feature uses RWS_3_6_4.3°. In the “shear and texture” diagram, the texture feature uses RJ_0_0.5° and the shear feature uses RWS_3_4_0.5°, which is at the same elevation angle.

      Scatter diagrams for the three cases are shown in Fig. 9, from which it can be seen that:

      (1) All four types of features show very high discrimination between SCW and NSCW in all three cases.

      (2) For the high-value reflectivity features (Fig. 9a), the ability to distinguish the presence or absence of typical echo phenomena mainly results from the Ref99 feature.

      (3) For both the high-value radial velocity feature and the radial velocity shear feature (Fig. 9b), convective storms accompanied by MARC are stronger than those without typical echoes. At the same time, these two types of features do not show a strong correlation for convective storms without typical echoes.

      (4) The nonlinear distribution of the case sample in the “shear and texture” feature plane presented in Fig. 9c indicates that these two types of features are not highly correlated, and their roles in the model cannot be replaced by each other.

    5.   Summary
    • The objective of this study is feature construction for severe convective wind and building of a model to identify the SCW. To this end, a single storm cell with a maximum ground wind speed of more than 17.2 m s−1 is regarded as an SCW sample, while wind speeds within 9–15 m s−1 are considered to be NSCW samples.

      A dataset is constructed by using the radar images and data from automatic meteorological stations in 13 cities of China from June to August 2016. Five kinds of features are constructed by referring to the characteristics of typical convective wind-related echo phenomena, and a model for identifying SCW is built.

      Results from testing the model and validity of the features indicate that most features perform well in distinguishing SCW and NSCW, and excel at distinguishing SCW from NSCW when carrying typical phenomena.

      At the same time, according to the radar feature and EP model results in Table 6, environmental parameters have a positive effect on the identification of convective wind. However, sounding data usually have a low temporal resolution (recorded at 0800 and 2000 BT) for real-time forecasting. Therefore, it is not easy to guarantee the validity of sounding data in practical use. It would be worthwhile exploring how to effectively use the information provided by environmental physical fields to jointly train a stronger convective wind recognition model with higher quality, which is the next step for our group in future work.

      Acknowledgments. The authors thank the CMA Public Meteorological Service Center for providing the source data.

Reference (27)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return