QpefBD: A Benchmark Dataset Applied to Machine Learning for Minute-Scale Quantitative Precipitation Estimation and Forecasting

QpefBD:应用于机器学习的分钟尺度定量降水估算和预报基准数据集

+ Author Affiliations + Find other works by these authors

PDF

  • Nowcasts of strong convective precipitation and radar-based quantitative precipitation estimations have always been hot yet challenging issues in meteorological sciences. Data-driven machine learning, especially deep learning, provides a new technical approach for the quantitative estimation and forecasting of precipitation. A high-quality, large-sample, and labeled training dataset is critical for the successful application of machine-learning technology to a specific field. The present study develops a benchmark dataset that can be applied to machine learning for minute-scale quantitative precipitation estimation and forecasting (QpefBD), containing 231,978 samples of 3185 heavy precipitation events that occurred in 6 provinces of central and eastern China from April to October 2016–2018. Each individual sample consists of 8 products of weather radars at 6-min intervals within the time window of the corresponding event and products of 27 physical quantities at hourly intervals that describe the atmospheric dynamic and thermodynamic conditions. Two data labels, i.e., ground precipitation intensity and areal coverage of heavy precipitation at 6-min intervals, are also included. The present study describes the basic components of the dataset and data processing and provides metrics for the evaluation of model performance on precipitation estimation and forecasting. Based on these evaluation metrics, some simple and commonly used methods are applied to evaluate precipitation estimates and forecasts. The results can serve as the benchmark reference for the performance evaluation of machine learning models using this dataset.This paper also gives some suggestions and scenarios of the QpefBD application. We believe that the application of this benchmark dataset will promote interdisciplinary collaboration between meteorological sciences and artificial intelligence sciences, providing a new way for the identification and forecast of heavy precipitation.
    本研究发展了一套可直接应用于机器学习的分钟尺度定量降水估算和预报基准数据集(QpefBD)。数据集包含了中国中东部地区6个省2016–2018年4–10月的3185次强降水天气过程的231,978个样本数据,样本数据由过程时间窗内的逐6分钟的8种天气雷达产品,表达大气热力和动力天气环境条件的27个逐小时物理量产品,以及表达地面降水空间分布和地面强降水落区的两类逐6分钟地面降水标记数据组成。本文描述了数据集的基本组成和数据处理过程,给出了模型对降水估算和预报性能的评价指标,基于该评价指标,计算给出了利用一些简单和通用方法对降水进行定量估算和预报的性能评估结果,以此作为机器学习模型性能评估的参考基准。
  • 加载中
  • Fig. 1.  Examples of 2D patterns of several variables contained in a sample from a heavy precipitation event [2337 BT (Beijing Time) 18 to 1226 BT 19 June 2016]: (a) composite radar reflectivity and (b) radar reflectivity at an altitude of 3 km in Wuhan at 0803 BT 19 June, (c) lifted condensation level and (d) level of free convection at 0800 BT 19 June, (e) 6-min precipitation (label data), and (f) heavy rainfall area (label data) at 0803 BT 19 June. The dashed, black squares in (c) and (d) show areas where radar data and label data are consistent.

    Fig. 2.  Locations of national-level surface stations (solid, colored dots) and weather radars (open, black circles). Rainfall observations made at these surface stations are used to determine heavy precipitation events. The colors show the occurrence frequencies of single-station, short-term, heavy precipitation events from 2016 to 2018. The black box denotes the area coverage by a single radar.

    Fig. 3.  Distribution of ground precipitation observation stations.

    Fig. 4.  Percentage of missing values in the precipitation intensity label data before (black bars) and after (gray bars) their filling for the six provinces, i.e., Anhui (AH), Fujian (FJ), Hubei (HB), Hunan (HN), Zhejiang (ZJ), Jiangxi (JX), and the average (TOTAL).

    Fig. B1.  Threat scores (TSs) for the persistence forecasts of 6-min precipitation with different magnitudes [0, ≥ 0.1, ≥ 1, ≥ 2, and ≥ 3 mm (6 min)−1] as a function of forecast lead time.

    Fig. B2.  Probability of detection (POD), false alarm ratio (FAR), and threat score (TS) for persistence forecasts of 6-min precipitation as a function of forecast lead time.

    Fig. C1.  Threat scores (TSs) for forecasts of 6-min precipitation with different magnitudes [0, ≥ 0.1, ≥ 1, ≥ 2, and ≥ 3 mm (6 min)−1] as a function of forecast lead time using the optical flow extrapolation method.

    Fig. C2.  False alarm ratio (FAR), probability of detection (POD), and threat score (TS) for consecutive forecasts of 6-min precipitation as a function of forecast lead time using the optical flow extrapolation method.

    Table 1.  The number of CHRE-Rs (NCHRE-R) and the number of samples (Nsample) for each year

    NCHRE-RNsample
    20161121 89,268
    2017 881 62,551
    20181183 80,159
    Total3185231,978
    Download: Download as CSV

    Table 2.  Radar data products contained in the benchmark dataset. ASL stands for above sea level

    NameIdentifierUnit
    Composite reflectivityCRdBZ
    Hybrid reflectivityHBRdBZ
    Reflectivity at 2 km ASLCAP02dBZ
    Reflectivity at 3 km ASLCAP03dBZ
    Reflectivity at 4 km ASLCAP04dBZ
    Reflectivity at 5 km ASLCAP05dBZ
    Echo top (18 dBZ)ETkm
    Vertically integrated
    liquid water content
    VILkg m−2
    Download: Download as CSV

    Table 3.  The 27 atmospheric condition parameters contained in QpefBD

    Water vaporUpliftingStatic instability
    Mixing ratio (Q) at 1000, 850, and 700 hPaDivergence at 925, 850, and 200 hPaK index (KI)
    Wind speed at 925, 700, and 500 hPa
    Relative humidity (RH) at 1000, 700, and 500 hPaConvective available potential energy (CAPE)
    Precipitable water (PWAT)Lifting index (LI) at 500 and 300 hPa
    Precipitation efficiency (PE)Elevation at the 0°C level
    Average dew point temperature from the ground to 850 hPaElevation at the −10°C level
    700–400-hPa mean difference between temperature and dew point temperatureElevation at the wet bulb temperature of 0°C
    700–400-hPa maximum difference between temperature and dew point temperatureLevel of free convection (LFC)
    Mean lifted condensation level below 100 hPa (LCL)
    Mean unstable lifted condensation level below 100 hPa (muLCL)
    Download: Download as CSV

    Table A1.  RMSEs of radar precipitation estimates based on the ZR relationship

    0 mm≥ 0.1 mm≥ 5 mm≥ 10 mm≥ 15 mm≥ 20 mmAverage
    Number of samples31,699,46939,861,6909,966,2874,884,5802,729,2761,565,381
    POD (%)77.6878.4737.3225.5719.3415.1167.57
    FAR (%)30.7823.7840.6348.9555.3962.6229.20
    TS (%)57.7463.0429.7220.5415.5912.0652.84
    RMSE
    (mm h−1)
    1.886.3511.815.7619.4323.23 8.27
    Download: Download as CSV

    Table B1.  RMSEs of 6-min precipitation persistence forecasts

    Lead time (min)6121824303642485460667278849096102108114120
    RMSE [mm (6 min)−1]0.350.490.560.610.640.660.680.690.700.710.720.730.730.740.740.750.750.760.760.76
    Download: Download as CSV

    Table C1.  RMSEs of 6-min precipitation forecasts generated by the optical flow extrapolation method

    Lead time (min)6121824303642485460667278849096102108114120
    RMSE [mm (6 min)−1]0.360.480.550.590.620.640.650.660.670.680.690.690.700.700.700.710.710.710.710.72
    Download: Download as CSV
  • [1]

    Blumberg, W. G., K. T. Halbert, T. A. Supinie, et al., 2017: SHARPpy: An open-source sounding analysis toolkit for the atmospheric sciences. Bull. Amer. Meteor. Soc., 98, 1625–1636. doi: 10.1175/BAMS-D-15-00309.1.
    [2]

    Chen, X. C., K. Zhao, and M. Xue, 2014: Spatial and temporal characteristics of warm season convection over Pearl River Delta region, China, based on 3 years of operational radar data. J. Geophys. Res. Atmos., 119, 12,447–12,465. doi: 10.1002/2014JD021965.
    [3]

    Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm identification, tracking, analysis, and nowcasting—A radar-based methodology. J. Atmos. Oceanic Technol., 10, 785–797. doi: 10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.
    [4]

    Doswell, C. A. III, H. E. Brooks, and R. A. Maddox, 1996: Flash flood forecasting: An ingredients-based methodology. Wea. Forecasting, 11, 560–581. doi: 10.1175/1520-0434(1996)011<0560:FFFAIB>2.0.CO;2.
    [5]

    Foresti, L., I. V. Sideris, D. Nerini, et al., 2019: Using a 10-year radar archive for nowcasting precipitation growth and decay: A probabilistic machine learning approach. Wea. Forecasting, 34, 1547–1569. doi: 10.1175/WAF-D-18-0206.1.
    [6]

    Gagné, D. J. II, A. McGovern, S. E. Haupt, et al., 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 1819–1840. doi: 10.1175/WAF-D-17-0010.1.
    [7]

    Gupta, R., R. Hosfelt, S. Sajeev, et al., 2019: xBD: A dataset for assessing building damage from satellite imagery. Availableonline at https://arxiv.org/abs/1911.09296. Accessedon30December 2021.
    [8]

    Han, L., J. Z. Sun, W. Zhang, et al., 2017: A machine learning nowcasting method based on real-time reanalysis data. J. Geophys. Res. Atmos., 122, 4038–4051. doi: 10.1002/2016JD025783.
    [9]

    Hersbach, H., B. Bell, P. Berrisford, et al., 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 1999–2049. doi: 10.1002/qj.3803.
    [10]

    Jing, J. R., Q. Li, and X. Peng, 2019: MLC-LSTM: Exploiting the spatiotemporal correlation between multi-level weather radar echoes for echo sequence extrapolation. Sensors, 19, 3988. doi: 10.3390/s19183988.
    [11]

    Johnson, J. T., P. L. MacKeen, A. Witt, et al., 1998: The storm cell identification and tracking algorithm: An enhanced WSR-88D algorithm. Wea. Forecasting, 13, 263–276. doi: 10.1175/1520-0434(1998)013<0263:TSCIAT>2.0.CO;2.
    [12]

    Lagerquist, R., A. McGovern, and T. Smith, 2017: Machine learning for real-time prediction of damaging straight-line convective wind. Wea. Forecasting, 32, 2175–2193. doi: 10.1175/WAF-D-17-0038.1.
    [13]

    Leng, L., X. Y. Huang, H. P. Yang, et al., 2012: Recognition and application of Doppler weather radar clear air echoes. Meteor. Sci. Technol., 40, 534–541. doi: 10.3969/j.issn.1671-6345.2012.04.004. (in Chinese)
    [14]

    Liu, L. P., L. L. Wu, and Y. M. Yang, 2007: Development of fuzzy-logical two-step ground clutter detection algorithm. Acta Meteor. Sinica, 65, 252–260. doi: 10.3321/j.issn:0577-6619.2007.02.011. (in Chinese)
    [15]

    Liu, Y., D. G. Xi, Z. L. Li, et al., 2015: A new methodology for pixel-quantitative precipitation nowcasting using a pyramid Lucas Kanade optical flow approach. J. Hydrol., 529, 354–364. doi: 10.1016/j.jhydrol.2015.07.042.
    [16]

    Marzban, C., and G. J. Stumpf, 1996: A neural network for tornado prediction based on Doppler radar-derived attributes. J. Appl. Meteor. Climatol., 35, 617–626. doi: 10.1175/1520-0450(1996)035<0617:ANNFTP>2.0.CO;2.
    [17]

    Marzban, C., and A. Witt, 2001: A Bayesian neural network for severe-hail size prediction. Wea. Forecasting, 16, 600–610. doi: 10.1175/1520-0434(2001)016<0600:ABNNFS>2.0.CO;2.
    [18]

    Mecikalski, J. R., J. K. Williams, C. P. Jewett, et al., 2015: Probabilistic 0–1-h convective initiation nowcasts that combine geostationary satellite observations and numerical weather prediction model data. J. Appl. Meteor. Climatol., 54, 1039–1059. doi: 10.1175/JAMC-D-14-0129.1.
    [19]

    Pan, Y., Y. Shen, J. J. Yu, et al., 2015: An experiment of high-resolution gauge-radar-satellite combined precipitation retrieval based on the Bayesian merging method. Acta Meteor. Sinica, 73, 177–186. doi: 10.11676/qxxb2015.010. (in Chinese)
    [20]

    Perler, D., and O. Marchand, 2009: A study in weather model output postprocessing: Using the boosting method for thunderstorm detection. Wea. Forecasting, 24, 211–222. doi: 10.1175/2008WAF2007047.1.
    [21]

    Rasp, S., P. D. Dueben, S. Scher, et al., 2020: WeatherBench: A benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst., 12, e2020MS002203. doi: 10.1029/2020MS002203.
    [22]

    Reichstein, M., G. Camps-Valls, B. Stevens, et al., 2019: Deep learning and process understanding for data-driven Earth system science. Nature, 566, 195–204. doi: 10.1038/s41586-019-0912-1.
    [23]

    Russakovsky, O., J. Deng, H. Su, et al., 2015: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis., 115, 211–252. doi: 10.1007/s11263-015-0816-y.
    [24]

    Shi, X. J., Z. R. Chen, H. Wang, et al., 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proceedings of the 28th International Conference on Neural Information Processing Systems, MIT, Montréal, Canada, 802–810.
    [25]

    Shi, X. J., Z. H. Gao, L. Lausen, et al., 2017: Deep learning for precipitation nowcasting: A benchmark and a new model. Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Long Beach, CA, USA, 5622–5632.
    [26]

    Sønderby, C. K., L. Espeholt, J. Heek, et al., 2020: MetNet: A neural weather model for precipitation forecasting. Available online at https://arxiv.org/pdf/2003.12140.pdf. Accessedon 30 December 2021.
    [27]

    Su, H., J. Deng, and F.-F. Li, 2012: Crowdsourcing annotations for visual object detection. Available online at http://vision.stanford.edu/pdf/bbox_submission.pdf. Accessedon 30 December 2021.
    [28]

    Sun, J. Z., M. Xue, J. W. Wilson, et al., 2014: Use of NWP for nowcasting convective precipitation: Recent progress and challenges. Bull. Amer. Meteor. Soc., 95, 409–426. doi: 10.1175/BAMS-D-11-00263.1.
    [29]

    Tan, X., L. P. Liu, and S. R. Fan, 2013: Statistical characteristics of sea clutter and its identification with the CINRAD. Acta Meteor. Sinica, 71, 962–975. doi: 10.11676/qxxb2013.074. (in Chinese)
    [30]

    Tang, X. W., J. P. Tang, and X. L. Zhang, 2010: An ingredient-based operational heavy rain quantitative forecast system. J. Nanjing Univ. (Nat. Sci.), 46, 277–283. (in Chinese)
    [31]

    Weber, E., and H. Kané, 2020: Building disaster damage assessment in satellite imagery with multi-temporal fusion. Available online at https://arxiv.org/pdf/2004.05525.pdf. Accessed on 30 December 2021.
    [32]

    Wen, H., L. P. Liu, C. A. Zhang, et al., 2016: Operational evaluation of radar data quality control for ground clutter and electromagnetic interference. J. Meteor. Sci., 36, 789–799. doi: 10.3969/2015jms.0085. (in Chinese)
    [33]

    Xiao, Y. J., and L. P. Liu, 2006: Study of methods for interpolating data from weather radar network to 3-D grid and mosaics. Acta Meteor. Sinica, 64, 647–657. doi: 10.3321/j.issn:0577-6619.2006.05.011. (in Chinese)
    [34]

    Xiao, Y. J., L. P. Liu, and H. P. Yang, 2008: Technique for generating hybrid reflectivity field based on 3-D mosaicked reflectivity of weather radar network. Acta Meteor. Sinica, 66, 470–473. doi: 10.3321/j.issn:0577-6619.2008.03.016. (in Chinese)
    [35]

    Ying, M., W. Zhang, H. Yu, et al., 2014: An overview of the China meteorological administration tropical cyclone database. J. Atmos. Oceanic Technol., 31, 287–301. doi: 10.1175/JTECH-D-12-00119.1.
    [36]

    Yu, X. D., and Y. G. Zheng, 2020: Advances in severe convection research and operation in China. J. Meteor. Res., 34, 189–217. doi: 10.1007/s13351-020-9875-2.
    [37]

    Yu, X. D., X. P. Yao, T. N. Xiong, et al., 2006: The Principle and Operational Application of Doppler Weather Radar. China Meteorological Press, Beijing, 185 pp. (in Chinese)
    [38]

    Zhang, W., L. Han, J. Z. Sun, et al., 2019: Application of multi-channel 3D-cube successive convolution network for convective storm nowcasting. 2019 IEEE International Conference on Big Data (Big Data), IEEE, Los Angeles, CA, USA, 1705–1710.
    [39]

    Zhang, X. L., S. Y. Tao, and J. H. Sun, 2010: Ingredients-based heavy rainfall forecasting. Chinese J. Atmos. Sci., 34, 754–766. (in Chinese)
    [40]

    Zhang, X. L., Y. Chen, and T. Zhang, 2012: Meso-scale convective weather analysis and severe convective weather forecasting. Acta Meteor. Sinica, 70, 642–654. doi: 10.11676/qxxb2012.052. (in Chinese)
    [41]

    Zhang, X. L., J. H. Sun, Y. G. Zheng, et al., 2020: Progress in severe convective weather forecasting in China since the 1950s. J. Meteor. Res., 34, 699–719. doi: 10.1007/s13351-020-9146-2.
    [42]

    Zhou, K. H., Y. G. Zheng, B. Li, et al., 2019: Forecasting different types of convective weather: A deep learning approach. J. Meteor. Res., 33, 797–809. doi: 10.1007/s13351-019-8162-6.
    [43]

    Zhou, K. H., Y. G. Zheng, W. S. Dong, et al., 2020: A deep learning network for cloud-to-ground lightning nowcasting with multisource data. J. Atmos. Oceanic Technol., 37, 927–942. doi: 10.1175/JTECH-D-19-0146.1.
  • Anyuan XIONG.pdf

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

QpefBD: A Benchmark Dataset Applied to Machine Learning for Minute-Scale Quantitative Precipitation Estimation and Forecasting

    Corresponding author: Anyuan XIONG, xay@cma.gov.cn
  • 1. National Meteorological Information Center, China Meteorological Administration, Beijing 100081
  • 2. Jiangxi Meteorological Observatory, Nanchang 330096
  • 3. Anhui Weather Modification Office, Hefei 230061
Funds: Supported by the National Key Research and Development Program of China (2018YFC1507305)

Abstract: Nowcasts of strong convective precipitation and radar-based quantitative precipitation estimations have always been hot yet challenging issues in meteorological sciences. Data-driven machine learning, especially deep learning, provides a new technical approach for the quantitative estimation and forecasting of precipitation. A high-quality, large-sample, and labeled training dataset is critical for the successful application of machine-learning technology to a specific field. The present study develops a benchmark dataset that can be applied to machine learning for minute-scale quantitative precipitation estimation and forecasting (QpefBD), containing 231,978 samples of 3185 heavy precipitation events that occurred in 6 provinces of central and eastern China from April to October 2016–2018. Each individual sample consists of 8 products of weather radars at 6-min intervals within the time window of the corresponding event and products of 27 physical quantities at hourly intervals that describe the atmospheric dynamic and thermodynamic conditions. Two data labels, i.e., ground precipitation intensity and areal coverage of heavy precipitation at 6-min intervals, are also included. The present study describes the basic components of the dataset and data processing and provides metrics for the evaluation of model performance on precipitation estimation and forecasting. Based on these evaluation metrics, some simple and commonly used methods are applied to evaluate precipitation estimates and forecasts. The results can serve as the benchmark reference for the performance evaluation of machine learning models using this dataset.This paper also gives some suggestions and scenarios of the QpefBD application. We believe that the application of this benchmark dataset will promote interdisciplinary collaboration between meteorological sciences and artificial intelligence sciences, providing a new way for the identification and forecast of heavy precipitation.

QpefBD:应用于机器学习的分钟尺度定量降水估算和预报基准数据集

本研究发展了一套可直接应用于机器学习的分钟尺度定量降水估算和预报基准数据集(QpefBD)。数据集包含了中国中东部地区6个省2016–2018年4–10月的3185次强降水天气过程的231,978个样本数据,样本数据由过程时间窗内的逐6分钟的8种天气雷达产品,表达大气热力和动力天气环境条件的27个逐小时物理量产品,以及表达地面降水空间分布和地面强降水落区的两类逐6分钟地面降水标记数据组成。本文描述了数据集的基本组成和数据处理过程,给出了模型对降水估算和预报性能的评价指标,基于该评价指标,计算给出了利用一些简单和通用方法对降水进行定量估算和预报的性能评估结果,以此作为机器学习模型性能评估的参考基准。
    • Data-driven machine-learning models have been widely used in weather forecasting in recent years. Using ground-based and airborne observations, such as lightning data and observations from radar systems, aircraft, and earth-observing satellites combined with mesoscale numerical weather prediction (NWP) products, machine-learning techniques have demonstrated better forecasting results than operational forecasting for severe convective weather like thunderstorms (Perler and Marchand, 2009), cloud-to-ground lighting (Zhou et al., 2020), straight-line wind storms (Lagerquist et al., 2017), hail (Marzban and Witt, 2001; Gagné et al., 2017), tornadoes (Marzban and Stumpf, 1996), and convective initiation (Mecikalski et al., 2015). The end-to-end, deep-learning technique that has been rapidly developed in recent years, e.g., convolution neural network (CNN) models, has great potential for weather forecasting. For example, the deep-learning models of Convolutional Long Short-Term Memory (ConvLSTM; Shi et al., 2015), Trajectory Gate Recurrent Unit (TrajGRU; Shi et al., 2017), and Multi-Level Correlation Long Short-Term Memory (MLC-LSTM; Jing et al., 2019) use two- or three-dimensional (2D or 3D) gridded data of previous radar reflectivity to predict the intensity of radar echoes (or precipitation) in the subsequent 0–2 hours. Compared with traditional radar extrapolation technology, these models show much better forecasting skills. The threat score (TS) of forecasts of severe convective weather like thunderstorms, heavy precipitation, hail, and gales by the CNN model, which uses physical quantities derived from NWP products as input, is well above the TS of forecasts by meteorologists in operational service (Zhou et al., 2019).

      Data-driven machine learning, especially deep learning, is mostly based on labeled training datasets. The relationship between the predicted and predictors can be obtained from historical datasets, serving as the basis for developing forecast models for target variables. The larger the sample size of the training dataset, the more complete the knowledge learned by the model. Therefore, a high-quality, large-sample, and labeled training dataset is the key issue in machine learning. The construction of a training dataset takes great effort, which is financially costly and time-consuming, especially when the training samples need to be annotated by humans. For example, ImageNet (Su et al., 2012), developed at Princeton University, contains more than 14 million annotated 2D images in 21,841 synonym sets. Each image is manually labeled by using the Amazon Mechanical Turk platform. After its release to the public in 2020, ImageNet has been widely applied. It has been the standard dataset (Russakovsky et al., 2015) of the International ImageNet Large Scale Visual Recognition Challenge since 2010, greatly promoting the international development of artificial intelligence image recognition. The xBD dataset for building locations and damage assessments (Gupta et al., 2019) contains 22,068 images from the WorldView-3 satellite, with a spatial resolution of 0.3 m, and 19 different categories of labeled events, e.g., earthquakes, floods, wildfires, volcanic eruptions, and car accidents. It is one of the largest and highest quality public datasets of high-resolution satellite imagery. Using this dataset for machine-learning training, we can effectively identify building damages due to various disasters and evaluate economic losses based on satellite remote-sensing data (Weber and Kané, 2020).

      Severe convective weather forecasting has been made progress in China since the 1950s (Zhang et al., 2020). In spite of this, 0–6-h nowcasting of precipitation has always been a challenge because of the spin-up problem in NWP (Sun et al., 2014). Traditional methods for precipitation nowcasting are based on various radar-echo extrapolation algorithms, such as thunderstorm identification, tracking, analysis, and nowcasting (Dixon and Wiener, 1993). However, weather systems are characterized by nonlinear features and large variabilities, with extrapolation algorithms often failing when weather systems change drastically. In recent years, scientists have begun to apply machine-learning techniques for precipitation nowcasting. For example, the deep recurrent neural network was used for 0–2-h precipitation nowcasting based on previous radar echoes (Shi et al., 2015, 2017; Jing et al., 2019). Due to the lack of ground-based precipitation information, label data in a training dataset can only be radar echoes. Therefore, models can only predict radar echoes, and the predicted radar echo intensity (radar reflectivity) needs to be converted into precipitation. However, large uncertainties exist in quantitative precipitation estimation (QPE) due to uncertainties of the parameters used in the ZR empirical relationship between radar reflectivity (Z) and ground precipitation (R) (Foresti et al., 2019). For this reason, when applying deep-learning techniques for ground precipitation forecasting, label data in the training dataset should be ground precipitation data having the same spatiotemporal attributes as the predictands. In addition, it is not enough to use radar echoes from an earlier time as the only predictor in a machine-learning model because environmental conditions have important impacts on the occurrence and development of weather systems conductive to precipitation. Many studies that attempt to apply deep learning for weather forecasting use multi-source data, especially the integrated dataset of radar observations and NWP products. For example, Zhang et al. (2019) derived a physical quantity based on Doppler weather radar reflectivity and outputs of the variational Doppler-radar assimilation system, which was then used to train a CNN model for 30-min nowcasting of convective thunderstorms. It is necessary to include multi-source information that has impacts on the target variable in machine-learning training data.

      To encourage and promote the development of artificial intelligence models for specific applications, it is very important to establish a benchmark dataset. Different models can be developed and compared based on the same benchmark dataset and unified evaluation metrics. For example, the ECMWF released WeatherBench in 2020 (Rasp et al., 2020), a machine-learning benchmark dataset for weather and climate simulations and forecasting. This dataset contains 8 variables at 13 isobaric levels and 6 surface variables based on the ERA5 global reanalysis product, which can be used for the training and testing of deep-learning models. Based on this benchmark dataset, complex and diverse deep neural network models can be developed for weather forecasting with various lead times. Forecasting performances can be compared and evaluated by using the benchmark WeatherBench.

      The present study develops a benchmark dataset that can be applied to machine learning for minute-scale quantitative precipitation estimation and forecasting (QpefBD). The dataset contains Doppler radar products for 3185 heavy convective precipitation events in heavy-rainfall-prone areas of eastern China and numerical model outputs or derived physical quantities that are closely related to ground precipitation. Ground precipitation data whose spatiotemporal resolutions are consistent with those of the radar products are used as label data in QpefBD. Several commonly used verification metrics are implemented as the evaluation benchmark. Section 2 describes various data contained in QpefBD and the procedures to process these data. Section 3 gives the benchmarks for QPE and quantitative precipitation forecast (QPF) evaluation based on commonly used methods. Section 4 describes three application scenarios of the dataset. The availability of the dataset is given in Section 5. Conclusions and summary are provided in Section 6.

    2.   Description of the dataset
    • The benchmark QpefBD dataset mainly supports precipitation estimation and quantitative precipitation nowcasting based on weather radar observations. QpefBD contains Doppler weather radar products, atmospheric condition parameters (ACP), and two types of ground precipitation data as labels within the time windows of 9832 convective heavy rainfall events (CHREs) that occurred from April to October 2016–2018 in 6 provinces of China (Hubei, Hunan, Anhui, Jiangxi, Zhejiang, and Fujian). These CHREs are single-station CHREs (called CHRE-S), obtained based on 60-min precipitation at a station (details can be found in Section 2.1). The time window of a CHRE-S is defined as 2 h before the event start time to 2 h after the event end time. The time window of heavy precipitation event is extended by 4 h mainly because: (1) the sample size of training data can be increased, and (2) a heavy precipitation event is a continuous mesoscale weather process. The sampled data within the extended time window can more completely describe the genesis, development, and dissipation of a weather system.

      Due to the importance of radar data in precipitation estimation and forecasting, each CHRE-S selects an S-band weather radar station adjacent to the ground station and uses this radar station as the center of the spatial coverage to obtain label data, radar products, and ACP within a certain spatial range. In this way, different CHRE-Ss may choose the same radar as their spatial center to obtain data, resulting in large amounts of duplicate data. In QpefBD, duplicate data are processed. For different CHRE-Ss with the same radar selected, if there are overlaps in their time windows, the CHREs of these different stations are combined into one single CHRE (called CHRE-R). A single CHRE-R may contain multiple CHRE-Ss. Eventually, 3185 CHRE-Rs were obtained. The time window for more than 80% of the CHRE-R cases is within 4–8 h, and the maximum time window can be up to 60 h.

      Weather radar products include eight radar variables at about 6-min intervals in the time window of the event. ACP are physical variables or derived quantities obtained from the ERA5 reanalysis product (Hersbach et al., 2020), representing atmospheric dynamic and thermodynamic conditions during a CHRE. Two types of ground precipitation data as labels, i.e., 6-min rainfall intensity and the area covered by heavy rainfall, are the label data (or true values) used in machine learning. Sections 2.2, 2.3, and 2.4 provide detailed descriptions of the data. Figure 1 shows examples of 2D patterns of several variables contained in a sample.

      Figure 1.  Examples of 2D patterns of several variables contained in a sample from a heavy precipitation event [2337 BT (Beijing Time) 18 to 1226 BT 19 June 2016]: (a) composite radar reflectivity and (b) radar reflectivity at an altitude of 3 km in Wuhan at 0803 BT 19 June, (c) lifted condensation level and (d) level of free convection at 0800 BT 19 June, (e) 6-min precipitation (label data), and (f) heavy rainfall area (label data) at 0803 BT 19 June. The dashed, black squares in (c) and (d) show areas where radar data and label data are consistent.

      All data are saved as 2D slab data with equal longitudinal and latitudinal grid spacings. The horizontal resolution of radar and ground precipitation data is 0.01°, and the temporal resolution is 6 min. The background weather data (i.e., ACP) in 0.25° × 0.25° grids have a temporal resolution of 1 h. To facilitate data processing by general machine-learning platforms, all 2D gridded data are stored in the Numpy binary format that can be directly read by the Python programming language.

      Table 1 lists the number of CHRE-Rs and the number of samples for each individual year.

      NCHRE-RNsample
      20161121 89,268
      2017 881 62,551
      20181183 80,159
      Total3185231,978

      Table 1.  The number of CHRE-Rs (NCHRE-R) and the number of samples (Nsample) for each year

      The number of samples for a single CHRE-R refers to the number of data collection times within the time window of the CHRE-R. A sample is generally taken every six minutes. If the time window of a CHRE-R is 5 h, the number of samples for this CHRE-R should be 50. Each sample data include radar products, atmospheric condition parameters, and labels of ground precipitation data. The former two are input data (or predictors) for a machine-learning model, and the latter are the labels for the inputs of the model.

      When training a machine-learning model, the datasets used are usually divided into a training dataset and a testing dataset. The former is used for model training to determine various model parameters, and the latter is used as independent data for model performance evaluation. We use the sampled data collected in the six provinces in May, July, and September 2018 as testing data and the remaining data as training data. The testing data contain samples from different seasons to test seasonal differences in the model performance. The training data contain 2544 heavy precipitation events, and the number of samples is 189,039. There are 641 heavy precipitation events and 42,939 samples in the testing dataset, accounting for 18.5% of the total number of samples. The reference benchmark for the model performance evaluation given in Section 3 is calculated based on the testing data.

      The ground precipitation data and radar observations are provided by the National Meteorological Information Center, China Meteorological Administration. The ERA5 reanalysis product is downloaded from the ECMWF Copernicus Climate Change Service (https://cds.climate.copernicus.eu/).

    • QpefBD contains CHREs that occurred in Hubei, Hunan, Anhui, Jiangxi, Zhejiang, and Fujian provinces from April to October 2016–2018. These provinces are located in eastern and central China, regions prone to heavy convective precipitation. Heavy rainstorms and floods in the summertime pose a major threat to local economic development and human safety.

      Short-term heavy precipitation usually refers to precipitation events with hourly rainfall amounts equal to or greater than 20 mm. To match the temporal resolution of radar observations, 1-min precipitation observations collected at all 491 national automatic weather stations in the 6 provinces are used to calculate 6-min precipitation information in accordance with the radar observation period. Operational quality control was conducted on the 1-min precipitation observations, and those data flagged as incorrect or suspicious were excluded. Here, a severe precipitation event (CHRE-S) is identified when a 60-min rainfall amount is greater than or equal to 20 mm at a single station. A sliding calculation is applied to obtain 60-min precipitation amounts at each individual station based on time series of 1-min precipitation amounts at each station. The start time of a heavy precipitation event at a station is the beginning minute when the 60-min precipitation amount starts to meet the criterion (≥ 20 mm), and the end time of the event is the ending minute when the last 60 minutes of cumulative precipitation (≥ 20 mm) ends.

      The study area in China is frequently affected by western Pacific typhoons. Single-station heavy convective precipitation events caused by typhoons are excluded from QpefBD. Here, typhoon-induced heavy precipitation refers to the precipitation that occurs within a radius of 400 km around the typhoon center. Typhoon track data are extracted from the typhoon best-track dataset produced by the Shanghai Typhoon Institute of the China Meteorological Administration (CMA-STI Best Track Dataset for Tropical Cyclones over the western North Pacific; Ying et al., 2014), available at http://tcdata.typhoon.org.cn.

      Figure 2 shows the spatial distribution of stations with CHRE-S in the 6 provinces from April to October 2016–2018 and the number of heavy precipitation events at each station.

      Figure 2.  Locations of national-level surface stations (solid, colored dots) and weather radars (open, black circles). Rainfall observations made at these surface stations are used to determine heavy precipitation events. The colors show the occurrence frequencies of single-station, short-term, heavy precipitation events from 2016 to 2018. The black box denotes the area coverage by a single radar.

    • For each individual CHRE, the weather radar station with the most complete data and closest to the station where the CHRE occurred is first identified. All 6-min radar data collected at this radar station during the time window of the CHRE are then obtained. Quality control is performed to remove various non-precipitation echoes, including noise (outlier) filtering, radial interference recognition, ground/super-refraction echo recognition, ocean-wave echo recognition, and clear-sky echo elimination. The quality-control algorithms used are referenced in the literature (Liu et al., 2007; Leng et al., 2012; Tan et al., 2013; Wen et al., 2016). Multiple physical quantities are obtained from the quality-controlled radar data to represent different meteorological features. The benchmark dataset developed in the present study contains eight products that are closely related to surface precipitation (Table 2). The eight radar quantities listed in Table 2 have clear physical significance and are the most frequently used products in meteorological operations. They are well related to the spatial distribution of and variation in surface precipitation and are useful for identifying heavy precipitation weather conditions (Yu et al., 2006).

      NameIdentifierUnit
      Composite reflectivityCRdBZ
      Hybrid reflectivityHBRdBZ
      Reflectivity at 2 km ASLCAP02dBZ
      Reflectivity at 3 km ASLCAP03dBZ
      Reflectivity at 4 km ASLCAP04dBZ
      Reflectivity at 5 km ASLCAP05dBZ
      Echo top (18 dBZ)ETkm
      Vertically integrated
      liquid water content
      VILkg m−2

      Table 2.  Radar data products contained in the benchmark dataset. ASL stands for above sea level

      The composite reflectivity (CR) is the maximum reflectivity of scans at all elevations, representing the strongest echo in the 2D space obtained by the radar. If the strongest echo occurs at a lower altitude, ground precipitation can be well reflected by radar echoes. The hybrid reflectivity (HBR) is the echo intensity observed by the radar at the lowest elevation angle above the terrain height, which can well reflect surface precipitation (Xiao et al., 2008) and is the main radar product used in QPE. Radar reflectivities at the four altitude levels between 2 and 5 km shown by the constant altitude plan position indicator (CAPPI) at each level are obtained by 3D interpolation of radial data at different elevation angles using the algorithms proposed by Xiao and Liu (2006). Only CAPPIs at these 4 levels are selected here because echoes between 2 and 5 km are more closely associated with ground precipitation. The higher-level CAPPI has a bright band of echoes due to solid precipitation particles, which may contaminate the precipitation echo signal. CAPPIs below 2 km are mostly affected by the side-lobe echo, ground clutter, and the super-refraction echo. For example, the CAPPI at the 3-km height is commonly used to analyze the climatological characteristics of convective storms (Chen et al., 2014). The echo top (ET) is the highest altitude that a target with a reflectance factor above 18 dBZ can be detected by radar. ET products can be used to detect storms (Yu et al., 2006). The 18-dBZ value is approximate to the radar echo intensity that may generate surface precipitation. It is generally used as the default value for echo-top-height products in operational nowcasting systems in the United States and China. The vertically integrated liquid water content (VIL) is the sum of equivalent liquid water contents derived from radar reflectivities at each scanning elevation angle. For this calculation, it is assumed that all reflectivities are caused by liquid water drops (Yu et al., 2006).

      The radar product is gridded data of equal latitudinal and longitudinal intervals, covering a rectangular area of 3° × 3° centered on the radar station with 301 × 301 grids, and the resolution is 0.01° × 0.01°.

      The benchmark dataset contains data collected by 43 weather radars. All 43 radars are S-band Doppler weather radars with a wavelength of 10 cm. Figure 2 shows the locations of the 43 radar stations and an example of area coverage by a single radar.

    • Short-term, heavy rainfall is generally a weather phenomenon that occurs in mesoscale convective systems, but not all mesoscale convective systems can generate heavy rainfall. Only under favorable weather conditions can mesoscale systems easily develop and produce strong surface precipitation. Zhang et al. (2012) reported that the background weather conditions for the occurrence and development of strong mesoscale convective systems include four elements: water vapor, static instability, uplift, and vertical wind shear. Yu and Zheng (2020) systematically reviewed the environmental conditions of static instability, moisture, and lifting for severe convective weather. A diagnostic analysis of the environmental conditions for the occurrence and development of convection based on radiosonde data or numerical model products is helpful for forecasting short-term, severe convective weather. For example, Doswell et al. (1996) proposed an “ingredients-based” method to diagnose and analyze three types of weather conditions, including water vapor, convective instability, and uplift for heavy precipitation events. This method was eventually used for the operational forecasting of heavy rainstorms in China (Tang et al., 2010; Zhang et al., 2010).

      In order to consider the weather conditions for the occurrence and development of heavy rainfall during an operational forecast, weather-condition data are needed at the time of the forecast and the previous period as input for the machine-learning model. QpefBD contains 27 physical parameters, i.e., ACP, describing atmospheric water vapor, static instability, and dynamic lifting conditions (Table 3). The parameters are closely related to weather conditions associated with the occurrence and development of heavy rainfall and have explicit physical significance. ACPs are calculated based on the ERA5 reanalysis product, which has a spatial resolution of 0.25° and a temporal resolution of 1 h. ACPs are direct model output, while others are derived from model output at the surface and at various isobaric levels using the Sounding and Hodograph Analysis and Research Program in Python (SHARPpy) software package (Blumberg et al., 2017). SHARPpy can be available at https://github.com/sharppy/SHARPpy.

      Water vaporUpliftingStatic instability
      Mixing ratio (Q) at 1000, 850, and 700 hPaDivergence at 925, 850, and 200 hPaK index (KI)
      Wind speed at 925, 700, and 500 hPa
      Relative humidity (RH) at 1000, 700, and 500 hPaConvective available potential energy (CAPE)
      Precipitable water (PWAT)Lifting index (LI) at 500 and 300 hPa
      Precipitation efficiency (PE)Elevation at the 0°C level
      Average dew point temperature from the ground to 850 hPaElevation at the −10°C level
      700–400-hPa mean difference between temperature and dew point temperatureElevation at the wet bulb temperature of 0°C
      700–400-hPa maximum difference between temperature and dew point temperatureLevel of free convection (LFC)
      Mean lifted condensation level below 100 hPa (LCL)
      Mean unstable lifted condensation level below 100 hPa (muLCL)

      Table 3.  The 27 atmospheric condition parameters contained in QpefBD

      All ACPs are 2D rectangular gridded data, with spatial and temporal resolutions the same as those of ERA5. Considering the possible impacts of weather conditions around the forecast target area (i.e., the area defined by radar products and label data) on precipitation in the target area, the spatial range of ACP is a 5° × 5° rectangular area. This is larger than the label data area, and its center point is the grid point closest to the radar station. ACPs are used as input to the machine-learning model. However, the area and resolution of ACPs are different from those of the radar products and label data, challenging the development of a machine-learning technique.

    • Most machine-learning methods are supervised-learning methods, i.e., each sample in the dataset needs to be given a true value (a label) representing the forecast, which is called label data. For example, in ImageNet (Su et al., 2012), each image is manually labeled (such as “motorbike” and “bicycle”). An object-recognition model can be obtained through learning and training with a large number of labeled samples. In the field of geoscience, the dataset often contains huge amounts of samples, while labeling these samples requires complicated professional skills and a great deal of time. For this reason, Reichstein et al. (2019) listed the lack of labeled datasets as one of the five major challenges in the application of artificial intelligence to geosciences. The quality of the label data is the most critical factor affecting the performance of machine-learning models.

      The present study produces two types of label data, i.e., rainfall intensity data showing the spatial distribution of 6-min precipitation and heavy rainfall area data showing where heavy rainfall occurs.

    • For precipitation estimation and forecasting, QpefBD needs to provide an actual ground precipitation distribution for each training sample. However, it is hard to obtain “true” surface precipitation distributions. Fortunately, 15,652 ground precipitation observation stations are densely distributed in the area covered by this dataset (Fig. 3). These include all national-level weather stations (Fig. 2) and regional-level weather stations in the six provinces. The temporal resolution of the precipitation observational data is 1 min, which can be used to derive precipitation data at 6-min intervals, matching the time period of a complete radar-volume scan.

      Figure 3.  Distribution of ground precipitation observation stations.

      To obtain label data consistent with the temporal and spatial attributes of the main input data (i.e., radar products), ground precipitation observations collected at weather stations need to be remapped to 0.01° × 0.01° grids. Here, the piecewise inverse distance squared weighted interpolation method is used, expressed as

      $$ {P}_{i}=\left[{\sum }_{k=1}^{n}\left({W}_{ik}\times {P}_{ik}\right)\right]/{\sum }_{k=1}^{n}{W}_{ik} , $$ (1)

      where Pi is the precipitation at the ith grid, Pik is the precipitation of the kth ground station found around the ith grid, n is the number of stations searched, and $ {W}_{ik} $ is the weighting coefficient:

      $$ {W}_{ik}=1/{R}_{ik}^{2}, $$ (2)

      where $ {R}_{ik} $ is the distance from the ith grid point to the kth station.

      The search radii are 1, 5, and 10 km. If stations can be found within a 1-km radius, the labeled value is the precipitation recorded at the station closest to the grid point. If no valid station can be found within a 1-km radius, the search radius is sequentially expanded to 5 and 10 km, and Eq. (1) is used to calculate precipitation at the grid point. If no station is found within 10 km, precipitation at that grid point is given the value of −9, indicating a missing value. Multiple search radii are used here mainly for the purpose of preserving as much as possible real precipitation information from the nearest neighboring area of the grid, making sure heavy precipitation at stations close to the grid will not be underestimated due to interpolation. As a result, the gridded precipitation data interpolated from observations retain extreme precipitation information observed at the stations.

      The label data cover a 2D rectangular area with the same size as that of radar products, with the center of the area corresponding to the position of the radar station. The temporal and spatial resolutions of the dataset are the same as those of radar products.

      Since there are no ground precipitation observations in some uninhabited mountainous areas and over large water bodies, label data still contain many missing values, affecting how machine-learning models learn. To reduce the proportion of missing values, missing values are filled by zeros at those grids identified as having no precipitation based on the CR in the same area and the hourly, 5-km resolution precipitation product called CMA Multi-source merged Precipitation Analysis System (CMPAS), which is a product fusing three sources of observations, namely, observations from radar, satellite, and ground rain gauges (Pan et al., 2015). The criteria for identifying non-precipitation grids are: (1) CR is valid but no more than 10 dBZ, and (2) when criterion (1) cannot be satisfied, CMPAS’s 1-h precipitation is less than 0.1 mm. After the above processing, the proportion of missing values in the label data is reduced from 21.68% to 5.97%, indicating that the amount of missing data has been greatly reduced. Figure 4 shows the frequency distribution of the proportion of missing values before and after the processing.

      Figure 4.  Percentage of missing values in the precipitation intensity label data before (black bars) and after (gray bars) their filling for the six provinces, i.e., Anhui (AH), Fujian (FJ), Hubei (HB), Hunan (HN), Zhejiang (ZJ), Jiangxi (JX), and the average (TOTAL).

    • Based on radar reflectivity data at various elevation angles, a storm cell size exceeding a given threshold is identified by using the storm cell identification and tracking algorithm (Johnson et al., 1998). It is applied to all storm cells at all elevation angles, and results are combined by a union operation. All storm cells in the union are projected onto the grids in turn, and all the grid points surrounded by the storm cells can be used to denote storm cells. The threshold of the echo intensity is 35 dBZ. Heavy precipitation areas are mainly labeled based on the radar storm cell size and 6-min accumulated precipitation at stations. If the 6-min precipitation amount is greater than or equal to 3 mm at a single station within the storm cell, all the grid points covered by the storm cell are given the label 1. If no 6-min precipitation amount is greater than or equal to 3 mm at any station within the storm cell, then all the grid points covered by the storm cell are given the label 0. If there is no observation station within the storm cell, the grid points covered by the storm cell are all given the missing value label, i.e., −9. Grid points outside the storm cell are labeled 0 regardless of whether there is heavy precipitation or not.

      The spatial range and spatiotemporal resolutions of the label data of heavy precipitation areal coverage and precipitation intensity are the same.

    3.   Reference benchmark for performance evaluation of a machine-learning model
    • To build a machine-learning model based on QpefBD, the performance of the model needs to be evaluated. To provide a comparable reference benchmark for the evaluation of different models developed by different research-and-development institutions, QpefBD provides several unified evaluation metrics. In addition, a few commonly used methods or methods used in operational weather services are applied to precipitation estimates and forecasts. Skill scores are given, serving as the benchmark reference. Precipitation estimation and forecasting methods include radar quantitative precipitation estimation, persistence forecasting, and optical flow extrapolation based on semi-Lagrangian extrapolation.

    • The following metrics are used to evaluate the model performance on precipitation estimation and forecasting:

      (1) Root-mean-square error (RMSE)

      $$ {\rm{RMSE}}=\sqrt{\frac{1}{N}\sum\nolimits _{n=1}^{N}({y}_{n}-{{{\hat y}}}_{n}}) , $$ (3)

      where $ {y}_{n} $ and $ {\hat{y}}_{n} $ are observed and forecasted precipitation, respectively.

      (2) Probability of detection (POD), false alarm ratio (FAR), and threat score (TS or CSI)

      Based on precipitation observations, the metrics for evaluating modeled precipitation estimates and forecasts for different precipitation intensity thresholds are as follows:

      $$ {\rm POD=TP (TP+FN)^{-1}}, $$ (4)
      $$ {\rm FAR=FP (TP+FP)^{-1} } , $$ (5)
      $$ {\rm TS=CSI= TP (TP+FN+FP) ^{-1} }, $$ (6)

      where TP (true positive) is the number of forecasts and observations that are both true, FN (false negative) is the number of false forecasts and true observations, and FP (false positive) is the number of true forecasts and false observations.

    • Calculating precipitation based on the exponential relationship between Z and R (i.e., the ZR relationship) is an effective method to quickly obtain the distribution of surface precipitation in meteorological services. The algorithms used in operational weather services are implemented in the present study to estimate hourly precipitation (QPE) using the testing dataset. The steps taken are as follows.

      (1) Calculate the first guess of 1-h precipitation (R): About 10 hybrid reflectivity data samples in the previous hour are used to calculate the precipitation rate based on the ZR relationship, which is expressed as Z = aRb . The sum of the results is the 1-h QP estimate. Here, Z (mm6 m−3) is the radar hybrid reflectivity, R (mm h−1) is the estimated precipitation, a = 300, and b = 1.4.

      (2) Use the 1-h precipitation data collected at weather stations to correct the hourly precipitation, R, first guess: Assuming that there are n weather stations in the area covered by the radar, and the arithmetic average of 1-h precipitation at these stations is Rg, the average precipitation first-guess estimate at the n grids closest to the stations is R. The weather stations used include all the stations shown in Fig. 3. The correction factor is L = RgR−1.

      The corrected precipitation estimate is QPE = L × R. Hourly QP estimates during the period covered by the testing dataset can then be obtained.

      One-hourly precipitation values are first derived from the 6-min precipitation information contained in the label data of precipitation intensity and taken as the true values. Forecast skill scores of RMSE, POD, FAR, and TS for different precipitation intensity intervals of the radar QPE (i.e., 0, ≥ 0.1, ≥ 5, ≥ 10, ≥ 15, and ≥ 20 mm h−1) are finally calculated.

      Appendix A presents detailed results, which are used as the baseline for evaluating the performance of radar precipitation estimation models.

    • A persistence forecast assumes that the forecasted quantity at all forecast times is the same as the quantity at the initial time of the forecast. Here, persistence refers to Eulerian persistence instead of Lagrangian persistence. Persistence forecast results are often used as a benchmark to evaluate the forecast skills of other methods.

      Based on the testing dataset, consecutive persistence forecasts of 6-min ground precipitation within the time window of each heavy precipitation event are carried out. Statistics of skill scores for the model performance are calculated by using the label data of precipitation intensity as true values.

      Appendix B presents detailed results.

    • The optical flow method assumes that the moving target (such as precipitation) has Lagrangian persistence (Liu et al., 2015). The 2D fields of the target at two consecutive times are used to calculate the advection field (i.e., the optical flow field) then extrapolated to the target position at the forecast time. There are many algorithms for optical flow extrapolation. The present study uses the algorithm called the real-time optical flow by variational method for echoes of radar (ROVER) developed at the Hong Kong Observatory (HKO). Shi et al. (2015) provide the code, with details found at https://github.com/sxjscience/HKO-7.

      Using the 6-min precipitation at the start time of the forecast and the previous time in the testing dataset, the optical flow extrapolation method is applied to predict precipitation in the subsequent 20 consecutive 6-min periods (6–120 min). Statistics of the performance for these forecasts are calculated by using the label data of precipitation intensity as true values.

      Appendix C presents detailed results.

    4.   Dataset application scenarios
    • QpefBD can be used for minute-scale radar precipitation estimations and 0–2-h precipitation nowcasting. Three application scenarios are given next. Section 4.4 gives a few suggestions for further data processing.

    • Based on the radar precipitation echo intensity and the physical relationship between the echo intensity and ground precipitation (i.e., the ZR relationship), a radar-based QP estimate can be obtained. This has been the commonly used method in operational weather services in recent decades. Data-driven machine-learning technology has the potential to improve precipitation estimation. Using QpefBD, a machine-learning model can find the connection between radar observations and ground precipitation. For example, a deep neural network can learn the complicated nonlinear relationship between 2D radar products and 2D ground precipitation from the training dataset and directly output the spatial distribution of 2D ground precipitation. Input data for the deep-learning model include one or more of the eight radar products listed in Table 2. They are used as the model’s multi-channel, 2D spatial data. Since the 2D spatial samples in the training dataset contain data from different regions and seasons and also different radars. Factors such as region, terrain, season, and diurnal variation can thus be added to input data to consider their possible influences on precipitation. This will improve the accuracy and the generalization ability of the model. However, this will also increase the complexity of the model, requiring more computing resources. The model outputs are 2D gridded datasets of 6-min ground precipitation (or precipitation rate), covering the same spatial area as the radar products.

    • Zero-to-two-hour nowcasting of surface precipitation is a challenging issue in operational weather forecasting. It is also of interest to other scientific research efforts. QpefBD provides a standard dataset for precipitation nowcasting on a minute scale based on machine learning. We recommend applying QpefBD in deep neural network models, such as the recurrent neural network and its various derivative models (e.g., LSTM and GRU) that have the ability to deal with time series of prediction. Using the large-capacity training samples provided by QpefBD, such as the sequences of radar products at the initial time of the forecast and the previous periods, plus background ACP data at the time of the forecast and the “true values” of ground precipitation represented by label data, a deep machine-learning model can be established. The neural network model can learn (1) the temporal variation regularities of precipitation from previous radar products and (2) the relationship between weather conditions and ground precipitation from the ACP physical quantities at the forecast time, eventually realizing 6-min precipitation forecasting.

    • The model output can be converted into a forecast of ground elements by establishing relationships between the forecast products of numerical weather models and ground elements (e.g., conventional elements like temperature, humidity, and wind, or strong convective weather phenomena, such as storms, heavy precipitation, and hail). Traditional statistics-based model interpretation technology can hardly determine the nonlinear relationships between model products and ground elements [e.g., the Model Output Statistics (MOS) method]. Data-driven machine-learning technology offers a new approach. A machine-learning model is established by using a large number of training data samples. The relationship between model output variables and ground forecast variables can then be obtained regardless of whether the relationship is linear or nonlinear. At present, a large number of results have been generated by using machine-learning technology. For example, 57 characteristic quantities closely related to the occurrence of thunderstorms are obtained from the Swiss mesoscale weather model called aLMo and used as input to the sampling machine-learning model called AdaBoost. The model is then used to predict the occurrence probability of thunderstorms, generating better predictions than those of an expert system (Perler and Marchand, 2009). Whether lightning will occur or not on each grid point in the Korean Peninsula can be predicted by using the ECMWF global forecast products at 3-h intervals (Han et al., 2017). Zhou et al. (2019) used a deep network model to establish the relationship between the output of the US NCEP Global Forecast System (GFS) model and the occurrence of severe convective weather (i.e., thunderstorms, heavy precipitation, hail, and high convective winds) on the ground, realizing the potential forecast of strong convective weather on the ground based on the GFS real-time forecast.

      QpefBD contains 6-min surface precipitation distributions within the time window of heavy precipitation events and 27 hourly physical quantities produced by numerical models. The deep neural network model can then be used to determine the relationship between ACP and ground precipitation based on this dataset. Since the label data of surface precipitation are 6-min precipitation amounts and ACP are hourly data, label data need to be converted into hourly data, with the target of the forecast being the hourly precipitation field. Also, the spatial resolution of the ground precipitation data is 0.01°, whereas the spatial resolution of ACP is 0.25°. The spatial ranges of the two datasets are inconsistent. The area of the labeled data is within the area covered by ACP, which has a larger spatial area than the label data. How to establish the spatial mapping relationship between the two datasets poses a challenge to the development of deep-learning models.

    • The radar products in QpefBD have some non-valid data due to ground clutter and other non-precipitation echoes. These are denoted by the integers −32,768 for missing data and −1280 for no valid echoes. When radar products are used as input to the deep-learning model, these data should be preprocessed. For example, invalid values in the CR, HBR, and CAPPI products could be replaced with 0 dBZ so that surface precipitation is not produced. Similarly, label data have some missing data due to the lack of enough ground precipitation stations, denoted by −9. These missing data cannot be involved in the calculation of loss during model training.

      When machine-learning models are trained, input data and label data need to be normalized during data preprocessing due to inconsistencies in the units of measure between the different input data and between the input data and output data.

      Topography has an important influence on the occurrence and development of surface precipitation. Machine-learning models for precipitation estimation and forecasting often include topographic features (e.g., elevation, slope, and aspect) as important input data, e.g., Google’s precipitation forecasting model (Sønderby et al., 2020). We suggest that for precipitation estimation and forecasting models developed for regions with complex topography, topographic and geographic factors could be added to the input of the model that uses radar products. The topographic and geographic data should be processed with the same spatial attributes as the radar products. The merging of topographic and geographic data with other input data could improve the forecasting performance of the model, thus enhancing the generalization ability of the model to different regions.

    5.   Data availability
    • The dataset developed in this study is open to domestic users in China for free, available at http://10.1.64.154/idata/web/data/index. It can be used for scientific research and operational services for the public welfare. Users wishing to use this dataset for commercial purposes must obtain permission from the owner of the dataset, i.e., the National Meteorological Information Center. The dataset will be updated over the next year to cover more time periods, e.g., 2019 and 2020, and more regions, e.g., 11 more provinces.

    6.   Summary
    • The present study has developed a benchmark dataset, i.e., QpefBD, which can be used in machine-learning models for ground precipitation estimation and forecast. The basic characteristics of the dataset are as follows.

      (1) Data samples are taken from 3185 heavy precipitation events that occurred during March–October of 2016–2018 in 6 provinces in central and eastern China. In total, there are 228,809 samples.

      (2) The dataset includes Doppler weather radar data, ACP, and two kinds of labels (precipitation intensity and heavy precipitation area). The radar data contain eight radar products closely related to the occurrence and development of ground precipitation. The ACP data are from ERA5 hourly reanalysis data and their derived physical quantities. The label data have the same spatiotemporal attributes as the radar data, i.e., a horizontal resolution of 0.01° and a temporal resolution of 6 min.

      (3) The samples contained in the dataset are two-dimensional gridded raster data with equal latitudinal and longitudinal intervals, directly usable for the training of machine-learning models, especially the deep-learning models.

      The present study provides metrics for the evaluation of machine-learning-model performance. The results of model evaluation based on these metrics can serve as the baseline for the performance evaluation of machine-learning models using this dataset.

      QpefBD can be widely used in scenarios such as single-station Doppler weather radar quantitative precipitation estimation, minute-scale precipitation nowcasting, and precipitation forecast interpretation based on products from numerical weather models. We believe that extensive application of this dataset can effectively promote collaborative studies in various atmospheric science fields, upgrade the application of artificial intelligence in the meteorological sciences, and improve ground precipitation forecasts (especially short-term heavy precipitation forecasts). We also hope that more artificial intelligence scientists will work with experts in the atmospheric sciences to develop algorithms and models that can effectively solve problems specifically associated with the atmospheric sciences.

      Appendix A: The baseline performance of operational radar QPE

      0 mm≥ 0.1 mm≥ 5 mm≥ 10 mm≥ 15 mm≥ 20 mmAverage
      Number of samples31,699,46939,861,6909,966,2874,884,5802,729,2761,565,381
      POD (%)77.6878.4737.3225.5719.3415.1167.57
      FAR (%)30.7823.7840.6348.9555.3962.6229.20
      TS (%)57.7463.0429.7220.5415.5912.0652.84
      RMSE
      (mm h−1)
      1.886.3511.815.7619.4323.23 8.27

      Table A1.  RMSEs of radar precipitation estimates based on the ZR relationship

      Appendix B: The baseline performance from the persistence forecast of rainfall

      Lead time (min)6121824303642485460667278849096102108114120
      RMSE [mm (6 min)−1]0.350.490.560.610.640.660.680.690.700.710.720.730.730.740.740.750.750.760.760.76

      Table B1.  RMSEs of 6-min precipitation persistence forecasts

      Figure B1.  Threat scores (TSs) for the persistence forecasts of 6-min precipitation with different magnitudes [0, ≥ 0.1, ≥ 1, ≥ 2, and ≥ 3 mm (6 min)−1] as a function of forecast lead time.

      Figure B2.  Probability of detection (POD), false alarm ratio (FAR), and threat score (TS) for persistence forecasts of 6-min precipitation as a function of forecast lead time.

      Appendix C: The baseline performance of the rainfall forecast using optical flow extrapolation

      Lead time (min)6121824303642485460667278849096102108114120
      RMSE [mm (6 min)−1]0.360.480.550.590.620.640.650.660.670.680.690.690.700.700.700.710.710.710.710.72

      Table C1.  RMSEs of 6-min precipitation forecasts generated by the optical flow extrapolation method

      Figure C1.  Threat scores (TSs) for forecasts of 6-min precipitation with different magnitudes [0, ≥ 0.1, ≥ 1, ≥ 2, and ≥ 3 mm (6 min)−1] as a function of forecast lead time using the optical flow extrapolation method.

      Figure C2.  False alarm ratio (FAR), probability of detection (POD), and threat score (TS) for consecutive forecasts of 6-min precipitation as a function of forecast lead time using the optical flow extrapolation method.

Reference (43)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return