Nowcasts of strong convective precipitation and radar-based quantitative precipitation estimations have always been hot yet challenging issues in meteorological sciences. Data-driven machine learning, especially deep learning, provides a new technical approach for the quantitative estimation and forecasting of precipitation. A high-quality, large-sample, and labeled training dataset is critical for the successful application of machine-learning technology to a specific field. The present study develops a benchmark dataset that can be applied to machine learning for minute-scale quantitative precipitation estimation and forecasting (QpefBD), containing 231,978 samples of 3185 heavy precipitation events that occurred in 6 provinces of central and eastern China from April to October 2016–2018. Each individual sample consists of 8 products of weather radars at 6-min intervals within the time window of the corresponding event and products of 27 physical quantities at hourly intervals that describe the atmospheric dynamic and thermodynamic conditions. Two data labels, i.e., ground precipitation intensity and areal coverage of heavy precipitation at 6-min intervals, are also included. The present study describes the basic components of the dataset and data processing and provides metrics for the evaluation of model performance on precipitation estimation and forecasting. Based on these evaluation metrics, some simple and commonly used methods are applied to evaluate precipitation estimates and forecasts. The results can serve as the benchmark reference for the performance evaluation of machine learning models using this dataset.
This paper also gives some suggestions and scenarios of the QpefBD application. We believe that the application of this benchmark dataset will promote interdisciplinary collaboration between meteorological sciences and artificial intelligence sciences, providing a new way for the identification and forecast of heavy precipitation.