-
Climate variables play various essential roles in the components of the global water cycle. Specifically, rainfall has dominant influences in the hydrological cycle and thus affects the modeling of extreme hydrological events (floods and droughts), water management, and natural disasters (landslides and earthquakes) (Tan et al., 2018; Gyasi-Agyei, 2020; Satgé et al., 2020). Similarly, temperature has significant influences on heat and cold waves (Costa et al., 2020). However, the variabilities of precipitation and temperature in space and time cause their measurements and estimations to be difficult.
The temporal and spatial coverages of gauge observations and their quality standards highly influence the accuracy of hydrometeorological studies (Li et al., 2020). The quality and stability of climatic data from ground meteorological stations may not be sufficient due to a limited number of stations, uneven spatial distribution, and vulnerability to human and environmental factors (Wang et al., 2020). These issues are more pronounced in developing countries (Boke, 2017). Economic and political problems often result in sparse and nonuniformly-distributed gauge networks that are unable to capture temporal and spatial climatic variability (Le Coz and van de Giesen, 2020). For example, the recommended rain gauge density is one station per 600–900 km2 for flatlands and one station per 100–250 km2 for topographically rugged areas (Zeng et al., 2018). Africa has only 744 stations, of which only a quarter of them meet the international standards. However, Africa ideally requires 10,000 uniformly distributed standard stations to capture its spatiotemporal climate variabilities (Satgé et al., 2020). Moreover, the gauges that exist in Africa are concentrated in a few regions, such as in South Africa (Le Coz and van de Giesen, 2020). In Ethiopia, the gauge networks are very sparse and are of low quality. For example, in the upper Tekeze River basin (UTB), one rain gauge station covers an area of approximately 1400 km2, which is far below the recommended density standard (Gebremicael et al., 2019). Thus, investigating relevant data sources that can capture the spatiotemporal climatic variability of the UTB is required for high-quality hydrometeorological studies.
Recently, the availability of high-quality gridded climatic dataset products (satellite and reanalysis-based) has increased, and these products are increasingly used in hydrological and water management studies (Hu et al., 2017; Fang et al., 2019; Azimi et al., 2020; Gebremicael et al., 2020; Wang et al., 2020). In this context, gridded climatic datasets at both regional and global scales become essential alternative data sources (Li et al., 2020). This could be a result of the fast development of remote sensing and data assimilation technologies (Zhang and Ma, 2018). However, the performance of these datasets in representing the ground-based observed data and in hydrological modeling applications is inconsistent between different regions and basins.
Many studies have evaluated and validated different datasets against gauge measurements (Dembélé and Zwart, 2016; Sahlu et al., 2017; Aslami et al., 2019; Lockhoff et al., 2019; Ayoub et al., 2020; Islam and Cartwright, 2020). Moreover, satellite and reanalysis-based datasets have also been used to drive hydrological models in different basins around the world (Zhao et al., 2015; Tan et al., 2017; Li et al., 2018; Roy et al., 2018; Awange et al., 2019; Azimi et al., 2020). These studies suggested that the various datasets may have different applicability across regions. For instance, the climatic research unit (CRU) time series (TS) 2.1 shows good agreement over most parts of China as compared to other reanalysis rainfall estimates (Zhao and Fu, 2006). A similar study by Dembélé and Zwart (2016) indicated that the Climate Hazards Group InfraRed Precipitation with Stations (CHIRPSv8) precipitation dataset better captured the observed extremes in West Africa. Notable improvements of interpolated surface temperature have been achieved through topographic correction for the 40-yr ECMWF Re-Analysis (ERA-40) temperature dataset over China (Zhao et al., 2008). The different meteorological datasets are inevitably subject to uncertainties resulting from the factors associated with climate zones, seasonal changes, ground-surface conditions, geographical positions, and the embedded algorithms used to derive the datasets (Fang et al., 2019; Gebremicael et al., 2019; Wang et al., 2020). Assessment of the accuracy and uncertainties in historical meteorological dataset products based on ground observations is helpful for water resource and hydrological studies.
Some studies have already evaluated satellite- and reanalysis-based global datasets in Ethiopia, e.g., in the Lake Tana basin (Worqlul et al., 2014), the Gilgel Abbay watershed (Lakew et al., 2017), and the upper Blue Nile (Bayissa et al., 2017; Sahlu et al., 2017). These studies focused on rainfall datasets only in the upper Blue Nile basin. Gebremicael et al. (2019) evaluated eight precipitation products in the UTB. However, their study focused only on satellite rainfall products despite the fact that temperature is also a key parameter for hydro-climatic studies in the region. Reanalysis-based climatic datasets often provide better spatial coverage than non-reanalysis-based datasets. Based on the above, our study aims to evaluate nine satellite and reanalysis rainfall and near surface air temperature (SAT) datasets against gauge-observed data in the UTB. The UTB is representative of the mesoscale basins in northern Ethiopia, and the Tekeze River is the most important water source for irrigation and hydropower in both Ethiopia and the downstream countries such as Sudan and Egypt (Fentaw et al., 2018; Gebremicael et al., 2019). With a comprehensive evaluation of the temperature and precipitation products, this study will contribute to UTB water resource management for the sustainable development of the basin region and the downstream countries.
-
The Tekeze River basin (Fig. 1), situated in the northwestern part of Ethiopia, is one of the major tributaries of the Nile River (Abrha, 2009). The Tekeze headwaters originate in the Meket Mountains near Lalibela and flow northwards until it turns westward along the Ethiopia–Eritrean border, with a distance of 600 km until it crosses the Ethiopia–Sudan border near Humera (MoWR, 1998). The basin is characterized by complex topography consisting of mountains, highlands, and lowlands with gently sloping terrain. The complex topography endows the basin a high potential for hydropower production in the mountainous areas and for irrigation in the lowland areas.
Figure 1. Map of the study area: (a) Ethiopia, (b) geographical locations and distributions of meteorological stations in the UTB, (c) average monthly maximum, mean, and minimum temperatures (Tmax, Tmean, and Tmin), and (d) mean monthly rainfall of the UTB for dry and wet seasons based on gauge records from 1980 to 2019.
The northern and eastern parts of the UTB are categorized as semi-arid, and the southern part is categorized as semi-humid (Fentaw et al., 2018). The annual rainfall ranges from 400 mm yr−1 in the eastern parts to 1200 mm yr−1 in the southwestern parts of the basin (Gebremicael et al., 2019). The rainy season is from June to September, during which more than 70% of the total annual rainfall occurs in the UTB. The variables Tmax and Tmin vary from 3 to 21°C in the high-elevation areas (around the Semien Mountains) and from 19 to 43°C in the flat and low-elevation areas (around Humera). At the basin level, the average minimum and maximum temperatures (Tmin and Tmax) of the UTB are 11 and 32°C, respectively (Fentaw et al., 2018).
-
Gauge observations of daily maximum and minimum SAT and precipitation were collected from the National Meteorological Agency of Ethiopia (NMAE) from 21 stations located inside and nearby the UTB (Fig. 1). These daily data span from 1980 to 2019, with the lengths of records varying from station to station. In this study, the stations having at least 35 years of continuous records with less than 2.5% missing values are considered. Most of the meteorological stations are in the northeastern part of the UTB (mostly highland area), and only a few are in the western and southwestern parts (dominantly lowland areas) of the basin (Fig. 1). This may introduce uncertainties into the spatially aggregated values of the climate variables. Thus, the point data are not interpolated into gridded time series. However, for the purpose of evaluation/validation of satellite and reanalysis climatic datasets, the distribution of stations can be considered sufficient for a point-to-pixel comparison.
Quality control such as excluding outliers and homogenization was applied to gauge observation TS data. Monthly precipitation and SAT series were calculated for each gauge station and then tested for homogeneity using the Standard Normal Homogeneity Test. The homogenization procedure is based on the application of the Standard Normal Homogeneity Test. This test assumes that precipitation amounts at the station being tested (test station) and some regional average values are proportional to each other. In our study, the test was applied to a series of ratios comparing the observations at each gauge station (test stations) with the averaged observations at the four nearest stations to the station to be tested. This relationship is expressed in terms of the ratio between the test station normalized precipitation values and those of a regional TS defined as a weighted average of the four neighboring reference stations. See Section 1 in supplementary information for more details. The test indicated no homogenization problems in the rainfall and temperature gauge observations, which are thus reliable for statistical analyses following screening criteria.
-
We selected nine climatic datasets that include satellite, reanalysis, gauge-based temperature and rainfall observations, and the merged estimates based on the three types of datasets (Table 1). The choice of these products was mainly based on their temporal coverage, data integrity, availability of near-real-time data, accessibility, and popularity in the literature.
Dataset Resolution Frequency Coverage Period Variable Reference EWEMBI v1.1 0.5° Daily Global 1979–2016 Pre, Tmax, Tmin, Tmean Lange (2019) CRU TS v4.03 0.5° Monthly Global 1901–2018 Pre, Tmax, Tmin, Tmean Harris et al. (2020) ERA5-land 0.1° Hourly Global 1982–2020 Pre, Tmean C3S (2019) GPCC v2018 1° Daily Global 1982–2016 Pre Ziese et al. (2018) CPC 0.5° Daily Global 1979–2019 Pre, Tmax, Tmin, Tmean Chen et al. (2008) NCEP Reanalysis 2 2° 6 hourly Global 1979–2019 Pre, Tmax, Tmin, Tmean Kanamitsu et al. (2002) WFDEI 0.5° Daily Global 1979–2016 Pre Weedon et al. (2018) CHIRPS v8 0.05° Daily 50°S–50°N 1981–2019 Pre Funk et al. (2015) CHIRTS 0.05° Daily Global 1983–2016 Tmax, Tmin, Tmean Funk et al. (2019) Note:
EWEMBI: the EartH2Observe, WFDEI, and ERA-Interim reanalysis data Merged and Bias-corrected for the Inter-Sectoral Impact Model Intercomparison Project;
CRU: Climatic Research Unit;
ERA5: ECMWF Re-Analysis version 5;
GPCC: Global Precipitation Climatology Centre;
CPC: Climate Prediction Center;
NCEP: National Centers for Environmental Prediction;
WFDEI: WATCH Forcing Data methodology applied to ERA-Interim data;
CHIRPS: Climate Hazards Group InfraRed Precipitation with Station data;
CHIRTS: Climate Hazards Group InfraRed Temperature with Station data. Pre denotes precipitation.Table 1. Summary of dataset characteristics considered in this study
The CRU gridded TS (CRU TS) dataset is produced by the CRU at the University of East Anglia based on the World Meteorological Organization (WMO) global stations (Harris et al., 2014). Many studies have used this dataset in diverse research areas in Africa since it was first released in 2000 (e.g., Haile et al., 2020; Mahmood et al., 2020; Peng et al., 2020).
ERA5-land is a global reanalysis-based product produced by the Copernicus Climate Change Service (C3S; Maidment et al., 2013). The variable Tmean at 2 m above the ground and the total precipitation were obtained from this dataset. These two variables are based on inputs from satellite radiances and in-situ data provided by the WMO information system (WIS). This product has been applied in Africa, e.g., in East Africa (Agutu et al., 2017) and Uganda (Maidment et al., 2013).
The CHIRPS data are a quasi-global rainfall dataset developed by the Climate Hazards Center (CHC) at the University of California, Santa Barbara and the US Geological Survey (USGS; Funk et al., 2015). This dataset is based on the CHC’s precipitation climatology dataset and quasi-global geostationary thermal infrared satellite observations. Rainfall data from the NOAA Climate Forecast System and the TRMM 3B42 (Tropical Rain Measurement Mission level 3 output) product are also the inputs for CHIRPS (Funk et al., 2015). This product has been widely used in East Africa, as reported in many studies (e.g., Bayissa et al., 2017; Lemma et al., 2019).
Climate Hazards Group InfraRed Temperature with Station (CHIRTS) daily SAT data was developed by the CHC at the global scale (60°S–70°N) with a spatial resolution of 0.05° × 0.05° (approximately 5 km). This dataset contains daily maximum and minimum SAT and was produced through the disaggregation from monthly fields of the CHIRTS Tmax (CHIRTSmax1) TS and by synthesizing daily temperatures from the ERA5 (Funk et al., 2019). The input data for CHIRTSmax1 are Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2), CRU TS, and ground station data (of approximately 15,000 stations) from the Berkeley Earth database.
The Global Precipitation Climatology Centre (GPCC) product v2018 of global daily precipitation was produced based on rainfall data provided by national meteorological and hydrological station services, regional and global data collections and WMO GTS (Global Telecommunication System) data (Schamm et al., 2014). More than 85,000 rain gauge stations across the globe were used in the GPCC dataset. This dataset has been widely used in Africa, e.g., in the Horn of Africa (Dinku et al., 2007; Funk et al., 2015) and sub-Saharan Africa (Harrison et al., 2019).
The Climate Prediction Center (CPC) dataset is part of a product suite from the CPC Unified Precipitation Project that is underway at the NOAA (Chen et al., 2008). The precipitation and temperature variables were produced by merging observations from gauge stations with precipitation estimates from several satellite-based (infrared and microwave) datasets.
NCEP Reanalysis-2, produced by the NOAA-CIRES Climate Diagnostics Center, is a new version of the NCEP Reanalysis-1 model that has been modified through the update of the physical process parameterization and the fixing of errors (Marques et al., 2009). NCEP Reanalysis-2 performed data assimilation using the Rapid Radiative Transfer Model (RRTM) developed by the Atmospheric and Environmental Research (AER) group using different data sources.
WFDEI was generated by using the same methodology as the WATCH (WATer and global CHange) Forcing Data by making use of the ERA-Interim reanalysis data (Weedon et al., 2014). The ERA-Interim data improved the temperature and precipitation data compared to WATCH. WFDEI has been widely used for studies on climate change and hydrological modeling (Liu et al., 2017; Lange, 2019).
EWEMBI was a newly compiled reference dataset named for EartH2Observe, WFDEI, and ERA-Interim reanalysis data Merged and Bias-corrected for the Inter-Sectoral Impact Model Intercomparison Project (EWEMBI) phase 2b (Lange, 2018). The sources of the precipitation and temperature data of the EWEMBI dataset are the WFDEI and GPCC v5 data over the land surface, and EartH2Observe forcing data (E2OBS) over the ocean (Lange, 2016).
-
The climatic variables of precipitation and mean, maximum, and minimum SAT were evaluated at daily and monthly timescales. The temperature and precipitation products with hourly temporal frequencies were aggregated to daily and monthly TS. Evaluation of the selected temperature and precipitation products was performed for 35 yr covered by all the products. A point-to-pixel comparison approach was used to evaluate the products against their corresponding observations at each gauge station. In this point-to-pixel approach, precipitation and temperature products from a grid cell were compared to the observed gauge data located within this pixel. This approach is more applicable in rugged topography and complex terrain such as the UTB than the areas with homogenous topography for the evaluation of such products at small spatial scales with varying climatic factors (Gebremicael et al., 2019; Lemma et al., 2019; Li et al., 2020).
The precipitation products were further evaluated for monthly cumulative precipitation and different rainfall intensity groups. The monthly standardized precipitation index (SPI) was used to compare the performances of the rainfall datasets in capturing historical drought events. SPI is an index for characterizing and monitoring drought conditions on a range of timescales based on the probability distribution of a long-term precipitation TS (McKee et al., 1995; Liu et al., 2012). The calculation of SPI includes a transformation of one frequency distribution (e.g., gamma) to another frequency distribution (normal or Gaussian). Specifically, a time serial of precipitation is firstly fitted with a gamma (or an incomplete beta) probability distribution, and then transformed to a normal (or Gaussian) distribution. The gamma distribution has been widely used in previous studies as it has been understood as the reliable fit to the rainfall data. The normal distribution has a mean of zero and standard deviation of one and then can be used to indicate dry or wet condition. The detailed SPI calculation procedures can be found in the supplementary information. The SPI was calculated for 3-month (SPI3), 6-month (SPI6), 9-month (SPI9), and 12-month (SPI12) timescales for each precipitation product. A threshold value of –1 is usually used as an indicator of drought conditions for the SPI (Tefera et al., 2019). Hence, −1.0 ≥ SPI > −1.5 indicates moderate drought, −1.5 ≥ SPI > −2.0 indicates severe drought, and SPI < −2.0 indicates extreme drought (Liu et al., 2012). The trend of extreme heat and the number of hot days per year (number of days scored above the extreme value) were also evaluated for the temperature datasets against the gauge-observed data. In this study, extreme temperature was defined as SAT > the 95th percentile for daily maximum and mean SAT and SAT < the 5th percentile for minimum SAT.
-
Four statistical metrics (Table 2), namely, the Pearson correlation coefficient (CC), percentage of bias (Pbias), mean absolute error (MAE), and root mean square error (RMSE), were used for the evaluation of temperature and precipitation products. These statistical measures are commonly used metrics for hydro-climatic dataset evaluation (Gebremicael et al., 2019; Costa et al., 2020; Satgé et al., 2020; Wang et al., 2020). Detailed descriptions of such statistical metrics can be found in Chai and Draxler (2014), Toté et al. (2015), and Saccenti et al. (2020). Negative (positive) values of Pbias indicate underestimation ( overestimation) of the observations, respectively. High CC and low RMSE and MAE values indicate that the estimates in a climatic dataset are close to those of gauge-based observations.
Metric name Function Optimum value Pearson correlation coefficient (CC) $ {\rm CC}=\dfrac{\sum ({x}_{i}-\bar{x})({y}_{i}-\bar{y})}{\sqrt{{\left({x}_{i}-\bar{x}\right)}^{2}}\sqrt{{\left({y}_{i}-\bar{y}\right)}^{2}}} $ 1 Percentage of bias (Pbias) $ {\rm Pbias}=\left(\dfrac{\sum {y}_{i}-\sum {x}_{i}}{\sum {x}_{i}}\right)×100 $ 0 Mean absolute error (MAE) $ {\rm MAE}=\dfrac{1}{n}\sum _{i=1}^{n}|{y}_{i}-{x}_{i}| $ 0 Root mean square error (RMSE) $ {\rm RMSE}=\sqrt{\dfrac{1}{n}\sum _{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}} $ 0 Note: $ {x}_{i} $ is the temperature or precipitation from gauge observations, $ {y}_{i} $ is the corresponding value from the precipitation or temperature product, n is the number of observations, and $ \stackrel{-}{x} $ and $ \stackrel{-}{y} $ are the averages of the gauge-observed and meteorological product values, respectively. The evaluation was performed at daily and monthly timescales for all the selected gauge stations. Table 2. Statistical indices used for evaluating multiple meteorological products
-
The statistical metrics from the comparison of precipitation products against the observed data across all gauging stations of the UTB are shown in Fig. 2. The EWEMBI dataset shows a reasonably good correspondence with the rain gauge observations at the daily timescale, with average values of 0.58, 2.3%, 6.4 mm, and 2.7 mm for CC, Pbias, RMSE, and MAE, respectively. This may be due to the climatological bias-adjustments done for EWEMBI. EWEMBI was produced by using data sources E2OBS, WFDEI, and ERAI, which are bias-corrected by using GPCPv2.1 monthly precipitation total over the ocean (Balsamo et al., 2015) and with GPCCv5/v6 monthly precipitation totals over the land (Weedon et al., 2014). The CC, Pbias, RMSE, and MAE values of EWEMBI for the different gauge stations range from 0.48 to 0.67, −23.6% to 37%, 5.3 to 7.6 mm, and 2 to 3.5 mm, respectively. Compared with EWEMBI, the CHIRPS and CPC products perform well, with CC values greater than 0.5. The likely reason for the better performance of CHIRPSv8 could be its high spatial resolution (0.05°) as compared to the other products and the CHIRPS algorithm that combines the bias-corrected CHIRP with station observations. A similar study in the upper Blue Nile basin, Ethiopia by Sahlu et al. (2017), Gebremicael et al. (2019) and Lemma et al. (2019) also showed that the CHIRPSv8 product better captures the ground observations compared to other satellite products. The averages and range values (in brackets) of CC, Pbias, RMSE, and MAE of CHIRPS for all stations are 0.57 (0.45 to 0.7), −4% (−24.4 to 22%), 7.1 mm (5.8 to 10.6 mm), and 2.8 mm (2.1 to 3.8 mm), respectively. Similarly, average values and ranges of 0.5 (0.3 to 0. 7), −16.8% (−31.1 to 23.5%), 6.8 mm (5.5 to 7.7 mm), and 2.7 mm (1.8 to 3.6 mm) for CC, Pbias, RMSE, and MAE were obtained for the CPC dataset. The CC (Fig. 2a), Pbias (Fig. 2b), RMSE (Fig. 2c), and MAE (Fig. 2d) values for each precipitation product except for the EWEMBI, CHIRPS, and CPC exhibit poor agreement with the majority of the rain gauge networks at the daily timescale. The larger gap between some of the datasets and the observed data at the daily timescale could be due to the complex and rugged topography in the UTB. This, in turn, could contribute to failures in detecting more localized convective rainfall events.
Figure 2. Boxplots of evaluation metrics for the seven precipitation products at daily timescales during 1982–2016. (a) Pearson correlation coefficient (CC), (b) percentage of bias (Pbias), (c) root mean square error (RMSE), and (d) mean absolute error (MAE). Each boxplot also indicates the range of variation from minimum to maximum (vertical line), the quartiles and median (horizontal lines), mean (points), and outliers (+) of the precipitation estimates at the 21 stations.
Among the precipitation datasets, the ERA5 and NCEP products overestimate the rainfall at all stations. The ERA5 products resulted in a high CC (0.6) but largely overestimated the rainfall at all stations, with an average (range) value of Pbias reaching 150% (88% to 242%). This indicates the poor performance of the ERA5 product in capturing the ground rainfall over the basin, which is consistent with the previous study by Lemma et al. (2019). Likewise, NCEP extremely overestimates (Pbias = 98.6%) the daily observed rainfall. NCEP also shows a relatively poor skill throughout the basin, with an average CC value of 0.33. The RMSE and MAE values for both the ERA5 and NCEP products are also very large compared to those of the other datasets (Fig. 2). The performances of the remaining precipitation products are characterized by results in between those of the rainfall products discussed above.
Figure 3 presents a comparison between the precipitation estimates and the observed rainfall using the spatial distribution of CC. The spatial distributions of CC, Pbias, RMSE, and MAE (Table S1a–c) indicated by the range values show that EWEMBI and CHIRPSv8 products have better agreement with ground rainfall, especially at the stations in the northeastern part of the basin. The NCEP, GPCC, and WFDEI show poor agreement with CC values less than 0.5 at almost all the stations.
-
The daily precipitation products were then evaluated at the monthly timescale in order to determine how much the performance of the precipitation products would be improved in the magnitude of the indicators from daily to monthly scales. Meanwhile, they could also be compared with the well-known monthly precipitation product, the CRU TS v4.03. Table 3 summarizes the performances of the different rainfall estimates using the various statistical indices at the monthly timescale. The statistical indices of all products in estimating the monthly rainfall are significantly improved compared to those at the daily timescale. As shown in Table 3, all products have average CC values of greater than 0.7. This result agrees with previous studies showing improvements in detecting rainfall events using satellite and reanalysis-based datasets when the temporal scale was changed from daily to monthly (Roy et al., 2018; Ayoub et al., 2020; Islam and Cartwright, 2020). This could be due to the variabilities being counterbalanced and the errors offset each other when the data are aggregated from shorter to longer timescales. Even though the estimation accuracies are improved for most of the precipitation datasets, the EWEMBI, CHIRPS, CRU TS v4.03, and CPC datasets relatively outperform the others, with higher CC values and lower Pbias, RMSE, and MAE values. In addition, the performance rank of these products at the monthly timescale is in line with the results obtained at the daily timescale.
Meteorological forcing CC Pbias (%) RMSE (mm month−1) MAE (mm month−1) CPC 0.78 (0.65, 0.85) −12 (−36, 75) 62 (40, 80) 37 (22, 48) ERA5 0.84 (0.7, 0.9) 146 (88, 242) 189 (102, 339) 103 (65, 163) EWEMBI 0.86 (0.77, 0.93) 5 (−24, 25) 51 (34, 82) 31 (18, 53) GPCC 0.80 (0.65, 0.87) 25 (−19, 89) 67 (55, 83) 42 (34, 55) NCEP 0.67 (0.58, 0.75) 99 (41, 196) 122 (101, 139) 80 (65, 91) WFDEI 0.78 (0.69, 0.85) 13 (−23, 64) 69 (53, 103) 40 (26, 63) CHIRPS 0.85 (0.78, 0.91) −7 (−35, 22) 51 (34, 90) 30 (20, 57) CRU TS 0.81 (0.69, 0.90) −9 (−35, 54) 57 (38, 76) 34 (21, 43) Table 3. Ranges and means of the statistical metrics derived from the comparison between the different precipitation datasets and the observed data at the monthly timescale. The range values are shown in parentheses
As the observations have the highest accuracy at the location of stations, a case was added by interpolating grid data to the stations and compare these interpolated-gridded data with station observation directly. A little performance improvement was found for most of the precipitation products (Table 4). CHIRPS (CC = 0.9) and EWEMBI (CC = 0.89) perform better as compared to the others, while the NCEP, ERA, and WFDEI show relatively poor agreement with the ground observations.
Meteorologi-cal forcing CC Pbias (%) RMSE (mm month−1) MAE (mm month−1) CPC 0.76 −15 71 39 ERA5 0.81 127 127 96 EWEMBI 0.89 5.2 47 26 GPCC 0.86 20.5 67 40 NCEP 0.73 102 128 83 WFDEI 0.79 16 60 33 CHIRPS 0.9 −6.6 48 26 CRU TS 0.85 −11 52 28 Table 4. Mean of the statistical metrics resulting from the comparison between the different interpolated gridded rainfall datasets and the observed data at the monthly timescale
-
To further understand whether the different precipitation datasets could capture rainfall events within various intensity groups, we divided the daily precipitation intensity (PI) of the gauge data (OBS) into 6 groups (0 ≤ PI < 1, 1 ≤ PI < 5, 5 ≤ PI < 10, 10 ≤ PI < 20, 20 ≤ PI < 30, and PI ≥ 30 mm day−1) following Wang et al. (2020). Figure 4 shows the total cumulative rainfall values of the eight precipitation products under different PI groups. The PI groups ranging from 0–1 to 1–5 mm day−1 are overestimated by all precipitation products. However, it is clearly shown in Fig. 4 that ERA5 and NCEP highly overestimate all rainfall magnitudes (PI groups). In particular, ERA and NCEP fail to capture the categorized PI groups with values below 20 mm day−1, but reasonably well capture for rain gauge observations of above 20 mm day−1. This result is contrary to the findings by Islam and Cartwright (2020), in which they reported ERA5 outperforming other datasets in predicting rainfall accumulation below 20 mm day−1 but seriously underestimating higher rainfall values. This could be an indicator that the different datasets could produce different results in different regional studies. Most of the remaining precipitation products, especially EWEMBI, CPC, and CHIRPS, captured the PI groups greater than 5 mm day−1 better than ERA5 and NCEP.
The precipitation products were further evaluated by using statistical metrics to examine whether the datasets can capture the rainfall of the different PI groups. The CC and Pbias indices indicate that the different precipitation datasets poorly captured almost all PI groups. However, CHIRPS (for 0 ≤ PI < 1 and 1 ≤ PI < 5), EWEBI (for 5 ≤ PI < 10 and 10 ≤ PI < 20), and CPC (for 20 ≤ PI < 30 and PI ≥ 30 mm day−1) showed relatively better performance with higher CCs and lower Pbias values (Table 5) compared to the other datasets. This result is in agreement with a previous study by Wang et al. (2020), which showed lower capturing capacities of reanalysis and satellite rainfall products when rainfall events were grouped into different intensity ranges. The results of the statistical metrics (CC, Pbias, RMSE, and MAE) of all the precipitation datasets for the considered PI groups are summarized in the supplementary files (Table S2a–d).
PI (mm day−1) EWEMBI CPC CHIRPS CC Pbias CC Pbias CC Pbias 0–1 0.20 46.5 0.42 32.5 0.52 28 1–5 0.35 14.4 0.15 29.3 0.47 −2.3 5–10 0.57 −11.7 0.08 10.8 0.39 −15 10–20 0.51 −34.3 0.11 −8 0.36 −28.1 20–30 −0.39 −55.3 0.47 −26.1 −0.44 −42.3 ≥ 30 −0.46 −72.5 0.48 −38.2 −0.46 −33.6 Table 5. Statistical summary of EWEMBI, CPC, and CHIRPS precipitation products in capturing different PI groups
-
The ability of the different rainfall datasets to detect temporal drought was analyzed through comparisons to the drought indicated by the gauge-observed rainfall records (OBS). Figure 5 shows the monthly average SPI values over the UTB during 1982–2016 for each of the precipitation products using the 12-month SPI TS (SPI12). The SPI3, SPI6, and SPI9 TS are also given in the supplementary materials (Fig. S1). The discrepancies in the SPI results among the datasets were further supported by the time-specific historical exceptional drought records during the study period. For example, there was a severe drought in Ethiopia and Sudan during 1984–1985, induced by persistent rainfall shortages, which caused many deaths and migrations (Degefu and Bewket, 2015; Haile et al., 2020). The observed data detected this case as an extreme drought (SPI < −2; Fig. 5). EWEMBI, CRU TS v4.03, and CHIRPS detected a severe drought, while NCEP, ERA5, WFDEI, and GPCC identified a moderate drought for that specific period. Furthermore, CPC did not detect a drought in this particular period. Similarly, during 2002–2003, there was a countrywide drought in Ethiopia that affected more than 14 million people and resulted in severe damages (Muller, 2014; Nicholson, 2017). Almost all products detected this drought event better than the drought event in 1984; the datasets detected drought events for this case from moderate to extreme, except GPCC and WFDEI, which detected a very light drought (0 > SPI > −1).
Figure 5. Temporal variations in the monthly standardized precipitation index (SPI) over the UTB for a 12-month SPI timescale (SPI12) for all precipitation products.
Figure 6 indicates the statistical metrics (Pbias, CC, RMSE, and MAE) for the drought events (for which the observed SPI is ≤ −1) at the SPI3, SPI6, SPI9, and SPI12 TS for each product. Generally, all precipitation products underestimated the drought pattern compared to that detected in the observed data. This is revealed by the negative values of Pbias for all SPI TS (Fig. 6b). Drought estimates from EWEMBI, CRU TS v4.03, and CHIRPS are closer to the observed drought (SPI ≤ −1) for SPI3, SPI6, SPI9, and SPI12 than the other datasets. The Pbias and CC for EWEMBI, CRU TS v4.03, and CHIRPS are estimated as −28%, −33%, and −44%, and 0.71, 0.67, and 0.73, respectively, for SPI12. The drought metrics at the SPI12 TS driven by NCEP, GPCC, ERA5, and WFDEI are significantly underestimated compared to the drought observations. The underestimates are within the range from −50% (ERA5) to more than −112% (NCEP). This indicates that most precipitation products can reasonably detect the meteorological, agricultural, and hydrological droughts in the UTB.
Figure 6. Evaluation metrics for drought estimated from precipitation products compared with observed drought data (SPI < −1). (a) RMSE, (b) Pbias, (c) CC, and (d) MAE.
The Pbias (CC) values of EWEMBI are −40% (0.52), −37% (0.57), −32% (0.61), and −28% (0.71) for drought metrics of SPI3, SPI6, SPI9, and SPI12 TS, respectively. These values clearly show a consistently improving pattern with the increasing months of the SPI TS. This may imply the improvement of the capability in rainfall events detection of the precipitation products from daily to monthly timescales. However, there are some exceptions for WFDEI and NCEP; even though their correlations with the observed data increase with the SPI TS, their underestimations of drought are further accelerated as the SPI increases in the number of months from SPI3 to SPI6, SPI9, and SPI12 (Fig. 6). This may be attributed to the different retrieval methodologies and assumptions of the products. The results suggest that choice of precipitation product, spatial resolution, and record lengths can vary significantly in precipitation-based drought metrics. These relationships also vary with the severity of drought events.
In summary, the EWEMBI, CRU TS v4.03, and CHIRPS datasets have shown better agreements with the observed drought at each SPI timescale (SPI3, SPI6, SPI9, and SPI 12) across the basin than the other products. The relatively better performance of these products is also confirmed by the minimum Pbias, RMSE, MAE, and maximum CC at the monthly timescale.
-
Comparisons based on the exact daily Tmean at each gauge station indicate that the estimates from the ERA5, CHIRTS, and EWEMBI products are in agreement with the corresponding observations. As presented in Fig. 7, the average CC values for the ERA5, CHIRTS, and EWEMBI products are 0.65, 0.55, and 0.55, respectively. The daily Tmean patterns gathered from these products have consistent and better agreement with the ground Tmean than those of the NCEP and CPC products, which have large RMSE and MAE values. The discrepancies in the metrics are large across stations for NCEP and CHIRTS. In particular, the ranges of the metrics are large for NCEP, e.g., 1.62–6.38°C (MAE) and 2.04–6.82°C (RMSE). The CC metric was also calculated based on the anomaly values of Tmean. Similar to the exact values, The ERA5, CHIRTS, and EWEMBI products are in agreement with the corresponding observation of Tmean anomalies, with CC value of 0.62, 0.53, and 0.52, respectively (Fig. S2).
Figure 7. Boxplots showing comparisons of Tmean between the five product estimates and the ground measurements at the daily timescale: (a) Pearson CC, (b) mean absolute error (MAE), and (c) root mean square error (RMSE).
Daily Tmax and Tmin were evaluated for CHIRPS, CPC, and EWEMBI, and monthly Tmax and Tmin were evaluated including CRU TS v4.03. Table 6 indicates the performances of each of these products in capturing their corresponding gauge values; the values shown are the average results of the metrics for all the stations within the basin. The Tmax and Tmin estimates of CHIRTS and EWEMBI show relatively more accuracy at the daily scale than the other datasets. Likewise, the Tmax and Tmin estimates of CHIRTS, CRU TS v4.03, and EWEMBI indicate better agreement with the observed values at the monthly timescale (Table 6) than the other datasets. However, analogous to the precipitation products, all the temperature products show poor performances at the daily timescale than at the monthly timescale. The better performance at the monthly timescale could be due to that the errors in daily data offset each other when aggregated to the monthly data.
Metrics Timescale Tmax (°C) Tmin (°C) CHIRTS CPC EWEMBI CRU TS CHIRTS CPC EWEMBI CRU TS CC Daily 0.54 0.36 0.43 − 0.56 0.37 0.49 − Monthly 0.69 0.61 0.62 0.68 0.72 0.63 0.68 0.7 RMSE Daily 3.7 4.9 4.4 − 4.07 4.11 4.6 − Monthly 3.0 4.5 3.8 3.2 3.12 3.54 3.45 3.26 MAE Daily 2.84 3.74 3.45 − 3.41 4.53 3.42 − Monthly 2.6 3.4 3.1 2.7 3.0 4.07 3.05 3.0 Table 6. Comparison of Tmax and Tmin between the four products and the observed values at daily and monthly timescales
The CHIRTS, CPC, and EWEMBI products overestimate Tmax and Tmin at both daily and monthly timescales. The CRU TS temperature product merely underestimates Tmin and overestimates Tmax (Table S3a–c). The temperature products are able to better capture Tmean than Tmax and Tmin of the gauge observations with relatively higher CC values (Fig. 7 and Table 6). However, SAT estimates from all products show consistently improving patterns with an increase in temporal scale. This improvement could be because the different temperature datasets are relatively better-captured by ground measurements when compared at larger temporal scales than at smaller temporal scales. The temporal variations in Tmean, Tmax, and Tmin at the seasonal timescale are given in the supplementary information (Table S4) for spring, autumn, winter, and summer. Generally, all the temperature products overestimate the seasonal daily Tmean, Tmax, and Tmin, except ERA5 (which underestimates summer Tmean) and CPC (which underestimates autumn Tmax).
-
Table 7 presents daily values of the 95th percentile of Tmax and Tmean and the 5th percentile values of Tmin for all the SAT datasets. CPC and ERA5 have relatively closer values to the 95th percentile Tmax and Tmean gauge-observed data, respectively. The high extreme SAT (95th percentile) from all the products are larger than those of the observations. EWEMBI shows a relatively better estimate of low extreme (5th percentile) Tmin observed data (Table 7) than the other products.
Dataset 5th percentile value (°C) 95th percentile value (°C) Tmin Tmax Tmean OBS 8.2 28.4 21.1 CHIRTS 10.5 31 23.9 CPC 10 29.9 23.9 ERA5 − − 22 EWEMBI 9 31.8 23.7 NCEP − − 25.4 Table 7. 95th percentile record for daily Tmax and Tmean and 5th percentile record for daily Tmin of the SAT products over UTB during 1982–2016
In this study, the SAT values are considered to be highly extreme when their values are greater than the 95th percentile of the gauge SAT data (values greater than 28.4 and 21.1°C for Tmax and Tmean, respectively). Similarly, a low extreme is considered when the values of Tmin are less than the 5th percentile of the gauge SAT (values < 8.2°C)
From the perspective of the 95th percentile long-term annual changes in Tmax and Tmean presented in Fig. 8, each product consistently overestimates the SAT variables. With regard to the changing tendencies, OBS, CHIRTS, EWEMBI, ERA5, and NCEP show an increasing pattern, whilst the CPC product shows a significantly decreasing pattern. According to the CC results, the annual TS of the 95th percentile Tmean values of ERA5 and EWEMBI shows comparatively better correlations with the observed data than the other datasets, with CC values of 0.71 and 0.66, respectively. CPC shows the lowest correlation with the observed data, with a CC value of 0.29 for the 95th percentile of Tmean (Fig. 8b). CHIRTS exhibits a relatively better accuracy in the 95th percentile annual TS values of Tmax (CC value of 0.67). Although the estimated 95th percentile Tmax values of CPC are closer than those of CHIRTS and EWEMBI to the observed data, the time-series correlation of CPC shows a poor performance with a CC value of less than 0.26 (Fig. 8a). The SAT datasets have poorly matched temporal variability of the Tmin extremes (5th percentile of Tmin) compared to the corresponding observed values. Relatively, EWEMBI is considered better (CC = 0.52) than CHIRTS (CC = 0.26) and CPC (CC = 0.30). Similarly, EWEMBI, CHIRTS, and CPC also overestimate the Tmin yearly extremes (Fig. 8).
Figure 8. Temporal changes in the annual 95th percentile values of (a) Tmax and (b) Tmean, and temporal changes in the 5th percentile for (c) Tmin of the different SAT datasets.
The number of days per year in which each SAT dataset scores greater than the 95th percentile of Tmax (28.4°C) and Tmean (21.1°C), and less than the 5th percentile of Tmin (8.2°C) is also evaluated. The mean estimated number of days with Tmax above 28.4°C is 50, 113, 55, and 126 days per year for OBS, CHIRTS, CPC, and EWEMBI, respectively (Fig. 9a). All SAT products overestimate the number of days with SAT greater than the 95th percentile value of Tmax (28.4°C). Among others, the CPC estimate is the closest to the observed values. The highest daily Tmax, which reaches a value of 37.7°C, is estimated by EWEMBI, followed by the observed data (36.7°C). Figure 9b shows the average number of days with Tmean values higher than 21.1°C; the highest value is estimated by CHIRTS (182.9 days yr−1); the minimum number is recorded by ERA (56.6 days yr−1), which is comparatively closer to the ground observations (49.8 days yr−1) than the estimates of the other datasets. The largest daily Tmean value from the observed data is 30.8°C, whereas the largest mean daily SAT from the different datasets ranges from 26.6°C (EWEMBI) to 31°C (CPC). Similarly, the average number of days per year of Tmin less than 8.2°C is 20.4, 2.5, 4.9, and 10.5 days for OBS, CHIRTS, CPC, and EWEMBI, respectively. The number of Tmin days below the 5th percentile value of the observed data is underestimated by all products, unlike the trends seen with Tmax and Tmean (Fig. 9c)
Figure 9. Comparisions between observed values and the various daily SAT datasets in the number of days estimated above the 95th percentile for Tmax and Tmean and below the 5th percentile for Tmin: (a) annual number of days with daily Tmax above 28.4°C, (b) annual number of days with daily Tmean above 21.1°C, and (c) annual number of days with Tmin below 8.2°C. In this study, the scores over the specified SAT (95th percentile values of the observed data) denote occurrence of extreme SAT.
In the UTB, Tmean is better captured by ERA5, while Tmax and Tmin estimated by CHIRTS are in good agreement with the observed values. This implies that the best estimates of Tmean, Tmax, and Tmin are from different SAT products. This result agrees with the study of Nechita et al. (2019). The TS of SAT extremes of all the datasets, including the observed data, shows increasing trends. This could be an indication of the impact of climate change in the UTB. This argument could also be supported by Fentaw et al. (2018) in a climate change study in the UTB; the study results show an increasing trend in SAT.
-
In this study, nine satellite and reanalysis-based precipitation and SAT datasets were statistically evaluated over the UTB. The EWEMBI, ERA5-land, GPCC v2018, CPC, WFDEI, NCEP reanalysis-2, CRU TS v4.03, CHIRPSv8, and CHIRTS products were evaluated against observed data. The precipitation, Tmax, Tmin, and Tmean were evaluated by using different statistical indices, including CC, Pbias, RMSE, and MAE against the ground measurements at daily and monthly timescales. These products were further evaluated by using different precipitation intensity groups and drought indices. Accordingly, the following conclusions are derived from this study.
EWEMBI, CHIRPSv8, and CRU TS v4.03 show good performances in representing daily and monthly precipitation. EWEMBI (CC = 0.86) shows reasonable and good agreement with the observed values, followed by the CHIRPS (CC = 0.85) and CRU TS v4.03 (CC = 0.85) products at the monthly timescale. The rainfall estimates from ERA5 and NCEP show poor performances as the daily rainfall is largely overestimated by these products. The monthly precipitation estimates usually perform better than the daily estimates. The precipitation products overestimate the lower precipitation intensity events (less than 5 mm day−1) but better estimate higher precipitation intensity events (greater than 20 mm day−1). Moreover, the precipitation products underestimate historical drought events compared to the observed data. Specifically, the EWEMBI, CRU TS v4.03, and CHIRPS products show better agreement with the gauge-based drought (SPI ≤ −1) estimation for each SPI timescale. Overall, the EWEMBI, CHIRPSv8, and CRU TS v4.03 precipitation products better represent the rainfall in the UTB than the other products.
ERA5, CHIRTS, and EWEMBI provide relatively better SAT estimates than the other products. Tmean estimates from ERA5, CHIRTS and EWEMBI show good agreement with gauge measurements. Similarly, Tmax and Tmin estimates of CHIRTS and EWEMBI at daily timescales and CHIRTS, CRU TS v4.03, and EWEMBI at monthly timescales have better agreement with observed values over the UTB than the other products. The SAT products have a poorer performance at the daily timescale than at the monthly timescale. In addition, Tmean is better represented than Tmax and Tmin by these products. SAT extremes are well-represented by ERA5 and EWEMBI (for Tmean), CHIRTS (for Tmax) and EWEMBI (for Tmin), while the CPC product shows poor performance in capturing the temperature extremes. The number of days with extreme temperature values is overestimated by all products. The overall deviation between the SAT products and the observed values is higher in the Tmin estimates than in the Tmax and Tmean estimates. In summary, depending on factors such as time series (daily and monthly) agreement and the estimation accuracy of the products for temperature extremes, the ERA5, CHIRTS, and EWEMBI temperature products are relatively better than the other products at representing the gauge-observed values in the UTB.
Acknowledgments. The first author was sponsored by the Chinese Academy of Sciences (CAS)—The World Academy of Sciences (TWAS) President’s Fellowship Programme for his PhD study at the University of Chinese Academy of Sciences. The authors appreciate the Ethiopian National Meteorological Agency for providing the weather data.
Dataset | Resolution | Frequency | Coverage | Period | Variable | Reference |
EWEMBI v1.1 | 0.5° | Daily | Global | 1979–2016 | Pre, Tmax, Tmin, Tmean | Lange (2019) |
CRU TS v4.03 | 0.5° | Monthly | Global | 1901–2018 | Pre, Tmax, Tmin, Tmean | Harris et al. (2020) |
ERA5-land | 0.1° | Hourly | Global | 1982–2020 | Pre, Tmean | C3S (2019) |
GPCC v2018 | 1° | Daily | Global | 1982–2016 | Pre | Ziese et al. (2018) |
CPC | 0.5° | Daily | Global | 1979–2019 | Pre, Tmax, Tmin, Tmean | Chen et al. (2008) |
NCEP Reanalysis 2 | 2° | 6 hourly | Global | 1979–2019 | Pre, Tmax, Tmin, Tmean | Kanamitsu et al. (2002) |
WFDEI | 0.5° | Daily | Global | 1979–2016 | Pre | Weedon et al. (2018) |
CHIRPS v8 | 0.05° | Daily | 50°S–50°N | 1981–2019 | Pre | Funk et al. (2015) |
CHIRTS | 0.05° | Daily | Global | 1983–2016 | Tmax, Tmin, Tmean | Funk et al. (2019) |
Note: EWEMBI: the EartH2Observe, WFDEI, and ERA-Interim reanalysis data Merged and Bias-corrected for the Inter-Sectoral Impact Model Intercomparison Project; CRU: Climatic Research Unit; ERA5: ECMWF Re-Analysis version 5; GPCC: Global Precipitation Climatology Centre; CPC: Climate Prediction Center; NCEP: National Centers for Environmental Prediction; WFDEI: WATCH Forcing Data methodology applied to ERA-Interim data; CHIRPS: Climate Hazards Group InfraRed Precipitation with Station data; CHIRTS: Climate Hazards Group InfraRed Temperature with Station data. Pre denotes precipitation. |