The runoff estimates from five retrospective datasets based on the offline LSMs (VIC-CN05.1, CLM-CFSR, CLM-ERAI, CLM-MERRA, and CLM-NCEP) and three reanalysis datasets [ERA-Interim/Land hereafter ERAI/Land, Japanese 55-yr reanalysis (JRA55), and MERRA-2] are used in this study. Table 1 summarizes the detailed information of the eight runoff products. Since their durations are different from each other, routing simulations are operated during the common time period of 1980–2009.
No. Name LSM Forcing Precipitation Resolution Duration Source 1 VIC-CN05.1 VIC4.2.d CN05.1 – 0.25° × 0.25° 1961–2017 2 CLM-CFSR CLM4.5 CFSR GPCP 0.5° × 0.5° 1979–2009 Wang et al. (2016) 3 CLM-ERAI CLM4.5 ERAI GPCP 0.5° × 0.5° 1979–2009 Wang et al. (2016) 4 CLM-MERRA CLM4.5 MERRA GPCP 0.5° × 0.5° 1979–2009 Wang et al. (2016) 5 CLM-NCEP CLM4.5 NCEP–NCAR CRU TS 0.5° × 0.5° 1979–2009 Wang et al. (2016) 6 ERAI/Land HTESSEL ERAI GPCP 0.75° × 0.75° 1979–2010 Balsamo et al. (2015) 7 JRA55 SiB JRA55 – T319 (~55 km) 1958–2012 Ebita et al. (2011) 8 MERRA-2 Catchment MERRA-2 CPCU, CMAP 0.5° × 0.625° 1980–present Gelaro et al. (2017) *CLM: NCAR Community Land Model; CFSR: NCEP Climate Forecast System Reanalysis; CPCU: NOAA Climate Prediction Center (CPC) unified gauge analysis; CMAP: CPC Merged Analysis of Precipitation. Other acronyms can be found in the main text of this paper.
Table 1. Summary of simulated runoff products from offline land surface models (LSMs) and reanalysis datasets
The VIC-CN05.1 runoff product is based on VIC 4.2.d, driven by a pure daily station-based atmospheric forcing dataset (precipitation, maximum and minimum temperature, and wind speed) called CN05.1 (Wu and Gao, 2013). The CN05.1 atmospheric forcing dataset was constructed on the basis of more than 2400 stations in China and has been widely used in model evaluation and long-term analysis (Gao et al., 2013; Peng and Zhou, 2017). The physical parameters in the offline simulation of VIC-CN05.1 were derived from the high-resolution soil properties and hydraulic characteristics datasets in China (Dai et al., 2013; Shangguan et al., 2013). The empirical parameters have been calibrated and validated by monthly naturalized streamflow at major river basins in China by Zhang et al. (2014), through a multi-objective global optimization method with Nash–Sutcliffe efficiency and relative error used as objective functions. Moreover, the terrestrial water budget components (e.g., soil moisture, runoff, and evaporation) in the VIC-CN05.1 dataset have been extensively evaluated and performed well against in situ measurements and satellite observations.
The other four retrospective runoff products (CLM-CFSR, CLM-ERAI, CLM-MERRA, and CLM-NCEP) developed by Wang et al. (2016) were based on the Community Land Model (CLM) version 4.5, driven by the CFSR, ERA-Interim, MERRA, and NCEP atmospheric reanalysis forcing datasets with bias-corrected precipitation, respectively. The monthly precipitation in the NCEP was adjusted by the Climatic Research Unit Time Series (CRU TS; Mitchell and Jones, 2005), while the monthly precipitation in the CFSR, ERA-Interim, and MERRA reanalysis datasets were bias-corrected by the Global Precipitation Climatology Project product (GPCP; Adler et al., 2003; Huffman et al., 2009). Wang et al. (2016) suggested that the CLM-CFSR, CLM-ERAI, CLM-MERRA, and CLM-NCEP products can reproduce the Chinese soil moisture and snow depth well.
ERAI/Land is a global land surface reanalysis dataset from the ECMWF (Balsamo et al., 2015). The simulated runoff is derived from the latest HTESSEL (Hydrology-Tiled ECMWF Scheme for Surface Exchanges over Land) LSM driven by ERA-Interim meteorological forcing data with monthly precipitation adjusted by the GPCP. The JRA55 atmospheric reanalysis is produced by the Japan Meteorological Agency (JMA; Ebita et al., 2011). To generate land surface analysis fields (including runoff), three-hourly atmospheric forcing data from the forecast model are used to force the Simple Biosphere model (SiB) during the assimilation cycle. The MERRA-2 atmospheric reanalysis is the latest product developed by NASA’s Global Modeling and Assimilation Office (GMAO; Gelaro et al., 2017) based on the Catchment LSM. Compared to MERRA (the early version), the model-generated precipitation in MERRA-2 is adjusted by the station-based CPCU product (Xie et al., 2007; Chen et al., 2008) and merged CMAP product (Xie and Arkin, 1997) over Africa within the coupled system.
Figure 2 shows the locations of 26 hydrological stations over 9 river basins in China, at which the monthly streamflow records are available from 1980 to 2008. In this study, we concentrate on the Huai River, the Yellow River, and the Yangtze River basins because they are prone to floods. Their drainage areas are 270,000, 750,000, and 1,800,000 km2, respectively. In each river basin, we select two stations with one at the upper reach of the stream (e.g., 1-Huai_Wangjiaba, 3-Yangtze_Zhimenda, and 7-Yellow_Tangnaihai in Fig. 2) and the other at the outlet (2-Huai_Bengbu, 6-Yangtze_Datong, and 8-Yellow_Huayuankou in Fig. 2) to perform extensive evaluations. Because the Yangtze River is the longest river with the largest drainage area in China, two more stations (4-Yangtze_Pingshan and 5-Yangtze_Yichang in Fig. 2) in the middle reach are also selected for detailed analysis. These stations are chosen partially because they have relatively complete streamflow observations during the study period.
Figure 2. Locations of 26 hydrological stations in China with records available from 1980 to 2008. Eight stations without missing data in the Huai River, Yangtze River, and Yellow River basins are selected and labeled as 1, 2, ..., 8 (in red) in the map. The corresponding stations are referred to as 1-Huai_Wangjiaba, 2-Huai_Bengbu, 3-Yangtze_Zhimenda, 4-Yangtze_Pingshan, 5-Yangtze_Yichang, 6-Yangtze_Datong, 7-Yellow_Tangnaihai, and 8-Yellow_Huayuankou, with a naming convention of “Number-RiverName_LocationName.” Note that the stations 2-Huai_Bengbu, 6-Yangtze_Datong, and 8-Yellow_Huayuankou are in the outlet of each river.
In addition to errors from the routing model and DEM-related parameters, the accuracy of simulated streamflow is also determined by the runoff input. Thus, eight runoff products are first evaluated by a composite runoff dataset and intercompared to show the differences among them. The spatial pattern, seasonal cycle, and interannual variability in simulated streamflow are then assessed against the gauged streamflow. The mean magnitude and standard deviation (STD) of runoff are computed to depict spatial features quantitatively, while the interannual variability is described by the coefficient of variation (CV). The CV (STD/mean) reflects the dispersion degree among the datasets and facilitates intercomparion over different hydrological regimes. To quantify the performances of the eight runoff products, we calculated the correlation coefficient (R), standard deviation (STD), Nash–Sutcliffe efficiency coefficient (NSE), and relative error (RE) for the monthly simulated and observed streamflow at all the hydrological stations. The NSE, sensitive to high values, reveals the abilities of the models in simulating the magnitude and timing of the peak flow (Moriasi et al., 2007). The perfect simulation appears when the NSE equals 1. An NSE less than 0 indicates an unreliable simulation. The values of R and normalized STD (the ratio of the simulations to observations) closer to 1 and the RE closer to 0 correspond to better performances.
To intercompare the overall performances of the eight products, a simplified ranking method is introduced (Wang and Zeng, 2012). The ranking score ranges from 1 to 8, with 1 indicates the best performance. First, the statistical metrics of each product at all hydrological stations are averaged (see columns 3–6 in Table 2). According to each averaged metric, the eight products are scored from the best (1) to the worst (8). Then, the scores of four statistical metrics are averaged as the mean score of each product (see column 7 in Table 2). Finally, the mean scores of the eight products are ranked to determine their relative performances.
No. Name NSE RE STD R Mean score 1 VIC-CN05.1 0.24 (4) 0.00 (1) 0.72 (2) 0.85 (1) 2.00 2 CLM-CFSR 0.26 (2) −0.19 (4) 0.49 (4) 0.65 (7) 4.25 3 CLM-ERAI 0.26 (3) −0.18 (3) 0.48 (5) 0.61 (8) 4.75 4 CLM-MERRA −0.19 (6) −0.51 (6) 0.26 (7) 0.67 (6) 6.25 5 CLM-NCEP −0.53 (8) −0.74 (8) 0.20 (8) 0.75 (4) 7.00 6 ERAI/Land 0.41 (1) −0.24 (5) 0.55 (3) 0.78 (3) 3.00 7 JRA55 0.09 (5) 0.10 (2) 0.87 (1) 0.72 (5) 3.25 8 MERRA-2 −0.29 (7) −0.69 (7) 0.39 (6) 0.82 (2) 5.50
Table 2. The average (across eight hydrological stations) statistical quantities (NSE, RE, STD, and R) for eight runoff products based on the monthly streamflow observations. All R values are significant at p = 0.02, and the numbers in the parentheses represent the ranking scores of each statistical quantity
3.1. Runoff products
3.2. Gauged streamflow
3.3. Analysis methods
The climatological (monthly and annual) composite runoff field from the Global Runoff Data Centre (GRDC) on a 0.5° × 0.5° resolution is adopted to validate those simulated runoff products. The GRDC composite runoff dataset is the combination of gauged discharge and outputs of a water balance model, and it keeps the accuracy of the observations and the spatiotemporal pattern of simulations at the same time (Fekete et al., 2002). There are 21 hydrological stations in China included in the GRDC, and the length of record varies with the stations but continuously updates. Although the GRDC is not based purely on observation, it is widely used as a reference runoff dataset for regional and global research (Wang et al., 2016; Lv et al., 2018). Figure 3 shows the mean spatial patterns of the GRDC and eight runoff products in summer (June–July–August) of 1980–2009. The magnitudes of the runoff values from all the products are relatively large in southeastern China (> 5 mm day−1) and small in other regions (< 1 mm day−1). The spatial mean and STD of the GRDC runoff data over China are 1.28 and 2.36 mm day−1, respectively, which are larger than those of all eight runoff products. Among the eight products, the relatively larger mean runoff values of 1.19, 1.12 , and 1.10 mm day−1 for JRA55, VIC-CN05.1, and CLM-CFSR, respectively, and the STDs of 1.78, 1.59, and 1.50 mm day−1 for JRA55, CLM-CFSR, and ERAI/Land, respectively, are also distinct. The JRA55 product has the highest mean value and STD, which are closest to those of the GRDC. Compared to the GRDC and other products, the CLM-MERRA, CLM-NCEP, and MERRA-2 obtain smaller mean and STD values concurrently (Fig. 3).
Figure 3. Spatial patterns of (a) the Global Runoff Data Centre (GRDC) composite runoff field and (b–i) the eight runoff products in summer during 1980–2009. The mean values (mean) and spatial standard deviations (STD) over China are indicated above each panel.
Studies have indicated that precipitation is the main source of errors in the runoff products (Fekete et al., 2004; Sheffield et al., 2006; Wang and Zeng, 2011). To investigate the reason for the differences among the eight runoff products, Fig. 4 presents the mean spatial patterns of precipitation in summer during 1980–2009. Although precipitation values in the six datasets have assimilated more or less observations (in situ/remote sensing), the CN05.1 dataset is the most accurate and can be regarded as the ground truth because it was derived solely from abundant station observations in China. The spatial average precipitation and STD of the CN05.1 dataset over China are 3.53 and 2.30 mm day−1, respectively. Among the eight datasets, the mean and STDs of precipitation are largest in JRA55, ERAI, and GPCP. Except for the CRU (3.48 mm day−1) and MERRA-2 (3.18 mm day−1), the mean precipitation of the other datasets is larger than that of the CN05.1. Only MERRA-2 (2.23 mm day−1) has a smaller spatial STD than that of the CN05.1. Hence, likely underestimated runoff in the CLM-NCEP and MERRA-2 datasets might result from small amounts of precipitation. Since the precipitation for the CLM-MERRA dataset has been adjusted by the GPCP, small runoff values are caused by the other forcing variables in the MERRA dataset and the LSM.
Figure 4. Spatial patterns of climatological summer precipitation for each runoff product of (a) CN05.1, (b) GPCP, (c) CRU, (d) ERA-Interim, (e) JRA55, and (f) MERRA-2, during 1980–2009. The mean values (mean) and spatial standard deviations (STD) over China are indicated above each panel.
The CV (STD/mean) of the mean precipitation for six datasets is 0.06, whereas the CV of the mean runoff for eight products is 0.30. Greater consistency among the precipitation datasets than that of runoff from the eight products demonstrates that simulated runoff is also affected by other atmospheric forcing variables and model parameterization schemes. For example, the only difference between the CLM-ERAI and ERAI/Land products is the LSMs, as they share the same combination of reanalysis forcing data and precipitation. The biases in the CLM-CFSR, CLM-ERAI, and CLM-MERRA products with identical precipitation and LSM illustrate the effect of other meteorological forcing variables on the runoff simulations (Wang et al., 2016).
The runoff from the LSMs and reanalyses are routed to the river channels by using the CaMa-Flood model. We compare the spatial patterns of climatological simulated streamflow against observations in summer during 1980–2008 (Fig. 5). In most areas of China, the distributions of simulated and observed streamflow match well with each other. In the middle and lower reaches of the Yangtze River basin, only the VIC-CN05.1 and CLM-CFSR products can capture the magnitude of the observed streamflow (> 30,000 m3 s−1). The streamflow in the Yellow River basin is obviously underestimated by the CLM-MERRA, CLM-NCEP, and MERRA-2 products (< 500 m3 s−1).
Figure 5. Spatial patterns of mean (a) observed and (b–i) simulated streamflow in boreal summer during 1980–2008.
The seasonal cycles of the simulated and observed streamflow during 1980–2008 are presented in Fig. 6. The performance of the simulated seasonal streamflow varies with the source runoff products, station locations (upper/lower), and different river basins. The peak flow is significantly underestimated by most products, especially at the upstream stations in the Yangtze and Yellow River basins (Figs. 6c, g). However, the peak flow is noticeably overestimated by VIC-CN05.1 and JRA55 at Huayuankou station at the outlet of the Yellow River basin (Fig. 6h). The simulated streamflow at the upper stream stations has clear seasonal variations in peak flow in summer, while the peak timing of all the products at the downstream stations lags behind the peak observations (Figs. 6b, f, h). Except for the CLM-MERRA, CLM-NCEP, and MERRA-2 products, the seasonal cycles of the other products match well with measurements at the Yichang and Datong sites in the middle reach of the Yangtze River (Figs. 6d, e). In general, the seasonal cycles of the simulations from the VIC-CN05.1, JRA55, and ERAI/Land products track closer than the simulations from the other products to the observed seasonal cycles. The seasonal streamflow in the Huai River and Yangtze River basins are simulated better than that in the Yellow River basin.
Figure 6. Seasonal cycles of simulated and observed streamflow during 1980–2008 at eight selected hydrological stations in the Huai River, Yangtze River, and Yellow River basins.
The interannual variabilities described by the coefficients of variation (CVs) of the simulated and observed streamflow during 1980–2008 are compared in Fig. 7. In the Huai River basin, the CV of observation is only caught by the MERRA-2, VIC-CN05.1, and ERAI/Land products at Wangjiaba station (Fig. 7a) but are completely underestimated by simulations at Bengbu station (Fig. 7b). In the Yangtze and Yellow River basins (Fig. 7c–h), the CVs of the simulated streamflow agree well with the observations except for MERRA-2, which shows apparent overestimation.
Figure 7. Coefficients of variation (CVs) of monthly streamflow during 1980–2008 at eight selected hydrological stations in the Huai River, Yangtze River, and Yellow River basins.
Monthly streamflow values at Yichang and Datong stations in the middle and lower reaches of the Yangtze River basin during 1980–2008 are taken as an example (Fig. 8). The amplitude of the observed streamflow is largely underestimated by most simulations except that JRA55 can sometimes catch the magnitude of the peak flow at Yichang station. Obviously, the simulated streamflow from the MERRA-2 and CLM-NECP products are much smaller than the observed streamflow, which is attributed to small input runoff. Nevertheless, the timing of the simulated peak flow from all the products is consistent with that in the observations.
Figure 8. Comparisons of monthly observed and simulated streamflow during 1980–2008 at the stations (a) 5-Yangtze_Yichang and (b) 6-Yangtze_Datong in the middle and lower reaches of the Yangtze River basin, respectively.
The normalized STDs and R values between the simulated and observed monthly streamflow during 1980–2008 are summarized in the Taylor diagram (Fig. 9). Most normalized STDs are less than 1, particularly for the CLM-NCEP and CLM-MERRA products, and most of the R values are within 0.60–0.90. The simulated streamflow at Huayuankou station in the Yellow River basin is the worst (see number 8 in Fig. 9), with all R values smaller than 0.60. Of all the products, VIC-CN05.1, with more points close to the reference point, has better performance than the other products. To quantitatively compare the performance of each runoff product, the STDs and R values are averaged for the eight hydrologi-cal stations in Table 2. In terms of the mean STD, the JRA55 product performs best (0.87), followed by the VIC-CN05.1 (0.72) and ERAI/Land (0.55), while the MERRA-2 (0.39), CLM-MERRA (0.26), and CLM-NCEP (0.20) products show poor performances. The smaller variations in the monthly simulated streamflow from the eight products may be caused by the absence of human activities in the LSMs and the limitations of the routing parameterization scheme (e.g., diffusive wave) in the CaMa-Flood model. The mean R value of the VIC-CN05.1 product is the highest at 0.85, while that of the CLM-ERAI product is the lowest at 0.61. The MERRA-2 product obtains the second largest R value of 0.82. The mean R values of CLM-MERRA (0.67) and CLM-NCEP (0.75) are not bad.
Figure 9. Taylor diagram for monthly streamflow at eight hydrological stations during 1980–2008, which shows correlation coefficient (R), normalized standard deviation (STD), and the normalized centered root mean square error (distance to the point marked REF). All R values are significant at p = 0.02. The numbers 1–8 refer to the same stations as in Fig. 2.
Figure 10 illustrates histograms of the RE and NSE for monthly simulated and observed streamflow during 1980–2008. The NSE values for the CLM-NCEP product at all stations are smaller than 0, which means that the CLM-NCEP product has no skills to simulate peak flow. MERRA-2 performs relatively better in the Huai River basin, with positive NSE at Wangjiaba and Bengbu stations (0.27 and 0.22). Large uncertainties exist in the NSE and RE values from the eight products at Hua-yuankou station. For example, most NSE values range from −1 to 1 except for VIC-CN05.1 and JRA55 at Huayuankou station (−2.86 and −3.81). The RE values of VIC-CN05.1 and JRA55 at Huayuankou station are 1.23 and 1.17, respectively. The overestimations and negative NSEs of VIC-CN05.1 and JRA55 at Huayuankou station may result from the frequent anthropogenic activities (e.g., dams and water withdrawals) in the Yellow River basin, which are not included in the LSMs. In terms of the mean NSE in Table 2, the ERAI/Land product simulates peak flow the best with the highest NSE value of 0.41, followed by the CLM-CFSR product (0.26). Due to the large negative values at Huayuankou station, the mean NSE values for VIC-CN05.1 (0.24) and JRA55 (0.09) are no longer satisfactory. The peak flow skills of CLM-MERRA, MERRA-2, and CLM-NCEP are still poor, with a negative mean NSE. On the whole, the simulated streamflow is smaller than the observed streamflow with most RE values less than 0 (Fig. 10b). The systematic underestimation of the monthly streamflow by all products may be related to inaccurate parameters in the CaMa-Flood model, such as river width and depth (Yamazaki et al., 2012). The seemingly perfect performances of VIC-CN05.1 and JRA55 with mean RE values near 0 are unreliable because the large positive values at Huayuankou station compensate for the negative values at the other hydrological stations. The performances of CLM-MERRA (−0.51), MERRA-2 (−0.69), and CLM-NCEP (−0.74) are still unsatisfactory, with the largest three negative biases.
Figure 10. (a) Nash–Sutcliffe efficiency coefficient (NSE) and (b) relative error (RE) for monthly streamflow at the eight hydrological stations during 1980–2008. The numbers 1–8 on the x-axis refer to the same stations as in Fig. 2.
In summary, the overall performances of the eight runoff products are ranked based on the mean scores of four statistical quantities (Table 2), which are displayed in descending order as follows: VIC-CN05.1, ERAI/Land, JRA55, CLM-CFSR, CLM-ERAI, MERRA-2, CLM-MERRA, and CLM-NCEP.
We also rank the simulations at eight hydrological stations based on the four statistical quantities averaged for the eight runoff products (Table 3): Pingshan, Yichang, Wangjiaba, Bengbu, Huayuankou, Datong, Tangnaihai, and Zhimenda. The ranking score for Huayuankou station is unreliable because large uncertainties among the eight products counteract each other. The Yellow River basin in the semiarid and arid regions has relatively small discharge but high evaporation, and it is heavily affected by human activities (Liu et al., 2019; Wang et al., 2019). Hence, simulated streamflow at Huayuankou station in the Yellow River outlet suffers more errors from the lack of human activities in the LSMs and routing process. According to the Chinese Glacier Inventory, glacier runoff contributes to 9.2% of the streamflow above Zhimenda station in the Yangtze River (Wu et al., 2013). Wu and Gao (2013) illustrated that precipitation stations were sparse in western China, especially in the northern part of the Tibetan Plateau. Glacial meltwater supplementation that is ignored in the LSMs and inaccurate precipitation in the source region of the Yangtze River may account for poor simulations at Zhimenda station. On average, all products outperform at Pingshan and Yichang stations, which are located in the middle reach of the Yangtze River basin with abundant water resources. With the exception of Huayuankou and Zhimenda stations, the simulations at the upstream stations are better than those at the downstream stations because of relatively fewer human activities in the upper reaches.
Station NSE RE STD R Mean score 1-Huai_Wangjiaba 0.28 (2) −0.43 (6) 0.45 (6) 0.73 (5) 4.75 2-Huai_Bengbu 0.27 (3) −0.17 (2) 0.38 (7) 0.68 (7) 4.75 3-Yangtze_Zhimenda 0.03 (5) −0.61 (8) 0.34 (8) 0.85 (3) 6.00 4-Yangtze_Pingshan 0.39 (1) −0.29 (4) 0.66 (1) 0.85 (1) 1.75 5-Yangtze_Yichang 0.25 (4) −0.32 (5) 0.55 (3) 0.85 (2) 3.50 6-Yangtze_Datong −0.07 (7) −0.28 (3) 0.54 (4) 0.72 (6) 5.00 7-Yellow_Tangnaihai −0.04 (6) −0.44 (7) 0.46 (5) 0.76 (4) 5.50 8-Yellow_Huayuankou −0.86 (8) 0.11 (1) 0.58 (2) 0.40 (8) 4.75
Table 3. The average (across eight runoff products) statistical quantities (NSE, RE, STD, and R) for eight hydrological stations based on the monthly streamflow observations. All R values are significant at p = 0.02, and the numbers in the parentheses represent the ranking scores of each statistical quantity