Quality Control and Evaluation of the Observed Daily Data in the North American Soil Moisture Database

+ Author Affiliations + Find other works by these authors
  • Corresponding author: Dagang WANG, wangdag@mail.sysu.edu.cn
  • Funds:

    Supported by the National Key Research and Development Program of China (2017YFA0604300), National Natural Science Foundation of China (51779278, 51379224, and 41671398), and NOAA/CPO Modeling, Analyses, Predictions, and Projections (MAP) Program

  • doi: 10.1007/s13351-019-8121-2

PDF

  • The North American Soil Moisture Database (NASMD) was initiated in 2011 to assemble and homogenize in situ soil moisture measurements from 32 observational networks in the United States and Canada encompassing more than 1800 stations. Although statistical quality control (QC) procedures have been applied in the NASMD, the soil moisture content tends to be systematically underestimated by in situ sensors in frozen soils, and using a single maximum threshold (i.e., 0.6 m3 m–3) may not be sufficient for robust QC because of the diverse soil textures in North America. In this study, based on the in situ soil porosity and North American Land Data Assimilation System phase 2 (NLDAS-2) Noah soil temperature, the simple automated QC method is revised to supplement the existing QC approach. This revised QC method is first validated based on the assessment at 78 of the Soil Climate Analysis Network (SCAN) stations where the manually checked data are available, and is then applied to all stations in the NASMD to produce a more strict quality-controlled dataset. The results show that the revised automated QC procedure can flag the spurious and erroneous soil moisture measurements for the SCAN stations, especially for those located in high altitudes and latitudes. Relative to station measurements in the original NASMD, the quality-controlled data show a slightly better agreement with the manually checked soil moisture content. It should be noted that this quality-controlled dataset may be over-flagged for some valid soil moisture measurements due to potential errors of the soil temperature and soil porosity data, and validation in this study is limited by the availability of benchmark soil moisture data. The updated QC and additional validation will be desirable to boost confidence in the product when high-quality data become available in the future.
  • 加载中
  • Fig. 1.  (a) Map of observation stations included in the North American Soil Moisture Database (NASMD) with the background map indicating topography; (b) locations of example stations in the Soil Climate Analysis Network (SCAN) used for the evaluation and generation of the automated quality control (QC) with the background map indicating the NLDAS-2 Noah soil porosity (m3 m–3). Data from a total of the 78 example stations in SCAN are used in the evaluation process.

    Fig. 2.  Spatial distributions of the anomaly correlation (AC) quality-controlled soil moisture observations from Exp1 and raw observations at (a–e) 5-, 10-, 20-, 50-, and 100-cm soil layers for the 78 SCAN stations. Solid circles represent the statistically significant difference at the 95% confidence level.

    Fig. 3.  As in Fig. 2, but for the Taylor skill score (S).

    Fig. 4.  As in Fig. 2, but for the bias.

    Fig. 5.  Boxplots of the metrics of (a) anomaly correlation (AC), (b) Taylor skill score (S), (c) bias, (d) mean absolute error (MAE), and (e) root-mean-square error (RMSE) at the five soil measurement layers for the 78 SCAN stations. The filled circles represent mean values for the 78 SCAN stations. Black represents the calculation based on raw observations, red based on quality-controlled data from Exp1, blue based on quality-controlled data from Exp2, and green based on quality-controlled data from Exp3.

    Fig. 6.  (a, c, e) The multi-year (2000–12) monthly variations and (b, d, f) monthly means of the soil moisture (m3 m–3) at the 5- (a, b), 10- (c, d), and 20-cm (e, f) soil depths.

    Fig. 7.  The 13-yr (2000–12) monthly averaged soil moisture for the stations (a–c) MS_2110, (d–f) MS_2032, and (g–i) MO_2061 at the 5- (a, d, g), 10- (b, e, h), and 20-cm (c, f, i) soil depths.

    Fig. 8.  The spatial distribution of percentages of flag occurrences by the (a) GR check, (b) SP check, and (c) ST0 check based on all the checked stations in NASMD for different soil depths.

    Fig. 9.  The comparison of averaged raw observations (black line) and quality-controlled observations (red line) at the top soil layer for the (a) CRN, (b) MAWN, (c) OM, (d) SNOTEL, (e) SCAN, and (f) WTM, spatially averaged from 53 checked stations in WTM to 335 checked stations in SNOTEL.

    Fig. 10.  The comparison of averaged raw observations (black line) and quality-controlled observations (red line) at the shallow soil layers (< 50 cm) for (a–c) CRN, (d) MAWN, (e–f) SNOTEL, and (g–i) SCAN.

    Table 1.  List of the 22 measurement networks including the number of stations, period of record, and depths at which the soil moisture is measured

    Network name (Abbreviation)Number of stationsStart yearEnd yearMeasurement depth (cm)
    Alberta Agriculture and Rural Development (AARD)372003Present5, 20, 50, 100
    AmeriFlux Network (AmeriFlux)571996PresentVaries (5–200)
    Atmospheric Radiation Measurement (ARM)171996Present5, 15, 25, 35, 60, 85, 125, 175
    Automated Weather Data Network (AWDN)432006Present10, 25, 50, 100
    Central Plains Experimental Range (CPER)12004Present5, 20, 50, 100
    Center for Hurricane Intensity and Landfall Investigation (CHILI)252006Present100
    Climate Reference Network (CRN)1132009Present5, 10, 20, 50, 100
    Cosmic-ray Soil Moisture Observing Station (COSMOS)542008PresentVaries (10–30)
    Delaware Environmental Observing System (DEOS)262005Present5
    Environment and Climate Observing Network (ECONET)361999Present20
    Illinois Climate Network (ICN)192004Present5, 10, 20, 50, 100, 150
    International Sorghum Grain Mold Nursery (ISGMN)62013Present30, 60, 90, 120, 150, 180
    Missouri Agricultural Weather Database (MAW-Missouri)82000Present5
    Michigan Automated Weather Network (MAWN)801999Present4, 10
    National Oceanic and Atmospheric Administration Hydrometeorological Testbed (NOAA HMT)252004PresentVaries (5–100)
    Oklahoma Mesonet (OM)1041998Present5, 25, 60, 75
    Snow Telemetry (SNOTEL)3521994Present5, 20, 50, 100
    Soil Climate Analysis Network (SCAN)1871994Present5, 10, 20, 50, 100
    Soilscape62011PresentVaries (5–50)
    South Dakota Automated Weather Network (SDAWN)112001Present5, 10, 20, 50, 100
    Water and Environmental Research Center (WERC)2419982012Varies (5–50)
    West Texas Mesonet (WTM)592002Present5, 20, 60, 75
    Download: Download as CSV

    Table 2.  The porosity table used in the Noah land surface model

    Soil texturePorosity value (m3 m–3)
    Sand0.395
    Loamy sand0.421
    Sandy loam0.434
    Silt loam0.476
    Silt0.476
    Loam0.439
    Sandy clay loam0.404
    Silty clay loam0.464
    Clay loam0.465
    Sandy clay0.406
    Silty clay0.468
    Clay0.457
    Organic materials0.464
    Bedrock0.200
    Download: Download as CSV

    Table 3.  Design of the three sensitivity experiments

    Exp nameQC nameQC criterion
    Exp1GR checkMeasurements are within a geophysical range between 0.0 and 0.6 m3 m–3.
    SP checkMeasurements do not exceed the in situ soil porosity based on the soil texture recorded for each depth in NASMD.
    ST0 checkMeasurements are not taken in frozen soils based on the NLDAS-2 Noah soil temperature.
    Exp2GR checkMeasurements are within a geophysical range between 0.0 and 0.6 m3 m–3.
    SP check/
    ST0 checkMeasurements are not taken in frozen soils based on the NLDAS-2 Noah soil temperature.
    Exp3GR checkMeasurements are within a geophysical range between 0.0 and 0.6 m3 m–3.
    SP checkMeasurements do not exceed the porosity based on the NLDAS-2 Noah soil porosity.
    ST0 checkMeasurements are not taken in frozen soils based on the NLDAS-2 Noah soil temperature.
    Download: Download as CSV

    Table 4.  Percentages of flag occurrences at the five measured soil layers for the 78 SCAN stations from the three experiments

    Exp nameQC criterionPercentage of flag occurrences (%)
    5 cm10 cm20 cm50 cm100 cm
    Exp1*GR check 1.61 0.320.070.32 0.29
    SP check 3.17 2.693.686.3312.55
    ST0 check11.1710.649.588.73 8.00
    Exp2*GR check 1.61 0.320.070.32 0.29
    SP check/////
    ST0 check11.3110.719.678.99 8.10
    Exp3*GR check 1.61 0.320.070.32 0.29
    SP check 2.85 2.463.786.4712.24
    ST0 check11.1710.649.598.76 8.00
    Notes: *Exp1 uses the geophysical range, NLDAS-2 Noah soil temperature, and NASMD soil texture; Exp2 only uses the geophysical range and NLDAS-2 Noah soil temperature; Exp3 uses the geophysical range, NLDAS-2 Noah soil temperature, and NLDAS-2 Noah soil porosity.
    Download: Download as CSV

    Table 5.  Percentage of flag occurrences based on 22 networks of NASMD that provide measurements on a daily basis

    Network nameNo. of
    stations
    No. of
    checked stations
    No. of total
    measurement records
    GR check (%)SP check (%)ST0 check (%)
    AARD3718185,5520.000.0039.68
    AmeriFlux5748268,6500.872.9113.73
    ARM1717533,6360.000.074.61
    AWDN4341298,9270.010.0224.60
    CPER10////
    CHILI252512,8263.270.550.00
    CRN113113298,7841.036.7510.19
    COSMOS545418,4262.324.48/
    DEOS262629,7610.001.2610.64
    ECONET363198,2602.6210.301.46
    ICN1919345,1270.3711.2712.12
    ISGMN60////
    MAW-Missouri8818,4625.4216.3113.17
    MAWN8080410,1925.026.9421.32
    NOAA HMT252583,0303.126.4811.67
    OM1041041,322,7360.000.102.80
    SNOTEL3523352,529,4870.360.4746.72
    SCAN1871151,329,1150.465.279.85
    Soilscape60////
    SDAWN111179,5115.696.0529.47
    WERC240////
    WTM5953460,7280.422.921.17
    Download: Download as CSV
  • [1]

    Anderson, W. B., B. F. Zaitchik, C. R. Hain, et al., 2012: Towards an integrated soil moisture drought monitor for East Africa. Hydrol. Earth Syst. Sci., 16, 2893–2913. doi:  10.5194/hess-16-2893-2012.
    [2]

    Balsamo, G., A. Beljaars, K. Scipal, et al., 2009: A revised hydrology for the ECMWF model: Verification from field site to terrestrial water storage and impact in the integrated forecast system. J. Hydrometeor., 10, 623–643. doi:  10.1175/2008JHM1068.1.
    [3]

    Brocca, L., L. Ciabatta, C. Massari, et al., 2017: Soil moisture for hydrological applications: Open questions and new opportunities. Water, 9, 140. doi:  10.3390/w9020140.
    [4]

    Cai, X. T., Z.-L. Yang, Y. L. Xia, et al., 2014: Assessment of simulated water balance from Noah, Noah-MP, CLM, and VIC over CONUS using the NLDAS test bed. J. Geophys. Res. Atmos., 119, 13751–13770. doi:  10.1002/2014JD022113.
    [5]

    Collow, T. W., A. Robock, J. B. Basara, et al., 2012: Evaluation of SMOS retrievals of soil moisture over the central United States with currently available in situ observations. J. Geophys. Res. Atmos., 117, D09113. doi:  10.1029/2011JD017095.
    [6]

    De Lannoy, G. J. M., and R. H. Reichle, 2016: Assimilation of SMOS brightness temperatures or soil moisture retrievals into a land surface model. Hydrol. Earth Syst. Sci., 20, 4895–4911. doi:  10.5194/hess-20-4895-2016.
    [7]

    Dirmeyer, P. A., 2011: The terrestrial segment of soil moisture–climate coupling. Geophys. Res. Lett., 38, L16702. doi:  10.1029/2011GL048268.
    [8]

    Dorigo, W. A., W. Wagner, R. Hohensinn, et al., 2011: The international soil moisture network: A data hosting facility for global in situ soil moisture measurements. Hydrol. Earth Syst. Sci., 15, 1675–1698. doi:  10.5194/hess-15-1675-2011.
    [9]

    Dorigo, W. A., A. Xaver, M. Vreugdenhil, et al., 2013: Global automated quality control of in situ soil moisture data from the international soil moisture network. Vadose Zone J., 12, 1–21. doi:  10.2136/vzj2012.0097.
    [10]

    El Sharif, H., J. F. Wang, and A. P. Georgakakos, 2015: Modeling regional crop yield and irrigation demand using SMAP type of soil moisture data. J. Hydrometeor., 16, 904–916. doi:  10.1175/JHM-D-14-0034.1.
    [11]

    Ford, T. W., and S. M. Quiring, 2014: In situ soil moisture coupled with extreme temperatures: A study based on the Oklahoma Mesonet. Geophys. Res. Lett., 41, 4727–4734. doi:  10.1002/2014GL060949.
    [12]

    Gruhier, C., P. de Rosnay, S. Hasenauer, et al., 2010: Soil moisture active and passive microwave products: Intercomparison and evaluation over a Sahelian site. Hydrol. Earth Syst. Sci., 14, 141–156. doi:  10.5194/hess-14-141-2010.
    [13]

    Hallikainen, M. T., F. T. Ulaby, M. C. Dobson, et al., 1985: Microwave dielectric behavior of wet soil-part 1: Empirical models and experimental observations. IEEE Trans. Geosci. Remote Sens., GE-23, 25–34. doi:  10.1109/TGRS.1985.289497.
    [14]

    Holgate, C. M., R. A. M. De Jeu, A. I. J. M. van Dijk, et al., 2016: Comparison of remotely sensed and modelled soil moisture data sets across Australia. Remote Sens. Environ., 186, 479–500. doi:  10.1016/j.rse.2016.09.015.
    [15]

    Kishné, A. S., Y. T. Yimam, C. L. S. Morgan, et al., 2017: Evaluation and improvement of the default soil hydraulic parameters for the Noah land surface model. Geoderma, 285, 247–259. doi:  10.1016/j.geoderma.2016.09.022.
    [16]

    Koster, R. D., Y. H. Chang, H. L. Wang, et al., 2016: Impacts of local soil moisture anomalies on the atmospheric circulation and on remote surface meteorological fields during boreal summer: A comprehensive analysis over North America. J. Climate, 29, 7345–7364. doi:  10.1175/JCLI-D-16-0192.1.
    [17]

    Kumar, S. V., C. D. Peters-Lidard, D. Mocko, et al., 2014a: Assimilation of remotely sensed soil moisture and snow depth retrievals for drought estimation. J. Hydrometeor., 15, 2446–2469. doi:  10.1175/JHM-D-13-0132.1.
    [18]

    Kumar, S., P. A. Dirmeyer, D. M. Lawrence, et al., 2014b: Effects of realistic land surface initializations on subseasonal to seasonal soil moisture and temperature predictability in North America and in changing climate simulated by CCSM4. J. Geophys. Res. Atmos., 119, 13250–13270. doi:  10.1002/2014JD022110.
    [19]

    Kumar, S. V., B. F. Zaitchik, C. D. Peters-Lidard, et al., 2016: Assimilation of gridded GRACE terrestrial water storage estimates in the North American land data assimilation system. J. Hydrometeor., 17, 1951–1972. doi:  10.1175/JHM-D-15-0157.1.
    [20]

    Legates, D. R., R. Mahmood, D. F. Levia, et al., 2011: Soil moisture: A central and unifying theme in physical geography. Prog. Phys. Geog. Earth Environ., 35, 65–86. doi:  10.1177/0309133310386514.
    [21]

    Liao, W. L., A. J. Rigden, and D. Li, 2018: Attribution of local temperature response to deforestation. J. Geophys. Res. Biogeosci., 123, 1572–1587. doi:  10.1029/2018JG004401.
    [22]

    Liu, Q., R. H. Reichle, R. Bindlish, et al., 2011: The contributions of precipitation and soil moisture observations to the skill of soil moisture estimates in a land data assimilation system. J. Hydrometeor., 12, 750–765. doi:  10.1175/JHM-D-10-05000.1.
    [23]

    Morgan, C. L. S., Y. T. Yimam, M. Barlage, et al., 2017: Valuing of soil capability in land surface modeling. Global Soil Security, D. J. Field, C. L. S. Morgan, and A. B. McBratney, Eds., Springer International Publishing, Cham, doi: 10.1007/978-3-319-43394-3_5.
    [24]

    Pal, J. S., and E. A. B. Eltahir, 2001: Pathways relating soil moisture conditions to future summer rainfall within a model of the land–atmosphere system. J. Climate, 14, 1227–1242. doi:  10.1175/1520-0442(2001)014<1227:PRSMCT>2.0.CO;2.
    [25]

    Parrens, M., E. Zakharova, S. Lafont, et al., 2012: Comparing soil moisture retrievals from SMOS and ASCAT over France. Hydrol. Earth Syst. Sci., 16, 423–440. doi:  10.5194/hess-16-423-2012.
    [26]

    Pozzi, W., J. Sheffield, R. Stefanski, et al., 2013: Toward global drought early warning capability: Expanding international cooperation for the development of a framework for monitoring and forecasting. Bull. Amer. Meteor. Soc., 94, 776–785. doi:  10.1175/BAMS-D-11-00176.1.
    [27]

    Quiring, S. M., T. W. Ford, J. K. Wang, et al., 2016: The North American soil moisture database: Development and applications. Bull. Amer. Meteor. Soc., 97, 1441–1459. doi:  10.1175/BAMS-D-13-00263.1.
    [28]

    Seneviratne, S. I., M. Wilhelm, T. Stanelle, et al., 2013: Impact of soil moisture–climate feedbacks on CMIP5 projections: First results from the GLACE-CMIP5 experiment. Geophys. Res. Lett., 40, 5212–5217. doi:  10.1002/grl.50956.
    [29]

    Stéfanon, M., P. Drobinski, F. D’Andrea, et al., 2014: Soil moisture–temperature feedbacks at meso-scale during summer heat waves over Western Europe. Climate Dyn., 42, 1309–1324. doi:  10.1007/s00382-013-1794-9.
    [30]

    Tang, C. L., and T. C. Piechota, 2009: Spatial and temporal soil moisture and drought variability in the Upper Colorado River Basin. J. Hydrol., 379, 122–135. doi:  10.1016/j.jhydrol.2009.09.052.
    [31]

    Taylor, K. E., 2001: Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos., 106, 7183–7192. doi:  10.1029/2000JD900719.
    [32]

    Taylor, C. M., R. A. M. de Jeu, F. Guichard, et al., 2012: Afternoon rain more likely over drier soils. Nature, 489, 423–426. doi:  10.1038/nature11377.
    [33]

    Wanders, N., D. Karssenberg, A. de Roo, et al., 2014: The suitability of remotely sensed soil moisture for improving operatio-nal flood forecasting. Hydrol. Earth Syst. Sci., 18, 2343–2357. doi:  10.5194/hess-18-2343-2014.
    [34]

    Wang, A. H., D. P. Lettenmaier, and J. Sheffield, 2011: Soil moisture drought in China, 1950–2006. J. Climate, 24, 3257–3271. doi:  10.1175/2011JCLI3733.1.
    [35]

    Wang, Z. L., P. W. Xie, C. G. Lai, et al., 2017: Spatiotemporal variability of reference evapotranspiration and contributing climatic factors in China during 1961–2013. J. Hydrol., 544, 97–108. doi:  10.1016/j.jhydrol.2016.11.021.
    [36]

    Wu, X. S., S. L. Guo, J. B. Yin, et al., 2018: On the event-based extreme precipitation across China: Time distribution patterns, trends, and return levels. J. Hydrol., 562, 305–317. doi:  10.1016/j.jhydrol.2018.05.028.
    [37]

    Xia, Y. L., K. Mitchell, M. Ek, et al., 2012a: Continental-scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1. Intercomparison and application of model products. J. Geophys. Res. Atmos., 117, D03109. doi:  10.1029/2011JD016048.
    [38]

    Xia, Y. L., K. Mitchell, M. Ek, et al., 2012b: Continental-scale water and energy flux analysis and validation for North American Land Data Assimilation System project phase 2 (NLDAS-2): 2. Validation of model-simulated streamflow. J. Geophys. Res. Atmos., 117, D03110. doi:  10.1029/2011JD016051.
    [39]

    Xia, Y. L., M. Ek, J. Sheffield, et al., 2013: Validation of Noah-simulated soil temperature in the North American Land Data Assimilation System phase 2. J. Appl. Meteor. Climatol., 52, 455–471. doi:  10.1175/JAMC-D-12-033.1.
    [40]

    Xia, Y. L., J. Sheffield, M. B. Ek, et al., 2014: Evaluation of multi-model simulated soil moisture in NLDAS-2. J. Hydrol., 512, 107–125. doi:  10.1016/j.jhydrol.2014.02.027.
    [41]

    Xia, Y. L., M. B. Ek, Y. H. Wu, et al., 2015a: Comparison of NLDAS-2 simulated and NASMD observed daily soil moisture. Part I: Comparison and analysis. J. Hydrometeor., 16, 1962–1980. doi:  10.1175/JHM-D-14-0096.1.
    [42]

    Xia, Y. L., M. B. Ek, Y. H. Wu, et al., 2015b: Comparison of NLDAS-2 simulated and NASMD observed daily soil moisture. Part II: Impact of soil texture classification and vegetation type mismatches. J. Hydrometeor., 16, 1981–2000. doi:  10.1175/JHM-D-14-0097.1.
    [43]

    Xia, Y. L., T. W. Ford, Y. H. Wu, et al., 2015c: Automated quality control of in situ soil moisture from the North American soil moisture database using NLDAS-2 products. J. Appl. Meteor. Climatol., 54, 1267–1282. doi:  10.1175/JAMC-D-14-0275.1.
  • 20190627161552.pdf

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Quality Control and Evaluation of the Observed Daily Data in the North American Soil Moisture Database

    Corresponding author: Dagang WANG, wangdag@mail.sysu.edu.cn
  • 1. School of Geography and Planning, Sun Yat-sen University, Guangzhou 510275, China
  • 2. Guangdong Key Laboratory for Urbanization and Geo-simulation, Sun Yat-sen University, Guangzhou 510275, China
  • 3. Key Laboratory of Water Cycle and Water Security in Southern China of Guangdong High Education Institute, Sun Yat-sen University, Guangzhou 510275, China
  • 4. Department of Civil and Environmental Engineering, University of Connecticut, Storrs, CT 06269, USA
  • 5. I. M. Systems Group, Environmental Modeling Center, National Centers for Environmental Prediction, College Park, MD 20740, USA
Funds: Supported by the National Key Research and Development Program of China (2017YFA0604300), National Natural Science Foundation of China (51779278, 51379224, and 41671398), and NOAA/CPO Modeling, Analyses, Predictions, and Projections (MAP) Program

Abstract: The North American Soil Moisture Database (NASMD) was initiated in 2011 to assemble and homogenize in situ soil moisture measurements from 32 observational networks in the United States and Canada encompassing more than 1800 stations. Although statistical quality control (QC) procedures have been applied in the NASMD, the soil moisture content tends to be systematically underestimated by in situ sensors in frozen soils, and using a single maximum threshold (i.e., 0.6 m3 m–3) may not be sufficient for robust QC because of the diverse soil textures in North America. In this study, based on the in situ soil porosity and North American Land Data Assimilation System phase 2 (NLDAS-2) Noah soil temperature, the simple automated QC method is revised to supplement the existing QC approach. This revised QC method is first validated based on the assessment at 78 of the Soil Climate Analysis Network (SCAN) stations where the manually checked data are available, and is then applied to all stations in the NASMD to produce a more strict quality-controlled dataset. The results show that the revised automated QC procedure can flag the spurious and erroneous soil moisture measurements for the SCAN stations, especially for those located in high altitudes and latitudes. Relative to station measurements in the original NASMD, the quality-controlled data show a slightly better agreement with the manually checked soil moisture content. It should be noted that this quality-controlled dataset may be over-flagged for some valid soil moisture measurements due to potential errors of the soil temperature and soil porosity data, and validation in this study is limited by the availability of benchmark soil moisture data. The updated QC and additional validation will be desirable to boost confidence in the product when high-quality data become available in the future.

    • Soil moisture is a critical component of the climate system and serves as an integrative and unifying theme in physical geography (Legates et al., 2011). Evapotranspiration, near-surface atmospheric moisture availability and temperature, and planetary boundary layer instability are all influenced by soil moisture (Pal and Eltahir, 2001; Taylor et al., 2012; Ford and Quiring, 2014; Stéfanon et al., 2014; Wang et al., 2017; Liao et al., 2018). In situ soil moisture has been widely used to investigate the impacts of soil moisture on climate on a variety of timescales (Dirmeyer, 2011; Seneviratne et al., 2013), to validate land surface models (LSMs; Balsamo et al., 2009; Xia et al., 2014) as well as satellite soil moisture products (Gruhier et al., 2010; Collow et al., 2012; Parrens et al., 2012; Holgate et al., 2016), and to provide drought monitoring and early warning (Tang and Piechota, 2009; Wang et al., 2011; Pozzi et al., 2013). Despite the importance of soil moisture in various fields, little has been done to collect and harmonize in situ soil moisture measured in different networks. The International Soil Moisture Network (ISMN) was initiated in 2009 to assemble and homogenize in situ soil moisture measurements from operational networks and validation campaigns on a global scale (Dorigo et al., 2011). There are 55 networks in total that contain over 2200 stations globally available in the ISMN (https://ismn.geo.tuwien.ac.at/en/). Additionally, the North American Soil Moisture Database (NASMD) was initiated in 2011 to provide the harmonized and quality-controlled in situ soil moisture data for different users with various purposes (Quiring et al., 2016). Different from ISMN, the NASMD is established primarily for North America and has approximately twice as many stations in North America as ISMN.

      Several quality control (QC) procedures have been applied to NASMD (Quiring et al., 2016), which flag soil moisture measurements that fall outside the range of 0 to 0.6 m3 m–3, record the same value over 10 days, or deviate more than three times the standard deviation during a 30-day period. However, these do not address all sources of errors, of which an important one is related to complications in frozen soil. Except for the cosmic-ray method, other sensors cannot accurately measure the soil moisture content when soil is partially or completely frozen. In addition, since the dielectric conductivity of ice is significantly lower than that of liquid water, in situ sensors in frozen soil systematically underestimate the soil water content (Hallikainen et al., 1985). However, quality checks based on the frozen/unfrozen soil status have not been conducted when NASMD was developed. Furthermore, using a single maximum threshold, such as 0.6 m3 m–3, may be inadequate for a robust evaluation of soil moisture QC, because the soil porosity differs substantially among the observational stations in North America (Xia et al., 2015c).

      The North American Land Data Assimilation System (NLDAS) aims to generate the spatially and temporally consistent high-quality LSM forcing datasets by using the best available observations, and to provide multi-model output in support of modeling activities. After upgrading the observation-based atmospheric forcing, parameters, and code of LSMs, the NLDAS phase 2 (NLDAS-2) produced a soil temperature dataset that agrees better with observations over much of the conterminous United States than that in the NLDAS phase 1 (Xia et al., 2012a, b, 2013). With the QC procedures of Dorigo et al. (2013), soil moisture measurements in ISMN are quality-controlled 1) based on the geophysical dynamic range and consistency of the observations, and 2) by detecting outliers, breaks, saturation of the signal, and unresponsive sensors based on the spectrum of soil moisture time series. As with ISMN in which Dorigo et al. (2013) developed a QC algorithm to flag spurious soil moisture observations, Xia et al. (2015c) proposed an automated QC methodology to identify spurious and erroneous soil moisture observations in 421 NASMD stations (approximately 23% of all the NASMD stations) based on the NLDAS-2 Noah soil temperature and soil porosity. With the QC procedures of Xia et al. (2015c), soil moisture measurements are flagged if 1) the soil moisture content does not fall within the geophysical range of 0–0.6 m3 m–3, 2) the value in the shallow soil layers (< 60 cm) exceeds the corresponding Noah soil porosity, and 3) the corresponding soil temperature is below zero. The results suggest that the QC approach is efficient to process the in situ soil moisture in NASMD, and the quality-controlled data indeed have a positive influence on assessment of the simulated soil moisture in the four NLDAS-2 LSMs [i.e., Mosaic, Noah, Sacramento (SAC), and Variable Infiltration Capacity (VIC)]. However, Kishné et al. (2017) and Morgan et al. (2017) recently pointed out that the soil hydraulic parameters used in the Noah model were out-of-date, and vertically homogeneous soil properties applied in the Noah model could lead to significant biases in model-simulated fluxes. Therefore, using the same soil porosity value for different soil layers in the QC method of Xia et al. (2015c) may deteriorate the efficiency of QC. To address the existing concerns, this study builds on the prior experience in soil moisture QC and conducts an updated automated QC procedure for all NASMD stations to provide a quality-controlled in situ soil moisture dataset to serve a diverse user community.

      The rest of the paper is organized as follows. Section 2 provides a detailed description of the NASMD, NLDAS-2,and datasets that are used for evaluating and generating QC procedures, followed by the experimental design and evaluation of QC procedures. The results of the evaluation in various QC processes based on the benchmark data are shown in Section 3, and the application of QC to NASMD is described in Section 4, followed by the summary and discussion in Section 5.

    2.   Data, experimental design, and evaluation process
    • The data used in this study include the daily in situ soil moisture observations and soil properties provided by NASMD, and simulated daily soil temperature and soil porosity from NLDAS-2 Noah LSM. NASMD provides a harmonized, dense, and quality-controlled network of the soil moisture dataset in North America (Quiring et al., 2016). It has integrated data from more than 32 observation networks comprising over 1800 observation stations in the United States and Canada. However, there are only 1290 observation stations from 22 networks that are currently available for downloads from the NASMD website (http://nationalsoilmoisture.com/About.html). Figure 1a shows the location of the 1290 observation stations included in NASMD, and Table 1 lists the measurement information of the 22 networks, i.e., the station density, record periods, and measurement depths. The number of stations varies from 1 in Central Plains Experimental Range (CPER) to 352 in Snow Telemetry (SNOTEL). However, as each network tends to address a unique objective, measurement techniques and depths are varied in different networks. Metadata, including the location, country, state, observation network, measurement depths, percent sand/silt/clay, soil texture, and bulk density at each depth, have been reported for each station in NASMD. In fact, the in situ observed soil texture data are available for only approximately 69% of the stations included in NASMD (Quiring et al., 2016). Soil texture information for the remaining stations is estimated based on the National Cooperative Soil Survey (NRCS) Soil Survey Geographic Database (SSURGO).

      Figure 1.  (a) Map of observation stations included in the North American Soil Moisture Database (NASMD) with the background map indicating topography; (b) locations of example stations in the Soil Climate Analysis Network (SCAN) used for the evaluation and generation of the automated quality control (QC) with the background map indicating the NLDAS-2 Noah soil porosity (m3 m–3). Data from a total of the 78 example stations in SCAN are used in the evaluation process.

      Network name (Abbreviation)Number of stationsStart yearEnd yearMeasurement depth (cm)
      Alberta Agriculture and Rural Development (AARD)372003Present5, 20, 50, 100
      AmeriFlux Network (AmeriFlux)571996PresentVaries (5–200)
      Atmospheric Radiation Measurement (ARM)171996Present5, 15, 25, 35, 60, 85, 125, 175
      Automated Weather Data Network (AWDN)432006Present10, 25, 50, 100
      Central Plains Experimental Range (CPER)12004Present5, 20, 50, 100
      Center for Hurricane Intensity and Landfall Investigation (CHILI)252006Present100
      Climate Reference Network (CRN)1132009Present5, 10, 20, 50, 100
      Cosmic-ray Soil Moisture Observing Station (COSMOS)542008PresentVaries (10–30)
      Delaware Environmental Observing System (DEOS)262005Present5
      Environment and Climate Observing Network (ECONET)361999Present20
      Illinois Climate Network (ICN)192004Present5, 10, 20, 50, 100, 150
      International Sorghum Grain Mold Nursery (ISGMN)62013Present30, 60, 90, 120, 150, 180
      Missouri Agricultural Weather Database (MAW-Missouri)82000Present5
      Michigan Automated Weather Network (MAWN)801999Present4, 10
      National Oceanic and Atmospheric Administration Hydrometeorological Testbed (NOAA HMT)252004PresentVaries (5–100)
      Oklahoma Mesonet (OM)1041998Present5, 25, 60, 75
      Snow Telemetry (SNOTEL)3521994Present5, 20, 50, 100
      Soil Climate Analysis Network (SCAN)1871994Present5, 10, 20, 50, 100
      Soilscape62011PresentVaries (5–50)
      South Dakota Automated Weather Network (SDAWN)112001Present5, 10, 20, 50, 100
      Water and Environmental Research Center (WERC)2419982012Varies (5–50)
      West Texas Mesonet (WTM)592002Present5, 20, 60, 75

      Table 1.  List of the 22 measurement networks including the number of stations, period of record, and depths at which the soil moisture is measured

      NLDAS-2 has generated land surface water/energy products from four LSMs (i.e., Mosaic, Noah, SAC, and VIC). The NLDAS-2 Noah soil temperature are simulated at the middle of four Noah soil layers (5 cm for the 0–10-cm layer, 25 cm for the 10–40-cm layer, 70 cm for the 40–100-cm layer, and 150 cm for the 100–200-cm layer) (Xia et al., 2015a). However, Mosaic and SAC do not have the soil temperature simulation, whereas VIC typically has three soil layers but their depths vary from grid to grid. The Noah simulated soil temperatures are chosen in this study because the number of layers in the Noah model is the largest among the four models, and the depths are uniform over the NLDAS domain (Xia et al., 2014). The NLDAS-2 Noah model is forced by meteorological data that integrate the North American Regional Reanalysis dataset with gauge observations. The NLDAS domain (25°–53°N, 67°–125°W) with a 0.125° grid covers the central United States, southern Canada, and northern Mexico. Simulations span from 1979 up to date with an hourly temporal resolution. The NLDAS-2 Noah soil temperature has been validated by using in situ observations, which reasonably agrees with the observed soil temperature for different soil layers and on different temporal scales (Xia et al., 2013). Furthermore, Xia et al. (2015b) found that the NLDAS-2 soil texture matches well with observed soil texture. Therefore, the soil temperature simulated from the NLDAS-2 Noah model will be used to flag spurious soil moisture observations recorded under frozen conditions, and the Noah soil porosity data supplemented with the vertical soil porosity data from selected/corresponding stations will be used to flag those exceeding the soil porosity.

      For the in situ soil moisture observations from 78 Soil Climate Analysis Network (SCAN) stations, extensive quality-controlled procedures including a visual inspection and automatic detection of spurious observations were conducted by Liu et al. (2011) (Fig. 1b). Problematic observations such as data beyond the physically reasonable range, data showing discontinuity without physical causes, data with inconsistency due to sensor-related changes, and data for frozen soil, were all excluded from the dataset. This manually quality-controlled dataset has previously been used as the ground truth to evaluate the accuracy of soil moisture simulations from LSMs (Cai et al., 2014; Kumar et al., 2014a, 2016; Xia et al., 2014; De Lannoy and Reichle, 2016), and is used in this study as the benchmark to assess performance of the automated QC methods.

    • To exclude spurious and erroneous soil moisture observations from the records, the most direct approach is to manually inspect the observed records one by one. However, it is not practical to apply the visual inspection to datasets containing large volumes of records. The more feasible solution is to develop a methodology to automatically identify and flag spurious observations. Xia et al. (2015c) developed an automatic methodology to flag spurious and erroneous soil moisture observations in 421 NASMD stations by using the NLDAS-2 Noah soil temperature and soil porosity. In this study we extend their methodology, with modifications, to all the available NASMD stations (1290 stations in total on the NASMD website). Rules of the QC algorithm are delineated as follows:

      (1) Ensuring that measurements are within a geophysical range (GR) between 0.0 and 0.6 m3 m–3 (GR check). If a soil moisture measurement is less than the minimum threshold 0.0 m3 m–3 or larger than the maximum threshold 0.6 m3 m–3, it is flagged as a spurious obser-vation.

      (2) Ensuring that measurements do not exceed the in situ soil porosity (SP). Based on the soil texture recorded for each depth at which the soil moisture is measured in NASMD, we estimate the soil porosity for each station based on the field capacity table (Table 2) used in the Noah LSM. If a soil moisture measurement is larger than the soil porosity at the corresponding measurement level and site, it is flagged as a spurious observation.

      Soil texturePorosity value (m3 m–3)
      Sand0.395
      Loamy sand0.421
      Sandy loam0.434
      Silt loam0.476
      Silt0.476
      Loam0.439
      Sandy clay loam0.404
      Silty clay loam0.464
      Clay loam0.465
      Sandy clay0.406
      Silty clay0.468
      Clay0.457
      Organic materials0.464
      Bedrock0.200

      Table 2.  The porosity table used in the Noah land surface model

      (3) Ensuring that measurements are not taken in frozen soils (ST0). Based on the NLDAS-2 Noah soil temperature simulated at the four soil layers (i.e., 5, 25, 70, and 150 cm), we generate the daily soil temperature series for each station according to its measurement site by using the nearest-neighbor method. However, depths at which the soil moisture is measured are network-dependent in NASMD (Table 1). Therefore, for each given station, we linearly interpolate the modeled soil temperature to depths of the soil moisture measurement. A soil moisture measurement is flagged as a spurious observation if the corresponding soil temperature drops below 0°C.

      The above methodology is similar to Xia et al. (2015c), but with several differences. First, Xia et al. (2015c) used 12 while we use 78 SCAN stations that pass the manual inspection to assess performance of the aforementioned QC approach including GR, SP, and ST0 checks (the first experiment, hereafter called Exp1; see Table 3). Then we repeat these QC procedures for all the stations in NASMD. Second, Xia et al. (2015c) assumed that the soil porosity was vertically uniform in the shallow (< 60 cm) soil layers for each station, while we estimate the soil porosity for each specific soil layer at a given station based on the corresponding soil texture recorded in NASMD. To diagnose the effect of soil porosity on quality-controlled data, we conduct a sensitivity analysis (the second experiment, referred to as Exp2 hereafter) in which the SP check is removed from Exp1. In other words, only GR and ST0 checks are implemented in Exp2. Furthermore, to test the effect of soil texture datasets (vertically uniform in NLDAS-2 versus inhomogeneous in NASMD) on the quality-controlled data, another sensitivity analysis (the third experiment, referred to as Exp3 hereafter) is added. The NLDAS-2 Noah soil porosity is interpolated to each station by using the nearest-neighbor method and then applied to all the soil depths at each station from the various networks in the SP check of Exp3.

      Exp nameQC nameQC criterion
      Exp1GR checkMeasurements are within a geophysical range between 0.0 and 0.6 m3 m–3.
      SP checkMeasurements do not exceed the in situ soil porosity based on the soil texture recorded for each depth in NASMD.
      ST0 checkMeasurements are not taken in frozen soils based on the NLDAS-2 Noah soil temperature.
      Exp2GR checkMeasurements are within a geophysical range between 0.0 and 0.6 m3 m–3.
      SP check/
      ST0 checkMeasurements are not taken in frozen soils based on the NLDAS-2 Noah soil temperature.
      Exp3GR checkMeasurements are within a geophysical range between 0.0 and 0.6 m3 m–3.
      SP checkMeasurements do not exceed the porosity based on the NLDAS-2 Noah soil porosity.
      ST0 checkMeasurements are not taken in frozen soils based on the NLDAS-2 Noah soil temperature.

      Table 3.  Design of the three sensitivity experiments

    • To evaluate how well the QC method performs, a direct comparison between the soil moisture observations before and after the QC and benchmark data at each individual site is conducted for the 78 SCAN stations. Evaluation metrics used in this study include the bias, mean absolute error (MAE), root-mean-square error (RMSE), anomaly correlation (AC), and Taylor skill score (S). The analysis period is from 2000 to 2012, and the observed soil moisture anomaly is the temporal anomaly after removing the mean seasonal cycle. Soil moisture observations before and after QC are compared to benchmarks based on the bias, MAE, RMSE, AC, and S criterion (Taylor, 2001):

      $$ {\rm{bias}} = \frac{1}{N}\sum\nolimits_{i = 1}^N {\left( {{\rm{OB}}{{\rm{S}}_i} - {\rm{B}}{{\rm{M}}_i}} \right)} , \!\quad\quad $$ (1)
      $$ {\rm{MAE}} = \frac{1}{N}\sum\nolimits_{i = 1}^N \left| {{\rm{OB}}{{\rm{S}}_i} - {\rm{B}}{{\rm{M}}_i}} \right|, \!\!\!\!\!\quad\quad $$ (2)
      $$ {\rm{RMSE}} = \sqrt {\sum \nolimits_{i = 1}^N \frac{{{{\left({{\rm{OB}}{{\rm{S}}_i} - {\rm B}{{\rm M}_i}} \right)}^2}}}{N}}, $$ (3)
      $$ {\rm{AC}} = \frac{{\frac{1}{N}\mathop \sum \nolimits_{i = 1}^N \left({{\rm{OB}}{{\rm{S}}_{{{\rm A}_i}}} - \overline {{\rm{OB}}{{\rm{S}}}}}_{\rm A} \right)\left({{\rm{B}}{{\rm{M}}_{{{\rm A}_i}}} - \overline {{\rm{B}}{{\rm{M}}}}}_{\rm A} \right)}}{{\sqrt {\frac{1}{N}\mathop \sum \nolimits_{i = 1}^N {{\left({{\rm{OB}}{{\rm{S}}_{{{\rm A}_i}}} - \overline {{\rm{OB}}{{\rm{S}}}}}_{\rm A} \right)}^2}} \sqrt {\frac{1}{N}\mathop \sum \nolimits_{i = 1}^N {{\left({{\rm{B}}{{\rm{M}}_{{{\rm A}_i}}} - \overline {{\rm{B}}{{\rm{M}}}}}_{\rm A} \right)}^2}} }}, $$ (4)
      $$ S = \frac{{4\left({1 + R} \right)}}{{{{\left({\sigma + \dfrac{1}{\sigma }} \right)}^2}\left({1 + {R_0}} \right)}}, \quad\quad\quad\quad\quad $$ (5)

      where N is the total number of days of the soil moisture observation, OBSi ($ {{\rm OBS}_{\rm A}}_{_i}$) is the observed soil moisture (anomaly), $\overline {{\rm{OB}}{{\rm{S}}}}_{\rm A}$ is the temporal average of observed soil moisture anomaly, BMi ($ {{\rm BM}_{\rm A}}_{_i}$) is the benchmark (anomaly), $\overline {{\rm{B}}{{\rm{M}}}} _{\rm A}$ is the temporal average of benchmark anomaly, R is the correlation coefficient between the observed soil moisture and benchmark, R0 is the maximum correlation coefficient in theory (assumed to be 1), and σ is the standard deviation of observed soil moisture normalized by the standard deviation of benchmarks.

      Overall, the bias, MAE, and RMSE are evaluated based on absolute values, whereas AC and S are based on the anomalies. Bias is used to evaluate the systematic error in quality-controlled data whereas MAE and RMSE are used to assess its overall error. AC is used to evaluate the capacity of quality-controlled data to capture the daily variability of benchmark data whereas S is to evaluate its capacity to capture the seasonal variance of benchmark data. If the variability and variance of quality-controlled data approach those of benchmarks, AC and S approach 1 (a perfect score). Otherwise, AC and S decrease toward zero, and AC can even be below zero (no skill). In addition, the significance of differences of these evaluation metrics between soil moisture observations after and before QC are tested by using a two-sample t test at the 95% confidence level.

    3.   Evaluation of various QC processes based on the benchmarks for 78 SCAN stations
    • Table 4 summarizes the overall percentages of flag occurrences at the five measured soil layers for the 78 SCAN stations. Although the geophysical range of soil moisture (between 0.0 and 0.6 m3 m–3) has already been used as a rule in the QC procedure in the NASMD development (Quiring et al., 2016), we still detect some observations that fall outside the range. Percentages of the observations flagged as beyond the geophysical range are 1.61%, 0.32%, 0.07%, 0.32%, and 0.29% at the 5-, 10-, 20-, 50-, and 100-cm soil layer, respectively. To investigate this in more details, we analyze the NASMD observations (before the QC procedure of this study is applied, referred to as raw observations hereafter) for the 78 SCAN stations, and find that the flagged measurements mainly come from stations ND_2020 and MS_2110, which are located in North Dakota and Mississippi, respectively. Approximately 80% of the soil moisture measurements at the 5-cm soil layer at station ND_2020 fall outside the geophysical range, whereas the percentage varies from 9.57% to 30.27% at the five measured soil layers at station MS_2110. The result suggests that the GR check for NASMD is still necessary.

      Exp nameQC criterionPercentage of flag occurrences (%)
      5 cm10 cm20 cm50 cm100 cm
      Exp1*GR check 1.61 0.320.070.32 0.29
      SP check 3.17 2.693.686.3312.55
      ST0 check11.1710.649.588.73 8.00
      Exp2*GR check 1.61 0.320.070.32 0.29
      SP check/////
      ST0 check11.3110.719.678.99 8.10
      Exp3*GR check 1.61 0.320.070.32 0.29
      SP check 2.85 2.463.786.4712.24
      ST0 check11.1710.649.598.76 8.00
      Notes: *Exp1 uses the geophysical range, NLDAS-2 Noah soil temperature, and NASMD soil texture; Exp2 only uses the geophysical range and NLDAS-2 Noah soil temperature; Exp3 uses the geophysical range, NLDAS-2 Noah soil temperature, and NLDAS-2 Noah soil porosity.

      Table 4.  Percentages of flag occurrences at the five measured soil layers for the 78 SCAN stations from the three experiments

      For the SP check in Exp1, percentages of the observations flagged as exceeding the in situ soil porosity are 3.17%, 2.69%, 3.68%, 6.33%, and 12.55% at the 5-, 10-, 20-, 50-, and 100-cm soil layer, respectively. However, percentages of the flag occurrences at the first four soil layers are larger than those reported by Xia et al. (2015c) (0.20%, 0.40%, 0.20%, 0.80%, and 9.06% at the 5-, 10-, 20-, 50-, and 100-cm soil layer, respectively). One of the reasons for this difference is that stations with a significant portion of the flagged records in this study are not included in Xia et al. (2015c). Most of the flagged observation records come from the SCAN stations located in Mississippi and Missouri, whereas no station from these two states are included in Xia et al. (2015c). Percentages of the flag occurrences are even up to 50% in some SCAN stations, e.g., stations MS_2032, MS_2110, and MS_2064.

      For the ST0 check in Exp1, percentages of the observations flagged for the 78 SCAN stations are 11.17%, 10.64%, 9.58%, 8.73%, and 8.00% at the 5-, 10-, 20-, 50-, and 100-cm soil layer, respectively. Percentages of the flag occurrences monotonically decrease as the soil depth increases, which indicates that the impacts of frozen soils on soil moisture observations are reduced at the deep soil layer because soil at the greater depth freezes less frequently. In addition, the flagged measurements due to the ST0 check mainly come from the SCAN stations in western United States, e.g., Montana, North Dakota, Utah, and Kansas. Because of the high altitude in the western mountainous area, soil freezing occurs more frequently and hence influencing more observations. Overall, the flagging process in Exp1 removes about 15.81% of measurements at the 78 SCAN stations.

    • Daily AC at the five measured soil layers for the 78 SCAN stations is calculated by using soil moisture observations before and after QC against benchmarks for the period 2000–12. Stations located in Colorado (i.e., CO_2017), Mississippi (i.e., MS_2024 and MS_2025), and Alabama (i.e., AL_2056, AL_2057, and AL_2059) show considerably low AC values (less than 0.5) for the 5-cm soil moisture. By contrast, most of the other stations show good performance with AC values up to 0.9 at the five measured soil layers before the QC procedure, and further improvement is derived from QC in Exp1. Figure 2 shows the spatial distribution of the AC difference between quality-controlled soil moisture observations and raw observations at the five measurement depths for the 78 SCAN stations. Some decrease of AC values, albeit small, is detected at several stations. Instead of reflecting potential weakness of the automatic QC implemented in this study, it may well indicate that even the manual QC may not be able to capture all spurious data, as the rules implemented here are rather fundamental to the physical realism of soil moisture. At most stations, AC values tend to increase for the quality-controlled data, especially for the stations located in Montana, Utah, Missouri, and Mississippi, where the AC difference is significant at the 95% confidence level at the 10-, 20-, and 50-cm soil depths.

      Figure 2.  Spatial distributions of the anomaly correlation (AC) quality-controlled soil moisture observations from Exp1 and raw observations at (a–e) 5-, 10-, 20-, 50-, and 100-cm soil layers for the 78 SCAN stations. Solid circles represent the statistically significant difference at the 95% confidence level.

      The metric S also indicates an improvement in soil moisture observations after QC (Fig. 3). Numbers of stations with positive and negative differences are almost equal at the first four soil layers. However, negative differences of S values are relatively small and not significant. In contrast, positive differences of S values are larger, especially for the stations located in Missouri and Mississippi where the S difference is significant at the 95% confidence level at the first four soil layers.

      Figure 3.  As in Fig. 2, but for the Taylor skill score (S).

      To evaluate the observation error of soil moisture data before and after the QC, bias, MAE, and RMSE at the five measured soil layers are calculated for the 78 SCAN stations. For raw observations, the bias values show a considerable overestimation at the 5-cm soil layer at stations in North Dakota (because of falling outside the geophysical range), and at the 10-, 20-, 50-, and 100-cm depths in Mississippi. In contrast, the 5- and 10-cm soil moistures are underestimated at the stations in Alabama and Colorado. Bias of the quality-controlled observations generally becomes closer to zero after QC. Figure 4 shows the spatial distribution of the bias difference between quality-controlled soil moisture observations and raw observations at the five measured soil layers for the 78 SCAN stations. As a whole, the bias values consistently decrease at the five soil layers, especially for the stations located in Mississippi where the bias difference is significant at the 95% confidence level. MAE and RMSE values for quality-controlled soil moisture observations are also smaller than those for the NMSMD dataset (figure omitted). Overall, the QC conducted in this study is effective in improving soil moisture observations for the 78 SCAN stations according to the metrics of AC, S, bias, MAE, and RMSE.

      Figure 4.  As in Fig. 2, but for the bias.

    • Figure 5 shows boxplots of the metrics of AC, S, bias, MAE, and RMSE at the five soil layers for all observational records of the 78 SCAN stations. Mean values of all metrics of soil moisture observations at all the five layers after the QC procedure are better than those for NASMD. AC (S) values increase by 0.017 (0.025), 0.005 (0.013), 0.008 (0.013), 0.005 (0.029), and 0.001 (0.019) at the 5-, 10-, 20-, 50-, and 100-cm soil layer, respectively, after the QC procedure of Exp1 is conducted. Mean values of bias (MAE, RMSE) decrease by 0.027 (0.026, 0.028), 0.003 (0.003, 0.003), 0.003 (0.003, 0.003), 0.006 (0.004, 0.005), and 0.009 (0.007, 0.007) m3 m–3 at the 5-, 10-, 20-, 50-, and 100-cm soil layer, respectively, when compared with NASMD. Additionally, ranges and numbers of outliers of the five metrics decrease at the five soil layers after the QC procedure. Overall, mean values of AC and S are closer to 1, whereas bias, MAE, and RMSE are closer to 0 after the QC procedure for the 78 SCAN stations.

      Figure 5.  Boxplots of the metrics of (a) anomaly correlation (AC), (b) Taylor skill score (S), (c) bias, (d) mean absolute error (MAE), and (e) root-mean-square error (RMSE) at the five soil measurement layers for the 78 SCAN stations. The filled circles represent mean values for the 78 SCAN stations. Black represents the calculation based on raw observations, red based on quality-controlled data from Exp1, blue based on quality-controlled data from Exp2, and green based on quality-controlled data from Exp3.

      To evaluate the seasonal cycle and interannual variability of the quality-controlled soil moisture observations at different depths, quality-controlled data are averaged over all the 78 SCAN stations. The monthly mean soil moisture and its climatology from the benchmarks, raw observations, and quality-controlled observations for the 5-, 10-, and 20-cm soil layers are shown in Fig. 6. Comparisons for the 50- and 100-cm soil layers are excluded because few SCAN stations have the benchmark data at either of these two depths. The results show that raw observations cannot capture the peak of benchmarks at different soil depths, especially at the 5-cm soil depth. However, quality-controlled data closely follow the monthly mean benchmarks for the top three soil depths. For the climatological seasonal cycle, quality-controlled data are much closer to benchmarks, especially in winter and early spring as the QC procedure has flagged unrealistic soil moisture observations caused by soil freezing when compared with raw observations.

      Figure 6.  (a, c, e) The multi-year (2000–12) monthly variations and (b, d, f) monthly means of the soil moisture (m3 m–3) at the 5- (a, b), 10- (c, d), and 20-cm (e, f) soil depths.

      Figure 7 compares the benchmarks, raw observations, and quality-controlled observations at stations MS_2110, MS_2032, and MO_2061 at the 5-, 10-, and 20-cm soil layers. Most spurious and erroneous observations are flagged by GR and SP checks at stations MS_2110 and MS_2032. Taking the 5-cm soil layer at station MS_2110 as an example, values of bias, MAE, and RMSE decrease from 0.073, 0.084, and 0.109 to –0.001, 0.023, and 0.026, respectively, whereas values of AC and S increase from 0.865 and 0.617 to 0.946 and 0.959, respectively. As a result, the monthly averaged soil moisture overall becomes closer to benchmarks after the QC procedure for these stations. For station MO_2061, most of spurious observations are flagged by SP and ST0 checks. The monthly averaged quality-controlled observations are much closer to benchmarks, especially in winter and early spring. As a whole, the overall results further suggest that the QC approach is effective in detecting spurious and erroneous observations in NASMD.

      Figure 7.  The 13-yr (2000–12) monthly averaged soil moisture for the stations (a–c) MS_2110, (d–f) MS_2032, and (g–i) MO_2061 at the 5- (a, d, g), 10- (b, e, h), and 20-cm (c, f, i) soil depths.

    • With the SP check removed in Exp2, percentages of the observations flagged due to the effect of frozen soils in Exp2 are slightly larger than those in Exp1 (Table 4). Mean values of AC and S for the 78 SCAN stations in Exp2 are consistently smaller than those in Exp1, whereas mean values of bias, MAE, and RMSE are consistently larger (Fig. 5). For stations where the soil moisture measurement is larger than soil porosity, e.g., MS_2110 and MS_2032 (Fig. 7), many spurious observations cannot be detected by either GR or ST0 check. As a result, the monthly averaged quality-controlled data are consistently larger than benchmarks at these stations. The quality-controlled data without the SP check is not so good as the data that go through the SP check.

      With the soil porosity estimated based on the in situ soil texture provided by NASMD replaced with the NLDAS-2 Noah soil porosity in the SP check, Exp3 still shows very similar results to Exp1. Percentages of the observations flagged as exceeding the soil porosity at the five soil measurement layers in Exp1 are slightly larger than those in Exp3 (Table 4). Interestingly, we notice that the percentage of flag occurrences at the 100-cm soil layer is still quite large in this test. The result suggests that for deep soil moisture measurements, there are still a large number of observations exceeding the soil porosity. The performance of Exp3 is as good as Exp1, even at deep soil layers. In addition, there are no obvious differences in the monthly mean soil moisture and climatologically averaged cycle between Exp1 and Exp3 at different soil depths for the 78 SCAN stations (Figs. 6, 7). Similar results from different soil texture datasets suggest that applying the Noah soil porosity throughout the whole soil layers for all stations in Exp3, while not accurate, may be less problematic than deemed. However, the soil texture does vary vertically throughout the soil column (0–200 cm). As a more detailed reflection, the mean absolute difference between the NLDAS-2 Noah soil porosity and estimated soil porosity averaged for the 78 SCAN stations is 0.018 m3 m–3, whereas the maximum difference (0.056 m3 m–3) is found at station UT_2132. Therefore, for better precision, in the QC procedure we use the soil porosity estimated from the in situ soil texture at each measured depth in NASMD for the SP check, as opposed to applying the 10-cm soil porosity from the Noah model into all layers throughout the whole soil column as done in Xia et al. (2015c).

    • The metrics presented above evaluate the effectiveness of QC methodology against the benchmark data. Obviously, how accurate they are in reflecting the QC effectiveness depends on the accuracy of benchmark data. As indicated earlier, even the benchmark data may not be free of errors. To check this, the QC methodology was applied to the benchmark data as well. A small fraction of the benchmark data are indeed flagged, with a vast majority of the flagged data found in winter, which is related to uncertainties in the temperature data used; in summer, very few data points are flagged, and 62 out of the 78 benchmark stations are not flagged at all. To examine how potential errors in the benchmark data may influence the perceived effectiveness of QC methodology and to eliminate uncertainties related to the quality of soil temperature data, the performance metrics are re-evaluated based on data from the 62 error-free stations during summer only. While quantitative differences are expected, qualitatively the results are the same as those based on all the benchmark data, and applying QC to NASMD data improves performance metrics. For example, for the 5-cm soil layer at station MS_2032, values of bias, MAE, and RMSE decrease from 0.055, 0.065, and 0.079 to 0.011, 0.029, and 0.034, respectively, whereas values of AC and S increase from 0.870 and 0.404 to 0.923 and 0.702, respectively. In comparison, for the same station and soil layer, when performance metrics are evaluated based on all the benchmark data (including the flagged data in winter), values of bias, MAE, and RMSE decrease from 0.076, 0.080, and 0.090 to 0.022, 0.033, and 0.038, respectively, whereas values of AC and S increase from 0.831 and 0.313 to 0.886 and 0.630 respectively. This indicates that the impact of potential errors in the benchmark data on the perceived effectiveness of QC methodology is rather small.

    4.   Application of QC to NASMD
    • There are 1290 observation stations from 22 networks which are currently available for downloads from NASMD. However, it is found that 81 of these stations are located outside the NLDAS domain ranging from 25° to 53°N and 125° to 67°W, and 86 of these stations do not have soil moisture measurement records. Therefore, the automated QC approach is applied to the remaining 1123 NASMD stations. However, for COSMOS, since cosmic-ray techniques can measure the total soil moisture content in frozen soil, the ST0 check of QC procedure is skipped for this network. The spatial distribution of the percentage of flagging by GR, SP, and ST0 checks based on the 1123 NASMD stations for different soil depths is shown in Fig. 8. Stations flagged by the GR check are relatively fewer in number and are distributed randomly in space. The reason for spurious measurements falling outside the geophysical range may be caused by erroneous readings or sensor failures. However, the GR check can effectively remove these spurious values. In general, stations flagged by the SP check are mainly located in the eastern United States, especially for the MAW-Missouri, ICN, and ECONET, whose soil types are silt loam or sandy loam and percentages of flagging by the SP check are 16.31%, 11.27% and 10.30%, respectively. By contrast, stations flagged by the ST0 check are mainly located in western and northern United States. Similar to the evaluation results for the 78 SCAN stations, data for the 1123 NASMD stations are more reliable in states with soils frozen less frequently than those in cold regions. As a result, higher percentages of flagged values happen at high altitudes and latitudes.

      Figure 8.  The spatial distribution of percentages of flag occurrences by the (a) GR check, (b) SP check, and (c) ST0 check based on all the checked stations in NASMD for different soil depths.

      Table 5 presents the percentages of flag occurrences for different networks in NASMD. The number of available soil moisture measurements varies from 12, 826 in the CHILI network to 2,529,487 in the SNOTEL network, depending on the years of record, number of stations available in each network, and number of the measured soil depths. Compared with SP and ST0 checks, the percentage of flag occurrences in the GR check is the lowest, varying from 0.00% in AARD, ARM, DEOS, and OM to 5.69% in SDAWN. The percentage of flag occurrences in the SP check varies from 0.00% in AARD to 16.31% in MAW-Missouri, whereas the percentage in the ST0 check varies from 0.00% in CHILI to 46.72% in SNOTEL. Less than 1% of the observations in AARD, AWDN, and SNOTEL are flagged by GR and SP checks together. In contrast, 39.68%, 24.54%, and 46.68% of the observations in these three networks are flagged by the ST0 check, respectively. For networks located in the warmer climate like CHILI, ECONET, and WTM, the ST0 check flags less than 2% of the observations. Generally, including the GR, SP, and ST0 checks, percentages of the flagged observations in ARM, CHILI, OM, and WTM, which are minimally affected by frozen soils, are the smallest (< 5%). We compare the percentages of observations flagged in NASMD with results from Dorigo et al. (2013) who applied similar QC procedures to several soil moisture measurement networks including the ARM, ICN, SNOTEL, and SCAN over North America in ISMN. Percentages of the flag occurrences in GR and SP checks are comparable in these networks, whereas percentages of flag occurrences in the ST0 check of our study are larger than those in Dorigo et al. (2013). The reason for this difference may be caused by the different soil temperature datasets used [Gobal Land Data Assimilation System (GLDAS) Noah soil temperature used in Dorigo et al. (2013) versus NLDAS-2 Noah soil temperature used in this study] and different spatial resolutions (0.25° in GLDAS versus 0.25° in NLDAS-2). Overall, approximately 24.47% of the soil moisture observations in NASMD are removed by the QC procedure.

      Network nameNo. of
      stations
      No. of
      checked stations
      No. of total
      measurement records
      GR check (%)SP check (%)ST0 check (%)
      AARD3718185,5520.000.0039.68
      AmeriFlux5748268,6500.872.9113.73
      ARM1717533,6360.000.074.61
      AWDN4341298,9270.010.0224.60
      CPER10////
      CHILI252512,8263.270.550.00
      CRN113113298,7841.036.7510.19
      COSMOS545418,4262.324.48/
      DEOS262629,7610.001.2610.64
      ECONET363198,2602.6210.301.46
      ICN1919345,1270.3711.2712.12
      ISGMN60////
      MAW-Missouri8818,4625.4216.3113.17
      MAWN8080410,1925.026.9421.32
      NOAA HMT252583,0303.126.4811.67
      OM1041041,322,7360.000.102.80
      SNOTEL3523352,529,4870.360.4746.72
      SCAN1871151,329,1150.465.279.85
      Soilscape60////
      SDAWN111179,5115.696.0529.47
      WERC240////
      WTM5953460,7280.422.921.17

      Table 5.  Percentage of flag occurrences based on 22 networks of NASMD that provide measurements on a daily basis

      To further evaluate the effect of QC procedure, we select six networks with 50 or more checked stations to compare the averaged soil moisture observation before and after the QC procedure. Figure 9 presents the comparison of quality-controlled observations (with QC procedure) and averaged raw observations (without QC procedure) at the top soil layer for CRN, MAWN, OM, SNOTEL, SCAN, and WTM. Compared with the raw observations, soil moisture contents in CRN, MAWN, and SCAN generally decrease after the QC check. However, effects of the QC procedure on raw observations are relatively small in OM and WTM. In contrast, most of the flags over SNOTEL, which is located in mountains, at the 5-cm soil depth, occur as expected due to the ST0 check, and soil moisture measurements during winter and early spring are removed after the QC procedure.

      Figure 9.  The comparison of averaged raw observations (black line) and quality-controlled observations (red line) at the top soil layer for the (a) CRN, (b) MAWN, (c) OM, (d) SNOTEL, (e) SCAN, and (f) WTM, spatially averaged from 53 checked stations in WTM to 335 checked stations in SNOTEL.

      In addition, the comparison of quality-controlled observations and averaged raw observations at the shallow soil layers (< 50 cm) for CRN, MAWN, SNOTEL, and SCAN are shown in Fig. 10. Effects of the QC procedure on raw observations at deep soil layers are relatively small for the stations in CRN and SCAN, particularly during the warm season. However, effects of the QC procedure on raw observations are substantial for the 10-cm soil water content in MAWN and 50-cm soil water content in SNOTEL. To investigate why averaged raw observations are so wet at the 50-cm soil layer in SNOTEL (Fig. 10f), the original data in NASMD is analyzed, and it is found that there are some extremely large values existing at several stations for a certain period in SNOTEL. The corresponding values of quality-controlled measurements decline as these erroneous measurements are removed by the GR check. Overall, the GR and SP checks flag approximately 11% of the raw observations in MAWN, making the quality-controlled observations smaller than raw observations due to effects of drier soils (Figs. 9b, 10d). Generally, the comparison results suggest that the QC procedure can effectively detect and flag the spurious and erroneous soil moisture measurements in NASMD.

      Figure 10.  The comparison of averaged raw observations (black line) and quality-controlled observations (red line) at the shallow soil layers (< 50 cm) for (a–c) CRN, (d) MAWN, (e–f) SNOTEL, and (g–i) SCAN.

    • We have generated a quality-controlled daily in situ soil moisture database based on NASMD. The quality-controlled database includes 1123 observation stations from 18 networks that are located in the NLDAS domain and have soil moisture measurement records in the original NASMD. Soil moisture measurements are flagged as “–9999” if 1) the value falls outside the geophysical range of 0 to 0.6 m3 m–3, 2) the value exceeds the corresponding soil porosity estimated based on the in situ soil texture, and 3) the corresponding soil temperature drops below zero. The quality-controlled database is available for free download at http://www.geosimulation.cn/QualityControlNASMD.html.

    5.   Summary and discussion
    • In this study, we revise the automated QC methodology proposed by Xia et al. (2015c), and extend the revised methodology to all the 1123 NASMD stations to develop a more vigorously quality-controlled in situ soil moisture dataset for North America. The revised methodology includes a geophysical range check (GR check), soil porosity check (SP check), and soil temperature check (ST0 check). Soil moisture observations at the 78 example stations in SCAN, which have been used to assess the accuracy of soil moisture simulation from LSMs in many previous studies, are used here as the benchmark to evaluate performance of the QC procedure in this study. The evaluation results show that errors of the quality-controlled data are generally reduced, and anomaly correlation (AC) and Taylor skill score (S) generally increase after the QC procedure. This improvement suggests that the GR, SP, and ST0 checks can effectively detect and flag most spurious and erroneous soil moisture observations for all the 78 stations.

      The revised QC approach is further applied to 1123 NASMD stations covering 22 networks. The QC results indicate that the reliability of raw soil moisture observations is the lowest in networks of SNOTEL (47.55% flagged), SDAWN (41.21% flagged), AARD (39.68% flagged), and MAW-Missouri (34.90% flagged), whereas the highest in networks of OM (2.90% flagged), CHILI (3.82% flagged), WTM (4.51% flagged), and ARM (4.68% flagged). As expected, the networks located in high altitudes and latitudes have the highest percentage of flagged observations. Overall, QC checks effectively ensure a geophysically realistic range of soil moisture dynamics (GR and SP checks), and remove the observational records negatively affected by soil freezing (ST0 check).

      Different from Xia et al. (2015c), for the SP check in this study, the soil porosity estimated based on the in situ soil texture throughout the whole soil layers for all available NASMD stations, is applied. The sensitivity test removing the SP check from the QC procedure shows that the SP check can effectively flag the soil moisture measurement that exceeds the soil porosity, and the quality-controlled soil moisture values decrease considerably when compared with the raw observations. In addition, another sensitivity test replacing the soil porosity estimated based on NASMD measured soil texture with the NLDAS-2 Noah soil porosity in the SP check suggests that the impact of different soil texture datasets on the quality-controlled data is limited, and the 10-cm NLDAS-2 soil porosity can well detect and flag the spurious observations in the top four soil layers. However, the soil texture more or less vertically varies with the depth throughout the soil column (0–200 cm). Using the soil porosity estimated from the in situ soil texture at each measured depth in NASMD for the SP check is more realistic and desirable.

      It is important to note that the manually checked data from the 78 SCAN stations used as benchmarks probably still contain errors, yet which are the best data available for the purpose of this study. Field samples processed by using the thermo-gravimetric method would provide the best benchmark data, but unfortunately such data are not available for the NASMD stations. On the other hand, we would like to point out that this study focuses on flagging the data points that violate thephysical realism of soil moisture, and does not involve re-calibrating the NASMD data based on the benchmark data. Therefore, while potential errors in the benchmark data certainly influence the perceived effectiveness of QC methodology, they do not influence the QC results. For the accuracy of QC results, the quality of soil temperature and soil porosity data are all that matter. Potential errors in the soil temperature and soil porosity data may cause the soil moisture data to be over-flagged or under-flagged. To this end, the National Soil Moisture Network is collaborating with its partners to collect the measured soil temperature for many NASMD sites where corresponding data are available. When these measured soil temperature becomes available in the future, a re-processing procedure using the measured soil temperature will be conducted to further enhance the soil moisture quality. Improving the quality of the soil moisture data is an ongoing process that requires continuous efforts from the community. At the current stage, we believe that the QC product documented here is suitable for use in evaluating the soil moisture from remote sensing, numerical models, and reanalysis data.

      Soil moisture is an important component of the coupled land–atmosphere system. It can influence subsequent meteorological conditions (Koster et al., 2016), and hence the more realistic soil moisture initialization can help improve seasonal and subseasonal climate predictions (Kumar et al., 2014b) and operational flood forecast (Wanders et al., 2014). Reliable soil moisture data is critical for agricultural drought monitoring (Anderson et al., 2012) and irrigation demand estimation (El Sharif et al., 2015), and therefore benefiting the water resources planning and management. Other applications using soil moisture data include the landslides and erosion prediction, epidemic risk monitoring, and rainfall estimation (Brocca et al., 2017; Wu et al., 2018). As an important data source for these application systems, quality soil moisture observations are essential and can influence results from application studies. Accurate soil moisture observations over large areas during long periods are very important to both practical applications and fundamental research. To this end, this study makes contributions through quality controlling an important observational soil moisture database for North America.

      Recently, remote sensing based soil moisture data have gained increasing recognition, and have been incorporated into many application systems owing to the more complete coverage in both space and time. However, the uncertain data quality and limited soil depth of remote sensing data are impeding their broader application. In contrast, in situ observations are usually more accurate and can reach greater soil depth, but the limited spatial coverage and relatively coarse temporal resolution are limiting factors. Combining the in situ and remotely sensed data can lead to better soil moisture products that can address challenges of both types of data. Such efforts should take advantage of quality controlled products such as what is produced in this study.

      Acknowledgments. We thank the scientists who maintain the soil moisture networks used in this study. We appreciate Steven M. Quiring and his group for collecting in situ soil moisture observations to form NASMD. Without their efforts and support, it would be impossible to accomplish this work. We are also grateful to the four anonymous reviewers for their constructive comments that have improved this article significantly.

Reference (43)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return