Development of an Integrated Global Land Surface Dataset from 1901 to 2018

1901–2018年全球地面集成数据资料研制

+ Author Affiliations + Find other works by these authors
  • Corresponding author: Su YANG, yangsu@cma.gov.cn
  • Funds:

    Supported by the National Natural Science Foundation of China (41805128), National Key Research and Development Program of China (2017YFC1501801), National Natural Science Foundation of China (42093190043), and National Innovation Project for Meteorological Science and Technology (CMAGGTD003-5)

  • doi: 10.1007/s13351-021-1058-2

PDF

  • We developed an integrated global land surface dataset (IGLD) at the National Meteorological Information Center of China Meteorological Administration. The IGLD consists of hourly data for 75 variables from five data sources. It contains not only the most widely used variables (e.g., pressure, temperature, dew-point temperature, and precipitation), but also visibility, cloud cover, snow depth, and so on. A hierarchy of data sources was created to identify duplicate records. The records located higher in the hierarchy were adopted preferentially in the IGLD. A comprehen-sive quality control procedure including extreme value test, internal consistency check, and spatiotemporal consistency check, was applied to the IGLD. The IGLD consists of land surface observations at more than 20,000 global sites from 1901 to 2018, of which about 17,000 stations are currently active. The number of global observatories generally increased over time, except for the 1960s to 1970s. It increased from about 2300 in 1951 to 17,000 in 2018. The observations over America, Europe, and eastern Asia always showed a high temporal integrity and dense spatial coverage, whereas measurements were sparser in South America, Africa, Russia, and the Mediterranean regions. In general, the standard and intermediate standard times for observation suggested by the World Meteorological Organization (WMO) were followed globally, except in Australia, where there were few data measured on the WMO schedule. The IGLD has been used in the China’s first generation global atmospheric reanalysis product (CRA) and the global daily precipitation dataset.
    中国气象局国家气象信息中心研制完成一套全球百年(1901–2018年)地面小时整合数据集(IGLD, Integrated Global Land Surface Dataset)。IGLD整合了中国地面观测小时数据、全球地面天气报、美国ISD (Integrated Surface Database)、CFSR (Climate Forecast System Reanalysis)、GDAS (Global Data Assimilation System) 5个数据源中气压、温度、风、降水、能见度、云量、积雪厚度等共计75个气象要素观测数据。整合过程中依据站名、报文类型从海量数据中识别出重复记录,依据数据源信度、完整性,实现多源数据分级整合,提升数据完整性,同时消除冗余信息干扰。IGLD收录了1901–2018年全球20,000余个地面观测站点,其中约17,000个活跃站点。在过去100年,具备小时观测能力并发布数据的站点数量在持续增加,1951年至2018年增长尤为迅速,从约2300站增加到约17,000站。美国、欧洲和东亚地区的小时观测数据的空间覆盖度一直都较高,时间完整性较好,而在南美、非洲、俄罗斯和地中海区域的观测空间分布一直较稀疏,时间完整性欠佳。通过分析IGLD数据的时次分布,发现全球大部分国家遵循世界气象组织(WMO)建议的观测时次。澳大利亚较为特殊,大量小时观测数据出现在非WMO建议时次。目前IGLD已经应用于中国第一代全球大气再分析产品(CRA)和全球日降水数据集。
  • 加载中
  • Fig. 1.  Temporal (1901–2018) evolution of (a) annual number of stations with measurements of more than one variable in the IGLD in regions outside China (orange shading) and in China (green shading), and (b) monthly volume of data for different variables in the IGLD. The measurements of pressure and prep_6h in the IGLD started from the early 1950s.

    Fig. 2.  Spatial distributions of P_OBS (%) above 10% for pressure (top row), temperature (second row), dew-point temperature (third row), wind direction (fourth row), and wind speed (bottom row) for each 1° grid over the whole globe. (I) Left column: 1931–1960; (II) middle column: 1961–1990; and (III) right column: 1991–2018. The ISD is the unique data source for the IGLD before 1951 when there had been measurements of temperature and wind, but no pressure.

    Fig. 3.  Spatial distributions of P_OBS (%) above 10% for prep_6h (top row), prep_12h (middle row), and prep_24h (bottom row) for each 1° grid over the whole globe. (I) Left column: 1931–1960; (II) middle column: 1961–1990; and (III) right column: 1991–2018. Note that the ISD is the unique data source for the IGLD before 1951, when there were only a few measurements of prep_6h, prep_12h, and prep_24h, and there was no site with continuous prep_6h measurements (P_OBS 10%). In addition, there were nearly no prep_12h data in North America due to unstable and discrete measurements in this region (P_OBS < 10%) during all the three time periods.

    Fig. 4.  Spatial distributions of hourly P_OBS (%) above 10% for temperature in each 1° grid over the whole globe for a 24-h period (from 0000 to 2300 UTC) during 2008–2018.

    Table 1.  Basic provenance information for IGLD data sources

    Priority
    score
    Data sourceDescriptionSpatial coverageReport typeTime periodQuality controlProvider/
    Reference
    8ISD
    https://www.ncdc.noaa.gov/isd
    Integrated surface databaseGlobalFM12/FM151901–2018Systematic quality controlNCEI/
    Smith et al. (2011)
    6NCEP (CFSR)
    https://rda.ucar.edu/datasets/ds099.0/
    Data used in CFSRGlobalFM12/FM151979–2014Pressure gross checkNCEP/
    Saha et al. (2010)
    6NCEP (GDAS)
    https://nomads.ncep.noaa.gov/pub/
    data/nccf/com/gfs/prod/
    Data used in GDASGlobalFM12/FM152015–2018Pressure gross checkNCEP
    5NMIC (GTS)
    http://data.cma.cn/data/detail/data
    Code/A.0013.0001.html
    GTS reports received by NMICGlobalFM121980–2018Simple quality controlNMIC
    #NMIC (China)
    http://data.cma.cn/data/cdcdetail/dataCode/A.0012.0001.html
    Data archived by NMICChinaFM121951–2018Systematic quality control and expert diagnosisNMIC
    Note: # means that NMIC (China) has top priority (no priority score) as it is used as the only data source for China. The priority scores for other datasets are applied solely to regions beyond China.
    Download: Download as CSV

    Table 2.  Components of the priority score (PS) of a data source

    ComponentDefinitionNote
    Stability score (P1)2: stab ≥ 90%;
    1: stab < 90%
    ${\rm{stab} } = \dfrac{ {\displaystyle\sum\limits_{i = 1}^{ {\rm{ny} } } { { {(\frac{ {\displaystyle\sum\limits_{j = 1}^{ {\rm{ns} } } { { {\rm{integ} }_j} } } }{ { {\rm{ns} } } })}_i} } } }{ { {\rm{ny} } } };\,\,\,{\rm{integ} } = \dfrac{ { { {\rm{DD} }_{ {\rm{obs} } } } } }{ { { {\rm{DD} }_{ {\rm{all} } } } } }$
    Observation site score (P2)2: CS ≥ 7000 sites;
    1: CS < 7000 sites
    CS: annual average number of global synoptic stations excluding China from 1980 to 2018
    Quality score (P3)2: more than two steps of quality control
    (extreme value check, internal consistency
    check, and temporal consistency check);
    1: primary quality control (extreme check)
    Usage score (P4)2: used in both climatology and prior reanalysis;
    1: used in climatology or prior reanalysis
    Note: stab is the overall integrity of the data source from 2001 to 2018; integ represents the integrity of a single station from each data source; subscript j represents the jth station; ns is the number of stations common to the four data sources, apart from China; subscript i represents the ith year; ny is the total number of years during 2001–2018; DDobs is the number of days with temperature measurements of the common station, apart from China, in the four data sources with a consistent start and end time in each year; and DDall is the number of days in one year.
    Download: Download as CSV

    Table 3.  Observational times recommended by the WMO in seven regions

    Region I (Africa)Region II (Asia)Region III
    (South America)
    Region IV (North and central America, the Caribbean)Region V
    (Southwest Pacific)
    Region VI (Europe)Region VII (Antarctica)
    Standard times
    Intermediate standard times
    Standard times include 0000, 0600, 1200, and 1800 UTC.
    Intermediate standard times include 0300, 0900, 1500, and 2100 UTC.
    Download: Download as CSV
  • [1]

    Brown, P. J., and A. T. DeGaetano, 2009: A method to detect inhomogeneities in historical dewpoint temperature series. J. Appl. Meteor. Climatol., 48, 2362–2376. doi: 10.1175/2009jamc2123.1.
    [2]

    Camalier, L., W. Cox, and P. Dolwick, 2007: The effects of meteorology on ozone in urban areas and their use in assessing ozone trends. Atmos. Environ., 41, 7127–7137. doi: 10.1016/j.atmosenv.2007.04.061.
    [3]

    Compo, G. P., J. S. Whitaker, P. D. Sardeshmukh, et al., 2011: The Twentieth Century Reanalysis project. Quart. J. Roy. Meteor. Soc., 137, 1–28. doi: 10.1002/qj.776.
    [4]

    Dai, A. G., 2001a: Global precipitation and thunderstorm frequencies. Part I: Seasonal and interannual variations. J. Climate, 14, 1092–1111. doi: 10.1175/1520-0442(2001)014<1092:gpatfp>2.0.co;2.
    [5]

    Dai, A. G., 2001b: Global precipitation and thunderstorm frequencies. Part II: Diurnal variations. J. Climate, 14, 1112–1128. doi: 10.1175/1520-0442(2001)014<1112:gpatfp>2.0.co;2.
    [6]

    Dai, A. G., 2006: Recent climatology, variability, and trends in global surface humidity. J. Climate, 19, 3589–3606. doi: 10.1175/jcli3816.1.
    [7]

    Dai, A. G., and C. Deser, 1999: Diurnal and semidiurnal variations in global surface wind and divergence fields. J. Geophys. Res. Atmos., 104, 31109–31125. doi: 10.1029/1999jd900927.
    [8]

    Dai, A. G., and J. H. Wang, 1999: Diurnal and semidiurnal tides in global surface pressure fields. J. Atmos. Sci., 56, 3874–3891. doi: 10.1175/1520-0469(1999)056<3874:dastig>2.0.co;2.
    [9]

    Dai, A. G., T. R. Karl, B. M. Sun, et al., 2006: Recent trends in cloudiness over the United States: A tale of monitoring inadequacies. Bull. Amer. Meteor. Soc., 87, 597–606. doi: 10.1175/bams-87-5-597.
    [10]

    Dunn, R. J. H., K. M. Willett, P. W. Thorne, et al., 2012: HadISD: A quality-controlled global synoptic report database for selected variables at long-term stations from 1973–2011. Climate Past, 8, 1649–1679. doi: 10.5194/cp-8-1649-2012.
    [11]

    Ilyas, M., C. M. Brierley, and S. Guillas, 2017: Uncertainty in regional temperatures inferred from sparse global observations: Application to a probabilistic classification of El Niño. Geophys. Res. Lett., 44, 9068–9074. doi: 10.1002/2017gl074596.
    [12]

    Liang, X., L. P. Jiang, Y. Pan, et al., 2020: A 10-yr global land surface reanalysis interim dataset (CRA-Interim/Land): Implementation and preliminary evaluation. J. Meteor. Res., 34, 101–116. doi: 10.1007/s13351-020-9083-0.
    [13]

    Liu, Z. Q., C. X. Shi, Z. J. Zhou, et al., 2017: CMA global reanalysis (CRA-40): Status and plans. Proc. 5th International Conference on Reanalysis, 13–17 November 2017, Rome, Italy, 16 pp. Available at https://climate.copernicus.eu/sites/default/files/repository/Events/ICR5/Talks/zhinqua%20liu_13pm.pdf. Accessed on 21 October 2021.
    [14]

    Lott, N., 2004: The quality control of the integrated surface hourly database. Preprints, 84th AMS Annual Meeting, Amer. Meteor. Soc., Seattle, WA, 1–7. Available online at https://ams.confex.com/ams/84Annual/webprogram/Paper71929.html. Accessed on 7 September 2021.
    [15]

    Saha, S., S. Moorthi, H.-L. Pan, et al., 2010: The NCEP climate forecast system reanalysis. Bull. Amer. Meteor. Soc., 91, 1015–1058. doi: 10.1175/2010bams3001.1.
    [16]

    Smith, A., N. Lott, and R. Vose, 2011: The integrated surface database: Recent developments and partnerships. Bull. Amer. Meteor. Soc., 92, 704–708. doi: 10.1175/2011bams3015.1.
    [17]

    Willett, K. M., N. P. Gillett, P. D. Jones, et al., 2007: Attribution of observed surface humidity changes to human influence. Nature, 449, 710–712. doi: 10.1038/nature06207.
    [18]

    Willett, K. M., P. D. Jones, N. P. Gillett, et al., 2008: Recent changes in surface humidity: Development of the HadCRUH dataset. J. Climate, 21, 5364–5383. doi: 10.1175/2008jcli2274.1.
    [19]

    WMO, 2011: Manual on Codes—Regional Codes and National Coding Practices. Volume II. World Meteorological Organization, WMO-No. 306, 1–352. Available online at https://library.wmo.int/doc_num.php?explnum_id=5730. Accessed on 7 September 2021.
    [20]

    WMO, 2019: Manual on Codes—International Codes. Volume I.1. Part A—Alphanumeric Codes. World Meteorological Organization, WMO-No. 306, 1–480. Available online at https://library.wmo.int/doc_num.php?explnum_id=10235. Accessed on 7 September 2021.
    [21]

    Yang, S., P. D. Jones, H. Jiang, et al., 2020: Development of a near-real-time global in situ daily precipitation dataset for 0000–0000 UTC. Int. J. Climatol., 40, 2795–2810. doi: 10.1002/joc.6367.
    [22]

    Zou, B., 2010: How should environmental exposure risk be assessed? A comparison of four methods for exposure assessment of air pollutions. Environ. Monit. Assess., 166, 159–167. doi: 10.1007/s10661-009-0992-8.
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Development of an Integrated Global Land Surface Dataset from 1901 to 2018

    Corresponding author: Su YANG, yangsu@cma.gov.cn
  • National Meteorological Information Center, China Meteorological Administration, Beijing 100081
Funds: Supported by the National Natural Science Foundation of China (41805128), National Key Research and Development Program of China (2017YFC1501801), National Natural Science Foundation of China (42093190043), and National Innovation Project for Meteorological Science and Technology (CMAGGTD003-5)

Abstract: We developed an integrated global land surface dataset (IGLD) at the National Meteorological Information Center of China Meteorological Administration. The IGLD consists of hourly data for 75 variables from five data sources. It contains not only the most widely used variables (e.g., pressure, temperature, dew-point temperature, and precipitation), but also visibility, cloud cover, snow depth, and so on. A hierarchy of data sources was created to identify duplicate records. The records located higher in the hierarchy were adopted preferentially in the IGLD. A comprehen-sive quality control procedure including extreme value test, internal consistency check, and spatiotemporal consistency check, was applied to the IGLD. The IGLD consists of land surface observations at more than 20,000 global sites from 1901 to 2018, of which about 17,000 stations are currently active. The number of global observatories generally increased over time, except for the 1960s to 1970s. It increased from about 2300 in 1951 to 17,000 in 2018. The observations over America, Europe, and eastern Asia always showed a high temporal integrity and dense spatial coverage, whereas measurements were sparser in South America, Africa, Russia, and the Mediterranean regions. In general, the standard and intermediate standard times for observation suggested by the World Meteorological Organization (WMO) were followed globally, except in Australia, where there were few data measured on the WMO schedule. The IGLD has been used in the China’s first generation global atmospheric reanalysis product (CRA) and the global daily precipitation dataset.

1901–2018年全球地面集成数据资料研制

中国气象局国家气象信息中心研制完成一套全球百年(1901–2018年)地面小时整合数据集(IGLD, Integrated Global Land Surface Dataset)。IGLD整合了中国地面观测小时数据、全球地面天气报、美国ISD (Integrated Surface Database)、CFSR (Climate Forecast System Reanalysis)、GDAS (Global Data Assimilation System) 5个数据源中气压、温度、风、降水、能见度、云量、积雪厚度等共计75个气象要素观测数据。整合过程中依据站名、报文类型从海量数据中识别出重复记录,依据数据源信度、完整性,实现多源数据分级整合,提升数据完整性,同时消除冗余信息干扰。IGLD收录了1901–2018年全球20,000余个地面观测站点,其中约17,000个活跃站点。在过去100年,具备小时观测能力并发布数据的站点数量在持续增加,1951年至2018年增长尤为迅速,从约2300站增加到约17,000站。美国、欧洲和东亚地区的小时观测数据的空间覆盖度一直都较高,时间完整性较好,而在南美、非洲、俄罗斯和地中海区域的观测空间分布一直较稀疏,时间完整性欠佳。通过分析IGLD数据的时次分布,发现全球大部分国家遵循世界气象组织(WMO)建议的观测时次。澳大利亚较为特殊,大量小时观测数据出现在非WMO建议时次。目前IGLD已经应用于中国第一代全球大气再分析产品(CRA)和全球日降水数据集。
1.   Introduction
  • Hourly surface-based meteorological observations are the most-used and most-requested type of climatological data. They are useful for studying changes in the earth’s climate and for reanalyzing individual meteorological events. For example, surface synoptic data have been used to quantify the frequency of precipitation (Dai, 2001a) and its diurnal cycle (Dai, 2001b), the diurnal variations of surface wind and wind divergence fields (Dai and Deser, 1999) and recent changes in surface humidity (Dai, 2006; Willett et al., 2008), and the variations of cloudiness (Dai et al., 2006) and global surface pressure (Dai and Wang, 1999). Willett et al. (2007) derived a homogenized gridded dataset of surface humidity from hourly data to examine changes in surface specific humidity during the late twentieth century. Surface meteorological data extracted from the Integrated Surface Database (ISD) maintained then at the US National Climatic Data Center (NCDC) were used to study the effects of meteorology on ozone in urban areas in eastern America (Camalier et al., 2007). Zou (2010) applied the ISD in a comparative evaluation of the accuracy levels associated with models to assess environmental exposure risk. Hourly dew-point temperature data at 10 stations in the contiguous America were used to develop a method to detect inhomogeneities (Brown and DeGaetano, 2009). The International Surface Pressure Data-bank was used to develop a global pressure reanalysis dataset for the twentieth century (Compo et al., 2011). During the last few decades, the Global Telecommunication System (GTS), operated under the auspices of the World Meteorological Organization (WMO), has allowed national meteorological and hydrological services (NMHSs) to share a wide variety of meteorological data regionally and worldwide. However, not all meteorological data transmitted from NMHSs reach all the other nodes (including operational meteorological centers).

    The ISD is one of the world’s most extensive global hourly datasets and is hosted by the National Centers for Environmental Information (NCEI, formerly known as NCDC) of NOAA. It is an archive of hourly surface observations from a large number of global surface stations (Lott, 2004; Smith et al., 2011; www.ncei.noaa.gov/products/land-based-station/integrated-surface-database). In spite of over two billion surface observations from more than 20,000 stations worldwide, the ISD has relatively low station densities over Asia (via the GTS), especially in China, compared with America and Europe. The uncertainties arising from poor spatial coverage tend to increase at the local scale in the study of global surface temperature (Ilyas et al., 2017).

    The National Meteorological Information Center (NMIC) of the China Meteorological Administration (CMA) is in charge of meteorological data in China. NMIC has been promoting the integrity and quality of meteorological observational data in China by digitizing the historical paper data archives and applying systema-tic quality control procedures. The aim of this study was to develop a comprehensive integrated global land surface dataset (IGLD) for the period 1901–2018. This dataset has been established and is now conditionally open to the public. Users may apply for access to the data through email or telephone numbers provided on the website http://data.cma.cn/. The IGLD has been used in the China’s first generation global atmospheric reanalysis product (Liu et al., 2017) and the global daily precipitation dataset (Yang et al., 2020).

    The rest of the paper is organized as follows. Section 2 gives a brief introduction to the data sources. Section 3 describes the methods used for the integration of multiple data sources, the quality control algorithms, and the assessment of the product. Section 4 discusses the performance of the IGLD. The conclusions from this study are presented in Section 5.

2.   Data sources
  • We collected data from five data sources, including four global sources and one regional source, to build a compilation dataset (namely IGLD) that uses the best features of the five individual sources. Table 1 lists the basic provenance information for the IGLD. In the table, the data archived by NMIC include the GTS data received in Beijing since 1980 (NMIC_GTS) and the hourly surface data from about 2400 national meteorological sites over China since 1951 (NMIC_China) in the CMA Net, which are updated in nearly real time. We refer to these data as the NMIC data.

    Priority
    score
    Data sourceDescriptionSpatial coverageReport typeTime periodQuality controlProvider/
    Reference
    8ISD
    https://www.ncdc.noaa.gov/isd
    Integrated surface databaseGlobalFM12/FM151901–2018Systematic quality controlNCEI/
    Smith et al. (2011)
    6NCEP (CFSR)
    https://rda.ucar.edu/datasets/ds099.0/
    Data used in CFSRGlobalFM12/FM151979–2014Pressure gross checkNCEP/
    Saha et al. (2010)
    6NCEP (GDAS)
    https://nomads.ncep.noaa.gov/pub/
    data/nccf/com/gfs/prod/
    Data used in GDASGlobalFM12/FM152015–2018Pressure gross checkNCEP
    5NMIC (GTS)
    http://data.cma.cn/data/detail/data
    Code/A.0013.0001.html
    GTS reports received by NMICGlobalFM121980–2018Simple quality controlNMIC
    #NMIC (China)
    http://data.cma.cn/data/cdcdetail/dataCode/A.0012.0001.html
    Data archived by NMICChinaFM121951–2018Systematic quality control and expert diagnosisNMIC
    Note: # means that NMIC (China) has top priority (no priority score) as it is used as the only data source for China. The priority scores for other datasets are applied solely to regions beyond China.

    Table 1.  Basic provenance information for IGLD data sources

    We also collected the data assimilated in the Climate Forecast System Reanalysis (CFSR) from 1979 to 2014 (Saha et al., 2010) and the Global Data Assimilation System (GDAS) from 2015 to 2018, from the operational data archives of NCEP/NOAA. The meteorological variables available include station pressure, temperature, dew-point temperature, and wind speed and direction. CFSR and GDAS use the NCEP operational observation quality control procedures, performing only rudimentary limit checks of surface pressure observations compared with the background (Saha et al., 2010). We refer to the data assimilated in CFSR and GDAS as the NCEP data.

    The ISD of NCEI contains data from over 100 origi-nal data sources that collectively archive hundreds of meteorological variables. The overall period of record is currently from 1901 to the present day. The number of active station locations has now reached 13,000, making the ISD one of the world’s most extensive global datasets of sub-daily data observations; the updates are delayed by about two days. The most common meteorological variables in ISD include station pressure, wind speed and direction, temperature, dew-point temperature, cloud data, sea-level pressure, altimeter setting, weather phenomenon, visibility, amounts of precipitation for various time periods, and snow depth. The quality control algorithms for ISD include a series of validity checks, extreme value checks, internal consistency checks, and external (versus another observation for the same station) continuity checks, but do not include spatial quality control (Smith et al., 2011).

3.   Methods
  • We aimed to establish a global hourly meteorological dataset containing records that were as comprehensive as possible by combining the best features of the datasets described in Section 2. The synoptic surface report (FM12; WMO, 2019) and aerodrome meteorological report (FM15; WMO, 2019) in the NCEI ISD and NCEP data were used in the integration. As a result of the current absence of decoding for FM15, only FM12 was considered in the NMIC data. The spatial coverage and volume of the FM12 data are significantly low in America (see Fig. S1 in the online supplemental material) because America does not generally use a synoptic format (FM12) and the reports are augmented with FM15 data (www.webaugur.com/dave/weather/ref/metar/WeatherObDefFormat/OMF-SYNOP.htm). The FM15 data added to IGLD significantly increase the coverage and volume of data in America (see Fig. S2 in the online supplemen-tal material). Compared with the NCEP data, the FM15 data in the ISD has the advantage of more stations and longer records (see Fig. S3 in the online supplemental material). The ISD is therefore considered as the data source in the FM15 report of the IGLD.

    We focused on the integration of the FM12 data in the data sources. A hierarchy of the five datasets was created before integration. In China, NMIC_China was considered a unique data source with top priority, given that NMIC is in charge of all the meteorological data and quality for this country. For regions beyond China, records higher in the hierarchy were preferentially incorporated into the IGLD if there were several optional data sources for one site. The priority of all datasets excluding NMIC_China was determined by the quality control procedures, the stability of the data, the number of stations, and the application (Table 1). The priority score (PS) of each data source was defined and calculated as,

    $${\rm{PS}} = {P_1} + {P_2} + {P_3} + {P_4},$$ (1)

    where P1, P2, P3, and P4 represent the stability score, the number score of the observation station, the quality score, and the application in different fields, respectively (see Table 2 for details).

    ComponentDefinitionNote
    Stability score (P1)2: stab ≥ 90%;
    1: stab < 90%
    ${\rm{stab} } = \dfrac{ {\displaystyle\sum\limits_{i = 1}^{ {\rm{ny} } } { { {(\frac{ {\displaystyle\sum\limits_{j = 1}^{ {\rm{ns} } } { { {\rm{integ} }_j} } } }{ { {\rm{ns} } } })}_i} } } }{ { {\rm{ny} } } };\,\,\,{\rm{integ} } = \dfrac{ { { {\rm{DD} }_{ {\rm{obs} } } } } }{ { { {\rm{DD} }_{ {\rm{all} } } } } }$
    Observation site score (P2)2: CS ≥ 7000 sites;
    1: CS < 7000 sites
    CS: annual average number of global synoptic stations excluding China from 1980 to 2018
    Quality score (P3)2: more than two steps of quality control
    (extreme value check, internal consistency
    check, and temporal consistency check);
    1: primary quality control (extreme check)
    Usage score (P4)2: used in both climatology and prior reanalysis;
    1: used in climatology or prior reanalysis
    Note: stab is the overall integrity of the data source from 2001 to 2018; integ represents the integrity of a single station from each data source; subscript j represents the jth station; ns is the number of stations common to the four data sources, apart from China; subscript i represents the ith year; ny is the total number of years during 2001–2018; DDobs is the number of days with temperature measurements of the common station, apart from China, in the four data sources with a consistent start and end time in each year; and DDall is the number of days in one year.

    Table 2.  Components of the priority score (PS) of a data source

    The first column of Table 1 provides the final priority of each data source in the integration. The ISD was gi-ven the highest priority (PS = 8) as a result of systematic quality control checks, the largest number of observations, and extensive applications. The ISD has the highest number of stations and data volumes and the highest global density of sites, especially in Europe, Japan, and Australia (see Figs. S1, S3 in the online supplemental material). We gave the NCEP data the second highest priority (PS = 6; P1 = 2, P2 = 2, P3 = 1, and P4 = 1) because of the higher spatial coverage and higher number of observation sites than the NMIC_GTS. Fewer observatories and simple quality control procedures resulted in the NMIC_GTS being given the lowest priority (PS = 5; P1 = 2, P2 = 1, P3 = 1, and P4 = 1).

    To facilitate the integration process, the chosen datasets and accompanying station metadata were reformatted into a common format and duplicate records were removed. The NCEP and NMIC data use the WMO five-digit station numbers to identify the stations, whereas the ISD uses a six-digit number. We unified all the identification numbers by converting all the station numbers into the type used by the ISD. If the records were identified according to the same station identification number with the same observation date and time across different data sources, the integration was carried out according to the determined priority.

  • By referring to the quality control methods developed by the NCEP Meteorological Assimilation Data Ingest System (madis.ncep.noaa.gov/madis_sfc_qc_notes.shtml) and the quality control methods applied to the UK Met Office Hadley Centre observational datasets (Dunn et al., 2012), a quality control flow was implemented for the IGLD, which included extreme value check, internal consistency check, temporal consistency check, and spatial consistency check, to identify the gross errors in the IGLD. The data quality check results were divided into three levels: credible, suspicious, and erroneous. The quality control results at each step jointly determined the final quality control level. Only the records passing all the tests were recognized as credible values; otherwise, they were deemed as erroneous (failing in one or more tests) or suspicious (with one or more suspicious quality control results in the quality control flow).

  • We evaluated the spatial coverage and integrity of the IGLD through analysis of the P_OBS value calculated as follows:

    $${\rm{P\_OBS}} = \frac{{{D_{{\rm{obs}}}}}}{{{D_{{\rm{all}}}}}} \times 100\text%,$$ (2)

    where Dobs is the number of days with meteorological measurements on a 1° grid and Dall is the total number of days in the evaluation period. The value of P_OBS represents the integrity of the data in each grid box. The spatial coverage of grids with P_OBS > 0 represents the spatial coverage of the IGLD.

4.   Results
  • This section discusses the performance of IGLD temperature, dew-point temperature, pressure, wind direction, wind speed, 6-h cumulative precipitation (prep_6h), 12-h cumulative precipitation (prep_12h), and 24-h cumulative precipitation (prep_24h), which are among the 75 most widely used variables of the IGLD (see Table S1 in the online supplemental material), based on analysis of spatiotemporal distributions of the data volumes and integrity of the data.

    The IGLD contains data from more than 20,000 stations worldwide from 1901 to 2018, and over 17,000 active stations are updated in the dataset. Figure 1a shows the changes in the number of stations with measurements of more than one variable each year from 1901 to 2018. It is clear that the number of sites has increased over time in the last 118 years, except for the 1960s–1970s, a result of the transition from the keying-in of data to the digital transmission/receipt of data (Smith et al., 2011).

    Figure 1.  Temporal (1901–2018) evolution of (a) annual number of stations with measurements of more than one variable in the IGLD in regions outside China (orange shading) and in China (green shading), and (b) monthly volume of data for different variables in the IGLD. The measurements of pressure and prep_6h in the IGLD started from the early 1950s.

    Figure 1b shows the monthly volumes of data for different variables in IGLD during 1901–2018. The data volumes for pressure, temperature, dew-point temperature, wind direction, and wind speed show the same variation trend and are higher than the data volumes for prep_6h, prep_12h, and prep_24h. IGLD had few stations and low data volumes before 1930. Figure 1b shows the gaps in the early 1970s before the GTS came into existence.

    There were two significant leaps in the volumes of data for pressure, temperature, dew-point temperature, wind direction, and wind speed: (1) around 1997 when automated Meteorological Aviation Report started (Saha et al., 2010) and (2) in the early 2000s when the observation pattern in China changed from 4 times daily manual observations to 24 hourly automatic observations.

    There were fluctuations in the prep_6h data volumes from 1998 to 2001 when the active stations in America were unstable (data not shown). A notable increase in the volume of prep_24h data occurred during 2000–2006, when the frequency of observations of prep_24h changed from once a day to 8 times a day in China and from no station to a high spatial coverage in Europe (data not shown).

    A significant gap in the prep_24h data occurred during 2006–2014 because of a deterioration of the data integrity in China. There was an increase in the volume of prep_6h, prep_12h, and prep_24h data around 2016 when the cumulative precipitation for all 24-h periods was obtained by summing up the 1-h cumulative precipi-tation for China.

    Figure 2 shows global distributions of P_OBS above 10% for pressure, temperature, dew-point temperature, wind direction, and wind speed. Panels I, II, and III represent the results for 1931–1960, 1961–1990, and 1991–2018, respectively. The spatial coverage and integrity of the surface pressure are different from those of temperature, dew-point temperature, wind direction, and wind speed. Before 1960, the latter four variables had a high spatial coverage over America, Europe, India, and eastern Asia and a low spatial coverage over Africa, South America, and the Mediterranean regions (Fig. 2). However, observations of surface pressure before 1960 were absent apart from China, and on average, P_OBS for pressure reached only approximately 30% in most regions of China.

    Figure 2.  Spatial distributions of P_OBS (%) above 10% for pressure (top row), temperature (second row), dew-point temperature (third row), wind direction (fourth row), and wind speed (bottom row) for each 1° grid over the whole globe. (I) Left column: 1931–1960; (II) middle column: 1961–1990; and (III) right column: 1991–2018. The ISD is the unique data source for the IGLD before 1951 when there had been measurements of temperature and wind, but no pressure.

    During 1961–1990, measurements were available over all the global land surface, despite the sparse coverage in South America, Africa, and the Mediterranean regions. Compared with the period 1931–1960, the coverage in the low-density regions had significantly improved and the integrity reached about 60%. The integrity in America, Europe, and China for regions with a high spatial coverage was significantly high and increased by about 60% relative to that in the time period 1931–1960.

    Compared with Panels I and II, Panel III shows significantly higher spatial coverage, especially over America, Europe, and eastern Asia, where the measurements covered almost the whole land surface and P_OBS for each grid reached about 90% in 1991–2018. The integrity of most grids in South America, Australia, Africa, the Mediterranean regions, and eastern Russia reached 80%–90%, although the measurements did not completely cover these regions. The integrity improved significantly by 20%–40% globally, especially in Europe, South America, Africa, and Australia relative to that in the time period 1961–1990.

    The spatial distributions of prep_6h, prep_12h, and prep_24h were not consistent with those of pressure, temperature, dew-point temperature, wind direction, and wind speed. Figure 3 shows that there was no site with continuous prep_6h measurements (P_OBS ≥ 10%) over the world during 1931–1960, when the measurements of prep_12h and prep_24h records were mainly located in China with a low integrity of 10%–20% because of the introduction of the NMIC data into IGLD since 1951.

    Figure 3.  Spatial distributions of P_OBS (%) above 10% for prep_6h (top row), prep_12h (middle row), and prep_24h (bottom row) for each 1° grid over the whole globe. (I) Left column: 1931–1960; (II) middle column: 1961–1990; and (III) right column: 1991–2018. Note that the ISD is the unique data source for the IGLD before 1951, when there were only a few measurements of prep_6h, prep_12h, and prep_24h, and there was no site with continuous prep_6h measurements (P_OBS 10%). In addition, there were nearly no prep_12h data in North America due to unstable and discrete measurements in this region (P_OBS < 10%) during all the three time periods.

    From 1961 to 1990, the measurements of prep_6h covered almost the whole globe and there was a high spatial coverage in America, Europe, and eastern Asia, where P_OBS reached 20%–50%. The spatial coverage of prep_12h with P_OBS above 10% just enlarged in some parts of Europe and Russia relative to that in the time period 1931–1960. The integrity was significantly high in China, where the measurements covered almost every grid and P_OBS reached about 90% after 1961. The spatial distribution of continuous measurements of prep_24h (P_OBS 10%) shows that the measurements covered almost all lands of the globe, apart from Europe and the Mediterranean regions (P_OBS < 10%). Meanwhile, the integrity was significantly low globally (see the small P_OBS values denoted by dark blue dots), apart from China where P_OBS reached about 90%. The coverage and integrity of prep_6h, prep_12h, and prep_24h in 1961–1990 was better than that in 1931–1960.

    Panel III in Fig. 3 shows the significantly high spatial coverage and integrity of prep_6h in America, Europe, and eastern Asia, where the measurements covered almost the whole land surface in the time period 1991–2018. P_OBS only had a high integrity (90%) in Europe, Australia, and parts of the Americas. Compared with 1961–1990, the integrity had been significantly improving globally from 1991 to 2018. The P_OBS of prep_6h increased by about 40% in Europe; 30% in eastern America, South America, and southern Africa; 50% in parts of India and Australia; and 60% in China. The spatial coverage of prep_12h was significantly high in Asia, Europe, Russia, and Australia, with noteworthy gaps in America and Africa; and the only regions with a high integrity were China and Europe. The observations of prep_24h covered almost the whole globe; it is simi-lar to that of prep_6h but with fewer sites of high integrity (red dots) in Africa, Europe, South America, and Australia. Compared with 1961–1990, the integrity improved significantly in Europe, Australia, the Mediterranean regions, and Russia, where P_OBS increased by about 60%, during 1991–2018.

    The spatial coverages of temperature, dew-point temperature, wind direction, wind speed, prep_6h, and prep_24h have been similar since the 1990s, when the measurements in the IGLD covered almost all of America, Europe, and eastern Asia. However, there were notable gaps in South America, Africa, Russia, and the Mediterranean regions, where the observation sites need upgrading. The integrities of pressure, temperature, dew-point temperature, wind direction, and wind speed reached about 90% in most regions, significantly higher than those of prep_6h, prep_12h, and prep_24h.

  • The WMO advises that observations/measurements be executed across the globe at standard times or at intermediate standard times (WMO, 2011; Table 3). Figure 4 shows hourly spatial distributions of P_OBS above 10% for 24-h observations of temperature on a 1° grid from 2008 to 2018. The spatial coverage was high in America, Europe, and eastern Asia, where the integrity reached about 90%–100% during the entire 24-h period. The spatial coverage in eastern South America was high in the whole 24-h period, but with a low integrity due to the high number of new observation sites here since 2016 (figure omitted). High-integrity observations in parts of South America occurred at both standard times and intermediate standard times. The observations were generally concentrated at standard and intermediate standard times in India, where high coverage occurred at 0300 and 1200 UTC. High coverage occurred in South and Northwest Africa at standard and intermediate standard times. The observations in southeastern Australia were concentrated at non-standard times and those in northeastern Australia occurred once every 3 h from 0500 UTC.

    Region I (Africa)Region II (Asia)Region III
    (South America)
    Region IV (North and central America, the Caribbean)Region V
    (Southwest Pacific)
    Region VI (Europe)Region VII (Antarctica)
    Standard times
    Intermediate standard times
    Standard times include 0000, 0600, 1200, and 1800 UTC.
    Intermediate standard times include 0300, 0900, 1500, and 2100 UTC.

    Table 3.  Observational times recommended by the WMO in seven regions

    Figure 4.  Spatial distributions of hourly P_OBS (%) above 10% for temperature in each 1° grid over the whole globe for a 24-h period (from 0000 to 2300 UTC) during 2008–2018.

    We found that the observations in most regions, apart from Australia, followed the observational times advised by the WMO. Table 3 shows that the observational times in Region V, including Australia, are at the standard or intermediate standard times as suggested by the WMO. The observational times were concentrated on non-standard times in Australia—for example, at 0200, 0800, 1400, and 2000 UTC. In addition, the observations in the Americas, Europe, and eastern Asia were done not only at the suggested observational times, but also at other times. Most observations in India were concentrated at the suggested observational times (0300 and 1200 UTC), with a few observations carried out at all times during the 24-h period.

5.   Summary
  • We established an hourly integrated global land surface dataset (IGLD), which contains 75 surface meteorological variables by integrating five international datasets. To make the best use of these data sources, the data format and identification codes of the observational stations were unified, and a hierarchy of five datasets was set up based on their quality control procedure, data stability, number of observatories, and application (e.g., as input data for reanalysis products). The records located higher in the hierarchy were adopted in the IGLD if there were several sources of data. A comprehensive quality control procedure was applied to the IGLD, including tests of extreme values, internal consistency, temporal consistency, and spatial consistency.

    The IGLD includes over 20,000 stations worldwide from 1901 to 2018 and more than 17,000 active stations that are regularly updated. After the 1990s, the measurements in the IGLD covered almost all of America, Europe, and eastern Asia. It should be noted that South America, Africa, Russia, and the Mediterranean regions have always had a low spatial coverage. The IGLD contains not only the most widely used variables (e.g., pressure, temperature, dew-point temperature, wind direction, wind speed, prep_6h, prep_12h, and prep_24h), but also visibility, cloud cover, and snow depth. The volumes of monthly data have been increasing over time, especially after the 1960s.

    Based on the IGLD, we found that most countries carry out their observations at the standard and intermediate standard times suggested by the WMO. In particular, America, Europe, and eastern Asia exhibit a high data integrity at all measurement times. Australia seems to prefer a local measurement schedule.

    The IGLD has been used in the China’s first generation global atmospheric reanalysis product (Liu et al., 2017) and land reanalysis product (Liang et al., 2020). It is also the data source for the global daily precipitation dataset (Yang et al., 2020). These applications inspire a certain level of confidence in the accuracy and stability of the IGLD. It is updated in real time based on global surface data from the NMIC_GTS and NMIC_China on the CMA Data-as-a-Service platform.

    Acknowledgments. The authors would like to thank Zhisen Zhang for his assistance in programming and the relevant agencies for providing the source data for the IGLD of NMIC.

Reference (22)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return