Fusion of In-Situ Soil Moisture and Land Surface Model Estimates Using Localized Ensemble Optimum Interpolation over China

Land data assimilation (DA) is an effective method to provide high-quality spatially and temporally continuous soil moisture datasets that are crucial in weather, climate, hydrological, and agricultural research. However, most existing land DA applications have used remote sensing observations, and are based on one-dimensional (1D) analysis, which cannot be directly employed to reasonably assimilate the recently expanded in-situ soil moisture observations in China. In this paper, a two-dimensional (2D) localized ensemble-based optimum interpolation (EnOI) scheme for assimilating in-situ soil moisture observations from over 2200 stations into land surface models (LSMs) is introduced. This scheme uses historical LSM simulations as ensemble samples to provide soil moisture background error covariance, allowing the in-situ observation information to be propagated to surrounding pixels. It is also computationally efficient because no additional ensemble simulations are needed. A set of ensemble sampling and localization length scale sensitivity experiments are performed. The EnOI performs best for in-situ soil moisture fusion over China with an ensemble sampling of hourly soil moisture from the previous 7 days and a localization length scale of 100 km. Following the evaluation, simulations for in-situ soil moisture fusion are also performed from May 2016 to September 2016. The EnOI analysis is notably better than that without in-situ observation fusion, as the wet bias of 0.02 m3 m−3 is removed, the root-mean-square error (RMSE) is reduced by about 37%, and the correlation coefficient is increased by about 25%. Independent evaluation shows that the EnOI analysis performs considerably better than that without fusion in terms of bias, and marginally better in terms of RMSE and correlation.


Introduction
High-quality spatially and temporally continuous soil moisture datasets are urgently needed as they have important roles in weather, climate, hydrology, agriculture, and many other fields (Yeh et al., 1984;Engman, 1991;Scipal et al., 2008). There are two main ways to obtain soil moisture information: (1) in-situ measurements or satellite remote sensing observations; and (2) land surface model (LSM) simulations (Moradkhani, 2008). Remote sensing provides the ability to continuously monitor soil moisture over large regions. Active and passive microwave measurements are the two main approaches used in soil moisture remote sensing. For example, those from the Soil Moisture Active and Passive (SMAP) mission (Entekhabi et al., 2009), the Soil Moisture and Ocean Salinity (SMOS) mission (Kerr et al., 2001), and the scatterometer and advanced scatterometer onboard the European Remote Sensing satellites (ERS-1 and ERS-2) and Meteorological Operational (MetOp) satellites, respectively (Bartalis et al., 2007;Naeimi et al., 2009). Measurements are also available from the Ad-vanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E; Kawanishi et al., 2003), the Advanced Microwave Scanning Radiometer 2 (AMSR2; Kim et al., 2015), the Special Sensor Microwave/Imager (Paloscia et al., 2001), and the Chinese Fengyun-3 (FY-3) satellites (Sun et al., 2014). LSMs are another important source of temporally and spatially continuous soil moisture information. Many global and regional land surface analysis or reanalysis datasets have been produced based on LSMs, such as the land surface dataset of the ECMWF's Interim Reanalysis (ERA-Interim/Land; Balsamo et al., 2015), the land surface dataset of the Modern-Era Retrospective Analysis for Research and Applications (MERRA-Land; Reichle et al., 2011), and the weakly coupled the NCEP's Global Land Data Assimilation System for the Climate Forecast System Reanalysis (CFSR/GLDAS; Meng et al., 2012).
Data assimilation (DA) is a powerful tool to combine observations with model data using a mathematical framework. To date, many operational regional and global land DA systems (LDAS) have been developed, such as the North American LDAS (Cosgrove et al., 2003;Xia et al., 2014); the Canadian LDAS, which assimilates Lband passive brightness temperature to reduce surface and root-zone soil moisture errors (Carrera et al., 2015); the China Meteorological Administration (CMA) LDAS (Shi et al., 2011); NASA's GLDAS (Rodell et al., 2004); the NCEP's GLDAS (Meng et al., 2012); and the ECM-WF's GLDAS, which includes a simplified extended Kalman filter-based soil moisture DA (De Rosnay et al., 2013). Numerous studies and applications have focused on the assimilation of remotely sensed observations, including the assimilation of SMOS brightness temperature or soil moisture retrievals (Lievens et al., 2015;De Lannoy and Reichle, 2016), the assimilation of AMSR-E or AMSR-2 observations or retrievals Jia et al., 2009;Tian et al., 2009), and SMAP (Draper et al., 2012;Kolassa et al., 2017). To assimilate in-situ observations directly, Gruber et al. (2018) introduced a two-dimensional (2D) Kalman filter using spatial error information provided by triple collocation techniques to assimilate spatially sparse in-situ soil moisture observations with a simplified linear model that only considered precipitation accumulation and time-independent soil moisture loss coefficients (Gruber et al., 2015(Gruber et al., , 2018. The Ensemble Kalman Filter (EnKF) is one of the most successfully applied DA methods (Evensen, 2003) among these applications.
The availability of soil moisture data from the CMA has grown considerably since the observation network was transformed in July 2013 from 10-day manual meas-urements to hourly automatic monitoring (Wang and He, 2015). A network of more than 2200 in-situ soil moisture monitoring stations over China currently provides soil moisture observations operationally, and these in-situ observations have the potential to improve soil moisture analysis because of their more direct measurement of soil moisture than via remote sensing. However, most existing land DA applications are one-dimensional (1D) analyses, which only update the horizontally collocated LSM grid. These 1D analyses are well suited to the assimilation of remote sensing data, but they cannot be used directly for the assimilation of in-situ observations. Tools that can be used to merge in-situ measurements with LSMs are an urgent requirement to test and evaluate the potential value of these rapidly developing in-situ observations over China.
The main aim of this study was to propose a computationally efficient method to interpolate in-situ measurements to an LSM, allowing observations at each site to update the surrounding LSM grids. The ensemble-based optimum interpolation (EnOI) scheme first introduced by Evensen (2003) was used as the DA method because of its low computational cost and comparable performance to the EnKF (Blyverket et al., 2019). We determined the ensemble samples and localization length scale through a set of sensitivity experiments. In-situ soil moisture experiments with the selected ensemble samples and localization length scale from May to September 2016 were performed, followed by detailed evaluation.
The remainder of this paper is organized as follows. Section 2 describes the in-situ soil moisture observations and atmospheric forcing of LSM utilized in the analysis. Section 3 provides background on the formulation of En-OI and introduces the EnOI-based scheme for blending in-situ soil moisture observations and LSM estimates. Results are presented in Section 4 and summarized in Section 5.

In-situ observations of soil moisture
The CMA's soil moisture monitoring network was transformed in July 2013 from 10-day manual measurements to hourly automatic monitoring. Currently, observations from more than 2200 stations are collected and archived by the National Meteorological Information Center (NMIC) of the CMA in real time. The observation profile of each station includes 10 vertical layers: 0-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, and 90-100 cm. The observations of the first layer (0-10 cm) were used in this study. The spatial distribution of the observation stations is shown in Fig. 1. The observations are dense in Southeast and Central China and relatively sparse in Northwest China. Among all the stations, 230 stations (red points in Fig. 1), which are evenly distributed in space and continuously monitored in time, were selected for independent evaluation in Section 4.3.4.

Atmospheric forcing
The CMA Land DA System (CLDAS) was put into operation by the NMIC in 2013 and has since been providing real-time hourly near-surface atmospheric forcing data (e.g., air temperature, specific humidity, surface pressure, wind speed, precipitation, and radiation) and land surface products (e.g., ground temperature, soil temperature, and soil moisture) with a resolution of 0.0625° (Shi et al., 2019). The CLDAS near-surface atmospheric forcing data are generated by merging multisource data, including ground-based observations and satellite-retrieved products, as well as numerical weather prediction model outputs. Then, the CLDAS land surface products are simulated by multiple LSMs [e.g., Community Land Model version 3.5 (CLM3.5), Common Land Model (CoLM), and Noah LSM with multiple parameterization (Noah-MP)] driven by the CLDAS near-surface atmospheric forcing data. CLDAS products have been widely used and validated in many research institutes, universities, and industries. In this study, CL-DAS forcing data were used to drive the LSM in the experiments described below.

Land surface model
This study applied the community Noah-MP options (Niu et al., 2011) for soil moisture simulation. Noah-MP was designed to facilitate climate predictions with physical-based ensembles, and developed with substantial upgrades from the Noah LSM to better represent several parameters including surface-layer radiation balances, snow depth, soil moisture and heat fluxes, leaf area-rainfall interaction, vegetation and canopy temperature distinction, soil column and drainage of soil, and runoff. Multiple parameterization options are available in Noah-MP for key land-atmosphere interaction processes, such as snow, dynamic vegetation and surface water infiltration, and runoff. To better predict the climate, Noah-MP is capable of coupling the NCEP's Global Forecasting System and Climate Forecasting System. Noah-MP contains four soil layers with thicknesses of 10, 30, 60, and 100 cm. In this paper, the default parameterization option of Noah-MP (Table 1) was used to simulate soil moisture.
To obtain a reasonable initial condition, every land model requires a spin-up period to reach the specific equilibrium state. We used the CLDAS atmospheric for-

Localized EnOI system X a
According to the EnOI scheme proposed by Evensen (2003), the analysis ( ) can be given as below: is the model forecast state, is the analysis, N m is the dimension of the model state vector, is the observation vector, N y is the number of observations, K is the gain matrix, and H is the observation operator. The gain matrix K is calculated by where is the ensemble-estimated background error covariance matrix; R is the observation error covariance matrix; the localized ensemble-estimated background error covariance matrix is the Schür product of matrices ρ and B, which is a matrix whose (i, j) entries are given by ; and is the parameter used to tune the different weights on the ensemble versus observations. The ensemble-estimated background error covariance is estimated from the equation where , N is the number of ensemble samples, and the kth element of A' is calculated by .
In the EnOI scheme, a relatively stationary ensemble of model state samples can be taken from a long-term en-semble of model perturbations (anomalies) generated from a long-term model run (Evensen, 2003). Without the need for an ensemble forecast, the EnOI scheme can typically save N times the computational cost than the EnKF. In fact, many previous studies have employed similar historical ensemble methods to simplify the ensemble generation procedure in the assimilation. For example, Pan et al. (2009) used downscaled forcing ensemble forecasts from the NOAA/NCEP Climate Forecast System (CFS) as the input forcing ensembles in their hydrological assimilation system. Pan and Wood (2009) proposed a pattern-based sampling approach in which random samples were drawn from a historical rainfall database according to the pattern of the satellite rainfall, and Pan and Wood (2010) directly used the rainfall data from the Tropical Rainfall Measuring Mission (TRMM) satellite products as the rainfall ensembles to force their assimilation experiments. The selection of ensemble samples in this study is described in Section 4.2.
Another critical question in ensemble-based DA is the localization technique, which is a widely used solution to reduce sampling error, especially when the ensemble size is small (Hamill et al., 2001;Oke et al., 2007). We used the following fifth-order piecewise rational function (Gaspari and Cohn, 1999) to construct the localization matrix ρ: , in which d is the localization length scale and is the horizontal spatial distance between the ith and jth grid points. The localization length scale d indicates the significance range of a measurement.

Evaluation methods
The evaluation criteria used in this study were the bias (Bias), root-mean-square error (RMSE), and correlation coefficient (Corr), which are calculated as follows: is the simulated (merged) soil moisture to be evaluated, represents the in-situ soil moisture observations used for the evaluation, is the number of observations, is the ith observation, is the simulated (merged) soil moisture collocated with the ith observation, is the average value of all observations used for the evaluation, and is the average value of simulated (merged) soil moisture at all the collocated locations.

Optimal localization length scale for in-situ soil moisture fusion over China
The localization technique is useful for the reduction of ensemble sampling error. One of the most important parameters for localization is the length scale. We began by analyzing the characteristics of soil moisture spatial correlation using in-situ observations over China. Then, a series of experiments with different localization length scales were performed for determining the optimal localization length scale. 4.1.1 Spatial correlation analysis based on the in-situ observation network in China Using in-situ hourly observations from 1185 sites in July 2016, of which the observations are time-continuous and valid, we examined the characteristics of soil moisture spatial correlation over China. First, the nearest neighboring site for each site was identified, and then the spatial distance and correlation coefficient between each site and its nearest neighboring site were calculated.
As shown in Fig. 2, the number of stations within distances of 0-10 and 0-30 km from their nearest neighboring site is 215 and 683, respectively, and the distances between most stations (95.6%) and their nearest neighbors are within 100 km. The number of stations with correlations of 0.6-0.8 and 0.8-1 is 343 and 420, respectively, and 78.8% of the observation stations have correlations above 0.5 with their nearest neighboring site. Figure 3 shows the spatial distribution of the correlation coefficients between each site and their nearest neighboring site. Most of the correlations are higher than 0.4 in regions with dense observations, such as North, South, Southwest, and Central China. In contrast, in regions where the sites are spatially sparse, such as Inner Mongolia, Gansu, Tibet, and Xinjiang, the correlations are relatively weaker.

Localization length scale
Based on the above knowledge about the spatial distances and correlations of the in-situ soil moisture monit-

Impact of ensemble sampling on soil moisture fusion
Reasonable construction of ensemble members is crucial for accurate estimation of background error covariance. A series of soil moisture fusion experiments covering the period 1-31 July 2016 were implemented using different historical ensemble samples, including the hourly samples from the previous 7 days (7 × 24 = 168 members, marked as "168_en"), the previous 5 days (5 × 24 = 120 members, marked as "120_en"), the previous 3 days (3 × 24 = 72 members, marked as "72_en"), the previous 1 day (1 × 24 = 24 members, marked as "24_en"), the same Julian day during 1998-2015 (18 × 24 = 432 members, marked as "432_en"), and from the same hour and Julian day during 1998-2015 (18 × 1 = 18 members, Boxplots of the bias, RMSE, and correlations of the merged soil moisture using different ensemble samples are presented in Fig. 5. In terms of the bias, the merged soil moisture using hourly samples from the previous 1 day (24_en in Fig. 5) has the largest bias, while that using hourly samples from the previous 7 days (168_en in Fig. 5) has the lowest bias. In terms of the RMSE, the experiment using hourly samples from the same hour and Julian day during 1998-2015 (18_en in Fig. 5) performs the worst, while the experiment collecting hourly samples from the previous 7 days (168_en in Fig. 5) performs the best. The correlation coefficients of the merged soil moisture are near 0.7 and vary little under different combinations of ensemble members. The experiments using hourly samples from the previous 5 days (120_en in Fig. 5) and from the previous 7 days (168_en in Fig. 5) have higher correlations. In summary, the experiment with ensemble samples from the previous 7-day hourly simulations performs best.

Soil moisture fusion experiments
According to the results of the previous sensitivity experiments, three further experiments were designed to evaluate the value of in-situ soil moisture observation fu-sion. The first was a Noah-MP open loop experiment (marked OL hereafter), which did not use in-situ observations. The second experiment (marked ANL-2200) merged in-situ observations from 2200 sites (red and black dots in Fig. 1) with the output of OL, while the third (marked ANL-1970) merged in-situ observations from only 1970 sites (black dots in Fig. 1), also with the output of OL. In the ANL-2200 and ANL-1970 experiments, we took the previous 7-day hourly Noah-MP simulations as ensemble samples for calculating the background error covariance (number of ensemble members: 168). The horizontal localization length scale was set as 100 km. The experiments were performed from 1 May to 30 September 2016. Observations from 230 stations (red dots in Fig. 1) were used for evaluation.

Spatial distributions of OL versus ANL-2200
Monthly means of OL and ANL-2200 were calculated for each month from May to September in 2016. Figure 6 shows that both OL and ANL-2200 reflect the rational soil moisture distribution over China (humid in Southeast China and dry in Northwest China and Inner Mongolia) in July 2016. ANL-2200 provides more detailed information and is drier than OL in Inner Mongolia, Sichuan Basin, and South China. ANL-2200 and OL are quite similar over the Tibetan Plateau and West China, where in-situ observations are sparse.

Comparison of daily statistics over China
Daily average soil moisture values based on OL and ANL-2200 were calculated and then evaluated against the daily mean of the hourly observations used in EnOI. . ANL-2200 also has a higher correlation (around 0.9) than OL (0.5-0.8). Overall, the EnOI analysis is notably better than that without in-situ observation fusion, as the wet bias of 0.02 m 3 m −3 is removed, the RMSE is reduced by about 37% (0.071−0.045 m 3 m −3 ), and the correlation is increased by about 25% (0.71−0.89).

Comparison of statistics over different subregions
We also calculated the bias, RMSE, and correlation against observation used in EnOI from May to September 2016 over eight subregions (Fig. 1), including Northeast China, North China, the JiangHuai subregion, Southeast China, Inner Mongolia, Southwest China, the Xinji-ang subregion, and the Tibetan Plateau, following the subregion definition of Ma et al. (2005).

Independent evaluation
As described in Section 2.1, 230 stations that are evenly distributed in space were selected for independent evaluation. Figure 9 shows the averages of soil mois-    is also smaller than that of OL (0.071 m 3 m −3 ), and the correlation of ANL-1970 (0.73) is marginally higher than for OL (0.71). Overall, the independent evaluation res-ults show that ANL-1970 is considerably better than OL in terms of bias, and performs marginally better with respect to RMSE and correlation.

Summary
Given the recent rapid development of the soil moisture in-situ measurement network in China, there is no doubt that these observations can and should be used increasingly more widely, not only for calibration and val- idation purposes but also for direct fusion into soil moisture products. In this context, the present study introduces an EnOI-based 2D soil moisture analysis scheme that allows in-situ observations at each site to update the surrounding LSM grids. The scheme uses a relatively stationary ensemble of model state samples to calculate the background error covariance matrix, which makes it very inexpensive computationally. A set of ensemble sampling and localization length scale sensitivity experiments were performed, and the results show that the EnOI scheme with ensemble sampling from the previous 7 days of hourly soil moisture states and a localization length scale of 100 km performs best for in-situ soil moisture fusion over China.
In-situ soil moisture fusion experiments were then performed from May to September 2016. The spatial distributions of monthly mean soil moisture over China with and without EnOI fusion are similar and reasonable. The EnOI analysis shows more detailed information in subregions such as Inner Mongolia, the Sichuan basin, and South China, where observations are denser. Evaluation against observations used in EnOI shows that the EnOI analysis is notably better than that without in-situ observation fusion for all the employed statistical metrics (bias, RMSE, and correlation). Independent evaluation shows that the EnOI analysis performs considerably better for bias, and marginally better for RMSE and correlation.