
Figure 2 shows the overall procedure for SMDBN modeling, which consists of the following six parts: input, temperature subnetwork, NDVI extraction reconstruction, EVI extraction reconstruction, SM subnetwork, and the output. In the model training process, the FY3D images and their corresponding observation data were used as inputs, while during the model testing process, only the FY3D images were used as the input. The temperature subnetwork was used to extract LSTs from the FY3D images. The extracted LST, together with the NDVI and EVI, were input into the SM subnetwork. Then, the SM data were generated from the SM subnetwork. More details on the two subnetworks are provided in the following sections.

The temperature subnetwork consisted of 11 RBM layers. The final RBM layer had only one output node. Besides the last layer, the output from each layer served as the input of the next layer. In the temperature subnetwork, the LST dataset described in Section 2.4 was used as the training dataset. Figure 3 shows the structure of each RBM, which was composed of a visible layer (the upper layer) and a hidden layer (the lower layer). There were no connections between the elements in the same layer, whereas there were dual connections between one element from one layer and all elements from the other layers.
In the RBM, let w be the weight, indicating the connection strength between two connected neurons (one in the visible layer and the other in the hidden layer). Let the bias coefficients of the two neurons be b (visible) and c (hidden), so that the energy function of the connection can be expressed as follows:
$$ {{E}} =  \mathop \sum \nolimits_{{{i}} = 1}^{{n}} {{{b}}_{{i}}}{{{v}}_{{i}}}  \mathop \sum \nolimits_{{{j}} = 1}^{{n}} {{{c}}_{{j}}}{{{h}}_{{j}}}  \mathop \sum \nolimits_{{{i}} = 1}^{{n}} \mathop \sum \nolimits_{{{j}} = 1}^{{n}} {{{v}}_{{i}}}{{{w}}_{{{i}},{{j}}}}{{{h}}_{{j}}}, $$ (1) where E is the function, i and j are the indices of the nodes in the visible and hidden layers, respectively, and v_{i} and h_{j} are the values of the above ith and jth nodes, respectively.
The probability that the hidden layer neuron, h_{j}, is activated is as follows:
$$ {{P}}\left( {{{{h}}_{{j}}}{{v}}} \right) = {{\sigma }}\left( {{{{b}}_{{j}}} + \mathop \sum \nolimits_{{i}} {{{w}}_{{{i}},{{j}}}}{{{v}}_{{i}}}} \right). $$ (2) Because of the dual connection, the neurons in the visible layer can also be activated by the neurons in the hidden layer:
$$ {{P}}\left( {{{{v}}_{{i}}}{{h}}} \right) = {{\sigma }}\left( {{{{c}}_{{i}}} + \mathop \sum \nolimits_{{j}} {{{w}}_{{{i}},{{j}}}}{{{h}}_{{j}}}} \right), $$ (3) where σ denotes a Sigmoid function.
The RBM was trained by a contrast divergence algorithm. First, the state of the hidden layer was obtained from data for the visible layer. Then, the visible layer was reconstructed by the hidden layer. Subsequently, a new hidden layer vector was generated from the state of the visible layer and used as the input for the next RBM to train the next layer. Therefore, each layer of the RBM was trained sequentially to extract deep information from the input eigenvalues. Table 1 lists the training parameters of the temperature subnetwork. The main parameters of the RBM structure were the number of input nodes and number of the RBM layers. The number of the nodes in the first half was larger than the number of input nodes, so we were able to extract the high level features of the eigenvalues. The number of the nodes gradually decreased in the second half, which resulted in the reduction of redundant features and improved the fitting results.
Momentum 0.1 Structure [n, n × 3, n × 5, n × 7, n × 9, n × 10, n × 8, n × 6, n × 4, n × 2, 1] cd_k 1 RBM learning rate e^{–4} RBM epoch 600 BP learning rate e^{–4} Dropout 0.0005 Batch size 50 Note: structure denotes the number of RBM layers and the number of neurons of each layer; n denotes the number of input parameter; cd_k denotes the sampling times; learning rate denotes the parameter in the optimization algorithm that determines the step size at each iteration while moving toward the minimum of the loss function; RBM epoch denotes RBM training times of each layer; dropout denotes the probability of abandonment of neurons; batch size denotes the number of samples per training. Table 1. Training parameters for the temperature subnetwork

The SM subnetwork consisted of 13 RBM layers, where each of the RBM layers was similar to the corresponding component in the temperature subnetwork. The SM dataset described in Section 2.5 was used as the input to the SM subnetwork. Table 2 lists the training parameters of the subnetwork.
Momentum 0.1 Structure [n, n × 8, n × 14, n × 16, n × 17, n × 18, n × 12, n × 11, n × 10, n × 9, n × 6, n × 2, 1] cd_k 1 RBM learning rate e^{–4} RBM epoch 200 BP learning rate e^{–4} Dropout 0.0005 Batch size 50 Table 2. Training parameters for the SM subnetwork

The loss function for the temperature and SMsubnetworks can be defined by using the mean square deviation:
$$ {\rm{loss}} = \frac{1}{{{{t_{\rm{s}}}}}}\mathop \sum \nolimits_{{{i}} = 1}^{t_{\rm{s}}} {{p}}  {{t}}{^2}, $$ (4) where t_{s} represents the total number of samples, p represents the value predicted by the model, and t represents the observation value. When the temperature subnetwork was trained, t was the LST observation value, whereas when the SM subnetwork was trained, t was the SM observation value.

As previously mentioned, there were two training processes for the SMDBN, i.e., one for the temperature subnetwork and the other for the SM subnetwork, and the procedures for training were identical as follows:
(1) Determine the hyperparameters in the training process and initialize the weights and bias in the training subnetwork.
(2) Input the FY3D images and corresponding ground observations to generate the LST and SM datasets. Notably, as the vegetation index was the main input to the SMDBN, site data for nonvegetated areas could not be used. Furthermore, for areas covered by vegetation, observation data for nonvegetative growth seasons could not be used.
(3) Use an unsupervised method to train the two layers of each RBM, input all preprocessed sample data to the visible layer of the RBM, transmit the data to the hidden layer through the excitation function, and train each layer of the RBM using greedy layerwise training. The sample data were collected by using the Gibbs sampling method, where the weight and bias value were updated by using the contrast divergence algorithm to ensure that the feature vector map was optimal. As previously mentioned, the output from one hidden layer was used as the input for the visible layer of the next RBM. When all RBMs were trained following the above steps, the pretraining process was complete.
(4) Use the supervised learning method to train the last layer of the BP neural network in the SMDBN. Compare the model results with measured data and use the gradient descent method to reversely propagate the error based on the preset learning rate for each layer to fine tune the weight of the DBN.

The LR and BP neural network models are the most widely used models for SM inversion, and we constructed an SM LR model (SMLR) and an SM BP neural network model (SMBP) for comparison with the SMDBN model. The structures of SMLR and SMBP are similar to that of SMDBN, as both consist of a temperature subnetwork and an SM subnetwork.
In this study, a graphics workstation with a 12GB Nvidia graphics card was used to conduct the comparison, with a Linux Ubuntu 16.04 operating system.
We used cross validation techniques in the comparison experiments. All samples of the LST and the SM datasets were used in the training procedure. As described in Section 2, the time period of the samples was January 2018 to December 2019.
First, we used the LST dataset to train the temperature subnetwork of the SMDBN, SMLR, and SMBP, respectively. Each subnetwork was trained for five rounds. In each round, we selected 80% of the samples of LST dataset as training samples and used the remaining 20% of the samples as a test sample. When selecting the samples, we followed two principles: On the one hand, we ensured that each sample would be tested at least once. On the other hand, considering that the time variation of SM was generally assessed, test samples selected from the same site were required to be continuous over time.
Second, the trained temperature subnetwork was used to generate temperature. The generated temperatures were used in the SM dataset for SMDBN, SMLR, and SMBP.
Third, the SM subnetwork of the SMDBN, SMLR, and SMBP were trained by using the same strategy using the respective SM dataset.
Table 3 lists the number of samples used in each round of the comparison experiments.
Type Test sample Training sample Temperature 3460 13,840 SM 2342 9368 Table 3. Number of samples used in each round of the comparison experiments

Figure 4 shows the experimental results of the SM distribution from three models tested on five days: 6 March, 3 April, 23 May, 29 October, and 31 October 2019. In the SMDBN model results, the variation in SM was relatively smooth, which was more consistent with reality than the other two model results. After five experimental rounds, each model produced 11,710 test samples of SM. Figure 5 shows the correlation between the measured data and inversion results of each sample of SM; the coefficient of determination R^{2} and the root mean square error (RMSE) for the overall accuracy of the SM prediction based on each model are displayed in Table 4. R^{2} values closer to 1 yield a higher fitting degree for a model while smaller RMSE values indicate a more accurate model. The SMDBN results exhibited the most important correlation with the measured data and the best accuracy, with R^{2} and RMSE values of 0.913 and 0.032, respectively. Thus, the SMDBN was highly superior to the other comparison models.
Figure 4. Distributions of SM over the Ningxia Hui Autonomous Region of China based on the (a) SMDBN, (b) SMLR, and (c) SMBP models for five days: 6 March, 3 April, 23 May, 29 October, and 31 October 2019 (from the left to right panels sequentially).
Model R^{2} RMSE SMDBN 0.913 0.032 SMLR 0.638 0.101 SMBP 0.813 0.083 Table 4. Accuracy of the SM prediction from three models
Figure 5. Correlation between the observed and predicted SM based on the (a) SMDBN, (b) SMLR, and (c) SMBP models. The dashed line in each panel denotes the fitting line from all testing points.
Considering that the time variation of SM is generally assessed, we selected two typical stations, i.e., Tongxin station and Ligang station, as experimental stations and conducted an SM series experiment. Tongxin station is located in the connecting zone between Ordos platform and the north of Loess Plateau, in the core area of arid zone in the middle of Ningxia. Ligang station is located in the central area of Ningxia Plain. Figure 6 presents a series of Tongxin and Ligang, respectively. The prediction data used in Fig. 6 are generated by the trained SMDBN, and all the data are independent with the training data. The trained SMDBN model can also be used to generate time series data in other places that have required basic data. Figure 6 demonstrates that the time series obtained by the SMDBN are highly consistent with the actual time series.
Momentum  0.1 
Structure  [n, n × 3, n × 5, n × 7, n × 9, n × 10, n × 8, n × 6, n × 4, n × 2, 1] 
cd_k  1 
RBM learning rate  e^{–4} 
RBM epoch  600 
BP learning rate  e^{–4} 
Dropout  0.0005 
Batch size  50 
Note: structure denotes the number of RBM layers and the number of neurons of each layer; n denotes the number of input parameter; cd_k denotes the sampling times; learning rate denotes the parameter in the optimization algorithm that determines the step size at each iteration while moving toward the minimum of the loss function; RBM epoch denotes RBM training times of each layer; dropout denotes the probability of abandonment of neurons; batch size denotes the number of samples per training. 