Figure 2 shows the overall procedure for SM-DBN modelling, which consists of the following six parts: input, temperature sub-network, NDVI extraction reconstruction, EVI extraction reconstruction, SM sub-network, and the output. In the model training process, the FY-3D images and their corresponding observation data were used as inputs, while during the model testing process, only the FY-3D images were used as the input. The temperature sub-network was used to extract LSTs from the FY-3D images. The extracted LST, together with the NDVI and EVI, were input into the SM sub-network. Then, the SM data were generated from the SM sub-network. More details on the two sub-networks are provided in the following sections.
The temperature sub-network consisted of 11 RBM layers. The final RBM layer had only one output node. Besides the last layer, the output from each layer served as the input of the next layer. In the temperature sub-network, the LST dataset described in Section 2.4 was used as the training dataset. Figure 3 shows the structure of each RBM, which was composed of a visible layer (the upper layer) and a hidden layer (the lower layer). There were no connections between the elements in the same layer, whereas there were dual connections between one element from one layer and all elements from the other layers.
In the RBM, let w be the weight, indicating the connection strength between two connected neurons (one in the visible layer and the other in the hidden layer). Let the bias coefficients of the two neurons be b (visible) and c (hidden), so that the energy function of the connection can be expressed as follows:
where E is the function, i and j are the indices of the nodes in the visible and hidden layers, respectively, and vi and hj are the values of the above i-th and j-th nodes, respectively.
The probability that the hidden layer neuron, hj, is activated is as follows:
Because of the dual connection, the neurons in the visible layer can also be activated by the neurons in the hidden layer:
where σ denotes a Sigmoid function.
The RBM was trained by a contrast divergence algorithm. First, the state of the hidden layer was obtained from data for the visible layer. Then, the visible layer was reconstructed by the hidden layer. Subsequently, a new hidden layer vector was generated from the state of the visible layer and used as the input for the next RBM to train the next layer. Therefore, each layer of the RBM was trained sequentially to extract deep information from the input eigenvalues. Table 1 lists the training parameters of the temperature sub-network. The main parameters of the RBM structure were the number of input nodes and number of the RBM layers. The number of the nodes in the first half was larger than the number of input nodes, so we were able to extract the high level features of the eigenvalues. The number of the nodes gradually decreased in the second half, which resulted in the reduction of redundant features and improved the fitting results.
Momentum 0.1 Struct [n, n*3, n*5, n*7, n*9, n*10, n*8, n*6, n*4, n*2, 1] cd_k 1 RBM learning rate e^–4 RBM_epochs 600 BP learning rate e^–4 dropout 0.0005 batch_size 50
Table 1. Training parameters for the temperature sub-network
The SM sub-network consisted of 13 RBM layers, where each of the RBM layers was similar to the corresponding component in the temperature sub-network. The SM dataset described in Section 2.5 was used as the input to the SM sub-network. Table 2 lists the training parameters of the sub-network.
momentum 0.1 struct [n, n*8, n*14, n*16, n*17, n*18, n*12, n*11, n*10, n*9, n*6, n*2, 1] cd_k(Line 411也有CD_K, 一样么？) 1 RBM learning rate e^–4 RBM_epochs 200 BP learning rate e^–4 dropout 0.0005 batch_size 50
Table 2. Training parameters for the SM sub-network
The loss function for the temperature-sub- and SM-sub-networks can be defined by using the mean square deviation:
where ts represents the total number of samples, p represents the value predicted by the model, and t represents the observation value. When the temperature sub-network was trained, t was the LST observation value, whereas when the SM sub-network was trained, t was the SM observation value.
As previously mentioned, there were two training processes for the SM-DBN, i.e., one for the temperature sub-network and the other for the SM sub-network, and the procedures for training were identical as follows:
(1) Determine the hyper-parameters in the training process and initialize the weights and bias in the training subnetwork.
(2) Input the FY-3D images and corresponding ground observations to generate the LST and SM datasets. Notably, as the vegetation index was the main input to the SM-DBN, site data for non-vegetated areas could not be used. Furthermore, for areas covered by vegetation, observation data for non-vegetative growth seasons could not be used.
(3) Use an unsupervised method to train the two layers of each RBM, input all pre-processed sample data to the visible layer of the RBM, transmit the data to the hidden layer through the excitation function, and train each layer of the RBM using greedy layer-wise training. The sample data were collected by using the Gibbs sampling method, where the weight and bias value were updated by using the contrast divergence algorithm to ensure that the feature vector map was optimal. As previously mentioned, the output from one hidden layer was used as the input for the visible layer of the next RBM. When all RBMs were trained following the above steps, the pre-training process was complete.
(4) Use the supervised learning method to train the last layer of the BP neural network in the SM-DBN. Compare the model results with measured data and use the gradient descent method to reverse propagate the error based on the preset learning rate for each layer to fine tune the weight of the DBN.
The LR and BP neural network models are the most widely used models for SM inversion, and we constructed a SM LR model (SM-LR) and a SM BP neural network model (SM-BP) for comparison with the SM-DBN model. The structures of SM-LR and SM-BP are similar to that of SM-DBN, as both consist of a temperature sub-model and an SM sub-model.
In this study, a graphics workstation with a 12-GB NVIDIA graphics card was used to conduct the comparison, with a Linux Ubuntu 16.04 operating system.
We used cross validation techniques in the comparison experiments. All samples of the LST dataset and all samples of the SM dataset were used in the training procedure. As described in Section 2, the time period of the samples was January 2018 to December 2019.
First, we used the LST dataset to train the temperature sub-model of the SM-DBN, SM-LR, and SM-BP, respectively. Each sub-model was trained for five rounds. In each round, we selected 80% of the samples of LST dataset as training samples and used the remaining 20% of the samples as a test sample. When selecting the samples, we followed two principles: On the one hand, we ensured that each sample would be tested at least once. On the other hand, considering that the time variation of SM was generally assessed, test samples selected from the same site were required to be continuous over time.
Second, the trained temperature sub-model was used to generate temperature. The generated temperatures were used in the SM dataset for SM-DBN, SM-LR, and SM-BP.
Third, the SM sub-model of the SM-DBN, SM-LR, and SM-BP were trained by using the same strategy using the respective SM dataset.
Table 3 lists the number of samples used in each round of the comparison experiments.
Type Test samples Training samples Temperature 3,460 13,840 SM 2,342 9,368
Table 3. Number of samples used in each round of the comparison experiments
Figure 4 shows the experimental results for SM from three models tested on five days: 6 March 2019, 3 April 2019, 23 May 2019, 29 October 2019, and 31 October 2019. In the SM-DBN model results, the variation in SM was relatively smooth, which was more consistent with reality than the other two model results. Figure 5 shows the correlation between the measured data and inversion results of each sample, as well as the root mean square error (RMSE) for the overall accuracy of SM based on each model. R2 values closer to 1 yielded a higher fitting degree for the model while smaller RMSE values indicated a more accurate model. The SM-DBN results exhibited the most important correlation with the measured data and the best accuracy, with R2 and RMSE values of 0.913 and 0.032, respectively. Thus, the SM-DBN was highly superior to the other comparison models.
Figure 4. SM results based on (a) the SM-DBN model, (b) the SM-LR model, and (c) the SM-BP model for five days: 6 March 2019, 3 April 2019, 23 May 2019, 29 October 2019, and 31 October 2019 (from left to right).
Figure 5. Correlation between measured and predicted SM based on (a) the SM-DBN model, (b) the SM-LR model, and (c) the SM-BP model. The accuracy of each model is indicated by the RMSE of the differences between the model results and measured data from all testing points.
After five experimental rounds, each model produced 11,710 test results. Table 4 gives the average R2 and RMSE. Table 4 shows that the proposed method is clearly superior to the comparative model in both indicators.
Model R2 RMSE SM-DBN 0.913 0.032 SM-LR 0.638 0.101 SM-BP 0.813 0.083
Table 4. Accuracy of the results from the three models
Considering that the time variation of SM is generally assessed, we selected two typical stations, i.e., Tongxin station and Ligang station, as experimental stations and conducted an SM series experiment. Tongxin station is located in the connecting zone between Ordos platform and the north of Loess Plateau, in the core area of arid zone in the middle of Ningxia. Ligang station is located in the central area of Ningxia Plain. Figure 6 presents a series of Tongxin and Ligang, respectively. The prediction data used in Fig. 6 are generated by the trained SM-DBN, and all the data are independent with the training data. The trained SM-DBN model can also be used to generate time series data in other places that have required basic data. Figure 6 demonstrates that the time series obtained by the SM-DBN are highly consistent with the actual time series.
|Struct||[n, n*3, n*5, n*7, n*9, n*10, n*8, n*6, n*4, n*2, 1]|
|RBM learning rate||e^–4|
|BP learning rate||e^–4|