Comparison of spatial interpolation methods for gridded bias removal in surface temperature forecasts

All numerical weather prediction (NWP) models inherently have substantial biases, especially in the forecast of near-surface weather variables. Statistical methods can be used to remove the systematic error based on historical bias data at observation stations. However, many end users of weather forecasts need bias corrected forecasts at locations that scarcely have any historical bias data. To circumvent this limitation, the bias of surface temperature forecasts on a regular grid covering Iran is removed, by using the information available at observation stations in the vicinity of any given grid point. To this end, the running mean error method is first used to correct the forecasts at observation stations, then four interpolation methods including inverse distance squared weighting with constant lapse rate (IDSW-CLR), Kriging with constant lapse rate (Kriging-CLR), gradient inverse distance squared with linear lapse rate (GIDS-LR), and gradient inverse distance squared with lapse rate determined by classification and regression tree (GIDS-CART), are employed to interpolate the bias corrected forecasts at neighboring observation stations to any given location. The results show that all four interpolation methods used do reduce the model error significantly, but Kriging-CLR has better performance than the other methods. For Kriging-CLR, root mean square error (RMSE) and mean absolute error (MAE) were decreased by 26% and 29%, respectively, as compared to the raw forecasts. It is found also, that after applying any of the proposed methods, unlike the raw forecasts, the bias corrected forecasts do not show spatial or temporal dependency.


Introduction
Since recent decades, numerical weather prediction (NWP) models have been used routinely to forecast the future weather events (e.g., Li et al., 2015;Wang et al., 2015;Li et al., 2016). Inevitably, the output of all NWP models involves some substantial errors that impact on the quality and operational utility of the forecasts. The errors arise mainly from the difference between the land use and topography used in the model and the real world, imperfections in the model itself, and the deficiencies in generating the initial conditions, and have two components of random and systematic errors. The model improvements and the developments in data assimilation methods for generating the initial conditions may be ef-fective on the reduction of the random component. On the other hand, statistical methods can be used to remove the systematic error that is mostly the larger component of total error at the surface where often there are substantial deficiencies in specified surface parameters and model physics (Hacker and Rife, 2007). As such, total error in the near surface fields such as 2-m temperature forecasts can be significantly reduced by removing the systematic error.
The systematic error or bias in NWP model outputs could be removed by using historical bias data. For this purpose, several statistical post-processing techniques have been proposed. The most popular methods that need long data archive for training period are model output statistics (MOS) (Glahn and Lowry, 1972;Wilks and Hamill, 2007) and perfect prognosis method (PPM) (Klein et al., 1959). Other methods that use short term training period include Kalman filtering (Homleid, 1995;Galanis and Anadranistakis, 2002;Roeger et al., 2003), running mean error (Stensrud and Skindlov, 1996), the neural network-based methods (Marzban, 2003;Yuval and Hsieh, 2003), and some other methods (e.g., Lam et al., 2011;Acharya et al., 2013;Sweeney et al., 2013). Using historical observation data for bias removal makes these techniques applicable only at observation locations and so they are not directly applicable to other points of interest. However, many end users of weather forecasts require bias corrected forecasts at any location that rarely coincides with the observation sites. Consequently, it is necessary to remove bias on the model grid where there is usually no observation data.
As there is no observation or bias data on the grid points, bias can be removed by using historical information of the bias in neighboring observation stations. The issue of interpolating past biases from neighboring observation stations to a given grid point has been addressed in some earlier works. For example, Yussouf and Stensrud (2006) used Cressman scheme (Cressman, 1959) to interpolate 12-day mean bias values at observation stations to a given location. Gel (2007) compared three methods named as local observation-based (LOB), classification and regression trees and alternating conditional expectations (CART-ACE), and LOB-CART-ACE, which is a natural combination of two previous methods for gridded bias correction in mesoscale weather forecasting. Hacker and Rife (2007) proposed two error covariance models to estimate the bias of near-surface temperature on a mesoscale grid. Mass et al. (2008) interpolated biases at observing locations to model grid points that have similar elevation and/or land-use characteristics.
All of the above mentioned methods interpolate biases at observation stations to the point of interest, but Glahn et al. (2009) interpolated MOS predictions (not bias) at observation sites to the model grid using a modified version of Cressman (1959) method. Similarly, in this paper, the bias corrected surface temperature forecasts are interpolated from observation stations to any given location.
For this purpose, four interpolation techniques are used for interpolating surface temperature that take into account the topography information. They are based on weighted-averaging with different approaches to calculate the weights and lapse rates. The first method is the inverse distance squared weighting with a constant lapse rate (IDSW-CLR) for temperature. The second method called GIDS-LR is the gradient plus inverse distance squared (GIDS) with linear regression to calculate the lapse rate. GIDS has been used with success for climate data interpolation (Nalder and Wein, 1998). In GIDS, a multiple linear regression based on latitude, longitude, and altitude of points is modeled to calculate the lapse rate. Moreover, an alternative method such as classification and regression tree (CART) is examined for lapse rate calculation. Therefore, the third method is GIDS-CART which incorporates nonlinear relationships between the temperature as the predictand and longitude, latitude, and altitude of a point as the predictors to calculate the lapse rate. The fourth technique (Kriging-CLR) is an ordinary Kriging method with a constant lapse rate used for the elevation adjustment. Finally, the results of using proposed techniques in bias correction of the weather research and forecasting (WRF) model (Skamarock et al., 2008) 48-h surface temperature forecasts are compared with each other.
The reminder of this paper is organized as follows. Section 2 supplies the general framework of the applied interpolation schemes. Section 3 reports the used data and methods. The verification and comparison results are presented in Section 4, and conclusions are finally drawn in Section 5.

Methodology
For spatial interpolation of bias corrected temperature, three aspects should be considered (Stahl et al., 2006): (1) the approach of choosing the neighboring observation stations, (2) the method of calculating the lapse rate to adjust for elevation, and (3) the method used for interpolating the data. In the following, the above mentioned three aspects used here, are presented.

Neighborhood
In fact, there is no optimal instruction for choosing an appropriate neighborhood, but some theoretical rules of thumb can be useful. Here, we have used the semivariogram that can provide an insight into estimating a radius of vicinity.
In geostatistics (e.g., Cressie, 2015), a semivariogram is a function for describing the spatial correlation of the observation data and is defined as°( where γ(h) is the semivariogram of variable Z at lag distance h and N(h) is the number of pairs of points separated by lag h. Semivariogram increases when the dis-tance between locations grows up to the distance called range at which the observations are uncorrelated. Thus, the range value is taken to be the radius of circle in which the appropriate neighbors used for interpolation exist.

Lapse rate
In general, temperature decreases with altitude and hence, the rate of temperature change with height (lapse rate) is an effective factor in determining temperature (Nalder and Wein, 1998). As the point of interest and its neighboring observation stations do not necessarily have similar altitude, the temperature change with elevation has to be taken into account in the interpolation procedure. The most common constant lapse rate value found in the literature is 6 ℃ km -1 (Dodson and Marks, 1997;Liston and Elder, 2006), but Courault and Monestiez (1999) claimed that this value corresponds to the free atmosphere, which is not proximate to the surface, and also varies in different seasons. Hence, they obtained the lapse rate by linear regression performed between temperature and elevation.
Similarly, in this study, a constant lapse rate is used in IDSW-CLR and Kriging-CLR methods whose lapse rate is estimated from the relationship between temperature and elevation. For the two other methods, adjustments for elevation have been made by using temperature gradients based on the geographical coordinates of the points that will be explained in the next subsection.

Prediction algorithms
To remove the systematic errors associated with the surface temperature forecast on a regular grid, the temperature forecasts are first bias corrected by using the running mean error method (Stensrud and Skindlov, 1996) at observation sites and then, corrected temperature forecasts are interpolated to the desired grid points.
Four various weighted-average interpolation methods are applied to estimate the temperature forecasts at desired gird points. To enhance the interpolation accuracy, temperature variation with height is taken into account by considering the lapse rate. The interpolation methods differ from each other primarily in the way the weights and lapse rate are calculated by employing two approaches to calculate weights and three approaches for lapse rate calculation.
In the following, T is temperature, d is the distance between the prediction point and observation point, n is the number of data points in local neighborhood of a prediction point, h is the elevation, and subscripts g and i refer to the grid prediction and observation point, re-spectively.

IDSW-CLR method
In this method, temperatures are first converted to sea level equivalent by using a constant lapse rate as: where T sea is temperature at sea level and c is the constant lapse rate. Then, the bias corrected temperatures of neighboring observation stations are interpolated to the given grid point by using weighted averaging with the weights calculated with inverse distance squared as: Then, the interpolated temperature at grid points are converted to the elevation of the prediction point with a constant lapse rate:

Kriging-CLR method
This method uses the same approach for elevation adjustment as used in the IDSW-CLR method. Hence, temperatures are first adjusted to sea level prior to the interpolation as explained above. Similar to IDSW-CLR, the interpolation method is a weighted averaging method, but the weights are computed by using Kriging scheme.
Kriging is a geostatistical interpolation method that is mathematically related to regression analysis (Cressie, 2015). Generally, this method is a weighted average of the known data points surrounding the estimation point. The ordinary Kriging estimator can be defined as: where λ and m are the Kriging weight and expected value (mean), respectively. Rather than an arbitrary weighting function, Kriging assigns weights by minimizing the variance of the estimator. The reader is referred to Stein (1999) for full details.

GIDS-LR method
The GIDS is first proposed by Nalder and Wein (1998) for spatial interpolation of climatic data. This method is a weighted average interpolation that adjustment for elevation is made by using the multiple linear regression based on the geographical coordinates of the points. It first fits a multiple linear regression whose predictor variables are longitude (x), latitude (y), and altitude (h) of the observing points: T = a 0 + a 1 x + a 2 y + a 3 h: The fitted temperature at each observation station and prediction point are used to estimate the temperature gradient. The predicted temperature is then calculated by weighted averaging of the temperature at known points adjusted by their corresponding temperature gradient, with inverse distance squared weights. The GIDS-LR can be formulated as follows:

GIDS-CART methodT
This method has similar interpolation approach to the GIDS-LR method, except that for temperature gradient is calculated by using CART, which is a modern nonparametric regression method, first introduced by Breiman et al. (1984). The CART can reflect the nonlinearity of variables that is impossible in the classical linear regression modeling. Rather than to estimate an explicit linear model for prediction, the CART constructs a binary decision tree by optimally splitting the predictors. For this purpose, a recursive binary splitting approach is used to successively bifurcate the predictor space. Consequently, each terminal node (leaf) of the tree gives a predicted value using the mean of the training observations in the region in which the test value belongs to it.
Generally, CARTs suffer from high variance that can be decreased by applying random forest method proposed by Breiman (2001) to improve the predictive performance of trees. Random forest is an ensemble learning method that builds a number of trees on bootstrapped training samples, and then averages the resulting predictions. Moreover, in a random forest, each node is split by using a random subset of the predictors to decorrelate the trees. Here, the random forest is used with spatial predictors such as longitude, latitude, and altitude of the points.

Data
The data are 48-h WRF forecasts of the surface temperature initialized daily at 1200 UTC from 15 September through 31 December 2011 over Iran. The initial conditions of the forecasts are provided by Global Forecast System (GFS) forecasts with a one-degree horizontal resolution. WRF was run with two nested domains, with the larger domain covering the south-west Middle East from 10° to 51° north and from 20° to 80° east and the inner domain covers Iran from 23° to 41° north and from 42°t o 65° east. The spatial resolutions are 45 and 15 km for the coarser and finer domains, respectively.
The model forecasts for the inner domain are bilinearly interpolated to the observation stations. Observational data used in this study consist of observations of the surface temperature measured at 1200 UTC at 306 irregularly spaced synoptic meteorological stations scattered across Iran as shown in Fig. 1.
The training period is a sliding training window of the most recent 15 days prior to the forecast day. The test period included all the days from 19 September 2011 to 28 February 2012.

Semivariogram
The semivariogram is used to estimate the appropriate size of a neighborhood. Figure 2 shows the estimated semivariogram of post-processed forecasts at 306 observation sites. The range value is estimated 7.61 degree (approximately 800 km), which implies that the observation stations within a 800-km radius of grid point vicin-  ity could be used for bias correction while those stations located outside of this range are not correlated and thus, may not make any improvement.

Constant lapse rate
Both IDSW-CLR and Kriging-CLR methods are adjusted for elevation using a constant lapse rate. Accordingly, a linear regression model is fitted between the observed temperature and elevation as: where a is the intercept and b is the proposed constant lapse rate. The training period data are used to employ Eq. (8). Using available data sample, the slope coefficient b is estimated to be 5.6 ℃ km -1 . This constant lapse rate is used to convert the temperature to sea level equivalent prior to interpolation. However, the results of some experiments show that the interpolated temperatures are not sensitive to the various lapse rate values ranging from 4.5 to 6.5 ℃ km -1 .

Verification
Since the correct value at the grid points is not known, the methods' performance is assessed by using the cross validation approach. For every day of the test period, each observation station, in turn, is excluded from the list of observation stations, and the withheld observation station is considered as a given point which is proposed to be bias corrected in the absence of its historical information. Eventually, the prediction error is computed with mean absolute error (MAE), root mean squared error (RMSE), and mean error (ME) or the bias. Figure 3 presents the error distribution of the raw and bias corrected forecasts graphically using the box-andwhisker plots over the test period. As can be seen, the mean error of the raw forecast is -1.4℃ while it has been decreased to about zero after bias correction. Similar results could be seen for other methods, though the error variance of the Kriging-CLR and GIDS-CART methods are slightly less than the others.

Results
An important cause of the systematic error existence in the output of NWP models is the difference between the topography used in the model and in the real world. In our domain of study, central and southern parts of the country are covered with desert and have more or less flat terrain, so we expect lesser differences the model and real world topography and land use. But North, West, and Northwest Iran are covered with complex topography.
More specifically, the chance of having a difference between the model and real world topography and land use, and thus systematic error for near-surface variables, in regions with complex topography (north, west, and northwest of the country with higher elevations) is higher compared to flat terrain. Hence, some correlation is expected between the systematic error and the topography. As seen in Fig. 4, the MAEs in the higher altitudes are almost larger than those in the lower altitudes representing more flat terrain, and the correlation coefficient between MAE and terrain height is 0.4. After statistical post-processing, the systematic error is removed and we expect a lower correlation coefficient between MAE and topography, which is out to be 0.03. In other words, the spatial dependency of the errors is decreased after employing any of the four post-processing methods. Figure 5 shows the MAE of the bias corrected forecasts using the four methods and also the raw forecasts in six different months. As seen, there is a slightly increasing trend in the MAE of the raw forecasts from September to February, which are the cold months of the winter. This result is natural since the error of the NWP models for surface temperature is generally larger during winter time compared to the error during other seasons (Warner, 2011). However, after removing the bias using either of the four methods (Fig. 5), errors of the post-processed forecasts are almost similar for all months. Thus, the temporal dependency of the errors is decreased after employing the statistical post-processing methods.
The calculated values for MAE, RMSE, improvement to hurt ratio, and RMSE skill score (RMSESS) of the raw and bias corrected forecasts are presented in Table 1 over the test period. The RMSESS is calculated as 1 -RMSE/RMSE ref , with RMSE of the raw forecast as the reference value. The improvement to hurt ratio is com- Fig. 3. Box-and-whisker plots of the error for the raw and bias corrected forecasts. The boxes show the median, the first and third quartiles, and whiskers indicate 1.5 × IQR (interquartile range) below the first quartile and above the third quartile.
AUGUST 2017 puted by the fraction of cases with more than 2℃ improvement in the absolute bias, to cases with more than 2℃ deterioration in the absolute bias (Gel, 2007). In Table 1, Kriging-CLR method has better performance than the other methods, and makes 26% RMSE reduction relative to the raw forecast.
The fields in Fig. 6 show the error of the 48-h surface temperature forecasts initialized on 19 December 2011. Figure 6a shows the error of the raw forecast. The errors of the bias corrected forecasts (Figs. 6b-e) are noticeably reduced compared to the error of the raw forecasts (Fig. 6a). As seen, the absolute values of the raw fore-cast errors are more than 3℃ over most parts of the domain, but the absolute values of the bias corrected forecast errors using either of the four mentioned methods are mostly less than 1℃. Furthermore, Kriging-CLR method (Fig. 6c) has provided the most improvement than the other methods as expected.
According to Murphy (1988), the total error can be decomposed to systematic and random components as: where T and O are temperature forecast and observation, respectively, and K is the number of cases used in the verification. Here, the absolute value of the mean error and the standard deviation of the error are the systematic and random error, respectively. Figure 7 shows the decomposed temperature errors of forecasts corrected by Kriging-CLR and raw forecasts at some stations that are   considered as the prediction points where there is no historical observation data. As can be seen, the systematic errors dominate the surface temperature errors at many stations and thus, the total error is significantly reduced after bias correction. The stations in Fig. 7 are selected in such a way that they cover all parts of the country. As such, their altitude varied from -8.6 to 1659.4 m. The locations of the selected stations are indicated by stars in Fig. 1.

Conclusions
In this paper, four methods are proposed and applied for the gridded bias correction of 48-h surface temperature forecasts over Iran during 1 October and 31 December 2011. Since there is no historical bias data on the prediction grid points, the bias corrected forecasts at surrounding observation stations are interpolated to the desired grid points. Moreover, the temperature change with elevation (lapse rate) has been taken into account in the interpolation procedures. All the applied methods are based on weighed averaging, but differ from each other primarily in the way how the weight and lapse rate are calculated. The IDSW-CLR method uses the inverse distance squared weighting and a constant lapse rate; the Kriging-CLR method uses Kriging weighting and a constant lapse rate too; the GIDS-LR method uses the inverse distance squared weighting and a multiple regression with longitude, latitude, and altitude of points as its predictors to adjust for elevation; the GIDS-CART uses the inverse distance squared weighting and a random forest tree to calculate the lapse rate. The novelty of the   last method (GIDS-CART) is that it incorporates nonlinear relationships between the predictand (temperature forecast) and predictors (longitude, latitude, and altitude) while they have a linear relationship in the GIDS-LR method.
A semivariogram is constructed to obtain a range value that is considered as the radius of the neighboring circle that is approximately 800 km. To calculate the constant lapse rate, a linear regression model is fitted between the observed temperature and elevation. The slope coefficient is considered as the lapse rate that is estimated 5.6 ℃ km -1 over the training period.
The results were verified by using the cross validation method. It is found that all applied bias correcting methods are successful in generating bias corrected forecasts on the grid points where there is no observation data, and have reduced the value of the bias to near zero. The verification measures including MAE, RMSE, RMSESS, and improvement ratio calculated for the interpolated bias corrected forecasts on grid points showed more or less similar results but the Kriging-CLR method is slightly better than the other methods.
The better performance of Kriging-CLR compared to other methods is probably due to the more sophisticated approach that is used in the weight calculations. The weights of Kriging-CLR are estimated by minimizing the error variance of the estimate in contrast to other methods whose weights depend simply on the inverse of squared distance. Fitting a mathematical function to the distribution of the points on a semivariogram is the important priority of the Kriging to the other methods.