Trend analysis
In case a significant trend term exists in the TS as detected in the MK, seasonal Mann–Kendall (SMK) or autocorrelation function (ACF) plot, a trend analysis is the best way to remove or reduce its impact on TS. Then, a linear line is fitted to the TS and is subtracted from the TS values; remaining is a detrended TS.
Differencing
One of the most widely employed methods of stationarizing TS is differencing. This method eliminates correlations in TS. The non-seasonal differencing method, which is the subtraction of each value from the previous one, removes the trend in variances and jumps. The equation is as follows:
$$ {\text{Differenced TS }}\left( t \right)\,\, = \,\,{\text{MED}}\left( t \right)\, - \, {\text{MED}}\left( {t \, - \, {1}} \right) $$
(19)
where MED(t) represents a studied TS, in this case ACE, recorded at time t.
Stochastic modeling
The auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA) models are the two most conventional methods of the stochastic approach. The difference between these models is in the data differencing method of the ARIMA model, which makes it suitable for non-stationary TS. The equation for ARIMA(p, d, q) is as follows [31]:
$$ \varphi \left( I \right)\,\,\left( {1\, - \,I} \right)^{d} MED\left( t \right)\, = \,\theta \left( I \right)\,\varepsilon \left( t \right) $$
(20)
$$ \varphi \left( I \right)\, = \,\left( {1\, - \,\varphi_{1} I\, - \,\varphi_{2} I^{2} \, - \,\,\varphi_{3} I^{3} \, - \, \cdots - \varphi_{p} I^{p} } \right) $$
(21)
$$ \theta \left( I \right)\, = \,\left( {1\, - \,\theta_{1} I\, - \,\theta_{2} I^{2} \,\, - \,\theta_{3} I^{3} \, - \,\, \cdots \, - - \theta_{p} I^{p} } \right) $$
(22)
where φ is the autoregressive (AR) process, θ the moving average (MA) parameter, ε(t) the residual, d the non-seasonal differencing, and p and q the AR and MA orders of the model parameters respectively. The value of these orders is determined through autocorrelation function (ACF) and partial autocorrelation (PACF) diagrams [31], I the differencing operator, and (1 − I)d the dth non-seasonal differencing. In the ARMA model, d is equal to 0 and it does not have the differencing operator.
As it is crucial to investigate the structure of the TS being studied prior to modeling, certain tests and preprocessing methods were initially applied to prepare the data for stochastic modeling. After separation of the dataset into training and testing samples, the existence of deterministic terms in the TS should be examined. For this purpose, MW, MK and Fisher tests are employed to check the existence of Jump, Trend and Period (respectively).
If the results of these tests show no deterministic terms, the stationary TS must be checked. Otherwise, any deterministic terms should be eliminated. The KPSS test is applied to check the stationary TS. If the result of this test does not confirm the stationary TS, Trend analysis and differencing is applied and the KPSS test is applied again to check the stationary TS. After ensuring that the TS is stationary, the TS normality is evaluated using the JB test. After making sure that the TS is stationary and normal, the preprocessing is finished and stochastic modeling is initiated. Initially, depending on the type of problem, it is determined whether the problem is seasonal or not. Then, the range of seasonal and non-seasonal parameters related to auto regressive (AR) and moving average (MA) terms, as well as a constant term, are determined using ACF and PACF diagrams. The ACF and PACF diagrams only determine the most important lags, not the optimum ones.
It may be possible to obtain the optimal model; it does not require the use of all the parameters specified by these two diagrams. The first way to obtain the optimum combination is to examine all the compounds resulting from the defined domains for the stochastic model parameters (i.e. 2p(max)+q(max) − 1 models for an ARMA model). Doing this is very time-consuming as one has to examine all the comparisons and compare them, and the results in many models should be examined as well. Therefore, integrating a stochastic model with the continuous genetic algorithm (CGA) is used in the current study. Indeed, the optimal values of the seasonal MA and AR parameters are determined through an evolutionary process. Then, the residual independence of the proposed model is evaluated using the Ljung-Box test. Finally, the performance of the model is appraised using test data. Considering the maximum number of ARMA, seasonal auto regressive (SAR) and seasonal moving average (SMA) as 5, an example of the optimum achieved solution by ARIMA-CGA is provided in Fig. 2.
The objective function of the CGA is defined, in which all possible combinations are considered and the corrected Akaike information criterion (AICC) (Eq. 23) is employed to find the optimum model in terms of accuracy and simplicity simultaneously. The first term of the AICC indicates the accuracy of the model while the second one considers the complexity of the model.
$$ AICC\, = \,N\, \times \,Ln\left( {MSE} \right)\, + \,2\, \times \,Comp.\, + \,\frac{2 \times Comp.(Comp.\, + \,1)}{{N\, - \,Comp.\, - \,1}} $$
(23)
where N is the number of samples, MSE is the mean square error and Comp. is the complexity of the model. The Comp. is the summation of stochastic models (p, q, P, Q) and constant term if it exists. The MSE is calculated as:
$$ MSE\, = \,\,\,\,\frac{{\sum\nolimits_{i = 1}^{N} {(MED_{obs,i} \, - \,MED_{P,i} )^{2} } }}{N} $$
(24)
where MEDobs,i and MEDp,i are the ith value of the observed and predicted value (respectively). The flowchart of the preprocessing based stochastic model is presented in Fig. 3.
Generalized structure of group method of data handling (GMDH)
GMDH is a self-organized approach that gradually produces more complex models when evaluating the performance of the input and output datasets [32]. In this approach, the relationship between the input and output variables is expressed by the Volterra Series, which is similar to the Kolmogrov–Gabor polynomial:
$$ y\, = \,a_{0} \, + \,\sum\limits_{i = 1}^{N} {a_{i} x_{i} } \,\, + \,\,\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {a_{ij} x_{i} x_{j} } } \, + \,\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {\sum\limits_{k = 1}^{N} {a_{ijk} x_{i} x_{j} x_{k} } \, + \, \cdots } } $$
(25)
where y is the output variable, A = (a0, a1, …, am) is the weights vector and X = (x1, …, xN) is the input variables vector. The GMDH model has been developed based on heuristic self-organization to overcome the complexities of multidimensional problems. This method first considers different neurons with two input variables and then specifies a threshold value to determine the variables that cannot reach the performance level. This procedure is a self-organizing algorithm.
The main purpose of the GMDH network is to construct a function in a feed-forward network on the basis of a second-degree transfer function. The number of layers and neurons within the hidden layers, the effective input variables and the optimal model structure are automatically determined with this algorithm.
In order to model using the GMDH algorithm, the entire dataset should first be divided into training and testing categories. After segmenting the data, it creates neurons with two inputs. Given that each neuron has only two inputs, all possible combinations for a model with n input vectors are as:
$$ NAPC\, = \,\left( \begin{gathered} n \hfill \\ 2 \hfill \\ \end{gathered} \right)\, = \,\frac{n(n\, - \,1)}{2} $$
(26)
where NAPC is the number of all possible combinations and n is the number of input vectors.
According to the quadratic regression polynomial function, all neurons have two inputs and one output with the same structure, and each neuron with five weights (a1, a2, a3, a4, a5) and one bias (a0) executes the processing between the inputs (xi, xj) and output data as follows:
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} \, = \,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{f} (x_{i} ,x_{j} )\, = \,a_{0} \, + \,a_{{1}} x_{i} \, + \,a_{{2}} x_{j} \, + \,a_{{3}} x_{i}^{2} \, + \,a_{{4}} x_{j}^{2} + \,a_{{5}} x_{i} x_{j} $$
(27)
The unknown coefficients (a0, a1, a2, a3, a4, a5) are obtained by ordinary least squares. The performance of all neural network methods is heavily influenced by the chosen parameters. The unknown coefficients are calculated through a least squares solution as follows:
$$ A_{i} \, = \,(x_{i}^{T} x_{i} )^{ - 1} x_{i} Y $$
(28)
where A = {a0, a1, a2, a3, a4, a5} is the unknown coefficients vector, Y = {y1, …, yN}T is the output vector and x is the input variable vector.
The AICC criterion (Eq. 23) is applied to determine the optimal network structure and select the neurons describing the target parameter. The Comp. in this equation for the GMDH model is defined as follows:
$$ Comp.\, = \,NL\, + \,NTN $$
(29)
where Comp. is the complexity, NL is the number of layers and NTN is the number of total neurons.
The performance of classical GMDH in the modeling of nonlinear problems has been demonstrated in various studies [33,34,35,36]. However, along with its advantages, it possesses the following limitations: (i) second-order polynomials, (ii) only two inputs for each neuron, (iii) inputs of each neuron can only be selected from the adjacent layer [37, 38]. In complex nonlinear problems, the necessity of using second-order polynomials may impede an acceptable result. In addition, considering only two inputs per neuron and using adjacent layer neurons would result in a significant increase in the number of neurons (NN) [39].
In the current study, a new scheme of GMDH as a GS-GMDH is employed and encoded in the MATLAB environment. The developed model removes all the mentioned disadvantages, so that each neuron can connect to two or three neurons at a time, taken from adjacent or non-adjacent layers. In addition, the order of polynomials can also be two or three. Similar to classical GMDH, the best structure is chosen based on the AICC index. According to the provided description, the developed GS-GMDH can offer four modes: (1) second-order polynomial with two inputs, (2) second-order polynomial with three inputs, (3) third-order polynomial with two inputs, and (4) third-order polynomial with three inputs. The first mode is classical GMDH.
Figure 4 indicates an example of the developed GS-GMDH for a model with five inputs and one output. In this figure, 3 different neurons (x11, x12, x21) are presented to provide an equation to estimate the target parameter (y). The two neurons x11 and x21 have three inputs, which are the inputs of the desired problem. The x21 neuron, which is the output of the problem, has three inputs similar to the two previous neurons (x11 and x21), except that it uses the non-adjacent layer neurons (x13) in addition to the adjacent layer neurons (x11 and x21).
The GS-GMDH was used in this study to achieve the most precise results in forecasting the studied TS, which we abbreviated as MED data. GS-GMDH is superior to the former method, GMDH, due to the random structure of neurons that is encoded in the genotype string that results in using all neurons from previous layers in subsequent layers. In addition, GS-GMDH facilitates finding the minimized training and prediction errors separately, preventing model overtraining. The flow chart of the developed GS-GMDH model is presented in Fig. 5.
Before starting the modeling using the GS-GMDH method, some parameters must first be determined. The first parameter is the Maximum Number of Inputs (MNI) that determines the maximum number of inputs for individual neurons. It could be two or three. If set to three, both two and three inputs are tried. Inputs More (IM) is the other one that should be determined before starting modeling. It could be zero or one. If set to zero, the inputs of each neuron are considered only for previous layer while if IM is set to one, this results in taking input from the non-adjacent layers also. The Maximum Number of Neurons (MNN) is equal to the number of input variables, while it could be twice that number for complex problems. The polynomial degree (PD) could be considered to be two or three. If set to three, both two and three are allowed.
Combining linear and nonlinear models (data-driven method)
ITS consists of stochastic and deterministic components. Thus, by using appropriate data preprocessing methods, it is possible to reduce the problematic effects of deterministic components in the modeling process. The proposed methodology is based on a continuous modeling process. This data-driven method is based on preprocessing to identify linear and nonlinear components of ITS, verification of the validity of decomposed data, and the decomposed model. In the studied case (6), the ACE TS fluctuates greatly. The outcomes of the single stochastic and neural network modeling approaches are relatively weak. Hence, as a third approach, the ACE TS is modeled with a combined stochastic-neural network model. Stochastic models perform efficiently, while TS are linear and do not contain deterministic terms that are responsible for nonlinearity. AI methods, on the other hand, allow the modeling of TS with nonlinear components. The TS, however, is not purely linear or nonlinear; both components are present simultaneously; the integration of which sometimes produces complex structures in the TS. In such cases, the use of single stochastic or nonlinear methods might be improved by a combined model. Combining stochastic models with AI methods is one of the most effective methods of modeling TS with complex structures. As shown in Fig. 6, the residuals of the stochastic models were used as a new TS in GS-GMDH modeling, such that the features of both modeling approaches were utilized.
Verification indices to evaluate models
To verify the accuracy of modeling performed in the TS MED forecasting, the correlation coefficient (R), scatter index (SI), mean absolute percentage error (MAPE), root mean squared relative error (RMSRE) and performance index (ρ) are used. In addition to these indices, the corrected AICC and Nash–Sutcliffe model efficiency (EN-S) based on comparing the model's simplicity with the goodness-of-fit and amount of deviation from the mean value [40] are used. The AICC index is used to find the best models in each TS modeling, and the lower the index value is the simpler the model. The EN-S index ranges from -∞ to 1, and the closer the index is to one, the more accurate the model.
$$ R\, = \,\frac{{\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,\overline{MED}_{obs,i} } \right)\left( {MED_{pred,i} \, - \overline{\,MED}_{pred,i} } \right)} }}{{\sqrt {\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,\overline{MED}_{obs,i} } \right)^{2} } \sum\limits_{i = 1}^{N} {\left( {MED_{pred,i} \, - \,\overline{MED}_{pred,i} } \right)^{2} } } }} $$
(30)
$$ SI\, = \,\frac{{\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,MED_{pred,i} } \right)^{2} } } }}{{\overline{MED}_{obs} }} $$
(31)
$$ MAPE = \,\frac{100}{N}\left( {\frac{{\left| {MED_{obs,i} \, - \,MED_{pred,i} } \right|}}{{MED_{obs,i} }}} \right) $$
(32)
$$ RMSE\, = \,\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,MED_{pred,i} } \right)^{2} } } $$
(33)
$$ \rho \, = \,\frac{{\left( {\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,MED_{pred,i} } \right)^{2} } } /\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} } \right)} } \right)}}{1\, + \,R} $$
(34)
$$ E_{N - S} \,\,\, = \,\,\left[ {1 - \frac{{\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} - MED_{pred,i} } \right)^{2} } }}{{\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} - \overline{MED}_{pred,i} } \right)^{2} } }}} \right] $$
(35)
$$ AICC\, = \,\frac{{2kN\, + \,\left( {N\ln \left( {\sigma_{\varepsilon }^{2} } \right)\left( {N\, - \,k\, - \,1} \right)} \right)}}{N\, - \,k\, - \,1} $$
(36)
where k is the number of parameters, N is the number of samples, σε2 is the residuals’ standard deviation, EN-S is the Nash–Sutcliffe test statistic, and MEDobs,i and MEDpred,i are the ith value of actual data and forecasted MED, respectively.
The Ljung-Box test is used to check the independence of the residuals of the modeled TS [41]. The test statistic is calculated as follows:
$$ Q_{Ljung - Box} \, = \,\left( {N^{2} \, + \,2N} \right)\sum\limits_{h = 1}^{m} {\frac{{r_{h} }}{N\, - \,1}} $$
(37)
where N is the number of samples, rh is the residual coefficient of the auto regression (εt) in lag h, and the value of m is equal to ln(N). If the probability corresponding to the Ljung-Box test statistic in the χ2 distribution is higher than the α-level (in this case PQ > α = 0.05), the residual series is white noise and the model is adequate.