### Trend analysis

In case a significant trend term exists in the TS as detected in the *MK*, seasonal Mann–Kendall (*SMK*) or autocorrelation function (ACF) plot, a trend analysis is the best way to remove or reduce its impact on TS. Then, a linear line is fitted to the TS and is subtracted from the TS values; remaining is a detrended TS.

### Differencing

One of the most widely employed methods of stationarizing TS is differencing. This method eliminates correlations in TS. The non-seasonal differencing method, which is the subtraction of each value from the previous one, removes the trend in variances and jumps. The equation is as follows:

$$ {\text{Differenced TS }}\left( t \right)\,\, = \,\,{\text{MED}}\left( t \right)\, - \, {\text{MED}}\left( {t \, - \, {1}} \right) $$

(19)

where MED(*t*) represents a studied TS, in this case ACE, recorded at time *t*.

### Stochastic modeling

The auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA) models are the two most conventional methods of the stochastic approach. The difference between these models is in the data differencing method of the ARIMA model, which makes it suitable for non-stationary TS. The equation for ARIMA(*p*, *d*, *q*) is as follows [31]:

$$ \varphi \left( I \right)\,\,\left( {1\, - \,I} \right)^{d} MED\left( t \right)\, = \,\theta \left( I \right)\,\varepsilon \left( t \right) $$

(20)

$$ \varphi \left( I \right)\, = \,\left( {1\, - \,\varphi_{1} I\, - \,\varphi_{2} I^{2} \, - \,\,\varphi_{3} I^{3} \, - \, \cdots - \varphi_{p} I^{p} } \right) $$

(21)

$$ \theta \left( I \right)\, = \,\left( {1\, - \,\theta_{1} I\, - \,\theta_{2} I^{2} \,\, - \,\theta_{3} I^{3} \, - \,\, \cdots \, - - \theta_{p} I^{p} } \right) $$

(22)

where *φ* is the autoregressive (AR) process, *θ* the moving average (MA) parameter, *ε*(*t*) the residual, *d* the non-seasonal differencing, and *p* and *q* the AR and MA orders of the model parameters respectively. The value of these orders is determined through autocorrelation function (ACF) and partial autocorrelation (PACF) diagrams [31], *I* the differencing operator, and (1 *− I*)^{d} the *d*th non-seasonal differencing. In the ARMA model, *d* is equal to 0 and it does not have the differencing operator.

As it is crucial to investigate the structure of the TS being studied prior to modeling, certain tests and preprocessing methods were initially applied to prepare the data for stochastic modeling. After separation of the dataset into training and testing samples, the existence of deterministic terms in the TS should be examined. For this purpose, *MW*, *MK* and Fisher tests are employed to check the existence of Jump, Trend and Period (respectively).

If the results of these tests show no deterministic terms, the stationary TS must be checked. Otherwise, any deterministic terms should be eliminated. The *KPSS* test is applied to check the stationary TS. If the result of this test does not confirm the stationary TS, Trend analysis and differencing is applied and the *KPSS* test is applied again to check the stationary TS. After ensuring that the TS is stationary, the TS normality is evaluated using the *JB* test. After making sure that the TS is stationary and normal, the preprocessing is finished and stochastic modeling is initiated. Initially, depending on the type of problem, it is determined whether the problem is seasonal or not. Then, the range of seasonal and non-seasonal parameters related to auto regressive (AR) and moving average (MA) terms, as well as a constant term, are determined using ACF and PACF diagrams. The ACF and PACF diagrams only determine the most important lags, not the optimum ones.

It may be possible to obtain the optimal model; it does not require the use of all the parameters specified by these two diagrams. The first way to obtain the optimum combination is to examine all the compounds resulting from the defined domains for the stochastic model parameters (i.e. 2^{p(max)+q(max)} − 1 models for an ARMA model). Doing this is very time-consuming as one has to examine all the comparisons and compare them, and the results in many models should be examined as well. Therefore, integrating a stochastic model with the continuous genetic algorithm (CGA) is used in the current study. Indeed, the optimal values of the seasonal MA and AR parameters are determined through an evolutionary process. Then, the residual independence of the proposed model is evaluated using the Ljung-Box test. Finally, the performance of the model is appraised using test data. Considering the maximum number of ARMA, seasonal auto regressive (SAR) and seasonal moving average (SMA) as 5, an example of the optimum achieved solution by ARIMA-CGA is provided in Fig. 2.

The objective function of the CGA is defined, in which all possible combinations are considered and the corrected Akaike information criterion (*AICC*) (Eq. 23) is employed to find the optimum model in terms of accuracy and simplicity simultaneously. The first term of the *AICC* indicates the accuracy of the model while the second one considers the complexity of the model.

$$ AICC\, = \,N\, \times \,Ln\left( {MSE} \right)\, + \,2\, \times \,Comp.\, + \,\frac{2 \times Comp.(Comp.\, + \,1)}{{N\, - \,Comp.\, - \,1}} $$

(23)

where *N* is the number of samples, *MSE* is the mean square error and *Comp*. is the complexity of the model. The Comp. is the summation of stochastic models (*p*, *q*, *P*, *Q*) and constant term if it exists. The *MSE* is calculated as:

$$ MSE\, = \,\,\,\,\frac{{\sum\nolimits_{i = 1}^{N} {(MED_{obs,i} \, - \,MED_{P,i} )^{2} } }}{N} $$

(24)

where *MED*_{obs,i} and *MED*_{p,i} are the *ith* value of the observed and predicted value (respectively). The flowchart of the preprocessing based stochastic model is presented in Fig. 3.

### Generalized structure of group method of data handling (GMDH)

GMDH is a self-organized approach that gradually produces more complex models when evaluating the performance of the input and output datasets [32]. In this approach, the relationship between the input and output variables is expressed by the Volterra Series, which is similar to the Kolmogrov–Gabor polynomial:

$$ y\, = \,a_{0} \, + \,\sum\limits_{i = 1}^{N} {a_{i} x_{i} } \,\, + \,\,\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {a_{ij} x_{i} x_{j} } } \, + \,\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {\sum\limits_{k = 1}^{N} {a_{ijk} x_{i} x_{j} x_{k} } \, + \, \cdots } } $$

(25)

where *y* is the output variable, *A* = (*a*_{0}, *a*_{1}, *…*, *a*_{m}) is the weights vector and *X* = (*x*_{1}, …, *x*_{N}) is the input variables vector. The GMDH model has been developed based on heuristic self-organization to overcome the complexities of multidimensional problems. This method first considers different neurons with two input variables and then specifies a threshold value to determine the variables that cannot reach the performance level. This procedure is a self-organizing algorithm.

The main purpose of the GMDH network is to construct a function in a feed-forward network on the basis of a second-degree transfer function. The number of layers and neurons within the hidden layers, the effective input variables and the optimal model structure are automatically determined with this algorithm.

In order to model using the GMDH algorithm, the entire dataset should first be divided into training and testing categories. After segmenting the data, it creates neurons with two inputs. Given that each neuron has only two inputs, all possible combinations for a model with *n* input vectors are as:

$$ NAPC\, = \,\left( \begin{gathered} n \hfill \\ 2 \hfill \\ \end{gathered} \right)\, = \,\frac{n(n\, - \,1)}{2} $$

(26)

where *NAPC* is the number of all possible combinations and *n* is the number of input vectors.

According to the quadratic regression polynomial function, all neurons have two inputs and one output with the same structure, and each neuron with five weights (*a*_{1}, *a*_{2}, *a*_{3}, *a*_{4}, *a*_{5}) and one bias (*a*_{0}) executes the processing between the inputs (*x*_{i}, *x*_{j}) and output data as follows:

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y} \, = \,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{f} (x_{i} ,x_{j} )\, = \,a_{0} \, + \,a_{{1}} x_{i} \, + \,a_{{2}} x_{j} \, + \,a_{{3}} x_{i}^{2} \, + \,a_{{4}} x_{j}^{2} + \,a_{{5}} x_{i} x_{j} $$

(27)

The unknown coefficients (*a*_{0}, *a*_{1}, *a*_{2}, *a*_{3}, *a*_{4}, *a*_{5}) are obtained by ordinary least squares. The performance of all neural network methods is heavily influenced by the chosen parameters. The unknown coefficients are calculated through a least squares solution as follows:

$$ A_{i} \, = \,(x_{i}^{T} x_{i} )^{ - 1} x_{i} Y $$

(28)

where *A* = {*a*_{0}, *a*_{1}, *a*_{2}, *a*_{3}, *a*_{4}, *a*_{5}} is the unknown coefficients vector, *Y* = {*y*_{1}, …, *y*_{N}}^{T} is the output vector and x is the input variable vector.

The *AICC* criterion (Eq. 23) is applied to determine the optimal network structure and select the neurons describing the target parameter. The *Comp*. in this equation for the GMDH model is defined as follows:

$$ Comp.\, = \,NL\, + \,NTN $$

(29)

where *Comp*. is the complexity, *NL* is the number of layers and *NTN* is the number of total neurons.

The performance of classical GMDH in the modeling of nonlinear problems has been demonstrated in various studies [33,34,35,36]. However, along with its advantages, it possesses the following limitations: (i) second-order polynomials, (ii) only two inputs for each neuron, (iii) inputs of each neuron can only be selected from the adjacent layer [37, 38]. In complex nonlinear problems, the necessity of using second-order polynomials may impede an acceptable result. In addition, considering only two inputs per neuron and using adjacent layer neurons would result in a significant increase in the number of neurons (NN) [39].

In the current study, a new scheme of GMDH as a GS-GMDH is employed and encoded in the MATLAB environment. The developed model removes all the mentioned disadvantages, so that each neuron can connect to two or three neurons at a time, taken from adjacent or non-adjacent layers. In addition, the order of polynomials can also be two or three. Similar to classical GMDH, the best structure is chosen based on the *AICC* index. According to the provided description, the developed GS-GMDH can offer four modes: (1) second-order polynomial with two inputs, (2) second-order polynomial with three inputs, (3) third-order polynomial with two inputs, and (4) third-order polynomial with three inputs. The first mode is classical GMDH.

Figure 4 indicates an example of the developed GS-GMDH for a model with five inputs and one output. In this figure, 3 different neurons (*x*_{11}, *x*_{12,} *x*_{21}) are presented to provide an equation to estimate the target parameter (*y*). The two neurons *x*_{11} and *x*_{21} have three inputs, which are the inputs of the desired problem. The *x*_{21} neuron, which is the output of the problem, has three inputs similar to the two previous neurons (*x*_{11} and *x*_{21}), except that it uses the non-adjacent layer neurons (*x*_{13}) in addition to the adjacent layer neurons (*x*_{11} and *x*_{21}).

The GS-GMDH was used in this study to achieve the most precise results in forecasting the studied TS, which we abbreviated as MED data. GS-GMDH is superior to the former method, GMDH, due to the random structure of neurons that is encoded in the genotype string that results in using all neurons from previous layers in subsequent layers. In addition, GS-GMDH facilitates finding the minimized training and prediction errors separately, preventing model overtraining. The flow chart of the developed GS-GMDH model is presented in Fig. 5.

Before starting the modeling using the GS-GMDH method, some parameters must first be determined. The first parameter is the Maximum Number of Inputs (MNI) that determines the maximum number of inputs for individual neurons. It could be two or three. If set to three, both two and three inputs are tried. Inputs More (IM) is the other one that should be determined before starting modeling. It could be zero or one. If set to zero, the inputs of each neuron are considered only for previous layer while if IM is set to one, this results in taking input from the non-adjacent layers also. The Maximum Number of Neurons (MNN) is equal to the number of input variables, while it could be twice that number for complex problems. The polynomial degree (PD) could be considered to be two or three. If set to three, both two and three are allowed.

### Combining linear and nonlinear models (data-driven method)

ITS consists of stochastic and deterministic components. Thus, by using appropriate data preprocessing methods, it is possible to reduce the problematic effects of deterministic components in the modeling process. The proposed methodology is based on a continuous modeling process. This data-driven method is based on preprocessing to identify linear and nonlinear components of ITS, verification of the validity of decomposed data, and the decomposed model. In the studied case (6), the ACE TS fluctuates greatly. The outcomes of the single stochastic and neural network modeling approaches are relatively weak. Hence, as a third approach, the ACE TS is modeled with a combined stochastic-neural network model. Stochastic models perform efficiently, while TS are linear and do not contain deterministic terms that are responsible for nonlinearity. AI methods, on the other hand, allow the modeling of TS with nonlinear components. The TS, however, is not purely linear or nonlinear; both components are present simultaneously; the integration of which sometimes produces complex structures in the TS. In such cases, the use of single stochastic or nonlinear methods might be improved by a combined model. Combining stochastic models with AI methods is one of the most effective methods of modeling TS with complex structures. As shown in Fig. 6, the residuals of the stochastic models were used as a new TS in GS-GMDH modeling, such that the features of both modeling approaches were utilized.

### Verification indices to evaluate models

To verify the accuracy of modeling performed in the TS MED forecasting, the correlation coefficient (*R*), scatter index (*SI*), mean absolute percentage error (*MAPE*), root mean squared relative error (*RMSRE*) and performance index (*ρ*) are used. In addition to these indices, the corrected *AICC* and Nash–Sutcliffe model efficiency (*E*_{N-S}) based on comparing the model's simplicity with the goodness-of-fit and amount of deviation from the mean value [40] are used. The *AICC* index is used to find the best models in each TS modeling, and the lower the index value is the simpler the model. The *E*_{N-S} index ranges from -∞ to 1, and the closer the index is to one, the more accurate the model.

$$ R\, = \,\frac{{\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,\overline{MED}_{obs,i} } \right)\left( {MED_{pred,i} \, - \overline{\,MED}_{pred,i} } \right)} }}{{\sqrt {\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,\overline{MED}_{obs,i} } \right)^{2} } \sum\limits_{i = 1}^{N} {\left( {MED_{pred,i} \, - \,\overline{MED}_{pred,i} } \right)^{2} } } }} $$

(30)

$$ SI\, = \,\frac{{\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,MED_{pred,i} } \right)^{2} } } }}{{\overline{MED}_{obs} }} $$

(31)

$$ MAPE = \,\frac{100}{N}\left( {\frac{{\left| {MED_{obs,i} \, - \,MED_{pred,i} } \right|}}{{MED_{obs,i} }}} \right) $$

(32)

$$ RMSE\, = \,\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,MED_{pred,i} } \right)^{2} } } $$

(33)

$$ \rho \, = \,\frac{{\left( {\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} \, - \,MED_{pred,i} } \right)^{2} } } /\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} } \right)} } \right)}}{1\, + \,R} $$

(34)

$$ E_{N - S} \,\,\, = \,\,\left[ {1 - \frac{{\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} - MED_{pred,i} } \right)^{2} } }}{{\sum\limits_{i = 1}^{N} {\left( {MED_{obs,i} - \overline{MED}_{pred,i} } \right)^{2} } }}} \right] $$

(35)

$$ AICC\, = \,\frac{{2kN\, + \,\left( {N\ln \left( {\sigma_{\varepsilon }^{2} } \right)\left( {N\, - \,k\, - \,1} \right)} \right)}}{N\, - \,k\, - \,1} $$

(36)

where *k* is the number of parameters, *N* is the number of samples, *σ*_{ε}^{2} is the residuals’ standard deviation, *E*_{N-S} is the Nash–Sutcliffe test statistic, and *MED*_{obs,i} and *MED*_{pred,i} are the ith value of actual data and forecasted MED, respectively.

The Ljung-Box test is used to check the independence of the residuals of the modeled TS [41]. The test statistic is calculated as follows:

$$ Q_{Ljung - Box} \, = \,\left( {N^{2} \, + \,2N} \right)\sum\limits_{h = 1}^{m} {\frac{{r_{h} }}{N\, - \,1}} $$

(37)

where *N* is the number of samples, *r*_{h} is the residual coefficient of the auto regression (*ε*_{t}) in lag *h*, and the value of *m* is equal to *ln(N)*. If the probability corresponding to the Ljung-Box test statistic in the χ2 distribution is higher than the α-level (in this case *P*_{Q} > α = 0.05), the residual series is white noise and the model is adequate.