1 Introduction

In this report, a best autoregressive integrated moving average (ARIMA) model was chosen from a set of possible models for a provided dataset on global land temperature anomalies through applying parameter estimation, including testing for overfitting, computing akaike information criterion (AIC) and bayesian information criterion (BIC) scores, assessing the model’s accuracy in terms of mean error (ME), root mean squared error (RMSE), mean absolute error (MAE), mean absolute scaled error (MASE) and the first-order autocorrelation coefficient (ACF1). In which, data preparation involving transformation, differencing and several tests for stationary was first performed on the dataset, before a set of models could be logically extracted from using suitable model specification tools such as autocorrelation function (ACF), partial autocorrelation function (PACF), extended autocorrelation function (EACF), and BIC tables.

2 Objectives

The aim of this report was to propose the best model for the given dataset from a possible set of models through integration of various data manipulation techniques, appropriate model specification tools, performing tests to determine normality and stationarity, as well as finding parameter estimates and evaluating the goodness-of-fit of the models.

3 Data Analysis

The dataset analysed in this report was provided by Royal Melbourne Institute of Technology (RMIT) and is available here: https://rmit.instructure.com/courses/124176/files/36223019. In addition, as the dataset was taken from here, more information can be found on the dataset through the link: https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series/globe/land/ytd/12/1850-2023?.

This dataset represents the annual Global Land Temperature Anomalies in Degrees Celsius (°C) against the base period 1901 to 2000, containing 174 observations and 2 variables. When inspecting the data, it was found that the first variable, ‘Year’ covered the years from 1850 to 2023, and served as the unique index, whilst the second column, ‘Anomaly’ consists of all the actual observations in degrees Celsius. The dataset contained no missing data values and the data summary below shows the ‘Anomaly’ variable having a mean of 0.06218 and a median of zero, since both are unequal, it is expected that the data distribution is not normal. Furthermore, the median being zero suggests both negative and positive values exist within the data, this was proven by the minimum value being -0.44 and the maximum value being 0.91.

##       Year         Anomaly        
##  Min.   :1850   Min.   :-0.44000  
##  1st Qu.:1893   1st Qu.:-0.12750  
##  Median :1936   Median : 0.00000  
##  Mean   :1936   Mean   : 0.06218  
##  3rd Qu.:1980   3rd Qu.: 0.23000  
##  Max.   :2023   Max.   : 0.91000

The dataset was converted into a time series object before the data was plotted against the time of observation to help gain a better understanding of the data’s characteristics. Referring to Figure 1, there is no seasonality observed as there is a lack of repeated patterns. Furthermore, the series first appears to slowly wander downwards before gradually changing direction to an increasing trend that remains consistent, and seems to have some changing variance as the length of the data points fluctuate back and forth from small to large over time. Additionally, the data possesses autoregressive (AR) and moving average (MA) behaviour as suggested by the up, down movements of the data points and successive points that stedily shift in an upwards trend, respectively.

Figure 1

Figure 1

When looking at the ACF plot in Figure 2, there is obvious decaying pattern with significant autocorrelation in all the lags, indicating this series is non-stationary. Whilst the PACF plot depicts strong correlation in the first lag, which implies the presence of a trend.

Figure 2

Figure 2

ADF Test Assumptions:
H0 : The process is difference non-stationary
HA : The process is stationary

Shapiro-Wilk Test Assumptions:
H0 : The given series is normally distributed
HA : The given series is not normally distributed

From scrutinising the augmented Dickey-Fuller (ADF) test, the p-value of 0.9044 indicates that it is greater than the chosen significant value, 0.05, therefore the the null hypothesis cannot be rejected. Whereas, the p-value from the Shapiro-Wilk test is smaller than 0.05, thus the null hypothesis is rejected. Hence, both tests suggests that the data neither exhibits normality nor demonstrates stationarity. The qq plot in Figure 3 supports this conclusion, as the data points depart from a straight line especially at the tails and bends downwards towards the center of the plot. Hence, the series is clearly non-stationary due to the changing variances and trends, in which transformation will be applied to stabilise variance across time to make the series stationary and more homogeneously distributed.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data
## Dickey-Fuller = -1.1974, Lag order = 5, p-value = 0.9044
## alternative hypothesis: stationary
## 
##  Shapiro-Wilk normality test
## 
## data:  data
## W = 0.94279, p-value = 1.897e-06
Figure 3

Figure 3

4 Data Preparation

4.1 Transformation

There are various data transformation methods such as log, square, square-root, reciprocal, cube, cube-root and several others that can be applied to stabilise the data and transform the residuals to normality. To determine which data transformation technique would be most effective to use with this dataset, a power transformation, Box-Cox was implemented to search for an optimal lambda (λ) value [1].

4.1.1 Box-Cox Transformation

With a 95% confidence interval, Figure 4 depicts that the λ value is located between 0.9 and 1.2 (signified by the two outer vertical, dotted lines), whereby the optimal λ value is 1 and is denoted by the middle, dotted line. Thus, a log transformation was chosen to stabilise the variance of this time series.

Figure 4

Figure 4

## [1] 0.9 1.2
## [1] 1

4.1.2 Log Transformation

When carrying out log transformation, a small constant (+0.5) was added to the raw data to make all values in the dataset positive. Since the selected log transformation can only be applied to positive values, and as previously discovered, the minimum value was -0.44, hence the need for a small constant to make all data values positive. Unlike log transformation, reciprocal transformation is capable of taking on negative values, however, due to its inability to deal with values of zero, it was not chosen. Moreover, the obtained optimal λ value supports the application of log transformation on this dataset.

Analysing Figure 5, after transformation, the data appears to have a more stable variance as the fluctuations have decreased, except for some data points in the early 1900s which could be possible outliers. Although, the data still exhibits a trend that needs to be eliminated or reduced to achieve stationary, this can be done by performing differencing on the log transformed data.

Figure 5

Figure 5

4.2 Differencing

Differencing can be explained as calculating the differences between consecutive observations and can help to stablise the mean by removing changes in the level of a time series. After first difference was implemented, Figure 6 clearly shows that a flatter mean level was attained compared to the data distribution in Figure 5.

Figure 6

Figure 6

Now looking at Figure 7 to check the normality of the data distribution and while the data points are not aligned completely with the straight line as indicated by the heavy tails, there is no longer a downward bend in the middle of the plot compared to Figure 3, when no transformation nor differencing had been applied.

Figure 7

Figure 7

ADF Test Assumptions:
H0 : The process is difference non-stationary
HA : The process is stationary

The p-value from the adf test is 0.01, therefore less than the significant value, 0.05 and so the null hypothesis is rejected. This means that the trend in the data has been removed and the data has been altered to a stationary series through log transformation and first differencing. Referring to the Phillips Perron (PP) test also supports this assumption, as the p-value of 0.01 is less than 0.05, it provides strong evidence to reject the null hypothesis and it can be said that the data has reached stationary.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data
## Dickey-Fuller = -7.1324, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
## 
##  Phillips-Perron Unit Root Test
## 
## data:  data
## Dickey-Fuller Z(alpha) = -137.56, Truncation lag parameter = 4, p-value
## = 0.01
## alternative hypothesis: stationary

5 Model Specification

Initially, the raw data was non-stationary, however, through log transformation and first differencing, was converted to a stationary series. Once a stationary series is achieved, suitable model specification tools are incorporated to come up with a set of possible ARIMA models, to find the values of ‘p’ and ‘q’. Whereby, ‘p’ represents the order of an autoregressive (AR) model model and is identified from the PACF plot, whilst ‘q’ represents the order of a moving-average (MA) model and comes from the ACF plot, with ‘d’ signifying the number of differences needed to make the series stationary [2]. Hence, (p,d,q) can also be understood as (AR,I,MA).

5.1 ACF and PACF Plots

Looking at ACF plot in Figure 8, there is one obvious significant lag and a second that is less distinct that may also be considered, hence, MA(2) and MA(1) models were taken for potential ‘q’ values. Looking at PACF plot, there are two very clear significant lags and a third less noticeable lag, thus, AR(3), AR(2) models were considered for possible ‘p’ values. As the first difference was used, ‘d’ will be 1. From which, there are four sets of ARIMA models derived from examining the ACF and PACF plots, that is; {ARIMA(3,1,1), ARIMA(3,1,2), ARIMA(2,1,1) and ARIMA(2,1,2)}.

Figure 8

Figure 8

5.2 EACF (Extended Autocorrelation Function) Test

Now using another approach to draw forth likely values for ‘p’ and ‘q’, the set of potential models obtained from eacf are; {ARIMA(1,1,2), ARIMA(1,1,3), ARIMA(2,1,1), ARIMA(2,1,2), ARIMA(2,1,3) and ARIMA(3,1,3)}.

## AR/MA
##   0 1 2 3 4 5 6 7 8
## 0 o x x o o o o o o
## 1 x x o o o o o o o
## 2 x o o o o o o o o
## 3 x x x o o o o o o
## 4 x o x o o o o o o
## 5 x o o o o o o o o
## 6 x o o x o o o o o
## 7 x o x o o o o o o
## 8 o x x x o o x o o

5.3 BIC Table

A BIC table was also tried to identify for any more probable models, observe that the shaded columns correspond to AR(1) and more strongly to AR(3), MA(2) and MA(3). Thus, the set of possible models based on those orders were; {ARIMA(3,1,1), ARIMA(3,1,2), ARIMA(3,1,3), ARIMA(1,1,2) and ARIMA(1,1,3)}.

## Reordering variables and trying again:

As a result, the final set of possible candidate models chosen for parameter estimation were; {ARIMA(3,1,1), ARIMA(3,1,2), ARIMA(3,1,3), ARIMA(2,1,1), ARIMA(2,1,2), ARIMA(1,1,2), ARIMA(1,1,3)}.

6 Parameter Estimation

To estimate the parameters and test the significance of the models, maximum likelihood (ML) and conditional sum of squares (CSS) methods will be used to fit the models from the candidate set [3]. Here, the raw data is given to the forecast package’s ‘ARIMA()’ function to find the coefficients and test their significance at a level of 5%. The altered series, whereby transformation and differencing was applied was not used, it was solely done to make the series stationary to be able to find the values of ‘p’ and ‘q’. Since the ‘ARIMA()’ function itself provides arguments more specifically, ‘order=’ and ‘lambda=’, to perform transformation and differencing on the inputted data.

6.1 Model Fitting

6.1.1 ARIMA(3,1,1)

In CSS, the coefficients are statistically significant at a 5% significance level for ar1, ar2 and ma1 but not ar3 as indicated by the p-values. Likewise, the outcome in ML is similar to the CSS method, as the coefficients with ar1, ar2 and ma1 are once again statistically significant, with ar3 regarded as slightly statistically significant. Hence, this model might be a good fit for the data.

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.609122   0.217984  2.7943 0.0052005 ** 
## ar2 -0.364134   0.086709 -4.1995 2.675e-05 ***
## ar3  0.207826   0.129207  1.6085 0.1077304    
## ma1 -0.735979   0.197573 -3.7251 0.0001952 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.615279   0.210744  2.9196  0.003505 ** 
## ar2 -0.361925   0.086262 -4.1956 2.721e-05 ***
## ar3  0.211823   0.127848  1.6568  0.097553 .  
## ma1 -0.741697   0.190517 -3.8931 9.898e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.1.2 ARIMA(3,1,2)

According to the p-values, both CSS and ML methods agree that ar2 and ma1 are statistically significant at a significance level of 5%, whilst ar1, ar3 and ma2 are considered statistically insignificant.

## 
## z test of coefficients:
## 
##       Estimate Std. Error z value Pr(>|z|)  
## ar1  0.6209971  0.4536291  1.3690  0.17101  
## ar2 -0.3690868  0.1866314 -1.9776  0.04797 *
## ar3  0.2112201  0.1753588  1.2045  0.22840  
## ma1 -0.7478902  0.4543540 -1.6461  0.09975 .
## ma2  0.0070346  0.2403759  0.0293  0.97665  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##        Estimate  Std. Error z value Pr(>|z|)  
## ar1  6.1572e-01  4.3134e-01  1.4275  0.15345  
## ar2 -3.6190e-01  1.8332e-01 -1.9742  0.04836 *
## ar3  2.1205e-01  1.7091e-01  1.2407  0.21472  
## ma1 -7.4211e-01  4.3152e-01 -1.7198  0.08548 .
## ma2  1.9872e-05  2.3092e-01  0.0001  0.99993  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.1.3 ARIMA(3,1,3)

CSS and ML’s p-values depict all coefficients to be statistically insignificant except for ar2.

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)  
## ar1  0.087316   0.480306  0.1818  0.85575  
## ar2 -0.437826   0.213029 -2.0552  0.03986 *
## ar3  0.197384   0.268971  0.7338  0.46304  
## ma1 -0.213441   0.479760 -0.4449  0.65640  
## ma2  0.020172   0.278349  0.0725  0.94223  
## ma3 -0.249243   0.195462 -1.2751  0.20226  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)  
## ar1  0.102264   0.479575  0.2132  0.83114  
## ar2 -0.432323   0.210972 -2.0492  0.04044 *
## ar3  0.207744   0.260171  0.7985  0.42459  
## ma1 -0.227198   0.478829 -0.4745  0.63515  
## ma2  0.018584   0.273986  0.0678  0.94592  
## ma3 -0.252181   0.192536 -1.3098  0.19027  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.1.4 ARIMA(2,1,1)

Both approaches, CSS and ML, show ar2 to be the only statistically significant coefficient when checking their p-values, which matches the pattern of the previous models also depicting ar2 as statistically significant.

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)    
## ar1  0.147460   0.192159  0.7674   0.4429    
## ar2 -0.399352   0.077451 -5.1562 2.52e-07 ***
## ma1 -0.273855   0.214609 -1.2761   0.2019    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.148792   0.194411  0.7653    0.4441    
## ar2 -0.395596   0.077386 -5.1120 3.188e-07 ***
## ma1 -0.273818   0.217564 -1.2586    0.2082    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.1.5 ARIMA(2,1,2)

When looking at the p-values from CSS and ML, the coefficients ar1, ma1 and ma2 are statistically insignificant, whereas ar2 is statistically significant. Thus, this model does not seem like a good fit for the data.

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ar1  0.12712    0.17380  0.7314   0.4645  
## ar2 -0.30861    0.16136 -1.9126   0.0558 .
## ma1 -0.25607    0.17918 -1.4291   0.1530  
## ma2 -0.11150    0.16641 -0.6700   0.5028  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ar1  0.12926    0.17412  0.7424  0.45786  
## ar2 -0.29585    0.16665 -1.7753  0.07585 .
## ma1 -0.25768    0.17896 -1.4398  0.14991  
## ma2 -0.12067    0.17016 -0.7092  0.47823  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.1.6 ARIMA(1,1,2)

The p-values obtained from CSS and ML imply the only coefficient here to be statistically significant is ma2, while ar1 and ma1 are statistically insignificant, therefore it does not seem to be a suitable model for the data.

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)    
## ar1  0.063647   0.153936  0.4135   0.6793    
## ma1 -0.202697   0.134390 -1.5083   0.1315    
## ma2 -0.370220   0.067526 -5.4826 4.19e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.070015   0.155948  0.4490    0.6535    
## ma1 -0.207702   0.136249 -1.5244    0.1274    
## ma2 -0.366988   0.067420 -5.4433 5.231e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.1.7 ARIMA(1,1,3)

All coefficients, ar1, ma1, ma2 and ma3, are statistically significant according to both CSS and ML techniques when inspecting the p-values, indicating this model has potential to be the best fit for the data.

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.942774   0.047472 -19.860 < 2.2e-16 ***
## ma1  0.847097   0.079913  10.600 < 2.2e-16 ***
## ma2 -0.514501   0.081849  -6.286 3.258e-10 ***
## ma3 -0.440719   0.067183  -6.560 5.382e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.958601   0.039211 -24.4470 < 2.2e-16 ***
## ma1  0.869863   0.077783  11.1832 < 2.2e-16 ***
## ma2 -0.510865   0.082772  -6.1720 6.745e-10 ***
## ma3 -0.449872   0.067227  -6.6918 2.204e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall, coefficients for ar2, ma2 and at times ma1 seem to fit better to the data and are considered statistically significant more often than the other coefficients.
Now, utilising AIC and BIC scores that are sorted in ascending order (smallest to largest) to help decide which model is the best fitted ARIMA model out of the seven candidate models.

6.2 AIC Scores

Referring to the AIC and BIC scores, AIC suggests model_113_ml before model_211_ml, whereas, BIC indicates the opposite, model_211_ml over model_113_ml, as the model that is best fitted to the dataset. In which, further exploration into each model’s measures of accuracy can help come to a decisive choice on which ARIMA model to select.

##              df       AIC
## model_113_ml  5 -354.6224
## model_211_ml  4 -352.8980
## model_311_ml  5 -352.1474
## model_212_ml  5 -351.3795
## model_112_ml  4 -350.5250
## model_312_ml  6 -350.1474
## model_313_ml  7 -349.2984

6.3 BIC Scores

##              df       BIC
## model_211_ml  4 -340.2848
## model_113_ml  5 -338.8560
## model_112_ml  4 -337.9119
## model_311_ml  5 -336.3810
## model_212_ml  5 -335.6131
## model_312_ml  6 -331.2277
## model_313_ml  7 -327.2254

7 Results

7.1 Model Evaluation

When producing the measures of accuracy on all seven possible ARIMA models, the metric columns, mean percentage error (MPE) and mean absolute percentage error (MAPE) returned ‘NaN’ and ‘Inf’, respectively as a consequence of zero values within the dataset. To solve this issue, it was initially decided to drop any observations containing the value of zero. Albeit, this way of handling zero values is not the best solution as such values may still contain meaningful value depending on the domain knowledge the data emerged from. Furthermore, the data containing zero values made up six observations out of the 174, thus making up approximately only 3% of the dataset size. Therefore, omitting these zero values may not make such a big change or impact, however, it is still possible that it can distort the accuracy of the results.

It is also worth highlighting that the evaluation metric, ‘MAPE’, as a percentage would only make sense for values requiring divisions and ratios, thus it is not sensible to compute percentages of temperatures, which is relevant here, as the given dataset represents global land temperature anomalies (°C) [4]. Therefore, the choice to not exclude data values containing zero was made and only the other five remaining accuracy measures will be relied on to assess the goodness-of-fit of the models, ‘MPE’ and ‘MAPE’ will be disregarded.

However, if the decision to remove zero values within the dataset was chosen then, the accuracy metrics obtained for ‘MPE’ and ‘MAPE’ are shown below. Moreover, when comparing the accuracy results of the other five metrics, ‘ME’, ‘RMSE’, ‘MAE’, ‘MASE’ and ‘ACF1’ for data containing observations with zero values against data without zero values, the outcome was mostly similar, and differ only by an increase of 0.001 ranging up to 0.022 for some models when inspecting all five accuracy measures.

Recall in section 6.2 and 6.3, when calculating the AIC and BIC scores, the tables suggested between ARIMA(1,1,3) and ARIMA(2,1,1) to be the most fitting. Now interpreting the measures of accuracy for all seven models, focusing especially on ARIMA(1,1,3) and ARIMA(2,1,1). Both models are equally good, however, it was decided that ARIMA(2,1,1) will be selected as the best model as it has a tiny edge over ARIMA(1,1,3) with slightly smaller values in ‘ME’, ‘MAE’, ‘MASE’ and ‘ACF1’, whereby the lower the better and values closer to zero is ideal. Although, a greater value would indicate a good model for ‘RMSE’ as it refers to the model’s accuracy in forecasting, from which ARIMA(2,1,1) also achieved a larger score than ARIMA(1,1,3).

Moreover, in section 6.1, when model fitting was carried out, coefficients from model orders, AR(2), MA(1) and MA(2) were statistically significant. Additionally, adhering to the principle of parsimony, ARIMA(2,1,1) has 2 AR terms, 1 difference, and 1 MA term, this is less complex compared to ARIMA(1,1,3) which has 1 AR term, 1 difference, and 3 MA terms, hence ARIMA(2,1,1) was selected.

##                 ME  RMSE   MAE MPE MAPE  MASE   ACF1
## ARIMA(3,1,1) 0.009 0.084 0.063 NaN  Inf 0.873 -0.011
## ARIMA(3,1,2) 0.009 0.084 0.063 NaN  Inf 0.873 -0.011
## ARIMA(3,1,3) 0.009 0.084 0.062 NaN  Inf 0.866 -0.008
## ARIMA(2,1,1) 0.008 0.085 0.063 NaN  Inf 0.873 -0.001
## ARIMA(2,1,2) 0.008 0.085 0.063 NaN  Inf 0.877 -0.005
## ARIMA(1,1,2) 0.010 0.085 0.064 NaN  Inf 0.895 -0.008
## ARIMA(1,1,3) 0.010 0.084 0.064 NaN  Inf 0.894 -0.022
##                 ME  RMSE   MAE    MPE   MAPE  MASE   ACF1
## ARIMA(3,1,1) 0.010 0.088 0.066 16.182 83.610 0.886 -0.013
## ARIMA(3,1,2) 0.009 0.088 0.065 15.782 84.771 0.885 -0.004
## ARIMA(3,1,3) 0.010 0.087 0.066  8.590 84.185 0.888  0.002
## ARIMA(2,1,1) 0.009 0.088 0.066 15.967 84.606 0.887 -0.008
## ARIMA(2,1,2) 0.009 0.088 0.066 16.081 84.324 0.887 -0.009
## ARIMA(1,1,2) 0.010 0.088 0.067 16.857 85.849 0.905 -0.005
## ARIMA(1,1,3) 0.010 0.088 0.067 19.949 85.763 0.904 -0.030

7.2 Overfitting Test

After a best model was chosen, ARIMA(2,1,1) now needs to be tested for over-fitting. This is done by increasing only the model’s AR order or ‘p’ by 1 whilst keeping ‘q’ as is, and checking if the parameters are significant or not, then repeating the same for MA order or ‘q’ whilst keeping ‘p’ the same.

7.2.1 ARIMA(3,1,1)

Analysing the p-values when AR order is increased by 1 from the chosen model, both CSS and ML exhibits all coefficients to be statistically insignificant except for ar2 at a significance level of 5%.

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)   
## ar1  0.415655   0.539818  0.7700 0.441305   
## ar2 -0.300310   0.112018 -2.6809 0.007342 **
## ar3  0.080556   0.254058  0.3171 0.751184   
## ma1 -0.599991   0.528452 -1.1354 0.256218   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)   
## ar1  0.418823   0.526005  0.7962 0.425897   
## ar2 -0.299566   0.107349 -2.7906 0.005261 **
## ar3  0.083465   0.250046  0.3338 0.738531   
## ma1 -0.602391   0.513549 -1.1730 0.240797   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7.2.2 ARIMA(2,1,2)

Assessing the p-values when MA order is increased by 1 from the chosen model, CSS suggests that ar2 and ma1 are a bit significant, whereas, ML only indicates ma1 to be significant at a 5% significance level. Both methods agree that ar1 and ma2 are not statistically significant.

Hence, the best model is appropriate as there are no new or major changes when either ‘p’ or ‘q’ is taken one step higher than ARIMA(2,1,1), and it can be said that ARIMA(2,1,2) and ARIMA(3,1,1) are over-parametrised models for ARIMA(2,1,1).

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)  
## ar1  0.222163   0.220652  1.0068  0.31401  
## ar2 -0.303122   0.182222 -1.6635  0.09622 .
## ma1 -0.407711   0.227226 -1.7943  0.07277 .
## ma2 -0.027355   0.207637 -0.1317  0.89519  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)  
## ar1  0.216442   0.224684  0.9633  0.33539  
## ar2 -0.300198   0.195700 -1.5340  0.12504  
## ma1 -0.401729   0.230330 -1.7441  0.08113 .
## ma2 -0.028886   0.223440 -0.1293  0.89714  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

8 Conclusion

To find the best model for the given global land temperature anomalies (°C) dataset, log transformation and the first difference was implemented on the dataset to ensure its a stationarity, in order to specify likely ‘p’ and ‘q’ values for the model’s AR and MA orders. Through various specification tools such as ACF, PACF, EACF and BIC table, potential combinations of ‘p’ and ‘q’ parameters were selected for parameter estimation, using ML and CSS methods. The best model was then chosen with consideration towards all candidate models’ AIC and BIC score rankings as well as evaluating the goodness-of-fit of the models based on ‘ME’, ‘RMSE’, ‘MAE’, ‘MASE’ and ‘ACF1’ measures. As a result, ARIMA(2,1,1), that is, an order modeled by 2 lagged values, 1 difference and 1 error term, was found to be the more parsimonious and smaller model, therefore was chosen as the best fitted model for this dataset. Over-fitting was also carried out to ensure there are no new changes to the model when either ‘p’ or ‘q’ value is increased by 1.

9 Appendices

Below is the code, loaded packages and functions used to produce the findings in this report.

####################################################################
#
#    Title: solutionModule6Tasks_2024_updated.R source code
#    Author: Demirhan, H
#    Date: Apr. 22 2024
#    Availability: https://rmit.instructure.com/courses/124176/files/37748470?module_item_id=6181993
#
####################################################################

# Load the required packages
library(TSA)
library(tseries)
library(lmtest)
library(forecast)

# Read in the data csv file
Global_Temperature_data <- read.csv("assignment2Data2024.csv", header=TRUE)

# Display the data to check its correctness to the original csv file
Global_Temperature_data

# Check the data type
class(Global_Temperature_data)

# Check that there are no missing values
sum(is.na(Global_Temperature_data$Anomaly))

# Check the summary of the data set
summary(Global_Temperature_data)

# Convert to a time series data object
Global_Temperature_data_TS = ts(Global_Temperature_data$Anomaly, start = 1850, end = 2023, frequency = 1) 
class(Global_Temperature_data_TS) 

# Create function to plot the data in a time series plot
timeseries_plot <- function(data, title, ylabel, xlabel, size) {
  plot(data, type='o', main = title, ylab=ylabel, xlab=xlabel, cex.main=size)
}

# Call the function
timeseries_plot(Global_Temperature_data_TS, title="Time series of global land temperature anomalies data (°C)", ylabel="Global land temperature anomalies (°C)", xlabel="Time (in Years)", size=0.9)

# Create function to generate acf and pacf plots
acf_pacf_plot <- function(data, acf_title, pacf_title,title_size) {
  par(mfrow=c(1,2))
  acf(data, main = acf_title, cex.main = title_size) # Adjust title size to fit
  pacf(data, main = "", cex.main = title_size)
  title(main = pacf_title, cex.main = title_size) 
  par(mfrow=c(1,1))
}

# Call the function
acf_pacf_plot(Global_Temperature_data_TS, 
              acf_title = "ACF of global temperature anomalies series", 
              pacf_title = "PACF of global temperature anomalies series", title_size = 0.8)

# Create function to perform adf test 
adf_test <- function(data) {
  adf.test(data)
}

# Call the function
adf_test(Global_Temperature_data_TS)

# Create function to perform Shapiro-Wilk test 
shapiroWilk_test <- function(data) {
  shapiro.test(data)
}

# Call the function
shapiroWilk_test(Global_Temperature_data_TS)

# Create function to generate QQ plot
qqPlot <- function(data, title, title_size) {
  qqnorm(data, main = title, cex.main = title_size)
  qqline(data, col = 2)
}

# Call the function
qqPlot(Global_Temperature_data_TS, title = "QQ plot of global land temperature anomalies data", title_size = 1)

# Performing Box-Cox transformation and plotting it
options(warn=-1)
BC = BoxCox.ar(Global_Temperature_data_TS+0.5)
title(main = "Log-likelihood versus the values of lambda for global land temperature anomalies data", cex.main = 0.9)

# Values of the first and third vertical lines
BC$ci

# To find lambda value of middle vertical line which is the optimal value
lambda <- BC$lambda[which(max(BC$loglike) == BC$loglike)] 
lambda

# Apply Box-Cox transformation
BC.Global_Temperature_data_TS = (Global_Temperature_data_TS^lambda-1)/lambda

# Implement log transformation on the raw data
log_data <- log(Global_Temperature_data_TS+0.5)

# Plot time series of log transformed data using timeseries_plot function
timeseries_plot(log_data, title = "Time series of log transformed global land temperature anomalies (°C)", ylabel='Global land temperature anomalies (°C)', xlabel="Time (in Years)", size=1)

# Apply first differencing to log transformed data to try and achieve a flat mean level
diff_log_data = diff(log_data, differences=1)

# Plot time series of first difference, log transformed data
timeseries_plot(diff_log_data, title = "Time series of first difference global land temperature anomalies data (°C)", ylabel='First difference global land temperature anomalies (°C)', xlabel="Time (in Years)", size=1)

# Generate QQ plot to check normality on first difference, log transformed data using the function
qqPlot(diff_log_data, title = "QQ plot of first difference global land temperature anomalies data", title_size = 1)

# Perform adf test on first difference, log transformed data
adf_test(diff_log_data) # Dickey-Fuller = -7.1324, Lag order = 5, p-value = 0.01

# Create function to perform Phillips Perron (pp) test
pp_test <- function(data){
  pp.test(data)
}

# Call the function to perform pp test on first difference log transformed data
pp_test(diff_log_data)

# Produce ACF and PACF plots for first differenced, log transformed data
acf_pacf_plot(diff_log_data, acf_title = "ACF of first difference global land temperature anomalies data", pacf_title = "PACF of first difference global land temperature anomalies data", title_size= 0.61)

# Create a function to perform eacf to search for potential models with max limit of 8
eacf(diff_log_data, ar.max = 8, ma.max = 8)

# Create a BIC table to help with specifying the final set of possible models.
res = armasubsets(y=diff_log_data,nar=5,nma=5,y.name='p',ar.method='ols')
plot(res)

# Fit the possible models into both CSS and ML methods, using raw data and lambda value (1) attained from box-cox transformation
# ARIMA(3,1,1)
model_311_css <- Arima(Global_Temperature_data_TS, order=c(3,1,1), method='CSS', lambda=lambda)
lmtest::coeftest(model_311_css)

model_311_ml <- Arima(Global_Temperature_data_TS, order=c(3,1,1), method='ML', lambda=lambda)
coeftest(model_311_ml)

# ARIMA(3,1,2)
model_312_css <- Arima(Global_Temperature_data_TS, order=c(3,1,2), method='CSS', lambda=lambda)
lmtest::coeftest(model_312_css)

model_312_ml <- Arima(Global_Temperature_data_TS, order=c(3,1,2), method='ML', lambda=lambda)
coeftest(model_312_ml)

# ARIMA(3,1,3)
model_313_css <- Arima(Global_Temperature_data_TS, order=c(3,1,3), method='CSS', lambda=lambda)
lmtest::coeftest(model_313_css)

model_313_ml <- Arima(Global_Temperature_data_TS, order=c(3,1,3), method='ML', lambda=lambda)
coeftest(model_313_ml)

# ARIMA(2,1,1)
model_211_css <- Arima(Global_Temperature_data_TS, order=c(2,1,1), method='CSS', lambda=lambda)
lmtest::coeftest(model_211_css)

model_211_ml <- Arima(Global_Temperature_data_TS, order=c(2,1,1), method='ML', lambda=lambda)
coeftest(model_211_ml)

# ARIMA(2,1,2)
model_212_css <- Arima(Global_Temperature_data_TS, order=c(2,1,2), method='CSS', lambda=lambda)
lmtest::coeftest(model_212_css)

model_212_ml <- Arima(Global_Temperature_data_TS, order=c(2,1,2), method='ML', lambda=lambda)
coeftest(model_212_ml)

# ARIMA(1,1,2)
model_112_css <- Arima(Global_Temperature_data_TS, order=c(1,1,2), method='CSS', lambda=lambda)
lmtest::coeftest(model_112_css)

model_112_ml <- Arima(Global_Temperature_data_TS, order=c(1,1,2), method='ML', lambda=lambda)
coeftest(model_112_ml)

# ARIMA(1,1,3)
model_113_css <- Arima(Global_Temperature_data_TS, order=c(1,1,3), method='CSS', lambda=lambda)
lmtest::coeftest(model_113_css)

model_113_ml <- Arima(Global_Temperature_data_TS, order=c(1,1,3), method='ML', lambda=lambda)
coeftest(model_113_ml)

# Function to sort the models from best to worse/ in ascending order
sort.score <- function(x, score = c("bic", "aic")){
  if (score == "aic"){
    x[with(x, order(AIC)),]
  } else if (score == "bic") {
    x[with(x, order(BIC)),]
  } else {
    warning('score = "x" only accepts valid arguments ("aic","bic")')
  }
}

# Producing sorted AIC scores using sort.score function to sort the models from best to worse
sort.score(AIC(model_311_ml, model_312_ml, model_313_ml, model_211_ml, model_212_ml, model_112_ml, model_113_ml), score = "aic")

# Producing BIC scores and sorting the models in ascending order
sort.score(BIC(model_311_ml, model_312_ml, model_313_ml, model_211_ml, model_212_ml, model_112_ml, model_113_ml), score = "bic")

# Getting the calculated measures from each model to put into a table
Smodel_311_css <- accuracy(model_311_css)[1:7] #[c(1:3, 6:7)] to exclude MPE and MAPE
Smodel_312_css <- accuracy(model_312_css)[1:7]
Smodel_313_css <- accuracy(model_313_css)[1:7]
Smodel_211_css <- accuracy(model_211_css)[1:7]
Smodel_212_css <- accuracy(model_212_css)[1:7]
Smodel_112_css <- accuracy(model_112_css)[1:7]
Smodel_113_css <- accuracy(model_113_css)[1:7]

# Joining multiple rows (model accuracy results) into one singular data frame
df.Smodels <- data.frame(rbind(Smodel_311_css, Smodel_312_css, Smodel_313_css, Smodel_211_css, Smodel_212_css, Smodel_112_css, Smodel_113_css))

# Creating the data frame's column and row names
colnames(df.Smodels) <- c("ME", "RMSE", "MAE", "MPE", "MAPE", "MASE", "ACF1")
rownames(df.Smodels) <- c("ARIMA(3,1,1)","ARIMA(3,1,2)", "ARIMA(3,1,3)", "ARIMA(2,1,1)", "ARIMA(2,1,2)", "ARIMA(1,1,2)", "ARIMA(1,1,3)")

# Round up the results to the nearest 3 decimal place
round(df.Smodels,  digits = 3)

# Check the number of observations that are valued 0
num_of_zeros <- sum(Global_Temperature_data_TS == 0)
print(num_of_zeros)

# Identifying the location of the zero value(s)
which(Global_Temperature_data_TS == 0)

# Exclude all observation(s) that are valued 0
Global_Temperature_data_TS <- Global_Temperature_data_TS[Global_Temperature_data_TS != 0]

# Fitting on data without observations containing zero values
# ARIMA(3,1,1)
model_311_css <- Arima(Global_Temperature_data_TS, order=c(3,1,1), method='CSS', lambda=lambda)
lmtest::coeftest(model_311_css)

model_311_ml <- Arima(Global_Temperature_data_TS, order=c(3,1,1), method='ML', lambda=lambda)
coeftest(model_311_ml)

# ARIMA(3,1,2)
model_312_css <- Arima(Global_Temperature_data_TS, order=c(3,1,2), method='CSS', lambda=lambda)
lmtest::coeftest(model_312_css)

model_312_ml <- Arima(Global_Temperature_data_TS, order=c(3,1,2), method='ML', lambda=lambda)
coeftest(model_312_ml)

# ARIMA(3,1,3)
model_313_css <- Arima(Global_Temperature_data_TS, order=c(3,1,3), method='CSS', lambda=lambda)
lmtest::coeftest(model_313_css)

model_313_ml <- Arima(Global_Temperature_data_TS, order=c(3,1,3), method='ML', lambda=lambda)
coeftest(model_313_ml)

# ARIMA(2,1,1)
model_211_css <- Arima(Global_Temperature_data_TS, order=c(2,1,1), method='CSS', lambda=lambda)
lmtest::coeftest(model_211_css)

model_211_ml <- Arima(Global_Temperature_data_TS, order=c(2,1,1), method='ML', lambda=lambda)
coeftest(model_211_ml)

# ARIMA(2,1,2)
model_212_css <- Arima(Global_Temperature_data_TS, order=c(2,1,2), method='CSS', lambda=lambda)
lmtest::coeftest(model_212_css)

model_212_ml <- Arima(Global_Temperature_data_TS, order=c(2,1,2), method='ML', lambda=lambda)
coeftest(model_212_ml)

# ARIMA(1,1,2)
model_112_css <- Arima(Global_Temperature_data_TS, order=c(1,1,2), method='CSS', lambda=lambda)
lmtest::coeftest(model_112_css)

model_112_ml <- Arima(Global_Temperature_data_TS, order=c(1,1,2), method='ML', lambda=lambda)
coeftest(model_112_ml)

# ARIMA(1,1,3)
model_113_css <- Arima(Global_Temperature_data_TS, order=c(1,1,3), method='CSS', lambda=lambda)
lmtest::coeftest(model_113_css)

model_113_ml <- Arima(Global_Temperature_data_TS, order=c(1,1,3), method='ML', lambda=lambda)
coeftest(model_113_ml)

# Computing accuracy scores for data without zero values
# Getting the calculated metrics from each model to put into a table
Smodel_311_css <- accuracy(model_311_css)[1:7] #[c(1:3, 6:7)] to exclude MPE and MAPE
Smodel_312_css <- accuracy(model_312_css)[1:7]
Smodel_313_css <- accuracy(model_313_css)[1:7]
Smodel_211_css <- accuracy(model_211_css)[1:7]
Smodel_212_css <- accuracy(model_212_css)[1:7]
Smodel_112_css <- accuracy(model_112_css)[1:7]
Smodel_113_css <- accuracy(model_113_css)[1:7]

# Combine all seven rows (model accuracy results) into a singular data frame
df.Smodels <- data.frame(rbind(Smodel_311_css, Smodel_312_css, Smodel_313_css, Smodel_211_css, Smodel_212_css, Smodel_112_css, Smodel_113_css))

# Creating the data frame's column and row names
colnames(df.Smodels) <- c("ME", "RMSE", "MAE", "MPE", "MAPE", "MASE", "ACF1")
rownames(df.Smodels) <- c("ARIMA(3,1,1)","ARIMA(3,1,2)", "ARIMA(3,1,3)", "ARIMA(2,1,1)", "ARIMA(2,1,2)", "ARIMA(1,1,2)", "ARIMA(1,1,3)")

# Round up the results to the nearest 3 decimal place
round(df.Smodels,  digits = 3)

# Test for overfitting
# Increase p by 1, ARIMA(3,1,1)
model_311_css <- Arima(Global_Temperature_data_TS, order=c(3,1,1), method='CSS', lambda=lambda)
lmtest::coeftest(model_311_css)

model_311_ml <- Arima(Global_Temperature_data_TS, order=c(3,1,1), method='ML', lambda=lambda)
coeftest(model_311_ml)

# Increase q by 1, ARIMA(2,1,2)
model_212_css <- Arima(Global_Temperature_data_TS, order=c(2,1,2), method='CSS', lambda=lambda)
lmtest::coeftest(model_212_css)

model_212_ml <- Arima(Global_Temperature_data_TS, order=c(2,1,2), method='ML', lambda=lambda)
coeftest(model_212_ml)

10 References

[1] H. Demirhan, “Module 4 - Models for Nonstationary Time Series”, RMIT. [Online]. Available: https://rmit.instructure.com/courses/124176/files/37260017?module_item_id=6132785

[2] H. Demirhan, “Module 3 - Models for Stationary Time Series”, RMIT. [Online]. Available: https://rmit.instructure.com/courses/124176/files/37162689?module_item_id=6125715

[3] H. Demirhan, “Module 6 - Parameter Estimation”, RMIT. [Online]. Available: https://rmit.instructure.com/courses/124176/files/37562403?module_item_id=6150718

[4] S. Kolassa, “Shortcomings of the MAPE”, StackExchange, (Apr. 18, 2024). [Online]. Available: https://stats.stackexchange.com/questions/299712/what-are-the-shortcomings-of-the-mean-absolute-percentage-error-mape

[5] H. Demirhan, “solutionModule6Tasks_2024_updated.R”, RMIT, (Apr. 22, 2024). [Online]. Available: https://rmit.instructure.com/courses/124176/files/37748470?module_item_id=6181993