Assessment and Transformations

Assessing the time series at hand and applying necessary transformations.

Initial Assessment

Let’s have a look at the time series

The time series seems to be variance stationary. However we will try out the transformations to confirm this.

nlog and Boxcox transformations

We cannot apply natural log or box-cox transformation directly because our data has negative values. Hence we will be adding a constant value of 0.5 to make all the Temperature Anomalies positive. This will not affect the time series, because the very nature of the temperature anomaly value is that it’s a difference from a fixed point which was chosen from the middle values. Adding a constant value will just shift the time series up above negative values, but the nature of the time series will be intact.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ann_temp_log$TA_log
## Dickey-Fuller = -1.7972, Lag order = 5, p-value = 0.6608
## alternative hypothesis: stationary

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ann_temp_boxcox$TA_boxcox
## Dickey-Fuller = -1.7366, Lag order = 5, p-value = 0.6861
## alternative hypothesis: stationary

Doing these transformations does not change our time series. This can confirm our initial hypothesis that time series is variance stationary

Testing for mean stationarity

ADF test on undifferenced data

## 
##  Augmented Dickey-Fuller Test
## 
## data:  annual_temp$Temperature_Anomaly
## Dickey-Fuller = -1.6814, Lag order = 5, p-value = 0.709
## alternative hypothesis: stationary

ADF test on differenced data

## 
##  Augmented Dickey-Fuller Test
## 
## data:  annual_temp_diff$TA_diff
## Dickey-Fuller = -6.7862, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

The time series has been made mean stationary

Seasonality

Since I have taken yearly data there is no reason to expect any seasonality in the time series. Had I taken the monthly time series, we could have seen seasonality.

Plotting ACF/PACF

The time series appears to be an AR3 process. It could also be a MA2 process.

Arima Modelling and Forecasting

Best Model

Checking BIC for different models

##                                                            df        BIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0))  2   91.44968
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1))  3  -20.17431
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0))  1 -213.43017
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1))  2 -220.63216
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0))  3 -204.16828
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1))  4 -209.81140
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0))  2 -214.61432
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1))  3 -220.60552
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1))  4 -217.51630
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1))  5 -216.96070

According to BIC values, the best model has an order = c(0, 1, 1))

Now checking the AIC values

##                                                            df        AIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0))  2   85.60972
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1))  3  -28.93425
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0))  1 -216.34282
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1))  2 -226.45747
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0))  3 -212.92823
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1))  4 -221.49132
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0))  2 -220.43963
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1))  3 -229.34348
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1))  4 -229.16692
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1))  5 -231.52397

But according to AIC ,the best model has an order of (3,1,1)

BIC penalizes complex models. That is why (3,1,1) has worse BIC value but a good AIC value.

To decide which one to choose,we can do an auto.arima test.

## Series: annual_temp$Temperature_Anomaly 
## ARIMA(3,1,1) 
## 
## Coefficients:
##           ar1      ar2      ar3     ma1
##       -0.9142  -0.4568  -0.3452  0.6427
## s.e.   0.3064   0.1238   0.0813  0.3347
## 
## sigma^2 = 0.01018:  log likelihood = 120.76
## AIC=-231.52   AICc=-231.06   BIC=-216.96

auto.arima suggests (3,1,1) as the best model, so from here this model will be used.

Plotting the actual and predicted values

annual_temp_reorder <- annual_temp %>% 
  arrange(Year)
best_model <- arima(annual_temp_reorder$Temperature_Anomaly, order = c(3,1,1))

resi <- best_model$residuals
predi <-  annual_temp$Temperature_Anomaly - resi

Box-Ljung test

forecast::checkresiduals(best_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,1)
## Q* = 4.9832, df = 6, p-value = 0.546
## 
## Model df: 4.   Total lags used: 10

## 
##  Box-Ljung test
## 
## data:  resi
## X-squared = 14.282, df = 20, p-value = 0.8159

In-sample root mean squared error of the model based on the residuals.

## [1] 0.0990563

Forecast

5 time-period forecast for the model

prediction = predict(best_model, n.ahead = 5)
prediction
## $pred
## Time Series:
## Start = 138 
## End = 142 
## Frequency = 1 
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220
## 
## $se
## Time Series:
## Start = 138 
## End = 142 
## Frequency = 1 
## [1] 0.09941981 0.12298671 0.13340248 0.14199221 0.15840061
pred_data = data.frame(
  pred=prediction,
  Year = (max(annual_temp$Year)+1):(max(annual_temp$Year)+1+4)
)
pred_data
##   pred.pred    pred.se Year
## 1 0.9034629 0.09941981 2017
## 2 0.8828516 0.12298671 2018
## 3 0.8997931 0.13340248 2019
## 4 0.9235973 0.14199221 2020
## 5 0.9012220 0.15840061 2021
pred_merge = annual_temp %>% 
  full_join(pred_data) %>%
  mutate(
    pred_high = (pred.pred+2*pred.se),
    pred_low = (pred.pred - 2*pred.se),
    pred.pred = pred.pred,
    error = pred.pred - Temperature_Anomaly
    )
pred_merge %>% 
  ggplot()+
  geom_line(aes(Year,Temperature_Anomaly))+
  geom_line(aes(Year,pred.pred),color='blue')+
  geom_ribbon(aes(x=Year,ymin=pred_low,ymax = pred_high),fill='blue',alpha=0.2) +
  ylab("Temperature Anomaly") +
  xlab("Year")

The forecast is a little off. Requires additional debugging!

Note: I have included the whole code in this section in the hope that it may help in figuring out what went wrong!