Assessing the time series at hand and applying necessary transformations.
Let’s have a look at the time series
The time series seems to be variance stationary. However we will try out the transformations to confirm this.
We cannot apply natural log or box-cox transformation directly because our data has negative values. Hence we will be adding a constant value of 0.5 to make all the Temperature Anomalies positive. This will not affect the time series, because the very nature of the temperature anomaly value is that it’s a difference from a fixed point which was chosen from the middle values. Adding a constant value will just shift the time series up above negative values, but the nature of the time series will be intact.
##
## Augmented Dickey-Fuller Test
##
## data: ann_temp_log$TA_log
## Dickey-Fuller = -1.7972, Lag order = 5, p-value = 0.6608
## alternative hypothesis: stationary
##
## Augmented Dickey-Fuller Test
##
## data: ann_temp_boxcox$TA_boxcox
## Dickey-Fuller = -1.7366, Lag order = 5, p-value = 0.6861
## alternative hypothesis: stationary
Doing these transformations does not change our time series. This can confirm our initial hypothesis that time series is variance stationary
ADF test on undifferenced data
##
## Augmented Dickey-Fuller Test
##
## data: annual_temp$Temperature_Anomaly
## Dickey-Fuller = -1.6814, Lag order = 5, p-value = 0.709
## alternative hypothesis: stationary
ADF test on differenced data
##
## Augmented Dickey-Fuller Test
##
## data: annual_temp_diff$TA_diff
## Dickey-Fuller = -6.7862, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
The time series has been made mean stationary
Since I have taken yearly data there is no reason to expect any seasonality in the time series. Had I taken the monthly time series, we could have seen seasonality.
The time series appears to be an AR3 process. It could also be a MA2 process.
Checking BIC for different models
## df BIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0)) 2 91.44968
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1)) 3 -20.17431
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0)) 1 -213.43017
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1)) 2 -220.63216
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0)) 3 -204.16828
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1)) 4 -209.81140
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0)) 2 -214.61432
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1)) 3 -220.60552
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1)) 4 -217.51630
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1)) 5 -216.96070
According to BIC values, the best model has an order = c(0, 1, 1))
Now checking the AIC values
## df AIC
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 0)) 2 85.60972
## arima(annual_temp$Temperature_Anomaly, order = c(0, 0, 1)) 3 -28.93425
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 0)) 1 -216.34282
## arima(annual_temp$Temperature_Anomaly, order = c(0, 1, 1)) 2 -226.45747
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 0)) 3 -212.92823
## arima(annual_temp$Temperature_Anomaly, order = c(1, 0, 1)) 4 -221.49132
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 0)) 2 -220.43963
## arima(annual_temp$Temperature_Anomaly, order = c(1, 1, 1)) 3 -229.34348
## arima(annual_temp$Temperature_Anomaly, order = c(2, 1, 1)) 4 -229.16692
## arima(annual_temp$Temperature_Anomaly, order = c(3, 1, 1)) 5 -231.52397
But according to AIC ,the best model has an order of (3,1,1)
BIC penalizes complex models. That is why (3,1,1) has worse BIC value but a good AIC value.
To decide which one to choose,we can do an auto.arima test.
## Series: annual_temp$Temperature_Anomaly
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## -0.9142 -0.4568 -0.3452 0.6427
## s.e. 0.3064 0.1238 0.0813 0.3347
##
## sigma^2 = 0.01018: log likelihood = 120.76
## AIC=-231.52 AICc=-231.06 BIC=-216.96
auto.arima suggests (3,1,1) as the best model, so from here this model will be used.
annual_temp_reorder <- annual_temp %>%
arrange(Year)
best_model <- arima(annual_temp_reorder$Temperature_Anomaly, order = c(3,1,1))
resi <- best_model$residuals
predi <- annual_temp$Temperature_Anomaly - resi
forecast::checkresiduals(best_model)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,1)
## Q* = 4.9832, df = 6, p-value = 0.546
##
## Model df: 4. Total lags used: 10
##
## Box-Ljung test
##
## data: resi
## X-squared = 14.282, df = 20, p-value = 0.8159
In-sample root mean squared error of the model based on the residuals.
## [1] 0.0990563
5 time-period forecast for the model
prediction = predict(best_model, n.ahead = 5)
prediction
## $pred
## Time Series:
## Start = 138
## End = 142
## Frequency = 1
## [1] 0.9034629 0.8828516 0.8997931 0.9235973 0.9012220
##
## $se
## Time Series:
## Start = 138
## End = 142
## Frequency = 1
## [1] 0.09941981 0.12298671 0.13340248 0.14199221 0.15840061
pred_data = data.frame(
pred=prediction,
Year = (max(annual_temp$Year)+1):(max(annual_temp$Year)+1+4)
)
pred_data
## pred.pred pred.se Year
## 1 0.9034629 0.09941981 2017
## 2 0.8828516 0.12298671 2018
## 3 0.8997931 0.13340248 2019
## 4 0.9235973 0.14199221 2020
## 5 0.9012220 0.15840061 2021
pred_merge = annual_temp %>%
full_join(pred_data) %>%
mutate(
pred_high = (pred.pred+2*pred.se),
pred_low = (pred.pred - 2*pred.se),
pred.pred = pred.pred,
error = pred.pred - Temperature_Anomaly
)
pred_merge %>%
ggplot()+
geom_line(aes(Year,Temperature_Anomaly))+
geom_line(aes(Year,pred.pred),color='blue')+
geom_ribbon(aes(x=Year,ymin=pred_low,ymax = pred_high),fill='blue',alpha=0.2) +
ylab("Temperature Anomaly") +
xlab("Year")
The forecast is a little off. Requires additional debugging!
Note: I have included the whole code in this section in the hope that it may help in figuring out what went wrong!