INTRODUCTION

The dataset used in this project describes the minimum daily temperature over 10 years (19981 and 1990) in Melbourne, Australia. The units are in degree celsius and there are 3650 observations. The source of the data was created by Australian Bureau of meteorology and taken from kaggle. We will analyze this dataset to capture the trend, applied transformation on the data where required, plot the ACF & PACF’s and coefficient tests on suitable models, and forecast for the next ten years. Data is available in https://www.kaggle.com/paulbrabban/daily-minimum-temperatures-in-melbourne

Preprocessing

Load required packages

# loading relevant packages into rstudio 
library(readr)
library(dplyr)
library(tidyr)
library(TSA)
library(MTS)
library(tseries)
library(lmtest)
library(forecast)
library(fUnitRoots)
library(FitAR)
library(astsa)
library(urca)

Import Dataset and Convert To Time Series

# reading the dataset  into R

Mel_Min_temp <- read.csv("Minimum Daily Temperature.csv", header = TRUE)


head(Mel_Min_temp,5)
##   ï..Year Month Day Minimum.Daily.Temperature.Degree..Degree.C..
## 1    1981     1   1                                         20.7
## 2    1981     1   2                                         17.9
## 3    1981     1   3                                         18.8
## 4    1981     1   4                                         14.6
## 5    1981     1   5                                         15.8
# converting the dataframe to time series object

min_TempTS <- ts(Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..,
start=c(1981, 1), end=c(1990, 365), frequency = 365)

#checking the class of the dataset 
class(min_TempTS)
## [1] "ts"

Descriptive Analysis

#plotting time Series data 

plot(min_TempTS , col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Plot1: TS plot of min temp in Melbourne.")

Plot 1: shows the changes in daily minimum temperature in melbourne from 1981 to 1990. We can conclude from the plot that there is a strong cyclic behaviour and there is no apparent trend in the data over this period.

converting daily minimum temperature to a monthly mean teamperature to better see seasonality

#converting daily minimum temperature to a monthly mean teamperature to better see seasonality 

monthlyMinTemp <- aggregate(Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..~Month+Mel_Min_temp$ï..Year,
Mel_Min_temp, mean )

head(monthlyMinTemp,5)
##   Month Mel_Min_temp$ï..Year
## 1     1                 1981
## 2     2                 1981
## 3     3                 1981
## 4     4                 1981
## 5     5                 1981
##   Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..
## 1                                                 17.712903
## 2                                                 17.678571
## 3                                                 13.500000
## 4                                                 12.356667
## 5                                                  9.490323

converting monthly temperature data into times series data

# converting monthly temperature data  into times series  data

monthlyMinTempTS <- ts(monthlyMinTemp$`Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..`,
start=c(1981, 1), end=c(1990, 12), frequency = 12)

monthlyMinTempTS
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1981 17.712903 17.678571 13.500000 12.356667  9.490323  7.306667  7.577419
## 1982 16.567742 15.921429 14.935484 11.470000  9.583871  5.606667  4.641935
## 1983 13.180645 16.807143 15.777419 10.596667 10.116129  6.600000  6.890323
## 1984 14.309677 14.944828 12.867742 10.750000  8.112903  7.730000  5.987097
## 1985 14.219355 14.032143 15.877419 12.976667  9.419355  7.073333  6.135484
## 1986 13.825806 14.196429 14.690323 11.653333 10.274194  7.526667  6.961290
## 1987 13.235484 13.889286 12.619355 12.250000  9.806452  8.273333  5.983871
## 1988 16.493548 14.524138 14.748387 12.833333 11.387097  8.386667  8.232258
## 1989 15.180645 16.371429 15.803226 12.563333 10.725806  6.560000  6.332258
## 1990 15.577419 15.417857 14.835484 13.433333  9.748387  7.720000  8.183871
##            Aug       Sep       Oct       Nov       Dec
## 1981  7.238710 10.143333 10.087097 11.890000 13.680645
## 1982  7.903226  7.280000  9.545161 12.486667 13.754839
## 1983  8.706452  9.210000 10.312903 11.993333 14.396774
## 1984  8.696774  8.046667 10.632258 12.623333 12.643333
## 1985  7.635484  8.803333 10.490323 13.073333 14.109677
## 1986  7.387097  8.933333  9.683871 11.793333 12.935484
## 1987  8.022581  9.810000 10.238710 13.150000 13.254839
## 1988  8.725806  9.883333 10.890323 12.253333 15.436667
## 1989  6.770968  8.486667  9.867742 12.876667 13.951613
## 1990  7.825806  9.166667 11.345161 12.656667 14.367742

Time Series plot of monthly min temp in Melbourne

plot(monthlyMinTempTS, col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Plot 2:TS plot of monthly min temp in Melbourne.")

After converting the daily minimum temperature to a monthly mean max temperature, we can clearly see a strong seasonality pattern. Further Analysis will need to be performed.

Check correlations of changes in minimum daily temperature

# Check correlations of changes in minimum daily temperature 

y = min_TempTS # read the daily min temp series into y 

x = zlag(min_TempTS) # generate first lag of the min temp series
index = 2:length(x) # Create an index to get rid of the first NA value in x

cor(y[index],x[index]) # Calculate correlation between numerical values in x and y
## [1] 0.7748702

Scatter plot of min temperature series and its first lag

plot(y[index],x[index],ylab='Min temperature series', xlab='The first lag of min temperature series', main = 'Plot 3: Scatter plot of min temperature series and its first lag')

As Scatter plot shows, the correlation value 0.774 proved that there is strong positive linear relationship between previous day’s min temperature on the next day’s min temperature

Investigating correlation of MonthlyMinTempTS

we observe a high correlation between temperatures of succeeding months and we couldnt be able to locate seasonality from this plot

bb = monthlyMinTempTS
aa = zlag(monthlyMinTempTS)
index = 2: length(aa)
cor(bb[index], aa[index])
## [1] 0.8020674

Scatter plot of average temperature in consecutive months

As expected, we could see a good amount of correlation between succeeding months

plot(y = monthlyMinTempTS, x = zlag(monthlyMinTempTS), ylab = 'Min temp', xlab = 'Previous Month temp',main = 'Plot 4:scatter plot of average temperature in consecutive months')

Trend analysis using regression approach

Least squares regression to check if there is a trend with daily data

day.= season(min_TempTS) 
model1_Daily = lm(min_TempTS~day.) 

#output hidden due to large output 

Estimate seasonal trend Model2-> Removing intercept term for monthly temperature data

month.=season(monthlyMinTempTS) # period added to improve table display and this line sets up indicators
model2=lm(monthlyMinTempTS~month.-1) # -1 removes the intercept term
summary(model2)
## 
## Call:
## lm(formula = monthlyMinTempTS ~ month. - 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.05065 -0.62012  0.03393  0.55556  2.68258 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## month.January    15.0303     0.3092   48.60   <2e-16 ***
## month.February   15.3783     0.3092   49.73   <2e-16 ***
## month.March      14.5655     0.3092   47.10   <2e-16 ***
## month.April      12.0883     0.3092   39.09   <2e-16 ***
## month.May         9.8665     0.3092   31.91   <2e-16 ***
## month.June        7.2783     0.3092   23.54   <2e-16 ***
## month.July        6.6926     0.3092   21.64   <2e-16 ***
## month.August      7.8913     0.3092   25.52   <2e-16 ***
## month.September   8.9763     0.3092   29.03   <2e-16 ***
## month.October    10.3094     0.3092   33.34   <2e-16 ***
## month.November   12.4797     0.3092   40.36   <2e-16 ***
## month.December   13.8532     0.3092   44.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9779 on 108 degrees of freedom
## Multiple R-squared:  0.9936, Adjusted R-squared:  0.9929 
## F-statistic:  1405 on 12 and 108 DF,  p-value: < 2.2e-16

All of the parameters corresponding to months are statistically significant at 5% level

Including intercept parameter to model 2

model3=lm(monthlyMinTempTS~month.) # remove -1 to include the intercept term in the model
summary(model3)
## 
## Call:
## lm(formula = monthlyMinTempTS ~ month.)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.05065 -0.62012  0.03393  0.55556  2.68258 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      15.0303     0.3092  48.605  < 2e-16 ***
## month.February    0.3480     0.4373   0.796  0.42792    
## month.March      -0.4648     0.4373  -1.063  0.29019    
## month.April      -2.9420     0.4373  -6.727 8.56e-10 ***
## month.May        -5.1639     0.4373 -11.808  < 2e-16 ***
## month.June       -7.7520     0.4373 -17.726  < 2e-16 ***
## month.July       -8.3377     0.4373 -19.065  < 2e-16 ***
## month.August     -7.1390     0.4373 -16.324  < 2e-16 ***
## month.September  -6.0540     0.4373 -13.843  < 2e-16 ***
## month.October    -4.7210     0.4373 -10.795  < 2e-16 ***
## month.November   -2.5507     0.4373  -5.832 5.77e-08 ***
## month.December   -1.1772     0.4373  -2.692  0.00824 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9779 on 108 degrees of freedom
## Multiple R-squared:  0.912,  Adjusted R-squared:  0.9031 
## F-statistic: 101.8 on 11 and 108 DF,  p-value: < 2.2e-16

All of the differences between January and the other months are statistically significant at 5% level in both models model2 and model3

Model specification

Plot the Time series data of Monthly minimum temperatures first

plot(monthlyMinTempTS, col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Plot 5: Time series plot of monthly minimum temperature in Melbourne.")

We only see a seasonal trend and there is no other trend associated with the data

Plotting ACF with max 36 lags

Because we have strong correlation at lags 12,24,36 and so on we consider the existence of seasonal auto correlation relationship.There is also a substantial correlation that needs to be considered.

par(mfrow=c(1,2))
acf(monthlyMinTempTS,lag.max=36) 

PACF

Pacf(monthlyMinTempTS,lag.max=36) 

Seasonal differencing

Plot after taking seasonal differencing

Since we have a seasonal effect pattern in time series plot, we take a seasonal difference

plot(diff(monthlyMinTempTS,lag=12),xlab='Time',ylab='Seasonal Difference of CO2 data',main= "Plot 6. TS plot of seasonal differences of Monthly temp levels.")

ACF of the seasonal differences of monthly Temperature levels

acf(as.vector(diff(monthlyMinTempTS,lag=12)),lag.max=36,ci.type='ma',main="Plot 7-ACF of seasonal differences of monthly temp levels")

PACF of the seasonal differences of monthly Temperature levels

pacf(as.vector(diff(monthlyMinTempTS,lag=12)),lag.max=36,ci.type='ma',main="Plot 8-PACF of seasonal differences of monthly temp levels")

Residuals approach

Specification of seasonal component When D=1

Fit m1 = SARIMA(0,0,0)x(0,1,0)12 model and display time series plots of the residuals

m1.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,0,0),seasonal=list(order=c(0,1,0), period=12))
res.m1 = residuals(m1.monthlyMinTempTS); 
plot(res.m1,xlab='Time',ylab='Residuals',main="Plot 9-m1-Time series plot of the residuals")

ACF and PACF of residuals-m1

par(mfrow=c(1,2))
acf(res.m1, lag.max = 36, main = "Plot 10-m1-ACF of residuals")

PACF

pacf(res.m1, lag.max = 36, main = "Plot 11-m1-PACF")

From the above plots, there is no pattern implying the existence of a seasonal trend .But We still have one significant correlation at the first seasonal lag in both ACF and PACF.The significant spike at lag 12 in the ACF suggests a seasonal MA(1) component. So we start with SARMA(0,1) and see if we get rid of the effect of the seasonal component on the residuals

m2-(0,0,0) * (0,1,1)s=12

Now, we will fit SARIMA(0,0,0)x(0,1,1)12 model and try to see if we get rid of the effect of the seasonal component on the residuals

m2.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,0,0),seasonal=list(order=c(0,1,1), period=12))
res.m2 = residuals(m2.monthlyMinTempTS); 
plot(res.m2,xlab='Time',ylab='Residuals',main="Plot 12-m2-Time series plot of the residuals")

ACF and PACF of residuals-m3

par(mfrow=c(1,2))
acf(res.m2, lag.max = 36, main = "Plot13-m2-ACF of residuals")
pacf(res.m2, lag.max = 36, main = "Plot14-m2-PACF of residuals")

At the seasonal lags, especially at the first seasonal lag in ACF, the autocorrelations became insignificant or slightly significant after adding the seasonal component. We will use the resulting ACF and PACF plots to specify the orders of ARMA component as there is no highly significant correlation at the lags s,2s,3s,….

The last ACF-PACF pair has significant pikes at the first lags implying an ARMA(1,1) model or due to the (insignificant) decrease of correlations in PACF, we would specify ARMA(0,1) for the residuals.

So we are settled with (0,1,1) for seasonal component as (P,D,Q).

The data may follow an ARIMA(p ,d ,0) model if the ACF and PACF plots of the differenced data show the following patterns:

  • the ACF is exponentially decaying or sinusoidal;
  • there is a significant spike at lag p in the PACF, but none beyond lag p .

(For the non seasoning component)Also ,there is a significant spike at lag 2 in the PACF, but none beyond lag 2 which suggests a (2,d,0) model.The pattern in the first two spikes is what we would expect from an ARIMA(2,0,0), as the PACF tends to decrease. So in this case, the ACF and PACF lead us to think an ARIMA(2,0,0) model might be appropriate

By getting p as 2 , we can add MA component upto order 1 maximum

Check stationarity by doing ADF test to identify if we need ordinary differencing or not

adf.test(res.m2, alternative = c("stationary"))
## Warning in adf.test(res.m2, alternative = c("stationary")): p-value smaller than
## printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  res.m2
## Dickey-Fuller = -4.3317, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

The ADF unit root test shows a p-value of less than 0.05 which means the null hypothesis of non stationarity is now rejected and the data is proved to be stationary .Therefore we do not need differencing and fix d as 0 d=0 IS FIXED

Now we set out to specify models for non seasonal ARIMA component We begin with (2,0,1) as mentioned earlier

m3-(2,0,1) * (0,1,1)

m3.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(2,0,1),seasonal=list(order=c(0,1,1), period=12))
res.m3 = residuals(m3.monthlyMinTempTS); 
plot(res.m3,xlab='Time',ylab='Residuals',main="m3-Time series plot of the residuals")

ACF and PACF of residuals-m3

par(mfrow=c(1,2))
acf(res.m3, lag.max = 36, main = "Plot15-m3-ACF of residuals")
pacf(res.m3, lag.max = 36, main = "Plot16-m3-PACF of residuals")

Looks almost like a white noise series from the above plots, but we cannot confirm it until we check for (2,0,0) as well since MA component can be upto order 1. It can be 0 as well

m4-(2,0,0) * (0,1,1)

m4.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(2,0,0),seasonal=list(order=c(0,1,1), period=12))
res.m4 = residuals(m4.monthlyMinTempTS); 
plot(res.m4,xlab='Time',ylab='Residuals',main="Plot 17-m4-Time series plot of the residuals")

ACF and PACF of residuals-m4

par(mfrow=c(1,2))
acf(res.m4, lag.max = 36, main = "Plot18-m4-ACF of residuals")
pacf(res.m4, lag.max = 36, main = "Plot19-m4-PACF of residuals")

**We can clearly see that even though m3 and m4 looks closer, the residuals of m4-> SARIMA(2,0,0)*(0,1,1)s=12 are closer to the white noise .So we can pretty much conclude with the orders p=2,d=0,q=0,P=0,D=1,Q=1,and s=12**

EACF of the previous model

The EACF suggests (0,1) and (1,1) as well for the non seasonal component

eacf(res.m4)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o o o o o o o o  o  o  o 
## 1 x o o o o o o o o o o  o  o  o 
## 2 x x o o o o o o o o o  o  o  o 
## 3 o o x o o o o o o o o  o  o  o 
## 4 x o o o o o o o o o o  o  o  o 
## 5 o o o x o o o o o o o  o  o  o 
## 6 x o o x o o o o o o o  o  o  o 
## 7 x x x x x o o o o o o  o  o  o
# **BIC**
res_m4_BIC=armasubsets(y=res.m4,nar=7,nma=7,y.name='test', ar.method='ols')
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 7 linear dependencies found
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : nvmax reduced to 7
plot(res_m4_BIC)
title(main = "Plot20-BIC table for monthly temp series.", line = 6)

  • So our estimated models are
  • (0,0,1)*(0,1,1)s=12
  • (1,0,1)*(0,1,1)s=12
  • (2,0,1)*(0,1,1)s=12
  • (2,0,0)*(0,1,1)s=12

Next we will fit these models

MODEL FITTING

PARAMETER ESTIMATION

  • As seen above ,our final candidates for parameter estimation are
  • (0,0,1)*(0,1,1)s=12
  • (1,0,1)*(0,1,1)s=12
  • (2,0,1)*(0,1,1)s=12
  • (2,0,0)*(0,1,1)s=12

**Fitting the SARIMA(0,0,1)*(0,1,1)S=12 model using ML method**

mf_001_ml = arima(monthlyMinTempTS,order=c(0,0,1),seasonal=list(order=c(0,1,1),
                                      period=12),method = "ML")
coeftest(mf_001_ml)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ma1   0.169651   0.085367  1.9873   0.04689 *  
## sma1 -0.802720   0.150343 -5.3393 9.332e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By ML method, MA(1) and SMA(1) looks to be significant

Fitting the SARIMA(1,0,1) * (0,1,1)S=12 model

mf_101_ml = arima(monthlyMinTempTS,order=c(1,0,1),seasonal=list(order=c(0,1,1),
                                      period=12),method = "ML")
coeftest(mf_101_ml)
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.79671    0.16339  4.8760 1.082e-06 ***
## ma1  -0.58510    0.21173 -2.7634  0.005721 ** 
## sma1 -0.80772    0.15120 -5.3420 9.192e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By ML method, AR(1),MA(1) and SMA(1) looks to be significant

Fitting the SARIMA(2,0,1) * (0,1,1)S=12 model

mf_201_ml = arima(monthlyMinTempTS,order=c(2,0,1),seasonal=list(order=c(0,1,1),
                                      period=12),method = "ML")
coeftest(mf_201_ml)
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.41314    0.38629  1.0695    0.2848    
## ar2   0.17411    0.14834  1.1737    0.2405    
## ma1  -0.23140    0.39513 -0.5856    0.5581    
## sma1 -0.78151    0.14788 -5.2848 1.258e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By ML method, MA(1),AR(1),AR(2) is insignificant and SMA(1) looks to be significant

Fitting the SARIMA(2,0,0) * (0,1,1)S=12 model

mf_200_ml = arima(monthlyMinTempTS,order=c(2,0,0),seasonal=list(order=c(0,1,1),
                                      period=12),method = "ML")
coeftest(mf_200_ml)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.197848   0.099944  1.9796   0.04775 *  
## ar2   0.225413   0.098794  2.2817   0.02251 *  
## sma1 -0.791129   0.150271 -5.2647 1.404e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By ML method, AR(1),AR(2) and SMA(1) looks to be significant

Model Selection

For model selection we will compute AIC and BIC values of our estimated moddels and select the best models with the best AIC score

sort.score <- function(x, score = c("bic", "aic")){
  if (score == "aic"){
    x[with(x, order(AIC)),]
  } else if (score == "bic") {
    x[with(x, order(BIC)),]
  } else {
    warning('score = "x" only accepts valid arguments ("aic","bic")')
  }
}

Sorting the AIC and BIC scores

sc.AIC=AIC(mf_001_ml, mf_101_ml, mf_201_ml, mf_200_ml)
sc.BIC=AIC(mf_001_ml, mf_101_ml, mf_201_ml, mf_200_ml, k = length(monthlyMinTempTS))

AIC values comparison

The SARIMA model (2,0,0)*(0,1,1) has the least AIC value and hence is considered to be the best model fit

sort.score(sc.AIC, score = "aic")
##           df      AIC
## mf_200_ml  4 324.7389
## mf_101_ml  4 325.3043
## mf_201_ml  5 326.3998
## mf_001_ml  3 329.6884
  • Because the standard error estimates for the SARIMA (2,0,0) * (0,1,1)s=12 model are relatively smaller than the corresponding parameter estimates, we think that the coefficient estimates are all highly significant

  • Because the SARIMA (2,0,0) * (0,1,1)s=12 model has the least AIC values, we declare this model as best fit model and to be used for forecasting

Over-fitting model: SARIMA(3,0,0) * (0,1,1)S=12

mf_300_ml = arima(monthlyMinTempTS,order=c(3,0,0),seasonal=list(order=c(0,1,1),
                                      period=12),method = "ML")
coeftest(mf_300_ml)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.177987   0.104243  1.7074   0.08774 .  
## ar2   0.215020   0.100001  2.1502   0.03154 *  
## ar3   0.069456   0.104095  0.6672   0.50462    
## sma1 -0.780039   0.147206 -5.2990 1.165e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AR3 does not show significance and so overfitting model SARIMA(3,0,0)*(0,1,1)_12 will not be taken into consideration

DIAGNOSTIC CHECKING

**Plot to check residuals from the best estimated SARIMA(2,0,0) x (0,1,1)_12 model**

plot(window(rstandard(mf_200_ml),start=c(1981,1)), ylab='Standardized Residuals',type='o', main="Plot 21-Residuals from SARIMA(2,0,0)x(0,1,1)_12 model.")
abline(h=0)

Since we have some strange behaviour near 1983 , we will have to investigate this best model further for outliers

plot the ACF of standardised residuals of our best model(0,1,1) * (0,1,1)s=12

acf(as.vector(window(rstandard(mf_200_ml),start=c(1981,1))),lag.max=36,main="Plot-22 ACF of residuals from SARIMA(2,0,0)x(0,1,1)_12")

Except for the slightly significant autocorrelation at lag 25, there is not any sign of violation of the independence of residuals.

Function for residual analysis

The function below includes all the residual analysis methods included in one single function

residual.analysis <- function(model, std = TRUE){
  
  if (std == TRUE){
    res.model = rstandard(model)
  }else{
    res.model = residuals(model)
  }
  par(mfrow=c(3,2))
  plot(res.model,type='o',ylab='Standardized residuals', main="Time series plot of standardized residuals")
  abline(h=0)
  hist(res.model,main="Histogram of standardized residuals")
  qqnorm(res.model,main="QQ plot of standardized residuals")
  qqline(res.model, col = 2)
  acf(res.model,main="ACF of standardized residuals")
  print(shapiro.test(res.model))
  k=0
  LBQPlot(res.model, lag.max = length(model$residuals)-1 , StartLag = k + 1, k = 0, SquaredQ = FALSE)
}

Residual Diagnostics - SARIMA (0,0,1) * (0,1,1)s=12

residual.analysis(model=mf_001_ml)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.97776, p-value = 0.04413
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length

SARIMA(0,0,1)x(0,1,1)S=12

  1. Time Series plots does not have any trend except for some intervention.
  2. Histogram shows residuals slightly skewed to the right.
  3. QQplot is normally distributed with 1 or 2 outliers.
  4. ACF plot highlights the white noise presence.
  5. Ljung Box test shows the presence of some residuals very few residuals and also not all points are above the threshold line ..

Residual Diagnostics - SARIMA (1,0,1) * (0,1,1)S=12

residual.analysis(model=mf_101_ml)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.98339, p-value = 0.1467
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length

SARIMA(1,0,1)x(0,1,1)S=12

  1. Time Series plots does not have any trend except for some intervention.
  2. Histogram is slightly skewed to the right.
  3. QQplot is normally distributed except for some points not lying on the line.
  4. ACF plot highlights the white noise presence.
  5. Ljung Box test shows the presence of good amount of some residuals and also not all points are above the threshold line .

Residual Diagnostics - SARIMA(2,0,1) * (0,1,1)S=12

residual.analysis(model=mf_201_ml)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.98111, p-value = 0.09036
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length

SARIMA(2,0,1)x(0,1,1)S=12

  1. Time Series plots does not have any trend except for some intervention.
  2. Histogram looks to be very slightly skewed to the right.
  3. QQplot is normally distributed except for one or two outliers.
  4. ACF plot highlights the white noise presence.
  5. Ljung Box test shows the presence of very few residuals and all points are above the threshold line .

Residual Diagnostics - SARIMA(2,0,0) * (0,1,1)S=12

residual.analysis(model=mf_200_ml)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.98135, p-value = 0.09516
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length

SARIMA(2,0,0)x(0,1,1)S=12

  1. Time Series plots does not have any trend except for some intervention.
  2. Histogram looks to be very slightly skewed to the right.
  3. QQplot is normally distributed except for one or two outliers.
  4. ACF plot highlights the white noise presence.
  5. Ljung Box test shows the presence of very few residuals and all points are above the threshold line .

FORECAST

With SARIMA(2,0,0) * (0,1,1) being the best model with the S is equal to 12, below is the forcast for next 10 years.We could still see the same seasonal trend would be repeating in the next ten years too

sarima.for(monthlyMinTempTS,120,2,0,0,0,1,1,12)

## $pred
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1991 15.248472 15.486091 14.845551 12.556087 10.210343  7.579259  7.102069
## 1992 15.258478 15.474305 14.868050 12.580553 10.242439  7.613236  7.138020
## 1993 15.296362 15.512218 14.905981 12.618493 10.280385  7.651184  7.175971
## 1994 15.334314 15.550171 14.943933 12.656446 10.318337  7.689137  7.213923
## 1995 15.372267 15.588124 14.981886 12.694399 10.356290  7.727090  7.251876
## 1996 15.410220 15.626076 15.019839 12.732351 10.394243  7.765042  7.289829
## 1997 15.448173 15.664029 15.057791 12.770304 10.432196  7.802995  7.327781
## 1998 15.486125 15.701982 15.095744 12.808257 10.470148  7.840948  7.365734
## 1999 15.524078 15.739935 15.133697 12.846209 10.508101  7.878900  7.403687
## 2000 15.562031 15.777887 15.171649 12.884162 10.546054  7.916853  7.441639
##            Aug       Sep       Oct       Nov       Dec
## 1991  7.992644  9.220982 10.615139 12.752758 14.195904
## 1992  8.029371  9.258276 10.652706 12.790498 14.233734
## 1993  8.067323  9.296228 10.690659 12.828450 14.271687
## 1994  8.105275  9.334181 10.728611 12.866403 14.309640
## 1995  8.143228  9.372134 10.766564 12.904356 14.347592
## 1996  8.181181  9.410086 10.804517 12.942308 14.385545
## 1997  8.219133  9.448039 10.842469 12.980261 14.423498
## 1998  8.257086  9.485992 10.880422 13.018214 14.461450
## 1999  8.295039  9.523944 10.918375 13.056166 14.499403
## 2000  8.332991  9.561897 10.956327 13.094119 14.537356
## 
## $se
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1991 0.9893561 1.0073505 1.0368022 1.0404600 1.0427207 1.0431994 1.0434020
## 1992 1.0604507 1.0611095 1.0622007 1.0623390 1.0624243 1.0624424 1.0624497
## 1993 1.0790274 1.0796649 1.0807299 1.0808644 1.0809477 1.0809652 1.0809724
## 1994 1.0972701 1.0978970 1.0989444 1.0990766 1.0991585 1.0991758 1.0991828
## 1995 1.1152144 1.1158312 1.1168618 1.1169919 1.1170725 1.1170895 1.1170964
## 1996 1.1328745 1.1334817 1.1344963 1.1346244 1.1347037 1.1347204 1.1347272
## 1997 1.1502635 1.1508616 1.1518608 1.1519870 1.1520651 1.1520816 1.1520883
## 1998 1.1673936 1.1679828 1.1689674 1.1690917 1.1691687 1.1691849 1.1691916
## 1999 1.1842758 1.1848567 1.1858273 1.1859498 1.1860257 1.1860417 1.1860482
## 2000 1.2009208 1.2014936 1.2024508 1.2025716 1.2026464 1.2026622 1.2026687
##            Aug       Sep       Oct       Nov       Dec
## 1991 1.0434553 1.0434710 1.0434704 1.0434219 1.0433923
## 1992 1.0624510 1.0624478 1.0624418 1.0623923 1.0623627
## 1993 1.0809736 1.0809705 1.0809645 1.0809159 1.0808868
## 1994 1.0991840 1.0991809 1.0991751 1.0991273 1.0990987
## 1995 1.1170975 1.1170945 1.1170888 1.1170418 1.1170136
## 1996 1.1347283 1.1347254 1.1347197 1.1346734 1.1346457
## 1997 1.1520894 1.1520865 1.1520809 1.1520353 1.1520080
## 1998 1.1691927 1.1691898 1.1691843 1.1691394 1.1691125
## 1999 1.1860493 1.1860465 1.1860411 1.1859968 1.1859703
## 2000 1.2026697 1.2026670 1.2026616 1.2026179 1.2025918

CONCLUSION

We can go on to say that the data followed a seasonal trend and we had to perform seasonal differencing to it. Since the non seasonal component was already stationary, we did not need to perform ordinary differencing. So we went on to predict probable models through the likes of Residual approach,EACF,ACF,PACF,BIC. On arriving at the estimated set of models, we performed model fitting through parameter estimation using Maximum Likelihood approach to test how significant the model is. Now based on the AIC score and our residual analysis, we selected the best model as SARIMA(2,0,0)*(0,1,1)s=12 and with this model we predicted the temperature forecast for the next 10 years

REFERENCE

SUMMARY

In this assignment, we worked on SARIMA models that helps in modelling time series with not so regular seasonal tendencies than a deterministic model. We followed the below procedure to accomplish the task at hand