The dataset use in this project describes the minimum daily temperature over 10 years (19981 and 1990) in Melbourne, Australia. The units are in degree celsius and there are 3650 observations. The source of the data was created by Australian Bureau of meteorology. We will analyze this dataset to capture the trend, applied transformation on the data where required, plot the ACF & PACF’s and coefficient tests on suitable models, and forecast for the next five years.
# loading relevant packages into rstudio
library(readr)
library(dplyr)
library(tidyr)
library(TSA)
library(tseries)
library(lmtest)
library(forecast)
library(fUnitRoots)
# reading the dataset into R
Mel_Min_temp <- read.csv("Minimun Daily Temperature.csv", header = TRUE)
# converting the dataframe to time series object
min_TempTS <- ts(Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..,
start=c(1981, 1), end=c(1990, 365), frequency = 365)
#checking the class of the dataset
class(min_TempTS)
## [1] "ts"
#plotting time Series data
plot(min_TempTS , col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Time series plot of minimum temperature in Melbourne.")
Figure 1: shows the changes in daily minimum temperature in melbourne from 1981 to 1990. We can conclude the plot as well as some strong cyclic behaviour. there is no apparent trend in the data over this period.
#converting daily minimum temperature to a monthly mean teamperature to better see seasonality
monthlyMinTemp <- aggregate(Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..~Month+Mel_Min_temp$ï..Year,
Mel_Min_temp, mean )
monthlyMinTemp
## Month Mel_Min_temp$ï..Year
## 1 1 1981
## 2 2 1981
## 3 3 1981
## 4 4 1981
## 5 5 1981
## 6 6 1981
## 7 7 1981
## 8 8 1981
## 9 9 1981
## 10 10 1981
## 11 11 1981
## 12 12 1981
## 13 1 1982
## 14 2 1982
## 15 3 1982
## 16 4 1982
## 17 5 1982
## 18 6 1982
## 19 7 1982
## 20 8 1982
## 21 9 1982
## 22 10 1982
## 23 11 1982
## 24 12 1982
## 25 1 1983
## 26 2 1983
## 27 3 1983
## 28 4 1983
## 29 5 1983
## 30 6 1983
## 31 7 1983
## 32 8 1983
## 33 9 1983
## 34 10 1983
## 35 11 1983
## 36 12 1983
## 37 1 1984
## 38 2 1984
## 39 3 1984
## 40 4 1984
## 41 5 1984
## 42 6 1984
## 43 7 1984
## 44 8 1984
## 45 9 1984
## 46 10 1984
## 47 11 1984
## 48 12 1984
## 49 1 1985
## 50 2 1985
## 51 3 1985
## 52 4 1985
## 53 5 1985
## 54 6 1985
## 55 7 1985
## 56 8 1985
## 57 9 1985
## 58 10 1985
## 59 11 1985
## 60 12 1985
## 61 1 1986
## 62 2 1986
## 63 3 1986
## 64 4 1986
## 65 5 1986
## 66 6 1986
## 67 7 1986
## 68 8 1986
## 69 9 1986
## 70 10 1986
## 71 11 1986
## 72 12 1986
## 73 1 1987
## 74 2 1987
## 75 3 1987
## 76 4 1987
## 77 5 1987
## 78 6 1987
## 79 7 1987
## 80 8 1987
## 81 9 1987
## 82 10 1987
## 83 11 1987
## 84 12 1987
## 85 1 1988
## 86 2 1988
## 87 3 1988
## 88 4 1988
## 89 5 1988
## 90 6 1988
## 91 7 1988
## 92 8 1988
## 93 9 1988
## 94 10 1988
## 95 11 1988
## 96 12 1988
## 97 1 1989
## 98 2 1989
## 99 3 1989
## 100 4 1989
## 101 5 1989
## 102 6 1989
## 103 7 1989
## 104 8 1989
## 105 9 1989
## 106 10 1989
## 107 11 1989
## 108 12 1989
## 109 1 1990
## 110 2 1990
## 111 3 1990
## 112 4 1990
## 113 5 1990
## 114 6 1990
## 115 7 1990
## 116 8 1990
## 117 9 1990
## 118 10 1990
## 119 11 1990
## 120 12 1990
## Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..
## 1 17.712903
## 2 17.678571
## 3 13.500000
## 4 12.356667
## 5 9.490323
## 6 7.306667
## 7 7.577419
## 8 7.238710
## 9 10.143333
## 10 10.087097
## 11 11.890000
## 12 13.680645
## 13 16.567742
## 14 15.921429
## 15 14.935484
## 16 11.470000
## 17 9.583871
## 18 5.606667
## 19 4.641935
## 20 7.903226
## 21 7.280000
## 22 9.545161
## 23 12.486667
## 24 13.754839
## 25 13.180645
## 26 16.807143
## 27 15.777419
## 28 10.596667
## 29 10.116129
## 30 6.600000
## 31 6.890323
## 32 8.706452
## 33 9.210000
## 34 10.312903
## 35 11.993333
## 36 14.396774
## 37 14.309677
## 38 14.944828
## 39 12.867742
## 40 10.750000
## 41 8.112903
## 42 7.730000
## 43 5.987097
## 44 8.696774
## 45 8.046667
## 46 10.632258
## 47 12.623333
## 48 12.643333
## 49 14.219355
## 50 14.032143
## 51 15.877419
## 52 12.976667
## 53 9.419355
## 54 7.073333
## 55 6.135484
## 56 7.635484
## 57 8.803333
## 58 10.490323
## 59 13.073333
## 60 14.109677
## 61 13.825806
## 62 14.196429
## 63 14.690323
## 64 11.653333
## 65 10.274194
## 66 7.526667
## 67 6.961290
## 68 7.387097
## 69 8.933333
## 70 9.683871
## 71 11.793333
## 72 12.935484
## 73 13.235484
## 74 13.889286
## 75 12.619355
## 76 12.250000
## 77 9.806452
## 78 8.273333
## 79 5.983871
## 80 8.022581
## 81 9.810000
## 82 10.238710
## 83 13.150000
## 84 13.254839
## 85 16.493548
## 86 14.524138
## 87 14.748387
## 88 12.833333
## 89 11.387097
## 90 8.386667
## 91 8.232258
## 92 8.725806
## 93 9.883333
## 94 10.890323
## 95 12.253333
## 96 15.436667
## 97 15.180645
## 98 16.371429
## 99 15.803226
## 100 12.563333
## 101 10.725806
## 102 6.560000
## 103 6.332258
## 104 6.770968
## 105 8.486667
## 106 9.867742
## 107 12.876667
## 108 13.951613
## 109 15.577419
## 110 15.417857
## 111 14.835484
## 112 13.433333
## 113 9.748387
## 114 7.720000
## 115 8.183871
## 116 7.825806
## 117 9.166667
## 118 11.345161
## 119 12.656667
## 120 14.367742
# converting monthly temperature data into times series data
monthlyMinTempTS <- ts(monthlyMinTemp$`Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..`,
start=c(1981, 1), end=c(1990, 12), frequency = 12)
monthlyMinTempTS
## Jan Feb Mar Apr May Jun Jul
## 1981 17.712903 17.678571 13.500000 12.356667 9.490323 7.306667 7.577419
## 1982 16.567742 15.921429 14.935484 11.470000 9.583871 5.606667 4.641935
## 1983 13.180645 16.807143 15.777419 10.596667 10.116129 6.600000 6.890323
## 1984 14.309677 14.944828 12.867742 10.750000 8.112903 7.730000 5.987097
## 1985 14.219355 14.032143 15.877419 12.976667 9.419355 7.073333 6.135484
## 1986 13.825806 14.196429 14.690323 11.653333 10.274194 7.526667 6.961290
## 1987 13.235484 13.889286 12.619355 12.250000 9.806452 8.273333 5.983871
## 1988 16.493548 14.524138 14.748387 12.833333 11.387097 8.386667 8.232258
## 1989 15.180645 16.371429 15.803226 12.563333 10.725806 6.560000 6.332258
## 1990 15.577419 15.417857 14.835484 13.433333 9.748387 7.720000 8.183871
## Aug Sep Oct Nov Dec
## 1981 7.238710 10.143333 10.087097 11.890000 13.680645
## 1982 7.903226 7.280000 9.545161 12.486667 13.754839
## 1983 8.706452 9.210000 10.312903 11.993333 14.396774
## 1984 8.696774 8.046667 10.632258 12.623333 12.643333
## 1985 7.635484 8.803333 10.490323 13.073333 14.109677
## 1986 7.387097 8.933333 9.683871 11.793333 12.935484
## 1987 8.022581 9.810000 10.238710 13.150000 13.254839
## 1988 8.725806 9.883333 10.890323 12.253333 15.436667
## 1989 6.770968 8.486667 9.867742 12.876667 13.951613
## 1990 7.825806 9.166667 11.345161 12.656667 14.367742
plot(monthlyMinTempTS, col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Time series plot of minimum temperature in Melbourne.")
After converting the daily minimum temperature to a monthly mean max temperature, we can clearly see a strong seasonality pattern. Further Analysis will need to be performed.
bb = monthlyMinTempTS
aa = zlag(monthlyMinTempTS)
index = 2: length(aa)
cor(bb[index], aa[index])
## [1] 0.8020674
We can see an upward trend and high correlation
plot(y = monthlyMinTempTS, x = zlag(monthlyMinTempTS), ylab = 'Min temp', xlab = 'Previous Year temp')
#Plotting ACF with max 36 lags Because we have strong correlation at lags 12,24,36 and so on we consider the existence of seasonal auto correlation relationship.There is also a substantial correlation that needs to be considered.
par(mfrow=c(1,2))
acf(monthlyMinTempTS,lag.max=36)
pacf(monthlyMinTempTS, lag.max=36)
##Time series plot,ACF AND PACF of Minimum Monthly temperature after taking first difference
Although the general upward trend is resolved, we still have a seasonal effect pattern in time series plot.
plot(diff(monthlyMinTempTS),ylab='First Difference of Temperature',xlab='Time',main = "Figure-Time series plot of the first differences of Minimum Monthly temperature levels")
## ACF of first difference of Min Monthly temp levels
acf(as.vector(diff(monthlyMinTempTS)),lag.max=36,main="Figure-ACF of the first differences of Min monthly temp levels.")
pacf(as.vector(diff(monthlyMinTempTS)),lag.max=36,main="Figure-PACF of the first differences of Min monthly temp levels.")
Although the general upward trend is resolved, we still have a seasonal effect pattern in time series plot. So, we consider taking a seasonal difference along with the ordinary difference
plot(diff(diff(monthlyMinTempTS),lag=12),xlab='Time',ylab='First and Seasonal Difference of monthly min temp levels',main= "Figure-Time series plot of the first and seasonal differences of monthly min temp levels.")
###ACF of the first and seasonal differences of Monthly min temp levels
acf(as.vector(diff(diff(monthlyMinTempTS),lag=12)),lag.max=36,ci.type='ma',main="Figure 12. ACF of the first and seasonal differences of")
###PACF of the first and seasonal differences of Monthly min temp levels
pacf(as.vector(diff(diff(monthlyMinTempTS),lag=12)),lag.max=36,ci.type='ma',main="Figure 12. PACF of the first and seasonal differences of monthly min temp levels")
After taking a seasonal difference, we nearly get rid of the seasonality in the series and very little autocorrelation remains in the series. This plot also suggests that a simple model which incorporates the lag 1 and lag 12 autocorrelations might be adequate. We will consider specifying an ARIMA(0,1,1)×(0,1,1)12 model to this series
##Residuals approach
###When D=1 ###Fit SARIMA(0,0,0)x(0,1,0)12 model and display time series plots of the residuals
m1.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,0,0),seasonal=list(order=c(0,1,0), period=12))
res.m1 = residuals(m1.monthlyMinTempTS);
plot(res.m1,xlab='Time',ylab='Residuals',main="m1-Time series plot of the residuals")
###ACF and PACF of residuals-m1
par(mfrow=c(1,2))
acf(res.m1, lag.max = 36, main = "m1-ACF of residuals")
pacf(res.m1, lag.max = 36, main = "m1-PACF of residuals")
When we focus on the autocorrelations at seasonal lags, there is no pattern implying the existence of a seasonal trend. However, the existence of an ordinary trend is obvious from the time series, ACF and PACF plots.
To get rid of the ordinary trend seen in the residuals, we will fit the SARIMA(0,1,0)x(0,1,0)12 model
m2.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,1,0),seasonal=list(order=c(0,1,0), period=12))
res.m2 = residuals(m2.monthlyMinTempTS);
plot(res.m2,xlab='Time',ylab='Residuals',main="m2-Time series plot of the residuals")
###ACF and PACF of residuals-m2
par(mfrow=c(1,2))
acf(res.m2, lag.max = 36, main = "m2-ACF of residuals")
pacf(res.m2, lag.max = 36, main = "m2-PACF of residuals")
There is no evidence of an ordinary trend left in the latest residuals. The residuals of the model SARIMA(0,1,0)x(0,1,0)12 now includes SARMA and ARMA components.For the orders of SARMA component, we will consider the lags s,2s,3,…, i.e. 12, 24, 36, … These are shown as 1, 2, 3, … in both the ACF and PACF plots.
When we consider the lags 1, 2, 3, … of ACF plot, the correlation at lag 1 is significant, and in PACF there is a decreasing pattern at lags 1, 2, 3, … This implies the existence of an SMA(1) component.
m3.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,1,0),seasonal=list(order=c(0,1,1), period=12))
res.m3 = residuals(m3.monthlyMinTempTS);
plot(res.m3,xlab='Time',ylab='Residuals',main="m3-Time series plot of the residuals")
###ACF and PACF of residuals-m3
par(mfrow=c(1,2))
acf(res.m3, lag.max = 36, main = "m3-ACF of residuals")
pacf(res.m3, lag.max = 36, main = "m3-PACF of residuals")
At the seasonal lags, especially at the first seasonal lag in ACF, the autocorrelations became insignificant or slightly significant after adding the seasonal component. We will use the resulting ACF and PACF plots to specify the orders of ARMA component as there is no highly significant correlation at the lags s,2s,3s,….
The last ACF-PACF pair has significant pikes at the first lags implying an ARMA(1,1) model or due to the (insignificant) decrease of correlations in PACF, we would specify ARMA(0,1) for the residuals.
m4.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,1,1),seasonal=list(order=c(0,1,1), period=12))
res.m4 = residuals(m4.monthlyMinTempTS);
plot(res.m4,xlab='Time',ylab='Residuals',main="m4-Time series plot of the residuals")
###ACF and PACF of residuals-m4
par(mfrow=c(1,2))
acf(res.m4, lag.max = 36, main = "m4-ACF of residuals")
pacf(res.m4, lag.max = 36, main = "m4-PACF of residuals")
m5.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(1,1,1),seasonal=list(order=c(0,1,1), period=12))
res.m5 = residuals(m5.monthlyMinTempTS);
plot(res.m5,xlab='Time',ylab='Residuals',main="m5-Time series plot of the residuals")
###ACF and PACF of residuals-m5
par(mfrow=c(1,2))
acf(res.m5, lag.max = 36, main = "m5-ACF of residuals")
pacf(res.m5, lag.max = 36, main = "m5-PACF of residuals")
We got nearly white noise residual series for both of the SARIMA(0,1,1)x(0,1,1)12 and SARIMA(1,1,1)x(0,1,1)12 models. The residuals of SARIMA(0,1,1)x(0,1,1)12 model are closer to the white noise. So, we can conclude with the orders p=0, d=1, q=1, P=0, D=1, Q=1, and s=12 of SARIMA(p,d,q)x(P,D,Q)s model
we use EACF on the residuals of the non seasonary differencing model (i.e. res.m3), to see information about AR and MA parts lefts in residuals. The EACF table also clearly suggests a (0,1,1) model
eacf(res.m3)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o o o o o o x x x o o o o
## 1 x o o o o o o o o o o o o o
## 2 x o x o o o o o o o o o o o
## 3 x x o o o o o o o o o o o o
## 4 x x o o o o o o o o o o o o
## 5 x x o x o o o o o o o o o o
## 6 o x x x o o o o o o o o o o
## 7 o x x o x o o o o o o o o o
So, we can conclude from residual approach and EACF that the suitable models for this series are SARIMA(0,1,1)*(0,1,1)s=12 of SARIMA(p,d,q)x(P,D,Q)s model
Along with this, other candidate models are SARIMA(0,1,0) * (0,1,1)s=12 and (1,1,1)*(0,1,1)s=12
###**Fitting the SARIMA(0,1,0)*(0,1,1)S=12 model using ML method**
mf_010_ml = arima(monthlyMinTempTS,order=c(0,1,1),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_010_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.785576 0.088072 -8.9197 < 2.2e-16 ***
## sma1 -0.801445 0.137944 -5.8099 6.25e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Fitting the same model above with lambda parameter to find AICc,BIC and AIC value
mf_010_lambda <- Arima(monthlyMinTempTS, order=c(0,1,0), seasonal=c(0,1,1),
lambda=0)
mf_010_lambda
## Series: monthlyMinTempTS
## ARIMA(0,1,0)(0,1,1)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## sma1
## -0.9998
## s.e. 0.2219
##
## sigma^2 estimated as 0.01344: log likelihood=65.5
## AIC=-127 AICc=-126.88 BIC=-121.65
**Fitting the same SARIMA(0,1,0)*(0,1,1)S=12 model using CSS method**
mf_010_css = arima(monthlyMinTempTS,order=c(0,1,1),seasonal=list(order=c(0,1,1),
period=12),method = "CSS")
coeftest(mf_010_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.787922 0.065622 -12.0070 < 2.2e-16 ***
## sma1 -0.586463 0.081213 -7.2213 5.151e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
###**Fitting the SARIMA(1,1,1)*(0,1,1)S=12 model**
mf_111_ml = arima(monthlyMinTempTS,order=c(1,1,1),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_111_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.079905 0.158437 0.5043 0.614
## ma1 -0.836734 0.122602 -6.8248 8.804e-12 ***
## sma1 -0.824361 0.154570 -5.3333 9.646e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Fitting the same model above with lambda parameter to find AICc,BIC and AIC value
mf_111_lambda <- Arima(monthlyMinTempTS, order=c(1,1,1), seasonal=c(0,1,1),
lambda=0)
mf_111_lambda
## Series: monthlyMinTempTS
## ARIMA(1,1,1)(0,1,1)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ma1 sma1
## -0.0083 -0.7410 -0.8670
## s.e. 0.2168 0.1982 0.1664
##
## sigma^2 estimated as 0.009828: log likelihood=88.59
## AIC=-169.17 AICc=-168.78 BIC=-158.48
**Fitting the same SARIMA(1,1,1)*(0,1,1)S=12 model using CSS method**
mf_111_css = arima(monthlyMinTempTS,order=c(1,1,1),seasonal=list(order=c(0,1,1),
period=12),method = "CSS")
coeftest(mf_111_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.135634 0.140075 -0.9683 0.3329
## ma1 -0.675582 0.117202 -5.7643 8.202e-09 ***
## sma1 -0.551512 0.084746 -6.5078 7.626e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
###**Fitting the SARIMA(0,1,1)*(0,1,1)S=12 model**
mf_011_ml = arima(monthlyMinTempTS,order=c(0,1,1),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_011_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.785576 0.088072 -8.9197 < 2.2e-16 ***
## sma1 -0.801445 0.137944 -5.8099 6.25e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
**Fitting the same model above with lambda parameter to find AICc,BIC and AIC value
mf_011_lambda <- Arima(monthlyMinTempTS, order=c(0,1,1), seasonal=c(0,1,1),
lambda=0)
mf_011_lambda
## Series: monthlyMinTempTS
## ARIMA(0,1,1)(0,1,1)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ma1 sma1
## -0.7477 -0.8687
## s.e. 0.0894 0.1613
##
## sigma^2 estimated as 0.009722: log likelihood=88.59
## AIC=-171.17 AICc=-170.94 BIC=-163.15
**Fitting the same SARIMA(0,1,1)*(0,1,1)S=12 model using CSS method**
mf_011_css = arima(monthlyMinTempTS,order=c(0,1,1),seasonal=list(order=c(0,1,1),
period=12),method = "CSS")
coeftest(mf_011_css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.787922 0.065622 -12.0070 < 2.2e-16 ***
## sma1 -0.586463 0.081213 -7.2213 5.151e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
###AIC values comparison The SARIMA model (0,1,1)*(0,1,1) has the lease AIC value and hence is considered to be the best fit
AIC(mf_010_lambda,mf_111_lambda,mf_011_lambda)
## df AIC
## mf_010_lambda 2 -126.9971
## mf_111_lambda 4 -169.1723
## mf_011_lambda 3 -171.1708
###BIC values comparison The SARIMA model (0,1,1)*(0,1,1) has the lease BIC value and hence is considered to be the best fit
BIC(mf_010_lambda,mf_111_lambda,mf_011_lambda)
## df BIC
## mf_010_lambda 2 -121.6515
## mf_111_lambda 4 -158.4809
## mf_011_lambda 3 -163.1523
Please perform Box test,hist,qqnorm,qqline,shapiro-wilk test for all 3 models given below
Please forecast using the model SARIMA(0,1,1) * (0,1,1)S=12 only
###Using auto.arima function To automatically fit the data we have and check the best model that could be fitted for our data using auto.arima function This result also comes as SARIMA(0,1,1)*(0,1,1)S=12
autoArimaModel = auto.arima(monthlyMinTempTS, d = 1)
autoArimaModel
## Series: monthlyMinTempTS
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.7856 -0.8014
## s.e. 0.0881 0.1379
##
## sigma^2 estimated as 1.065: log likelihood=-160.79
## AIC=327.58 AICc=327.81 BIC=335.6