The dataset used in this project describes the minimum daily temperature over 10 years (19981 and 1990) in Melbourne, Australia. The units are in degree celsius and there are 3650 observations. The source of the data was created by Australian Bureau of meteorology and taken from kaggle. We will analyze this dataset to capture the trend, applied transformation on the data where required, plot the ACF & PACF’s and coefficient tests on suitable models, and forecast for the next ten years. Data is available in https://www.kaggle.com/paulbrabban/daily-minimum-temperatures-in-melbourne
# loading relevant packages into rstudio
library(readr)
library(dplyr)
library(tidyr)
library(TSA)
library(MTS)
library(tseries)
library(lmtest)
library(forecast)
library(fUnitRoots)
library(FitAR)
library(astsa)
library(urca)
# reading the dataset into R
Mel_Min_temp <- read.csv("Minimum Daily Temperature.csv", header = TRUE)
head(Mel_Min_temp,5)
## ï..Year Month Day Minimum.Daily.Temperature.Degree..Degree.C..
## 1 1981 1 1 20.7
## 2 1981 1 2 17.9
## 3 1981 1 3 18.8
## 4 1981 1 4 14.6
## 5 1981 1 5 15.8
# converting the dataframe to time series object
min_TempTS <- ts(Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..,
start=c(1981, 1), end=c(1990, 365), frequency = 365)
#checking the class of the dataset
class(min_TempTS)
## [1] "ts"
#plotting time Series data
plot(min_TempTS , col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Plot1: TS plot of min temp in Melbourne.")
Plot 1: shows the changes in daily minimum temperature in melbourne from 1981 to 1990. We can conclude from the plot that there is a strong cyclic behaviour and there is no apparent trend in the data over this period.
converting daily minimum temperature to a monthly mean teamperature to better see seasonality
#converting daily minimum temperature to a monthly mean teamperature to better see seasonality
monthlyMinTemp <- aggregate(Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..~Month+Mel_Min_temp$ï..Year,
Mel_Min_temp, mean )
head(monthlyMinTemp,5)
## Month Mel_Min_temp$ï..Year
## 1 1 1981
## 2 2 1981
## 3 3 1981
## 4 4 1981
## 5 5 1981
## Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..
## 1 17.712903
## 2 17.678571
## 3 13.500000
## 4 12.356667
## 5 9.490323
converting monthly temperature data into times series data
# converting monthly temperature data into times series data
monthlyMinTempTS <- ts(monthlyMinTemp$`Mel_Min_temp$Minimum.Daily.Temperature.Degree..Degree.C..`,
start=c(1981, 1), end=c(1990, 12), frequency = 12)
monthlyMinTempTS
## Jan Feb Mar Apr May Jun Jul
## 1981 17.712903 17.678571 13.500000 12.356667 9.490323 7.306667 7.577419
## 1982 16.567742 15.921429 14.935484 11.470000 9.583871 5.606667 4.641935
## 1983 13.180645 16.807143 15.777419 10.596667 10.116129 6.600000 6.890323
## 1984 14.309677 14.944828 12.867742 10.750000 8.112903 7.730000 5.987097
## 1985 14.219355 14.032143 15.877419 12.976667 9.419355 7.073333 6.135484
## 1986 13.825806 14.196429 14.690323 11.653333 10.274194 7.526667 6.961290
## 1987 13.235484 13.889286 12.619355 12.250000 9.806452 8.273333 5.983871
## 1988 16.493548 14.524138 14.748387 12.833333 11.387097 8.386667 8.232258
## 1989 15.180645 16.371429 15.803226 12.563333 10.725806 6.560000 6.332258
## 1990 15.577419 15.417857 14.835484 13.433333 9.748387 7.720000 8.183871
## Aug Sep Oct Nov Dec
## 1981 7.238710 10.143333 10.087097 11.890000 13.680645
## 1982 7.903226 7.280000 9.545161 12.486667 13.754839
## 1983 8.706452 9.210000 10.312903 11.993333 14.396774
## 1984 8.696774 8.046667 10.632258 12.623333 12.643333
## 1985 7.635484 8.803333 10.490323 13.073333 14.109677
## 1986 7.387097 8.933333 9.683871 11.793333 12.935484
## 1987 8.022581 9.810000 10.238710 13.150000 13.254839
## 1988 8.725806 9.883333 10.890323 12.253333 15.436667
## 1989 6.770968 8.486667 9.867742 12.876667 13.951613
## 1990 7.825806 9.166667 11.345161 12.656667 14.367742
Time Series plot of monthly min temp in Melbourne
plot(monthlyMinTempTS, col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Plot 2:TS plot of monthly min temp in Melbourne.")
After converting the daily minimum temperature to a monthly mean max temperature, we can clearly see a strong seasonality pattern. Further Analysis will need to be performed.
Check correlations of changes in minimum daily temperature
# Check correlations of changes in minimum daily temperature
y = min_TempTS # read the daily min temp series into y
x = zlag(min_TempTS) # generate first lag of the min temp series
index = 2:length(x) # Create an index to get rid of the first NA value in x
cor(y[index],x[index]) # Calculate correlation between numerical values in x and y
## [1] 0.7748702
Scatter plot of min temperature series and its first lag
plot(y[index],x[index],ylab='Min temperature series', xlab='The first lag of min temperature series', main = 'Plot 3: Scatter plot of min temperature series and its first lag')
As Scatter plot shows, the correlation value 0.774 proved that there is strong positive linear relationship between previous day’s min temperature on the next day’s min temperature
Investigating correlation of MonthlyMinTempTS
we observe a high correlation between temperatures of succeeding months and we couldnt be able to locate seasonality from this plot
bb = monthlyMinTempTS
aa = zlag(monthlyMinTempTS)
index = 2: length(aa)
cor(bb[index], aa[index])
## [1] 0.8020674
Scatter plot of average temperature in consecutive months
As expected, we could see a good amount of correlation between succeeding months
plot(y = monthlyMinTempTS, x = zlag(monthlyMinTempTS), ylab = 'Min temp', xlab = 'Previous Month temp',main = 'Plot 4:scatter plot of average temperature in consecutive months')
Least squares regression to check if there is a trend with daily data
day.= season(min_TempTS)
model1_Daily = lm(min_TempTS~day.)
#output hidden due to large output
Estimate seasonal trend Model2-> Removing intercept term for monthly temperature data
month.=season(monthlyMinTempTS) # period added to improve table display and this line sets up indicators
model2=lm(monthlyMinTempTS~month.-1) # -1 removes the intercept term
summary(model2)
##
## Call:
## lm(formula = monthlyMinTempTS ~ month. - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05065 -0.62012 0.03393 0.55556 2.68258
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## month.January 15.0303 0.3092 48.60 <2e-16 ***
## month.February 15.3783 0.3092 49.73 <2e-16 ***
## month.March 14.5655 0.3092 47.10 <2e-16 ***
## month.April 12.0883 0.3092 39.09 <2e-16 ***
## month.May 9.8665 0.3092 31.91 <2e-16 ***
## month.June 7.2783 0.3092 23.54 <2e-16 ***
## month.July 6.6926 0.3092 21.64 <2e-16 ***
## month.August 7.8913 0.3092 25.52 <2e-16 ***
## month.September 8.9763 0.3092 29.03 <2e-16 ***
## month.October 10.3094 0.3092 33.34 <2e-16 ***
## month.November 12.4797 0.3092 40.36 <2e-16 ***
## month.December 13.8532 0.3092 44.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9779 on 108 degrees of freedom
## Multiple R-squared: 0.9936, Adjusted R-squared: 0.9929
## F-statistic: 1405 on 12 and 108 DF, p-value: < 2.2e-16
All of the parameters corresponding to months are statistically significant at 5% level
Including intercept parameter to model 2
model3=lm(monthlyMinTempTS~month.) # remove -1 to include the intercept term in the model
summary(model3)
##
## Call:
## lm(formula = monthlyMinTempTS ~ month.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05065 -0.62012 0.03393 0.55556 2.68258
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.0303 0.3092 48.605 < 2e-16 ***
## month.February 0.3480 0.4373 0.796 0.42792
## month.March -0.4648 0.4373 -1.063 0.29019
## month.April -2.9420 0.4373 -6.727 8.56e-10 ***
## month.May -5.1639 0.4373 -11.808 < 2e-16 ***
## month.June -7.7520 0.4373 -17.726 < 2e-16 ***
## month.July -8.3377 0.4373 -19.065 < 2e-16 ***
## month.August -7.1390 0.4373 -16.324 < 2e-16 ***
## month.September -6.0540 0.4373 -13.843 < 2e-16 ***
## month.October -4.7210 0.4373 -10.795 < 2e-16 ***
## month.November -2.5507 0.4373 -5.832 5.77e-08 ***
## month.December -1.1772 0.4373 -2.692 0.00824 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9779 on 108 degrees of freedom
## Multiple R-squared: 0.912, Adjusted R-squared: 0.9031
## F-statistic: 101.8 on 11 and 108 DF, p-value: < 2.2e-16
All of the differences between January and the other months are statistically significant at 5% level in both models model2 and model3
Plot the Time series data of Monthly minimum temperatures first
plot(monthlyMinTempTS, col=c("blue"), ylab='Min temperature', xlab='Time', type='o',
main = "Plot 5: Time series plot of monthly minimum temperature in Melbourne.")
We only see a seasonal trend and there is no other trend associated with the data
Plotting ACF with max 36 lags
Because we have strong correlation at lags 12,24,36 and so on we consider the existence of seasonal auto correlation relationship.There is also a substantial correlation that needs to be considered.
par(mfrow=c(1,2))
acf(monthlyMinTempTS,lag.max=36)
PACF
Pacf(monthlyMinTempTS,lag.max=36)
Seasonal differencing
Plot after taking seasonal differencing
Since we have a seasonal effect pattern in time series plot, we take a seasonal difference
plot(diff(monthlyMinTempTS,lag=12),xlab='Time',ylab='Seasonal Difference of CO2 data',main= "Plot 6. TS plot of seasonal differences of Monthly temp levels.")
ACF of the seasonal differences of monthly Temperature levels
acf(as.vector(diff(monthlyMinTempTS,lag=12)),lag.max=36,ci.type='ma',main="Plot 7-ACF of seasonal differences of monthly temp levels")
PACF of the seasonal differences of monthly Temperature levels
pacf(as.vector(diff(monthlyMinTempTS,lag=12)),lag.max=36,ci.type='ma',main="Plot 8-PACF of seasonal differences of monthly temp levels")
Specification of seasonal component When D=1
Fit m1 = SARIMA(0,0,0)x(0,1,0)12 model and display time series plots of the residuals
m1.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,0,0),seasonal=list(order=c(0,1,0), period=12))
res.m1 = residuals(m1.monthlyMinTempTS);
plot(res.m1,xlab='Time',ylab='Residuals',main="Plot 9-m1-Time series plot of the residuals")
ACF and PACF of residuals-m1
par(mfrow=c(1,2))
acf(res.m1, lag.max = 36, main = "Plot 10-m1-ACF of residuals")
PACF
pacf(res.m1, lag.max = 36, main = "Plot 11-m1-PACF")
From the above plots, there is no pattern implying the existence of a seasonal trend .But We still have one significant correlation at the first seasonal lag in both ACF and PACF.The significant spike at lag 12 in the ACF suggests a seasonal MA(1) component. So we start with SARMA(0,1) and see if we get rid of the effect of the seasonal component on the residuals
m2-(0,0,0) * (0,1,1)s=12
Now, we will fit SARIMA(0,0,0)x(0,1,1)12 model and try to see if we get rid of the effect of the seasonal component on the residuals
m2.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(0,0,0),seasonal=list(order=c(0,1,1), period=12))
res.m2 = residuals(m2.monthlyMinTempTS);
plot(res.m2,xlab='Time',ylab='Residuals',main="Plot 12-m2-Time series plot of the residuals")
par(mfrow=c(1,2))
acf(res.m2, lag.max = 36, main = "Plot13-m2-ACF of residuals")
pacf(res.m2, lag.max = 36, main = "Plot14-m2-PACF of residuals")
At the seasonal lags, especially at the first seasonal lag in ACF, the autocorrelations became insignificant or slightly significant after adding the seasonal component. We will use the resulting ACF and PACF plots to specify the orders of ARMA component as there is no highly significant correlation at the lags s,2s,3s,….
The last ACF-PACF pair has significant pikes at the first lags implying an ARMA(1,1) model or due to the (insignificant) decrease of correlations in PACF, we would specify ARMA(0,1) for the residuals.
So we are settled with (0,1,1) for seasonal component as (P,D,Q).
The data may follow an ARIMA(p ,d ,0) model if the ACF and PACF plots of the differenced data show the following patterns:
(For the non seasoning component)Also ,there is a significant spike at lag 2 in the PACF, but none beyond lag 2 which suggests a (2,d,0) model.The pattern in the first two spikes is what we would expect from an ARIMA(2,0,0), as the PACF tends to decrease. So in this case, the ACF and PACF lead us to think an ARIMA(2,0,0) model might be appropriate
By getting p as 2 , we can add MA component upto order 1 maximum
Check stationarity by doing ADF test to identify if we need ordinary differencing or not
adf.test(res.m2, alternative = c("stationary"))
## Warning in adf.test(res.m2, alternative = c("stationary")): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: res.m2
## Dickey-Fuller = -4.3317, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
The ADF unit root test shows a p-value of less than 0.05 which means the null hypothesis of non stationarity is now rejected and the data is proved to be stationary .Therefore we do not need differencing and fix d as 0 d=0 IS FIXED
Now we set out to specify models for non seasonal ARIMA component We begin with (2,0,1) as mentioned earlier
m3-(2,0,1) * (0,1,1)
m3.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(2,0,1),seasonal=list(order=c(0,1,1), period=12))
res.m3 = residuals(m3.monthlyMinTempTS);
plot(res.m3,xlab='Time',ylab='Residuals',main="m3-Time series plot of the residuals")
ACF and PACF of residuals-m3
par(mfrow=c(1,2))
acf(res.m3, lag.max = 36, main = "Plot15-m3-ACF of residuals")
pacf(res.m3, lag.max = 36, main = "Plot16-m3-PACF of residuals")
Looks almost like a white noise series from the above plots, but we cannot confirm it until we check for (2,0,0) as well since MA component can be upto order 1. It can be 0 as well
m4-(2,0,0) * (0,1,1)
m4.monthlyMinTempTS = arima(monthlyMinTempTS,order=c(2,0,0),seasonal=list(order=c(0,1,1), period=12))
res.m4 = residuals(m4.monthlyMinTempTS);
plot(res.m4,xlab='Time',ylab='Residuals',main="Plot 17-m4-Time series plot of the residuals")
ACF and PACF of residuals-m4
par(mfrow=c(1,2))
acf(res.m4, lag.max = 36, main = "Plot18-m4-ACF of residuals")
pacf(res.m4, lag.max = 36, main = "Plot19-m4-PACF of residuals")
**We can clearly see that even though m3 and m4 looks closer, the residuals of m4-> SARIMA(2,0,0)*(0,1,1)s=12 are closer to the white noise .So we can pretty much conclude with the orders p=2,d=0,q=0,P=0,D=1,Q=1,and s=12**
EACF of the previous model
The EACF suggests (0,1) and (1,1) as well for the non seasonal component
eacf(res.m4)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o o o o o o o o o o o
## 1 x o o o o o o o o o o o o o
## 2 x x o o o o o o o o o o o o
## 3 o o x o o o o o o o o o o o
## 4 x o o o o o o o o o o o o o
## 5 o o o x o o o o o o o o o o
## 6 x o o x o o o o o o o o o o
## 7 x x x x x o o o o o o o o o
# **BIC**
res_m4_BIC=armasubsets(y=res.m4,nar=7,nma=7,y.name='test', ar.method='ols')
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 7 linear dependencies found
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : nvmax reduced to 7
plot(res_m4_BIC)
title(main = "Plot20-BIC table for monthly temp series.", line = 6)
Next we will fit these models
**Fitting the SARIMA(0,0,1)*(0,1,1)S=12 model using ML method**
mf_001_ml = arima(monthlyMinTempTS,order=c(0,0,1),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_001_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 0.169651 0.085367 1.9873 0.04689 *
## sma1 -0.802720 0.150343 -5.3393 9.332e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By ML method, MA(1) and SMA(1) looks to be significant
Fitting the SARIMA(1,0,1) * (0,1,1)S=12 model
mf_101_ml = arima(monthlyMinTempTS,order=c(1,0,1),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_101_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.79671 0.16339 4.8760 1.082e-06 ***
## ma1 -0.58510 0.21173 -2.7634 0.005721 **
## sma1 -0.80772 0.15120 -5.3420 9.192e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By ML method, AR(1),MA(1) and SMA(1) looks to be significant
Fitting the SARIMA(2,0,1) * (0,1,1)S=12 model
mf_201_ml = arima(monthlyMinTempTS,order=c(2,0,1),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_201_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.41314 0.38629 1.0695 0.2848
## ar2 0.17411 0.14834 1.1737 0.2405
## ma1 -0.23140 0.39513 -0.5856 0.5581
## sma1 -0.78151 0.14788 -5.2848 1.258e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By ML method, MA(1),AR(1),AR(2) is insignificant and SMA(1) looks to be significant
Fitting the SARIMA(2,0,0) * (0,1,1)S=12 model
mf_200_ml = arima(monthlyMinTempTS,order=c(2,0,0),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_200_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.197848 0.099944 1.9796 0.04775 *
## ar2 0.225413 0.098794 2.2817 0.02251 *
## sma1 -0.791129 0.150271 -5.2647 1.404e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By ML method, AR(1),AR(2) and SMA(1) looks to be significant
Model Selection
For model selection we will compute AIC and BIC values of our estimated moddels and select the best models with the best AIC score
sort.score <- function(x, score = c("bic", "aic")){
if (score == "aic"){
x[with(x, order(AIC)),]
} else if (score == "bic") {
x[with(x, order(BIC)),]
} else {
warning('score = "x" only accepts valid arguments ("aic","bic")')
}
}
Sorting the AIC and BIC scores
sc.AIC=AIC(mf_001_ml, mf_101_ml, mf_201_ml, mf_200_ml)
sc.BIC=AIC(mf_001_ml, mf_101_ml, mf_201_ml, mf_200_ml, k = length(monthlyMinTempTS))
AIC values comparison
The SARIMA model (2,0,0)*(0,1,1) has the least AIC value and hence is considered to be the best model fit
sort.score(sc.AIC, score = "aic")
## df AIC
## mf_200_ml 4 324.7389
## mf_101_ml 4 325.3043
## mf_201_ml 5 326.3998
## mf_001_ml 3 329.6884
Because the standard error estimates for the SARIMA (2,0,0) * (0,1,1)s=12 model are relatively smaller than the corresponding parameter estimates, we think that the coefficient estimates are all highly significant
Because the SARIMA (2,0,0) * (0,1,1)s=12 model has the least AIC values, we declare this model as best fit model and to be used for forecasting
Over-fitting model: SARIMA(3,0,0) * (0,1,1)S=12
mf_300_ml = arima(monthlyMinTempTS,order=c(3,0,0),seasonal=list(order=c(0,1,1),
period=12),method = "ML")
coeftest(mf_300_ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.177987 0.104243 1.7074 0.08774 .
## ar2 0.215020 0.100001 2.1502 0.03154 *
## ar3 0.069456 0.104095 0.6672 0.50462
## sma1 -0.780039 0.147206 -5.2990 1.165e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AR3 does not show significance and so overfitting model SARIMA(3,0,0)*(0,1,1)_12 will not be taken into consideration
**Plot to check residuals from the best estimated SARIMA(2,0,0) x (0,1,1)_12 model**
plot(window(rstandard(mf_200_ml),start=c(1981,1)), ylab='Standardized Residuals',type='o', main="Plot 21-Residuals from SARIMA(2,0,0)x(0,1,1)_12 model.")
abline(h=0)
Since we have some strange behaviour near 1983 , we will have to investigate this best model further for outliers
plot the ACF of standardised residuals of our best model(0,1,1) * (0,1,1)s=12
acf(as.vector(window(rstandard(mf_200_ml),start=c(1981,1))),lag.max=36,main="Plot-22 ACF of residuals from SARIMA(2,0,0)x(0,1,1)_12")
Except for the slightly significant autocorrelation at lag 25, there is not any sign of violation of the independence of residuals.
Function for residual analysis
The function below includes all the residual analysis methods included in one single function
residual.analysis <- function(model, std = TRUE){
if (std == TRUE){
res.model = rstandard(model)
}else{
res.model = residuals(model)
}
par(mfrow=c(3,2))
plot(res.model,type='o',ylab='Standardized residuals', main="Time series plot of standardized residuals")
abline(h=0)
hist(res.model,main="Histogram of standardized residuals")
qqnorm(res.model,main="QQ plot of standardized residuals")
qqline(res.model, col = 2)
acf(res.model,main="ACF of standardized residuals")
print(shapiro.test(res.model))
k=0
LBQPlot(res.model, lag.max = length(model$residuals)-1 , StartLag = k + 1, k = 0, SquaredQ = FALSE)
}
Residual Diagnostics - SARIMA (0,0,1) * (0,1,1)s=12
residual.analysis(model=mf_001_ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.97776, p-value = 0.04413
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
SARIMA(0,0,1)x(0,1,1)S=12
Residual Diagnostics - SARIMA (1,0,1) * (0,1,1)S=12
residual.analysis(model=mf_101_ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.98339, p-value = 0.1467
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
SARIMA(1,0,1)x(0,1,1)S=12
Residual Diagnostics - SARIMA(2,0,1) * (0,1,1)S=12
residual.analysis(model=mf_201_ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.98111, p-value = 0.09036
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
SARIMA(2,0,1)x(0,1,1)S=12
Residual Diagnostics - SARIMA(2,0,0) * (0,1,1)S=12
residual.analysis(model=mf_200_ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.98135, p-value = 0.09516
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
SARIMA(2,0,0)x(0,1,1)S=12
With SARIMA(2,0,0) * (0,1,1) being the best model with the S is equal to 12, below is the forcast for next 10 years.We could still see the same seasonal trend would be repeating in the next ten years too
sarima.for(monthlyMinTempTS,120,2,0,0,0,1,1,12)
## $pred
## Jan Feb Mar Apr May Jun Jul
## 1991 15.248472 15.486091 14.845551 12.556087 10.210343 7.579259 7.102069
## 1992 15.258478 15.474305 14.868050 12.580553 10.242439 7.613236 7.138020
## 1993 15.296362 15.512218 14.905981 12.618493 10.280385 7.651184 7.175971
## 1994 15.334314 15.550171 14.943933 12.656446 10.318337 7.689137 7.213923
## 1995 15.372267 15.588124 14.981886 12.694399 10.356290 7.727090 7.251876
## 1996 15.410220 15.626076 15.019839 12.732351 10.394243 7.765042 7.289829
## 1997 15.448173 15.664029 15.057791 12.770304 10.432196 7.802995 7.327781
## 1998 15.486125 15.701982 15.095744 12.808257 10.470148 7.840948 7.365734
## 1999 15.524078 15.739935 15.133697 12.846209 10.508101 7.878900 7.403687
## 2000 15.562031 15.777887 15.171649 12.884162 10.546054 7.916853 7.441639
## Aug Sep Oct Nov Dec
## 1991 7.992644 9.220982 10.615139 12.752758 14.195904
## 1992 8.029371 9.258276 10.652706 12.790498 14.233734
## 1993 8.067323 9.296228 10.690659 12.828450 14.271687
## 1994 8.105275 9.334181 10.728611 12.866403 14.309640
## 1995 8.143228 9.372134 10.766564 12.904356 14.347592
## 1996 8.181181 9.410086 10.804517 12.942308 14.385545
## 1997 8.219133 9.448039 10.842469 12.980261 14.423498
## 1998 8.257086 9.485992 10.880422 13.018214 14.461450
## 1999 8.295039 9.523944 10.918375 13.056166 14.499403
## 2000 8.332991 9.561897 10.956327 13.094119 14.537356
##
## $se
## Jan Feb Mar Apr May Jun Jul
## 1991 0.9893561 1.0073505 1.0368022 1.0404600 1.0427207 1.0431994 1.0434020
## 1992 1.0604507 1.0611095 1.0622007 1.0623390 1.0624243 1.0624424 1.0624497
## 1993 1.0790274 1.0796649 1.0807299 1.0808644 1.0809477 1.0809652 1.0809724
## 1994 1.0972701 1.0978970 1.0989444 1.0990766 1.0991585 1.0991758 1.0991828
## 1995 1.1152144 1.1158312 1.1168618 1.1169919 1.1170725 1.1170895 1.1170964
## 1996 1.1328745 1.1334817 1.1344963 1.1346244 1.1347037 1.1347204 1.1347272
## 1997 1.1502635 1.1508616 1.1518608 1.1519870 1.1520651 1.1520816 1.1520883
## 1998 1.1673936 1.1679828 1.1689674 1.1690917 1.1691687 1.1691849 1.1691916
## 1999 1.1842758 1.1848567 1.1858273 1.1859498 1.1860257 1.1860417 1.1860482
## 2000 1.2009208 1.2014936 1.2024508 1.2025716 1.2026464 1.2026622 1.2026687
## Aug Sep Oct Nov Dec
## 1991 1.0434553 1.0434710 1.0434704 1.0434219 1.0433923
## 1992 1.0624510 1.0624478 1.0624418 1.0623923 1.0623627
## 1993 1.0809736 1.0809705 1.0809645 1.0809159 1.0808868
## 1994 1.0991840 1.0991809 1.0991751 1.0991273 1.0990987
## 1995 1.1170975 1.1170945 1.1170888 1.1170418 1.1170136
## 1996 1.1347283 1.1347254 1.1347197 1.1346734 1.1346457
## 1997 1.1520894 1.1520865 1.1520809 1.1520353 1.1520080
## 1998 1.1691927 1.1691898 1.1691843 1.1691394 1.1691125
## 1999 1.1860493 1.1860465 1.1860411 1.1859968 1.1859703
## 2000 1.2026697 1.2026670 1.2026616 1.2026179 1.2025918
We can go on to say that the data followed a seasonal trend and we had to perform seasonal differencing to it. Since the non seasonal component was already stationary, we did not need to perform ordinary differencing. So we went on to predict probable models through the likes of Residual approach,EACF,ACF,PACF,BIC. On arriving at the estimated set of models, we performed model fitting through parameter estimation using Maximum Likelihood approach to test how significant the model is. Now based on the AIC score and our residual analysis, we selected the best model as SARIMA(2,0,0)*(0,1,1)s=12 and with this model we predicted the temperature forecast for the next 10 years
Dataset reference: Daily Minimum Temperatures in Melbourne. (2021). Retrieved 12 June 2021, from https://www.kaggle.com/paulbrabban/daily-minimum-temperatures-in-melbourne
Cryer, J. D., & Chan, K.-S. (n.d.). Time Series Analysis with Applications in R. Www.Stat.Ipb. Retrieved June 12, 2021, from https://www.stat.ipb.ac.id/en/uploads/KS/S2%20-%20ADW/2%20Cryer%20-%20Time%20Series%20Analysis%20With%20Applications%20in%20R.pdf
In this assignment, we worked on SARIMA models that helps in modelling time series with not so regular seasonal tendencies than a deterministic model. We followed the below procedure to accomplish the task at hand