CONTENTS
1.Introduction
2.Import library
3.Import dataset
4.Convert and plot time series ,lag
5.Modeling
a)Linear
b)Quadratic
c)Cosine
6.Comparing ACF plots
7.Comparing Q-Q plot to check normality
8.Comparing histogram
9.Prediction for next 5 year
10.Propose a set of possible ARIMA(p, d, q) models
11.Conclusion
12.References
The given yearly changes in the thickness of Ozone layer 1927 to 2016 in Dobson units. A negative value in the dataset represents a decrease in the thickness and a positive value represents an increase in the thickness.We need to ind the best fitting model among the linear, quadratic, cosine, cyclical or seasonal trend models by implementing the model-building strategy and give the predictions of yearly changes for the next 5 years using the best model . Also we need to propose a set of possible ARIMA(p, d, q) models using all suitable model specification tools.
Find the best fitting model and find the predictions of yearly changes for the next 5 years .
#Import library
library(readr)
library(TSA)
## Warning: package 'TSA' was built under R version 4.0.4
##
## Attaching package: 'TSA'
## The following object is masked from 'package:readr':
##
## spec
## The following objects are masked from 'package:stats':
##
## acf, arima
## The following object is masked from 'package:utils':
##
## tar
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
#Import dataset
ozone<-read_csv("data1.csv")
## Parsed with column specification:
## cols(
## `1.351184356` = col_double()
## )
#convert into time series
Ozonelayer<- ts(ozone, start=1927, end=2016, frequency=1)
# Plot the time series for given ozone layer
plot(Ozonelayer,col=c("purple"),type='o',ylab='Thickness Of Ozone Layerin Dobson Units',xlab='Time (Years)' ,lwd =2
,main ="Measure thickness of ozone layer")
# build a scatter plot between time series and lag
plot(y =Ozonelayer,x =zlag(Ozonelayer) ,col=c("navy") ,ylab = "Ozonelayer" ,xlab="lag of Ozonelayer" , main ="scatter plot between timeseries and its lag")
#observation: Scatter plots shows positive correlation between time series and its lag
#Trend : There is a clear evidence of negative trend, signifying that the thickness of ozone layer has reduced over given period of time.
#Seasonality: No repeating pattern can be observed, hence no seasonality present.
#Change in Variance: Almost no change in variance can be observed.
#Intervention: No apparent Intervention can be observed.
#Autocorrelation: Some local trends and oscillating behaviour above mean level can be observed which indicate Autoregressive and Moving Average like behaviour.
# 1. LINEAR MODEL
linearmodel =lm(formula = Ozonelayer ~ time(Ozonelayer))
linearmodel
##
## Call:
## lm(formula = Ozonelayer ~ time(Ozonelayer))
##
## Coefficients:
## (Intercept) time(Ozonelayer)
## 201.2656 -0.1037
summary(lm(formula = Ozonelayer ~ time(Ozonelayer)))
##
## Call:
## lm(formula = Ozonelayer ~ time(Ozonelayer))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9431 -1.6534 -0.1078 1.4644 8.5847
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 201.265599 17.895612 11.25 <2e-16 ***
## time(Ozonelayer) -0.103715 0.009076 -11.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.237 on 88 degrees of freedom
## Multiple R-squared: 0.5974, Adjusted R-squared: 0.5928
## F-statistic: 130.6 on 1 and 88 DF, p-value: < 2.2e-16
plot(Ozonelayer, type='o', ylab='y', main='Time Series Plot for Yearly Changes for Linear model' ,col=c("blue"))
abline(linearmodel)
#Residual Analysis
res.model1 = rstudent(linearmodel)
plot(y = rstudent(linearmodel), x = as.vector(time(Ozonelayer)),xlab = 'Time', ylab='Standardized Residuals of model1',type='o',col=c("mediumblue")
,main ="Standardized residuals of linear model")
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
#QUADRATIC MODEL
t = time(Ozonelayer)
t2 = t^2
quadraticmodel = lm(Ozonelayer~t+t2)
summary(quadraticmodel)
##
## Call:
## lm(formula = Ozonelayer ~ t + t2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2845 -1.3749 -0.1366 1.3661 10.0970
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.301e+03 1.449e+03 -2.969 0.00386 **
## t 4.465e+00 1.470e+00 3.037 0.00315 **
## t2 -1.159e-03 3.728e-04 -3.108 0.00254 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.134 on 87 degrees of freedom
## Multiple R-squared: 0.6376, Adjusted R-squared: 0.6293
## F-statistic: 76.54 on 2 and 87 DF, p-value: < 2.2e-16
plot(ts(fitted(quadraticmodel)), ylim=c(min(c(fitted(quadraticmodel), as.vector(Ozonelayer))), max(c(fitted(quadraticmodel), as.vector(Ozonelayer)))), ylab='y', main = "Fitted Quadratic Curve to Ozone Thickness data")
lines(as.vector(Ozonelayer),type="o")
#residual analysis
res.model2 = rstudent(quadraticmodel)
plot(y=rstudent(quadraticmodel),x=as.vector(time(Ozonelayer)),xlab='Time',ylab ='Standardized Residuals',type='o',main = 'Time Series Plot Of Residuals for quadratic model',col=c('blue'))
#COSINE MODEL
harmonic.=harmonic(Ozonelayer,0.45) # calculate cos(2*pi*t) and sin(2*pi*t)
harmonicModel=lm(Ozonelayer~harmonic.)
harmonicModel
##
## Call:
## lm(formula = Ozonelayer ~ harmonic.)
##
## Coefficients:
## (Intercept) harmonic.cos(2*pi*t) harmonic.sin(2*pi*t)
## -2.855e+00 NA 8.314e+11
summary(harmonicModel)
##
## Call:
## lm(formula = Ozonelayer ~ harmonic.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0277 -1.8342 0.4986 2.2245 6.8841
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.855e+00 4.758e-01 -6.000 4.3e-08 ***
## harmonic.cos(2*pi*t) NA NA NA NA
## harmonic.sin(2*pi*t) 8.314e+11 7.057e+11 1.178 0.242
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.498 on 88 degrees of freedom
## Multiple R-squared: 0.01553, Adjusted R-squared: 0.004339
## F-statistic: 1.388 on 1 and 88 DF, p-value: 0.2419
harmonicModel=lm(Ozonelayer~harmonic.+t)
summary(harmonicModel)
##
## Call:
## lm(formula = Ozonelayer ~ harmonic. + t)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2117 -1.5766 -0.0827 1.4826 8.2228
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.003e+02 1.787e+01 11.210 <2e-16 ***
## harmonic.cos(2*pi*t) NA NA NA NA
## harmonic.sin(2*pi*t) 5.446e+11 4.508e+11 1.208 0.23
## t -1.031e-01 9.067e-03 -11.371 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.231 on 87 degrees of freedom
## Multiple R-squared: 0.604, Adjusted R-squared: 0.5949
## F-statistic: 66.36 on 2 and 87 DF, p-value: < 2.2e-16
#residual analysis
plot(y=rstudent(harmonicModel),x=as.vector(time(Ozonelayer)), xlab='Time',
ylab='Standardized Residuals ',type='o',
main = "Time series plot of residuals of Cosine Model", col=c("blue"))
We can observed that all residuals plots for linear ,quadratic and cosine model hovers above mean level 0.
#COMPARING OF ACF PLOTS (for linear ,quadratic and cosine model)
par(mar=c(2,4,2,1),mfrow=c(2,2),cex.main=0.75, cex.lab=0.75, cex.axis=0.75)
acf(res.model1 ,main="ACF of Standardized Residuals of linear model")
acf(rstudent(quadraticmodel),main="ACF of Standardized Residuals of quardartic model",col=c('blue'))
acf(harmonicModel$residuals, main = "ACF of standardized residuals of Harmonic model")
#COMPARING OF Q-Q PLOTS TO CHECK NORMALITY (for linear ,quadratic and cosine model)
par(mar=c(2,4,2,1),mfrow=c(2,2),cex.main=0.75, cex.lab=0.75, cex.axis=0.75)
qqnorm(rstudent(linearmodel), main = "Normal Q-Q plot of the linear model.")
qqline(rstudent(linearmodel), col = 2, lwd = 1, lty = 2)
qqnorm(rstudent(quadraticmodel), main = "Normal Q-Q plot of quadratic model.")
qqline(rstudent(quadraticmodel), col = 2, lwd = 1, lty = 2)
qqnorm(rstudent(harmonicModel), main = "Normal Q-Q plot of harmonic model.")
qqline(rstudent(harmonicModel), col = 2, lwd = 1, lty = 2)
#COMPARING HISTOGRAMS
par(mar=c(2,4,2,1),mfrow=c(2,2),cex.main=0.75, cex.lab=0.75, cex.axis=0.75)
hist(rstudent(linearmodel), xlab = 'Standardized Residuals',main ="Histogram for linear model" ,freq = FALSE, ylim = c(0,0.6))
curve(dnorm(x, mean=mean(rstudent(linearmodel)), sd=sd(rstudent(linearmodel))), col="red", lwd=2, add=TRUE, yaxt="n")
hist(rstudent(quadraticmodel) ,main="Histogram of quadratic model" ,freq = FALSE, ylim = c(0,0.6))
curve(dnorm(x, mean=mean(rstudent(quadraticmodel)), sd=sd(rstudent(quadraticmodel))), col="red", lwd=2, add=TRUE, yaxt="n")
hist(rstudent(harmonicModel), xlab = 'Standardized Residuals' ,main="Histogram of harmonic model",freq = FALSE, ylim = c(0,0.6))
curve(dnorm(x, mean=mean(rstudent(harmonicModel)), sd=sd(rstudent(harmonicModel))), col="red", lwd=2, add=TRUE, yaxt="n")
Looking at all three models we found quadratic model as best model since it has higher autocorrelation in campare to other models, it approximately fits normal curve (both q-q plot and histogram) compared to other models.
#PREDICTION FOR NEXT 5 YEARS (2017 -2021)
# We found overall quadratic as better model ,so we will use it to forecast
t = time(Ozonelayer)
t2 = t^2
# Predict the data for the next five years
forecasts <- predict.lm(quadraticmodel, data.frame(t=c(2017,2018,2019,2020,2021), t2=c(2017^2,2018^2,2019^2,2020^2,2021^2)), interval = "prediction" )
print(forecasts)
## fit lwr upr
## 1 -9.544501 -14.00317 -5.085833
## 2 -9.754807 -14.23282 -5.276799
## 3 -9.967431 -14.46618 -5.468680
## 4 -10.182372 -14.70332 -5.661427
## 5 -10.399630 -14.94427 -5.854993
#Plot of forecasting for next 5 year
plot(Ozonelayer, ylab = "Ozone Layer Thickness in Dobson Units",ylim=c(-20,5),type="o",xlab ="years",
main = " Time Series Plot of Ozone Layer Thickness for next 5 year")
lines(ts(as.vector(forecasts[,1]), start = 2017), col="red", type="l", lwd=2)
lines(ts(as.vector(forecasts[,2]), start = 2017), col="blue", type="l", lwd=2)
lines(ts(as.vector(forecasts[,3]), start = 2017), col="green", type="l", lwd=2)
We found quadratic model as best model out of all model and was used to forcast thickness of ozone layer for next 5 year.
#Propose a set of possible ARIMA(p, d, q) models
# Both ACF and PACF are used to observe the main characteristics of ARMA models.
# We will first check acf and pacf patterns using time series data Ozonelayer.
par(mfrow=c(1,2))
acf(Ozonelayer)
pacf(Ozonelayer)
# from both acf and pacf pattern there is a likelihood of AR1 model
#slight decaying pattern can be observed in acf graph which indicates presence of some trend
#The ACF plots shows decaying pattern ,approximately, and remains well above the significance range (dotted blue lines).
# we will perform adf test to check if the series is stationary or not.
adf.test(Ozonelayer)
##
## Augmented Dickey-Fuller Test
##
## data: Ozonelayer
## Dickey-Fuller = -3.7419, Lag order = 4, p-value = 0.02531
## alternative hypothesis: stationary
# since value can be observed as 0.023531 <0.05 so it is less than 0.05, so we reject null hypothesis of being non stationery.
# We will further test by taking differenced of time series.
# we will perform difference to check again if the given series is stationary or non stationery
data.diff = diff(Ozonelayer)
par(mfrow=c(1,2))
acf(data.diff)
pacf(data.diff)
# There are significant autocorrelation at lag3 on both ACF and PACF
#Perform adf test on difference
adf.test(data.diff)
## Warning in adf.test(data.diff): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data.diff
## Dickey-Fuller = -5.197, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# Since again pvalue =0.01 is <0.05 ,so we reject the null hypothesis that it is non stationery
#(EACF) method is used to identify the order of autoregressive and moving average components of ARMA models.
eacf(data.diff)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o x o o o o o o o o o o o
## 1 x o x o o o o o o o o o o o
## 2 x x o o o o o o o o o o o o
## 3 x o x o o o o o o o o o o o
## 4 x o x o o o o o o o o o o o
## 5 x x x o o o o o o o o o o o
## 6 o x x x o o o o o o o o o o
## 7 x x o x o o o o o o o o o o
# Here from the above eacf test ,we found p=0,q=0 , so its Random walk model
#p = 0; q = 1. ARIMA(0,1,1)
#p = 1; q = 1. ARIMA(1,1,1)
#p = 2; q = 2. ARIMA(2,1,2)
#p = 3; q = 1. ARIMA(3,1,1)
#p = 3; q = 3. ARIMA(3,1,3)
# Display the BIC table
res = armasubsets(y=data.diff,nar=14,nma=14,y.name='test',ar.method='ols')
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 8 linear dependencies found
## Reordering variables and trying again:
plot(res)
Proposed sets of ARIMA(p,d,q) models could be either ARIMA(0,1,1) ,ARIMA(1,1,1), ARIMA(2,1,2), ARIMA(3,1,1), ARIMA(3,1,3)
In BOC table we could see best model (from top to bottom) with set of lag3 and error lag10
#—————————————————————————————————————————
From BIC table best model contains lags 3 of the observed time series and lag 10 of the error process.
Also from adf test we rejected the null hypothesis that it is non stationery.We also cross checked it by using difference of series and again we got same result.So the series is stationery.
After modelling tie series data using different modelling technique,we found quadratic model gives the best fit.
https://rmit.instructure.com/courses/79716/files https://rmit.instructure.com/courses/79716/files/16758172?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/16758173?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/16758176?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/17368486?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/16758179?fd_cookie_set=1