R Markdown

CONTENTS

1.Introduction

2.Import library

3.Import dataset

4.Convert and plot time series ,lag

5.Modeling

a)Linear

b)Quadratic

c)Cosine

6.Comparing ACF plots

7.Comparing Q-Q plot to check normality

8.Comparing histogram

9.Prediction for next 5 year

10.Propose a set of possible ARIMA(p, d, q) models

11.Conclusion

12.References

INTRODUCTION

The given yearly changes in the thickness of Ozone layer 1927 to 2016 in Dobson units. A negative value in the dataset represents a decrease in the thickness and a positive value represents an increase in the thickness.We need to ind the best fitting model among the linear, quadratic, cosine, cyclical or seasonal trend models by implementing the model-building strategy and give the predictions of yearly changes for the next 5 years using the best model . Also we need to propose a set of possible ARIMA(p, d, q) models using all suitable model specification tools.

TASK 1

Find the best fitting model and find the predictions of yearly changes for the next 5 years .

#Import library

library(readr)
library(TSA)
## Warning: package 'TSA' was built under R version 4.0.4
## 
## Attaching package: 'TSA'
## The following object is masked from 'package:readr':
## 
##     spec
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## The following object is masked from 'package:utils':
## 
##     tar
library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
#Import dataset

ozone<-read_csv("data1.csv")
## Parsed with column specification:
## cols(
##   `1.351184356` = col_double()
## )
#convert into time series

Ozonelayer<- ts(ozone, start=1927, end=2016, frequency=1)

# Plot the time series for given ozone layer

plot(Ozonelayer,col=c("purple"),type='o',ylab='Thickness Of Ozone Layerin Dobson Units',xlab='Time (Years)' ,lwd =2 
     ,main ="Measure thickness of ozone layer")

# build a scatter plot between time series and lag

plot(y =Ozonelayer,x =zlag(Ozonelayer) ,col=c("navy") ,ylab = "Ozonelayer" ,xlab="lag of Ozonelayer" , main ="scatter plot between timeseries and its lag")

#observation: Scatter plots shows positive correlation between time series and its lag

Observations from plots:

#Trend : There is a clear evidence of negative trend, signifying that the thickness of ozone layer has reduced over given period of time.

#Seasonality: No repeating pattern can be observed, hence no seasonality present.

#Change in Variance: Almost no change in variance can be observed.

#Intervention: No apparent Intervention can be observed.

#Autocorrelation: Some local trends and oscillating behaviour above mean level can be observed which indicate Autoregressive and Moving Average like behaviour.

Modelling

# 1. LINEAR MODEL

linearmodel =lm(formula = Ozonelayer ~ time(Ozonelayer))
linearmodel
## 
## Call:
## lm(formula = Ozonelayer ~ time(Ozonelayer))
## 
## Coefficients:
##      (Intercept)  time(Ozonelayer)  
##         201.2656           -0.1037
summary(lm(formula = Ozonelayer ~ time(Ozonelayer)))
## 
## Call:
## lm(formula = Ozonelayer ~ time(Ozonelayer))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9431 -1.6534 -0.1078  1.4644  8.5847 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      201.265599  17.895612   11.25   <2e-16 ***
## time(Ozonelayer)  -0.103715   0.009076  -11.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.237 on 88 degrees of freedom
## Multiple R-squared:  0.5974, Adjusted R-squared:  0.5928 
## F-statistic: 130.6 on 1 and 88 DF,  p-value: < 2.2e-16
plot(Ozonelayer, type='o', ylab='y', main='Time Series Plot for Yearly Changes for Linear model' ,col=c("blue"))
abline(linearmodel)

#Residual Analysis 
 

res.model1 = rstudent(linearmodel)

plot(y = rstudent(linearmodel), x = as.vector(time(Ozonelayer)),xlab = 'Time', ylab='Standardized Residuals of model1',type='o',col=c("mediumblue")
     ,main ="Standardized residuals of linear model")

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

#QUADRATIC MODEL

t = time(Ozonelayer)
t2 = t^2
quadraticmodel = lm(Ozonelayer~t+t2)
summary(quadraticmodel)
## 
## Call:
## lm(formula = Ozonelayer ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2845 -1.3749 -0.1366  1.3661 10.0970 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -4.301e+03  1.449e+03  -2.969  0.00386 **
## t            4.465e+00  1.470e+00   3.037  0.00315 **
## t2          -1.159e-03  3.728e-04  -3.108  0.00254 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.134 on 87 degrees of freedom
## Multiple R-squared:  0.6376, Adjusted R-squared:  0.6293 
## F-statistic: 76.54 on 2 and 87 DF,  p-value: < 2.2e-16
plot(ts(fitted(quadraticmodel)), ylim=c(min(c(fitted(quadraticmodel), as.vector(Ozonelayer))), max(c(fitted(quadraticmodel), as.vector(Ozonelayer)))), ylab='y', main = "Fitted Quadratic Curve to Ozone Thickness data")
lines(as.vector(Ozonelayer),type="o")

#residual analysis

res.model2 = rstudent(quadraticmodel)

plot(y=rstudent(quadraticmodel),x=as.vector(time(Ozonelayer)),xlab='Time',ylab ='Standardized Residuals',type='o',main = 'Time Series Plot Of Residuals for quadratic model',col=c('blue'))

#COSINE MODEL

harmonic.=harmonic(Ozonelayer,0.45) # calculate cos(2*pi*t) and sin(2*pi*t)
harmonicModel=lm(Ozonelayer~harmonic.)
harmonicModel
## 
## Call:
## lm(formula = Ozonelayer ~ harmonic.)
## 
## Coefficients:
##          (Intercept)  harmonic.cos(2*pi*t)  harmonic.sin(2*pi*t)  
##           -2.855e+00                    NA             8.314e+11
summary(harmonicModel)
## 
## Call:
## lm(formula = Ozonelayer ~ harmonic.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0277 -1.8342  0.4986  2.2245  6.8841 
## 
## Coefficients: (1 not defined because of singularities)
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -2.855e+00  4.758e-01  -6.000  4.3e-08 ***
## harmonic.cos(2*pi*t)         NA         NA      NA       NA    
## harmonic.sin(2*pi*t)  8.314e+11  7.057e+11   1.178    0.242    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.498 on 88 degrees of freedom
## Multiple R-squared:  0.01553,    Adjusted R-squared:  0.004339 
## F-statistic: 1.388 on 1 and 88 DF,  p-value: 0.2419
harmonicModel=lm(Ozonelayer~harmonic.+t)
summary(harmonicModel)
## 
## Call:
## lm(formula = Ozonelayer ~ harmonic. + t)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2117 -1.5766 -0.0827  1.4826  8.2228 
## 
## Coefficients: (1 not defined because of singularities)
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           2.003e+02  1.787e+01  11.210   <2e-16 ***
## harmonic.cos(2*pi*t)         NA         NA      NA       NA    
## harmonic.sin(2*pi*t)  5.446e+11  4.508e+11   1.208     0.23    
## t                    -1.031e-01  9.067e-03 -11.371   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.231 on 87 degrees of freedom
## Multiple R-squared:  0.604,  Adjusted R-squared:  0.5949 
## F-statistic: 66.36 on 2 and 87 DF,  p-value: < 2.2e-16
#residual analysis

plot(y=rstudent(harmonicModel),x=as.vector(time(Ozonelayer)), xlab='Time',
     ylab='Standardized Residuals ',type='o', 
     main = "Time series plot of residuals of Cosine Model", col=c("blue"))

Observation:

We can observed that all residuals plots for linear ,quadratic and cosine model hovers above mean level 0.

#COMPARING OF ACF PLOTS (for linear ,quadratic and cosine model)

par(mar=c(2,4,2,1),mfrow=c(2,2),cex.main=0.75, cex.lab=0.75, cex.axis=0.75)

acf(res.model1 ,main="ACF of Standardized Residuals of linear model")  
acf(rstudent(quadraticmodel),main="ACF of Standardized Residuals of quardartic model",col=c('blue'))
acf(harmonicModel$residuals, main = "ACF of standardized residuals of Harmonic model")

Observation: we see that there is a significant correlation in certain lags of residuals for linear and quadratic models.Also autocorrelation is higher in quadratic plots than linear or harmonic plots.

#COMPARING OF Q-Q PLOTS TO CHECK NORMALITY (for linear ,quadratic and cosine model)

par(mar=c(2,4,2,1),mfrow=c(2,2),cex.main=0.75, cex.lab=0.75, cex.axis=0.75)

qqnorm(rstudent(linearmodel), main = "Normal Q-Q plot of the linear model.")
qqline(rstudent(linearmodel), col = 2, lwd = 1, lty = 2)


qqnorm(rstudent(quadraticmodel), main = "Normal Q-Q plot of quadratic model.")
qqline(rstudent(quadraticmodel), col = 2, lwd = 1, lty = 2)


qqnorm(rstudent(harmonicModel), main = "Normal Q-Q plot of harmonic model.")
qqline(rstudent(harmonicModel), col = 2, lwd = 1, lty = 2)

Observation: We can see that the Quadratic model have a approximately normal fits.For other model like linear and cosine model ,slight deviations from normal fit can be observed.

#COMPARING HISTOGRAMS

par(mar=c(2,4,2,1),mfrow=c(2,2),cex.main=0.75, cex.lab=0.75, cex.axis=0.75)

hist(rstudent(linearmodel), xlab = 'Standardized Residuals',main ="Histogram for linear model" ,freq = FALSE, ylim = c(0,0.6))
curve(dnorm(x, mean=mean(rstudent(linearmodel)), sd=sd(rstudent(linearmodel))), col="red", lwd=2, add=TRUE, yaxt="n")


hist(rstudent(quadraticmodel) ,main="Histogram of quadratic model" ,freq = FALSE, ylim = c(0,0.6))
curve(dnorm(x, mean=mean(rstudent(quadraticmodel)), sd=sd(rstudent(quadraticmodel))), col="red", lwd=2, add=TRUE, yaxt="n")


hist(rstudent(harmonicModel), xlab = 'Standardized Residuals' ,main="Histogram of harmonic model",freq = FALSE, ylim = c(0,0.6))
curve(dnorm(x, mean=mean(rstudent(harmonicModel)), sd=sd(rstudent(harmonicModel))), col="red", lwd=2, add=TRUE, yaxt="n")

Observation: Both Linear and the Quadratic model seems to be approximately normal in comparison to harmonic model by looking at histograms.

Observation from modelling:

Looking at all three models we found quadratic model as best model since it has higher autocorrelation in campare to other models, it approximately fits normal curve (both q-q plot and histogram) compared to other models.

#PREDICTION FOR NEXT 5 YEARS (2017 -2021)

# We found overall quadratic as better model ,so we will use it to forecast


t = time(Ozonelayer)
t2 = t^2

# Predict the data for the next five years
forecasts <- predict.lm(quadraticmodel, data.frame(t=c(2017,2018,2019,2020,2021), t2=c(2017^2,2018^2,2019^2,2020^2,2021^2)), interval = "prediction" )
print(forecasts)
##          fit       lwr       upr
## 1  -9.544501 -14.00317 -5.085833
## 2  -9.754807 -14.23282 -5.276799
## 3  -9.967431 -14.46618 -5.468680
## 4 -10.182372 -14.70332 -5.661427
## 5 -10.399630 -14.94427 -5.854993
#Plot of forecasting for next 5 year
plot(Ozonelayer, ylab = "Ozone Layer Thickness in Dobson Units",ylim=c(-20,5),type="o",xlab ="years",
     main = " Time Series Plot of Ozone Layer Thickness for next 5 year")

lines(ts(as.vector(forecasts[,1]), start = 2017), col="red", type="l", lwd=2)
lines(ts(as.vector(forecasts[,2]), start = 2017), col="blue", type="l", lwd=2)
lines(ts(as.vector(forecasts[,3]), start = 2017), col="green", type="l", lwd=2)

SOLUTION OF TASK 1

We found quadratic model as best model out of all model and was used to forcast thickness of ozone layer for next 5 year.

TASK 2

#Propose a set of possible ARIMA(p, d, q) models

# Both ACF  and PACF  are used to observe the main characteristics of ARMA models.
# We will first check acf and pacf patterns using time series data Ozonelayer.

par(mfrow=c(1,2))

acf(Ozonelayer)
pacf(Ozonelayer)

# from both acf and pacf pattern there is a likelihood of AR1 model

#slight decaying pattern can be observed in acf graph which indicates presence of some trend
#The  ACF plots shows decaying pattern ,approximately, and remains well above the significance range (dotted blue lines).

# we will perform adf test to check if the series is stationary or not.

adf.test(Ozonelayer)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  Ozonelayer
## Dickey-Fuller = -3.7419, Lag order = 4, p-value = 0.02531
## alternative hypothesis: stationary
# since value can be observed as 0.023531 <0.05 so it is less than 0.05, so we reject null hypothesis of being non stationery.

# We will further test by taking differenced of time series.
# we will perform difference to check again if the given series is stationary or non stationery

data.diff = diff(Ozonelayer)

par(mfrow=c(1,2))
acf(data.diff)
pacf(data.diff)

# There are significant autocorrelation at lag3 on both ACF and PACF

#Perform adf test on difference 

adf.test(data.diff)
## Warning in adf.test(data.diff): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data.diff
## Dickey-Fuller = -5.197, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# Since again pvalue =0.01 is <0.05 ,so we reject the null hypothesis that it is non stationery


#(EACF) method is used to identify the order of autoregressive and moving average components of ARMA models.

eacf(data.diff)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o x o o o o o o o o  o  o  o 
## 1 x o x o o o o o o o o  o  o  o 
## 2 x x o o o o o o o o o  o  o  o 
## 3 x o x o o o o o o o o  o  o  o 
## 4 x o x o o o o o o o o  o  o  o 
## 5 x x x o o o o o o o o  o  o  o 
## 6 o x x x o o o o o o o  o  o  o 
## 7 x x o x o o o o o o o  o  o  o
# Here from the above eacf test ,we found  p=0,q=0 , so its Random walk model

#p = 0; q = 1. ARIMA(0,1,1)
#p = 1; q = 1. ARIMA(1,1,1)
#p = 2; q = 2. ARIMA(2,1,2)
#p = 3; q = 1. ARIMA(3,1,1)
#p = 3; q = 3. ARIMA(3,1,3)



# Display the BIC table

res = armasubsets(y=data.diff,nar=14,nma=14,y.name='test',ar.method='ols')
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 8 linear dependencies found
## Reordering variables and trying again:
plot(res)

SOLUTION OF TASK2

Proposed sets of ARIMA(p,d,q) models could be either ARIMA(0,1,1) ,ARIMA(1,1,1), ARIMA(2,1,2), ARIMA(3,1,1), ARIMA(3,1,3)

In BOC table we could see best model (from top to bottom) with set of lag3 and error lag10

#—————————————————————————————————————————

CONCLUSION

  1. From BIC table best model contains lags 3 of the observed time series and lag 10 of the error process.

  2. Also from adf test we rejected the null hypothesis that it is non stationery.We also cross checked it by using difference of series and again we got same result.So the series is stationery.

  3. After modelling tie series data using different modelling technique,we found quadratic model gives the best fit.

REFERENCES

https://rmit.instructure.com/courses/79716/files https://rmit.instructure.com/courses/79716/files/16758172?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/16758173?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/16758176?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/17368486?fd_cookie_set=1 https://rmit.instructure.com/courses/79716/files/16758179?fd_cookie_set=1