Time Series Model Specification of Ozone layer

1 Part A

1.1 Introduction

1.2 Data Analysis & Interpretation

1.3 Conclusion

2 Part B

2.1 Introduction

2.2 Data Analysis & Interpretation

2.3 Conclusion

Part A

1.1 Introduction

A data set which contains the information of ozone layer thickness from 1927 to 2016 is used to analyse the trend. Objective of the project is to identify a best fitting model for the data and to give predictions for the next five years using the identified model. For identifying best models, linear, quadratic, cosine, cyclical or seasonal trend models are considered and the model which best fits the data set will be selected.

1.2 Data Analysis & Interpretation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## 
## Attaching package: 'TSA'

## The following object is masked from 'package:readr':
## 
##     spec

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Registered S3 methods overwritten by 'forecast':
##   method       from
##   fitted.Arima TSA 
##   plot.Arima   TSA

## [1] "ts"

1.2.1 Analysis Of Time Series Plot

Trend : Trend is vissible in the plot. There is a downward trend which implies as time increases Ozone Layer Thickness is getting reduced.
Changing Variance : Eventhough there is slight change of variance at some points, overall we can say there is no changing variance.
Intervention : We cannot observe any drastic change at a single point and hence can say there is no intervention point
Behavior/Autocorrelation : Plot has successive points as well as fluctuation. So it can be an ARIMA model which will be checked at later patter of this report

1.2.2 Finding Best Model

1.2.2.1 Linear Model

Expression for linear trend model : Y = aX+c Estimating slope (a) and intercept function (c)

model1=lm(Ozone.data1~time(Ozone.data1))
summary(model1)

## 
## Call:
## lm(formula = Ozone.data1 ~ time(Ozone.data1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       213.720155  16.257158   13.15   <2e-16 ***
## time(Ozone.data1)  -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

Slope & Intercept are satistically signifant at 5% significance level. But the R-squared value is 0.67, and hence this is not a good fitting model.Linear trend line is as follows

plot(Ozone.data1,type="o",ylab="Ozone Layer Thickness", main = 
       "Fitted Linear Trend")
legend("bottomleft",lty=1, bty = "n" ,text.width = 14, col=c("black","blue"),
c("Ozone Layer Thickness Changes", "Linear Model"))
abline(model1, col="blue")

Trend line is not able capture most of the values and hence linear trend is not a good fitting model.

Ressidual Analysis of Linear Trende Model

res.model1=rstudent(model1)
plot(y = res.model1, x = as.vector(time(Ozone.data1)),
     xlab = 'Time', ylab='Standardized Residuals',
     type='p', main = "Standardized Residuals vs Time")
abline(h=0)

# REsiduals vs Fitted
plot(y=res.model1,x=fitted(model1),
     xlab='Fitted Trend Values', ylab='Standardized Residuals',
     type='p',
     main = "Standardized Residuals vs Fitted values")
abline(h=0, col="red")

From the plots we can say residuals are correlated because all the points are randomly distributed.

Checking Normality Of Residuals Using Histogram

hist(rstudent(model1),xlab="Standardized Residuals", main="Histogram of Standardized Residuals")

Looking at the histogram, we can say its normally distributed.

Checking Normality Of Residuals Using QQ Plot

qqplot=rstudent(model1)
qqnorm(qqplot)
qqline(qqplot,col=2,lwd=1, lty=2)

At the start and at the end data deviates from the line and hence we can say the data is not normally distributed

Checking Normality Of Residuals Using Shapiro-Wilk Hypothesis Test

st=rstudent(model1)
shapiro.test(st)

## 
##  Shapiro-Wilk normality test
## 
## data:  st
## W = 0.98733, p-value = 0.5372

P>0.5 and hence we fail to reject null hypothesis which means stochaistic component of data doesn’t have a normal distribution.

Plotting ACF of Standardized Residuals

acf(rstudent(model1), main="ACF of Standardized Residuals")

From the ACF plots, we can observe significant lags. Hence correlation exists and we can infer that stochaistic component of series is not white noise.

1.2.4 Quadratic Model

Y = a1X + a2X^2 + c

# Obtaining summary of quadratic model
t=time(Ozone.data1)
t2=t^2
model2=lm(Ozone.data1~t+t2)
summary(model2)

## 
## Call:
## lm(formula = Ozone.data1 ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

Slope & Intercept are satistically signifant at 5% significance level. R-squared value is 0.72, which means 73% of the variation can be explained by quadratic model.Hnce this seems to be a good fitting model when compared to Linear model

Fitted Quadratic Plot

plot(ts(fitted(model2)),ylim=c(min(c(fitted(model2),
as.vector(Ozone.data1))),max(c(fitted(model2),
as.vector(Ozone.data1)))),ylab="Ozone Layer Thickness",
main="Fitted Quadratic Curve",col="blue")
legend("bottomleft",lty=1, bty = "n" ,text.width = 14, col=c("black","blue"),
c("Ozone Layer Thickness Changes", "Linear Model"))
lines(as.vector(Ozone.data1),type="o")

Plot shows that quadratic model has a better fit when compared to Linear Model

Ressidual Analysis of Quadratic Trende Model

res.model2=rstudent(model2)
plot(y = res.model2, x = as.vector(time(Ozone.data1)),
     xlab = 'Time', ylab='Standardized Residuals',
     type='p', main = "Standardized Residuals vs Time")
abline(h=0)

# Residuals vs Fitted
plot(y=res.model2,x=fitted(model2),
     xlab='Fitted Trend Values',
     ylab='Standardized Residuals',
     type='p', 
     main = "Standardized Residuals vs Fitted values")
abline(h=0, col="red")

From the plots we can say residuals are correlated because all the points are randomly distributed.

Checking Normality Of Residuals Using Histogram

hist(rstudent(model2),xlab="Standardized Residuals", main="Histogram of Standardized Residuals")

Looking at the histogram, we can say residuals are not normally distributed.

Checking Normality Of Residuals Using QQ Plot

qqplot=rstudent(model2)
qqnorm(qqplot)
qqline(qqplot,col=2,lwd=1, lty=2)

Only few points deviate from the straight line and hence we can say residuals are normally distributed

Checking Normality Of Residuals Using Shapiro-Wilk Hypothesis Test

st=rstudent(model2)
shapiro.test(st)

## 
##  Shapiro-Wilk normality test
## 
## data:  st
## W = 0.98889, p-value = 0.6493

P>0.5 and hence we fail to reject null hypothesis which means stochaistic component of data doesn’t have a normal distribution.

Plotting ACF of Standardized Residuals

acf(rstudent(model2), main="ACF of Standardized Residuals")

From the ACF plots, we can observe significant lags. Hence correlation exists and we can infer that stochaistic copmonent of series is not white noise.

1.2.5 Cosine & Cyclical Trend

There is no seasonality in the data and hence these two models are not considered for fitting.

1.2.6 Choosing Best Model

From all the plots and hypothesis testing we can say that quadratic model is a better fitting model when compared with linear model. But we can’t say this as the best model as they do not satisy the the assumption of residuals and normality. For this we need to differentiate the data and then fit the model.

Hence the quadratic model is a better fit, this model is considered to predict the ozone layer thickness for next five years.

1.2.7 Forecast for next five years

t= c(2017,2018,2019,2020,2021)
t2=t^2
x= data.frame(t,t2)
forecasts = predict(model2, x, interval = "prediction")
print(forecasts)

##         fit       lwr       upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701

plot(Ozone.data1,type='o',
     ylab='Changes in the thickness of ozone layer',
     xlim=c(1927,2022),
     ylim=c(-15,10))
lines(ts(as.vector(forecasts[,1]), start = 2017), col="red", type="l")
lines(ts(as.vector(forecasts[,2]), start = 2017), col="blue", type="l")
lines(ts(as.vector(forecasts[,3]), start = 2017), col="blue", type="l")
legend("topright", lty=1, pch=1, col=c("black","blue","red"), text.width = 21,
c("Data","5% forecast limits", "Forecasts"))

1.2.8 Conclusion

While testing different models, quadratic model was found as a better fit and the same was used to forecast ozone layer thickness for the next five years with 5% upper and lower limit.From the forecast plot we can clearly identify a downward trend which indicates that there will be a reduction in the thickness of ozone layer in the forcasted years.

Part B

2.1 Introduction

Objective of this part of the report is to predict ARIMA model for the ozone layer thickness dataset using various model specification tools.

2.2 Data Analysis & Interpretation

From the time series plot from Part A from the plot that the,

1)There is a downward trend,

2)No seasonality in the plot,

3)No changing variance

4)No intervention in the plot

5)Plot has successive points as well as fluctuation so it has Moving Aveage and Auto Regressive behaviour.

# Checking the stationarity of data using QQ Plot

qqnorm(Ozone.data1)
qqline(Ozone.data1, col =2, lwd =1, lty =2)

# From QQ plot we can say that distribution is not normal and shapiro test is used proove the same

#shapiro wilk test
shapiro.test(Ozone.data1)

## 
##  Shapiro-Wilk normality test
## 
## data:  Ozone.data1
## W = 0.95605, p-value = 0.004031

#Since the p value is less than 0.05, we cannot say that data is normal

par(mfrow=c(1,2))
acf(Ozone.data1)
pacf(Ozone.data1)

## We can observe a Slowly decaying pattern in ACF and very high first correlation in PACF which shows the existence of trend and nonstationarity which needs to be removed before predicting the model.

#Applyin ADF test for checking nonstationarity
adf.test(Ozone.data1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Ozone.data1
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary

#p>0.05  we fail to reject the null hypothesis which is that the series is non-stationary.

#Applying BocCox transformation
data2.trans = BoxCox.ar(Ozone.data1+ abs(min(Ozone.data1))+1)

data2.trans$ci

## [1] 0.9 1.5

lambda = 1.2 
# mid point of th interval obtained from plot
Ozone.data1 = Ozone.data1 + abs(min(Ozone.data1))
# For Box Cox transformation all values need to be positive
BC.data = (Ozone.data1^lambda-1)/lambda
qqnorm(BC.data) # To check the normality
qqline(BC.data, col = 2)

shapiro.test(BC.data)

## 
##  Shapiro-Wilk normality test
## 
## data:  BC.data
## W = 0.96696, p-value = 0.02169

# From the QQ plot we can interpret that the distribution is not normal.
# Shapiro test also gives enough evidence to reject the normality hypothesis
# We can say that distribution is not normal  even after box-cox transformation.
# Hence we need to do diffrencing

# Applying first difference and plotting graph to check whether it helped to reduce non stationary.
diff.BC.d1 = diff(Ozone.data1)
par(mfrow=c(1,1))
plot(diff.BC.d1,type='o',ylab='Quarterly earnings ')

adf.test(diff.BC.d1)

## Warning in adf.test(diff.BC.d1): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff.BC.d1
## Dickey-Fuller = -7.1568, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

# Based on the p-value we can reject the null hypothesis
 # of and can infer that  first differencing makes the series stationary.

par(mfrow=c(1,2)) # To plot both ACF and PACF in the same panel of plots
acf(diff.BC.d1)
pacf(diff.BC.d1)

From the ACF and PACF plot we can interpret p and q values. In ACF plot there are three significant lags, so q=3 and in PACf also number of significant lags is three, so q=3. We have done one differencing, so d=1. So the ARIMA model from PACF is (3,1,1)

eacf(diff.BC.d1)

## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o x o o o x o o x o  o  o  o 
## 1 x o x o o o o o o x o  o  o  o 
## 2 x o x o o o x o o x o  o  o  o 
## 3 x o x o o x o o o o o  o  o  o 
## 4 x o o x o x o o o o o  o  o  o 
## 5 x x x x o x o o o o o  o  o  o 
## 6 o o o x x o o o o o o  o  o  o 
## 7 o o o x o o o o o o o  o  o  o

# We have limited the order of pa & q to a maximum of five to reduce errors.
# From the output of eacf model we can consider following set of ARIMA models
# ARIMA(0,1,0)
# ARIMA(0,1,1) 
# ARIMA(1,1,1)

par(mfrow=c(1,1))
res2 = armasubsets(y=diff.BC.d1,nar=5,nma=5,y.name='test',ar.method='ols')
plot(res2)

# In the BIC table,test.lag represents AR values and error-lag represents MA values.
# So values will be AR(3) & MA(2)
# ARIMA models
#ARIMA (3,1,2)

2.3 Conclusion

By using different model specification tools proposed ARIMA models are as follows:

ARIMA(0,1,0) ARIMA(0,1,1) ARIMA(1,1,1) ARIMA(3,1,1) ARIMA(3,1,2)

Time Series Model Specification of Ozone layer

Tarun.S.Sarode

Table Of Contents

1 Part A

1.1 Introduction

1.2 Data Analysis & Interpretation

1.3 Conclusion

2 Part B

2.1 Introduction

2.2 Data Analysis & Interpretation

2.3 Conclusion

Part A

1.1 Introduction

1.2 Data Analysis & Interpretation

1.2.1 Analysis Of Time Series Plot

1.2.2 Finding Best Model

1.2.2.1 Linear Model

Ressidual Analysis of Linear Trende Model

Checking Normality Of Residuals Using Histogram

Checking Normality Of Residuals Using QQ Plot

Checking Normality Of Residuals Using Shapiro-Wilk Hypothesis Test

Plotting ACF of Standardized Residuals

1.2.4 Quadratic Model

Ressidual Analysis of Quadratic Trende Model

Checking Normality Of Residuals Using Histogram

Checking Normality Of Residuals Using QQ Plot

Checking Normality Of Residuals Using Shapiro-Wilk Hypothesis Test

Plotting ACF of Standardized Residuals

1.2.5 Cosine & Cyclical Trend

1.2.6 Choosing Best Model

1.2.7 Forecast for next five years

1.2.8 Conclusion

Part B

2.1 Introduction

2.2 Data Analysis & Interpretation

2.3 Conclusion