Introduction

The main objective of the study is to analyze thickness of ozone layer.
The dataset represents the yearly changes in the thickness of Ozone layer 1927 to 2016 in Dobson unit.A negative value in the dataset represents a decrease in the thickness and a positive value represents an increase in the thickness.
The first task is to determine the best fitting model and forecast the thickness of ozone layer for next 5 years.
The second task is to suggest suitable ARMA/ARIMA(p, d, q) model using model specification tools like ACF-PACF, EACF,BIC table

Task 1

Importing the Ozone Thickness time series data

#Importing the files 
data <- read_csv("data1.csv", col_names = FALSE)

#Provide column name 
colnames(data) <- ("ozone thickness")

#creating year sequence
year <- seq(1927,2016)

#Adding the year column to dataframe 
ozone <-  cbind(year,data)

Converting the Dataframe to the Time Series Object

#Convert the dataframe to Time Series object 
ozonets <- ts(ozone$`ozone thickness`,start = c(1927),end = c(2016),frequency = 1)

Plotting the Ozone thickness time series data

#plot the time series of ozone thickness 
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
     main="Time Series of changes in Ozone thickness from 1927 to 2016")

Trend : Trend looks like downward trend.
Seasonality : There might be seasonality but we need to perform PACF test is required to confirm seasonality in time series.
Change in Variance : Changes in variance are not significant by looking at the plot.
Behaviour : Moving Average like behaviour is observed therefore the data is non-stationery.
Change Point : There were no observations of Change point in the Time Series.

Visualize the autocorrelation of Ozone layer thickness in consecutive years

plot(y=ozonets,x=zlag(ozonets),xlab = "Previous Year thickness of Ozone layer",ylab= "Dobson Units",main = "Scatter plot of Ozone layer thickness in consecutive years")

Building right model

Linear Trend Model

#----Linear trend model----
#To fit a trend model, we need to extract time from the time series object:
t <- time(ozonets)

#fitting model 
model1 <- lm(ozonets ~ t) 
summary(model1)

## 
## Call:
## lm(formula = ozonets ~ t)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 213.720155  16.257158   13.15   <2e-16 ***
## t            -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

Observation:

Intercept and the Slope of the linear trend model is 213.72 and -0.110029 respectively.
Intercept,Slope,and p-value of the linear trend model is statistically significant at 5% level.
Adjusted R-squared value shows that 66.55% of the variance in the ozone thickness time series is explained by linear model.

Plotting the Time Series fitted to Linear Trend Model

#plotting the time series fitted to Linear model 
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
     main="Time Series of changes in Ozone thickness from 1927 to 2016")
abline(model1, col="red", lty = 2) # add a fitted least squares line from model1
legend("topright", legend = c("Time series plot", "Fitted Linear model"),
 col = c("black", "red"), lwd = 1, lty = c(1, 2), pch = c(1, NA))

According to the plot ,we can suggest that the Regression line does not cover all the points, some of the points are left out.

Residual Analysis of Linear Trend Model

The leftover of linear trend model which behaves like stochastic model is residuals.We can analyse these residuals if they behave like independant,normally distributed random variables(white noise).

par(mfrow=c(2,2))  #Divides plotting page into 4 subplots
#Visualizing  standaradized Residuals 
plot(y=rstudent(model1),x=as.vector(time(ozonets)), xlab='Time',
     ylab='Standardized Residuals',type='o', main = "Time series plot of standardized residuals 
     fitted to Ozone-Thickness series.")

#hist view 
hist(rstudent(model1),xlab='Standardized Residuals'
     , main = "Histogram of standardised residuals
     fitted to the Ozone-Thickness series.")

#---QQ plot
y = rstudent(model1)
qqnorm(y, main = "QQ plot of standardised residuals for
       linear model fitted to the Ozone-Thickness series.")
qqline(y, col = 2, lwd = 1, lty = 2)

#AutoCorrelation Function of Standaradized Residuals of model 
acf(rstudent(model1), main = "ACF of standardized residuals.")

#Shapiro-Wilk Normality for residuals
y=rstudent(model1)
shapiro.test(y)

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98733, p-value = 0.5372

Histogram : The plot concludes the standardized residuals are normally distributed.
QQ plot : We can see more number of the points of the time series falling on the straight line pattern ,there is slight deviation from the straight line pattern at the low and high ends and a slight shift is obesrved in the points around the mid section. We can say that the QQ plot supports of normally distributed stochastic component in linear trend model and very less white noise behaviour observed.
Standardized series plot : The plot suggests that the trend has been improved after standardizing the residuals.
Sample Autocorrelation Function(ACF) : According to the plot we can say that lag 1, 7 and 8 auto-correlations surpasses conidence limit.We can conclude that white noise is not the stochastic part of the series.
Shapiro test : The p-value of Shapiro test is 0.5372 which is greater than 0.05 hence, we can conclude that residual data are normally distributed.

Quadratic Model

#----Quadratic model------
# since we have time points of the series ,we can find the t^2 points 
t2 =t^2
model2 = lm(ozonets ~ t+t2) 
summary(model2)

## 
## Call:
## lm(formula = ozonets ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

Observations:

Intercept,Slope,and p-value of the quadratic trend model is statistically significant at 5% level.
R-squared value shows that 73.31% of the variance in the ozone thickness time series is explained by quadratic model better than linear model.

Plotting the Time Series fitted to Quadratic Model

#Visualizing the time series fitted to model
plot(ts(fitted(model2)), ylab='y', main = "Fitted quadratic curve to ozone-thickness series.",col = "blue",lty=2,
    ylim = c(min(c(fitted(model2), as.vector(ozonets))) ,
             max(c(fitted(model2), as.vector(ozonets)))) )
lines(as.vector(ozonets),type="o")
legend("topright", legend = c("Time series plot", "Fitted Quadratic model"),
 col = c("black", "blue"), lwd = 1, lty = c(1, 2), pch = c(1, NA))

The plot suggests that the quadratic curve covers more points as compared to regression line.

Residual analyis of Quadratic Trend Model

par(mfrow=c(2,2)) 
#Plot of standaradized Residuals 
plot(y=rstudent(model2),x=as.vector(time(ozonets)), xlab='Time',
     ylab='Standardized Residuals',type='o', main = "Time series plot of standardized residuals fitted
     to Ozone-Thickness series")

#---hist view 
hist(rstudent(model2),xlab='Standardized Residuals'
     , main = "Histogram of standardised residual
     fitted to the Ozone-Thickness series")

#QQ plot  
y = rstudent(model2)
qqnorm(y, main = "QQ plot of standardised residuals 
fitted to the Ozone-Thickness series")
qqline(y, col = 2, lwd = 1, lty = 2)

#----AutoCorrelation Function----
acf(rstudent(model2), main = "ACF of standardized residuals 
fitted to the Ozone-Thickness series")

#Shapiro-Wilk Normality for residuals
y=rstudent(model2)
shapiro.test(y)

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98889, p-value = 0.6493

Histogram : The plot concludes the standardized residuals are normally distributed.
QQ plot : We can see most of the points of the time series falling on the straight line pattern,the deviation from the straight line pattern is relatively less at the low and high ends compared to the linear model.
Standardized series plot : The plot suggests that the trend seems to be bette after standardizing the residuals, but change in variances can be observed.
Sample Autocorrelation Function(ACF) : We can see many correlations which are higher than the confidence intervals at many lags,slight white noise like behaviour is observed.
Shapiro test : The p-value of Shapiro test is 0.6493 which is greater than 0.05 hence, we can conclude that residual data are normally distributed. hence, we conclude that the standardizing residual data make the series staionary.

Forecast yearly changes of ozone layer thickness for the next 5 years

Since Quadratic Model is better than Linear model,we are using quadratic model to forecast the changes in the thickness of ozone layer for next five years.

h <- 5 # 5 steps ahead forecasts
t <- time(ozonets)
t2 <- t^2 
aheadTimes <- data.frame(t = seq(2017, 2017+h, 1),
                         t2 = seq(2017, 2017+h, 1)^2)

#creating a forecast model 
frcModel2 <- predict(model2, newdata = aheadTimes, interval = "prediction")
print(frcModel2)

##         fit       lwr       upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701
## 6 -11.62856 -15.51482 -7.742297

#Visualizing the Forecast model
plot(ozonets, xlim= c(1927,2021+h+1), ylim = c(-15,5), ylab = "Ozone-thickness series",
     main = "Forecasts from the quadratic model fitted to the Ozone-thickness series.")
lines(ts(as.vector(frcModel2[,3]), start = 2017), col="blue" , type="l")
lines(ts(as.vector(frcModel2[,1]), start = 2017), col="red", type="l")
lines(ts(as.vector(frcModel2[,2]), start = 2017), col="blue", type="l")
legend("topright", lty=1, pch=1, col=c("black","blue","red"), text.width = 24,
       c("Data" ,"5% forecast limits","Forecasts"))

Summary

Quadratic trend model would be better fit than the linear trend for the Ozone thickness time series because the R-squared value of quadratic model(73.31%) is higher than that of linear model(66.55%) as it describes better variation in the series.
Quadratic model is used for forecasting the thickness of Ozone layer for 5 years over linear trend model.

Task 2

Time Series plot of the Ozone thickness data

#Time series plot of ozone thickness 
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
     main="Time Series of changes in Ozone thickness from 1927 to 2016")

ACF and PACF plots for Ozone thickness data

#ACF and PACF tests for ozone thickness time series
par(mfrow=c(1,2))
acf(ozonets, main ="ACF plot ")
pacf(ozonets, main ="PACF plot")

ACF - There is some seaosonal behaviour with downward trend.
PACF - The first correlation in PACF is above the confidence limit which suggests the existence of trend and nonstationarity.

Dickey-Fuller Test to check the stationarity of ozone thickness time series

#Test to check the stationarity of the ozone thickness time series
adf.test(ozonets, alternative = c("stationary")) # Use default value of k

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ozonets
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary

According to Augmented Dickey-Fuller Test the p-value is 0.0867 which is greater than 0.05 which suggest not to reject the null hypothesis implying that the time series is Non-Stationary

Box-Cox Transformation for the Ozone thickness time series

#Box cox transformation 


#Since we have negative values in our data, we include them by adding the minimum value in our data and add 1.
ozonets = ozonets + abs(min(ozonets))+1

ozonets.BC.transform = BoxCox.ar(ozonets + abs(min(ozonets))+1)
title(main="Log-Likelihood vs Value of Lamdba")

#Confidence levels of Box-Cox transformed series
ozonets.BC.transform$ci

## [1] 0.9 1.8

# To find the optimal lambda value
lambda <- ozonets.BC.transform$lambda[which(max(ozonets.BC.transform$loglike) == ozonets.BC.transform$loglike)]
lambda

## [1] 1.3

Applying the Box-Cox transformation to the Ozone thickness time series

# Apply Box-Cox transformation
ozonetsBC <- (ozonets^lambda - 1) / lambda

Plotting the Box-Cox transformed Ozone thickness Series

#plot of Box -cox transformation
plot(ozonetsBC, ylab='Box-Cox transformed ozone thickness series', xlab='Time', type='o',main="Box Cox transformed Ozone thickness Series")

We observe in the above plot that there is still existence of trend which can be improved by using first lag difference analysis.

First Lag Difference time series analysis

#Calculating the first difference lag tinme series
ozonetsdiff <- diff(ozonets, differences = 1)

#Visualizing the first difference series
plot(ozonetsdiff, ylab='First difference of Ozone thickness series', xlab='Time', type='o',main="First difference of Ozone thickness series")

From the plot we can suggest that there is stationarity like behaviour ,also wave like pattern can be observed .

ACF & PACF for the first difference lag time series

#ACF and PACF tests for first difference lag time series
par(mfrow=c(1,2))
acf(ozonetsdiff, main ="ACF of first difference time series.")
pacf(ozonetsdiff, main ="PACF of first difference time series.")

ACF - There is wave like pattern observed which suggests that there is existence of trend
PACF - We can see three aucorrelation lag beyond the confidence limits.

In order to have better sense of stationarity of the time series we will analyze second lag difference of time series

Second Lag Difference time series analysis

#Calculating the second difference lag tinme series
ozonetsdiff_ <- diff(ozonets, differences = 2)

#Visualizing the second difference series
plot(ozonetsdiff_, ylab='Second lag difference of Ozone thickness series', xlab='Time', type='o',main="Second lag difference of Ozone thickness series")

From the plot we can suggest that series looks more stationary than the first lag difference.

ACF & PACF for the second difference lag time series

#ACF and PACF tests for second difference lag time series
par(mfrow=c(1,2))
acf(ozonetsdiff_, main ="ACF of second difference time series.")
pacf(ozonetsdiff_, main ="PACF of second difference time series.")

* ACF - There is no seasonal pattern in the plot and we ca proceed for further fitting the ARIMA model

PACF - We can observe there are four lags below the confidence limit

The possible ARIMA models is (3,2,2)

# Stationarity test for the first lag difference time series
adf.test(ozonetsdiff_, alternative = c("stationary")) # Use default value of k

## Warning in adf.test(ozonetsdiff_, alternative = c("stationary")): p-value
## smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ozonetsdiff_
## Dickey-Fuller = -7.6128, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

The above test confirms that the first lag differnece time series is stationary because the p-value is lesser than 0.05.

Extended Auto-Correlation Function Table

#Extended Auto Correlation Function
eacf(ozonetsdiff_)

## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o x o o o o o o x o  o  o  o 
## 1 o o x o o o o o o x o  o  o  o 
## 2 x x x o o o o o o x o  o  o  o 
## 3 x o x o o o o o o o o  o  o  o 
## 4 x x o x o o o o o o o  o  o  o 
## 5 x x x x o o o o o o o  o  o  o 
## 6 x o x x o o o o o o o  o  o  o 
## 7 x x o x o o o o o o o  o  o  o

From the eacf table, ARIMA(0,2,3),ARIMA(1,2,4),ARIMA(2,2,4),ARIMA(3,2,3) are possible potential ARIMA models.

Bayesian Information Criterion

#BIC table 
res_ = armasubsets(y=ozonetsdiff_ , nar=5 , nma=5, y.name='Ozone thickness Index', ar.method='ols')
plot(res_)

From BIC table,we can observe that p value can be 3 and the q value is 1 and lag is 2 so the ARIMA(3,2,1) is the possible ARIMA model.

Summary

We can still observe trend in the Box-Cox transformed time series which does not succeed in making the time series as stationary series.
Second lag difference of time series analysis is chosen over the first lag difference of time series for yeilding better stationary time series with no existence of trend.

Conclusion

Quadratic trend model is better than Linear trend model in modelling the ozone thickness time series data.
Set of the Potential possible models are {ARIMA(3,2,2) from ACF&PACF plots,ARIMA(0,2,3),ARIMA(1,2,4),ARIMA(2,2,4),ARIMA(3,2,3) from eacf table ARIMA(3,2,1) from BIC table }.

Appendix

library(TSA) library(readr) library(tseries)

data <- read_csv(“D:/Time Series Analysis/Assignment 1/data1.csv”, col_names = FALSE)

colnames(data) <- (“ozone thickness”) View(data)

year <- seq(1927,2016) year

ozone <- cbind(year,data) View(ozone)

ozonets <- ts(ozone$ozone thickness,start = c(1927),end = c(2016),frequency = 1)

plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”)

plot(y=ozonets,x=zlag(ozonets),xlab = “Previous Year thickness of Ozone layer”,ylab= “Dobson Units” ,main = “Scatter plot of Ozone layer thickness in consecutive years”)

t <- time(ozonets)

model1 <- lm(ozonets ~ t) summary(model1)

plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”) abline(model1) # add a fitted least squares line from model1

plot(y=rstudent(model1),x=as.vector(time(ozonets)), xlab=‘Time’, ylab=‘Standardized Residuals’,type=‘o’, main = “Time series plot of standardized resid fitted to Ozone-Thickness series.”)

hist(rstudent(model1),xlab=‘Standardized Residuals’ , main = “Histogram of standardised residual fitted to the Ozone-Thickness series.”)

y = rstudent(model1) qqnorm(y, main = “QQ plot of standardised residuals for the linear model fitted to the Ozone-Thickness series.”) qqline(y, col = 2, lwd = 1, lty = 2)

acf(rstudent(model1), main = “ACF of standardized residuals the linear model fitted to the Ozone-Thickness series.”)

y=rstudent(model1) shapiro.test(y)

t2 =t^2 model2 = lm(ozonets ~ t+t2) summary(model2)

plot(ts(fitted(model2)), ylab=‘y’, main = “Fitted quadratic curve to ozone-thickness series.”, ylim = c(min(c(fitted(model2), as.vector(ozonets))) , max(c(fitted(model2), as.vector(ozonets)))) ) lines(as.vector(ozonets),type=“o”)

plot(y=rstudent(model2),x=as.vector(time(ozonets)), xlab=‘Time’, ylab=‘Standardized Residuals’,type=‘o’, main = “Time series plot of standardized resid fitted to Ozone-Thickness series.”)

hist(rstudent(model2),xlab=‘Standardized Residuals’ , main = “Histogram of standardised residual fitted to the Ozone-Thickness series.”)

y = rstudent(model2) qqnorm(y, main = “QQ plot of standardised residuals for the linear model fitted to the Ozone-Thickness series.”) qqline(y, col = 2, lwd = 1, lty = 2)

acf(rstudent(model2), main = “ACF of standardized residuals the quadratic model fitted to the land-use series.”)

y=rstudent(model1) shapiro.test(y)

h <- 5 # 5 steps ahead forecasts t <- time(ozonets) aheadTimes <- data.frame(t = seq(2017, 2021, 1)) frcModel1 <- predict(model1, newdata = aheadTimes, interval = “prediction”) print(frcModel1)

plot(ozonets, xlim= c(1927,2021), ylim = c(-12,10), ylab = “Ozone-thickness series”, main = “Forecasts from the linear model fitted to the Ozone-thickness series.”) lines(ts(as.vector(frcModel1[,3]), start = 2017), col=“blue” , type=“l”) lines(ts(as.vector(frcModel1[,1]), start = 2017), col=“red”, type=“l”) lines(ts(as.vector(frcModel1[,2]), start = 2017), col=“blue”, type=“l”) legend(“topright”, lty=1, pch=1, col=c(“black”,“blue”,“red”), text.width = 10, c(“Data” ,“5% forecast limits”,“Forecasts”))

h <- 5 # 5 steps ahead forecasts t <- time(ozonets) t2 <- t^2 aheadTimes <- data.frame(t = seq(2017, 2017+h, 1), t2 = seq(2017, 2017+h, 1)^2) frcModel2 <- predict(model2, newdata = aheadTimes, interval = “prediction”) print(frcModel2)

plot(ozonets, xlim= c(1927,2021+h+1), ylim = c(-12,10), ylab = “Ozone-thickness series”, main = “Forecasts from the quadratic model fitted to the Ozone-thickness series.”) lines(ts(as.vector(frcModel2[,3]), start = 2017), col=“blue” , type=“l”) lines(ts(as.vector(frcModel2[,1]), start = 2017), col=“red”, type=“l”) lines(ts(as.vector(frcModel2[,2]), start = 2017), col=“blue”, type=“l”) legend(“topright”, lty=1, pch=1, col=c(“black”,“blue”,“red”), text.width = 10, c(“Data” ,“5% forecast limits”,“Forecasts”))

plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”)

par(mfrow=c(1,2)) acf(ozonets, main =“ACF plot”) pacf(ozonets, main =“PACF plot”)

adf.test(ozonets, alternative = c(“stationary”)) # Use default value of k

ozonets = ozonets + abs(min(ozonets))+1

ozonets.BC.transform = BoxCox.ar(ozonets + abs(min(ozonets))+1) title(main=“Log-Likelihood vs Value of Lamdba”)

ozonets.BC.transform$ci

lambda <- ozonets.BC.transform$lambda[which(max(ozonets.BC.transform$loglike) == ozonets.BC.transform$loglike)] lambda

ozonetsBC <- (ozonets^lambda - 1) / lambda

plot(ozonetsBC, ylab=‘Box-Cox transformed ozone thickness series’, xlab=‘Time’, type=‘o’ ,main=“Box Cox transformed Ozone thickness Series”)

ozonetsdiff <- diff(ozonets, differences = 1)

plot(ozonetsdiff, ylab=‘First difference of ozone thickness series’, xlab=‘Time’, type=‘o’ ,main=“First difference of Ozone thickness series”)

par(mfrow=c(1,2)) acf(ozonetsdiff, main =“ACF of first difference time series.”) pacf(ozonetsdiff, main =“PACF of first difference time series.”)

adf.test(ozonetsdiff, alternative = c(“stationary”)) # Use default value of k

eacf(ozonetsdiff)

res_ = armasubsets(y=ozonetsdiff , nar=5 , nma=5, y.name=‘Ozone thickness Index’, ar.method=‘ols’) plot(res_)

References

Time Series Analysis(MATH1318) notes prepared by Dr. Haydar Demirhan.

Assignment 1 - Analysis of Ozone thickness Time Series (MATH1318)

Kaushik Sunil Anagarkar(s3827495)

Introduction

Task 1

Building right model

Linear Trend Model

Observation:

Residual Analysis of Linear Trend Model

Quadratic Model

Observations:

Residual analyis of Quadratic Trend Model

Forecast yearly changes of ozone layer thickness for the next 5 years

Summary

Task 2

Summary

Conclusion

Appendix

References