Introduction

Ozone is a gas that has the ability to reflects the harmful ultraviolet (UV) rays. Ozone gas is present in the Earth’s atmosphere abundantly and it has a thick layer surrounding the Earth.

This layer protects every living organism on the Earth from the harmful UV rays which comes from the Sun. It would have been impossible to survive on the Earth’s surface, without this protective layer.

We will be observing the data, performing several fitting operation, running some diagnostic on the data of changes in the thickness of the ozone layer of the Earth over the years.

Further, we will be predicting the values future values and propose a set of ARIMA(p, d, q), which could be possibly fit the given data.

Problem Statement

The purpose of this assignment is to determine the best fit model among the available trend models, such as linear, quadratic, cyclical/seasonal and cosine. Also, we have to provide the predictions of annual ozone thickness changes for the upcoming 5 years, by using the chosen best model. Further, we have to propose a set of possible ARIMA(p, d, q) models, using each and every model specification tool such as ACF, PACF, EACF, BIC table and other tests.

Data

The chosen dataset consists of a single column with shows the annual changes in the thickness of the ozone layer of the Earth over the years from 1927 to 2016 in Dobson units.

If there was a decrease in the thickness of the Ozone layer compared to the previous year’s thickness, then it is represented by a negative value in the dataset. On the other hand, if there was a increase in the thickness of the Ozone layer compared to the previous year’s thickness, then it is represented by a positive value in the dataset.

Our data set is provided by our course coordinator and contains a total of 90 observations.

Task 1

In task 1, we will check all the trend model that would best fit with the given data set, such as linear, quadratic, cyclical/seasonal and cosine. Further, we will give the predictions of the annual ozone thickness changes for the upcoming 5 years, by using the chosen best model.

Load the necessary libraries

# Load all necessary libraries
library(TSA)
## 
## Attaching package: 'TSA'
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## The following object is masked from 'package:utils':
## 
##     tar
library(tseries)
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Set the working directory

# Set working directory
setwd("C:\\Users\\abhis\\OneDrive\\Desktop\\Master of Data Science\\Sem 3\\Time Series Analysis MATH1318\\Assignments\\Assignment 1")

Load the data set

# Load the provided data set
OzoneThickness <- read.csv("data1.csv", header=FALSE, col.names="Ozone.Thickness")
head(OzoneThickness)
class(OzoneThickness)
## [1] "data.frame"

Convert the data set to time series

# Convert the data set to time series data
OzoneThickness.ts = matrix(OzoneThickness$Ozone.Thickness, nrow = 90, ncol = 1)
OzoneThickness.ts = as.vector(t(OzoneThickness.ts))
OzoneThickness.ts <- ts(OzoneThickness.ts, start=1927, end=2016, frequency=1)
class(OzoneThickness.ts)
## [1] "ts"

Analysis of Data

Time Series Plot

In the plot below, following can be figured out: -
* There is a negative trend that can be clearly seen.
* There no evidence of seasonality
* Intervention are not visible
* Minor change in variance can be seen after the years 1960s and 2000s

# Plot the time series plot
plot(OzoneThickness.ts, ylab='Ozone Thickness Change (in Dobson units)', xlab='Years', type='o', main = 'Figure 1. Annual Ozone Thickness Change (1927-2016)')

Scatter Plot

In the plot below, a high correlation between ozone thickness change of consecutive years can be seen

# Plot the scatter plot
plot(y=OzoneThickness.ts,x=zlag(OzoneThickness.ts),ylab='Ozone Thickness Change (in Dobson units)', xlab='Previous Year Ozone Thickness Change (in Dobson units)' , main = 'Figure 2. Scatter plot of Ozone Thickness Change in Consecutive years')

Correlation

There is a high correlation factor of 0.87 can be observed

# Calculate correlation between the Ozone Thickness change in the consecutive years
y = OzoneThickness.ts
x = zlag(OzoneThickness.ts)
index = 2:length(x)
cor(y[index],x[index]) # strong correlation 0.87
## [1] 0.8700381

Analysis of Models

Linear Model

Fit the linear model

From the summary, it can seen that: - * p-value is less than 0.05 significance level * R-squared value is 0.6693 * Adjusted R-squared is 0.6655

# Fit Linear Trends Model
model.OzoneThickness.ln = lm(OzoneThickness.ts~time(OzoneThickness.ts))
summary(model.OzoneThickness.ln)
## 
## Call:
## lm(formula = OzoneThickness.ts ~ time(OzoneThickness.ts))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             213.720155  16.257158   13.15   <2e-16 ***
## time(OzoneThickness.ts)  -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

Time series with the linear model trend line

Plot is shown below

# Plot the time series plot with the linear model trend line
plot(OzoneThickness.ts,type='o',ylab='Ozone Thickness Change (in Dobson units)', xlab="Years", main = 'Figure 3. Fitted linear model to the Ozone Thickness Change')
abline(model.OzoneThickness.ln)

Standardized residuals of the fitted model

# Create the standardized residuals of the fitted model
res.model.OzoneThickness.ln = rstudent(model.OzoneThickness.ln)

Time series plot for standardized residuals

The plot seems to be evenly fitted around the zero line.

# Plot the time series plot for standardized residuals of the Ozone Thickness Change
plot(y = res.model.OzoneThickness.ln, x = as.vector(time(OzoneThickness.ts)),xlab = 'Years', ylab='Standardized Residuals',type='o', main = 'Figure 4. Time series plot of standardized residuals')
abline(h=0)

Histogram for standardized residuals

The histogram looks almost normalized, with some exception along the edges

# Plot the histogram for standardized residuals of the Ozone Thickness Change
hist(res.model.OzoneThickness.ln,xlab='Standardized Residuals', main = 'Figure 5. Histogram of the standardized residuals')

Q-Q plot for standardized residuals

Tails off at higher and lower values, not seems to be white noise.

# Plot the Q-Q plot of the standardized residuals of the Ozone Thickness Change
qqnorm(res.model.OzoneThickness.ln, main = 'Figure 6. Normal Q-Q plot of the standardized residuals')
qqline(res.model.OzoneThickness.ln, col = 2, lwd = 1, lty = 2)

ACF for standardized residuals

It can be seen that 3 lags are more than the horizontal dashed lines, hence the Stochastic component is not a white noise

# Plot the ACF of the standardized residuals of the Ozone Thickness Change
acf(res.model.OzoneThickness.ln, main = 'Figure 7. ACF of the standardized residuals')

Shapiro-Wilk test for standardized residuals

p-value turn out to be 0.5372, which is greater than 0.05. So we fail to reject the null hypothesis of normal distribution

# Apply the Shapiro-Wilk test to the standardized residuals of the Ozone Thickness Change
shapiro.test(res.model.OzoneThickness.ln)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model.OzoneThickness.ln
## W = 0.98733, p-value = 0.5372

Quadratic Model

Fit the quadratic model

From the summary, it can be seen that: - * p-value is less than 0.05 significance level * R-squared value is 0.7391 * Adjusted R-squared is 0.7331

# Fit Quadratic Trends Model
t = time(OzoneThickness.ts)
t2 = t^2
model.OzoneThickness.qa = lm(OzoneThickness.ts ~ t + t2)
summary(model.OzoneThickness.qa)
## 
## Call:
## lm(formula = OzoneThickness.ts ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

Time series with the linear model trend line

Plot is shown below

# Plot the time series plot with the Quadratic model trend line
plot(ts(fitted(model.OzoneThickness.qa)), ylim = c(min(c(fitted(model.OzoneThickness.qa), as.vector(OzoneThickness.ts))), max(c(fitted(model.OzoneThickness.qa), as.vector(OzoneThickness.ts)))), ylab='Ozone Thickness Change (in Dobson units)' , main = "Figure 8. Fitted quadratic curve to Ozone Thickness data", type="l", lty=2, col="red")
lines(as.vector(OzoneThickness.ts), type="o")

Standardized residuals of the fitted model

# Create the standardized residuals of the fitted model
res.model.OzoneThickness.qa = rstudent(model.OzoneThickness.qa)

Time series plot for standardized residuals

The plot seems to be evenly fitted around the zero line.

# Plot the time series plot for standardized residuals of the Ozone Thickness Change
plot(y = res.model.OzoneThickness.qa, x = as.vector(time(OzoneThickness.ts)), xlab = 'Years', ylab='Standardized Residuals', type='o', main = 'Figure 9. Time series plot of standardized residuals')
abline(h=0)

Histogram for standardized residuals

The histogram looks almost normalized, with some exception along the edges

# Plot the histogram for standardized residuals of the Ozone Thickness Change
hist(res.model.OzoneThickness.qa, xlab='Standardized Residuals', main = 'Figure 10. Histogram of the standardized residuals')

Q-Q plot for standardized residuals

Tails off at higher and lower values, not seems to be white noise.

# Plot the Q-Q plot of the standardized residuals of the Ozone Thickness Change
qqnorm(res.model.OzoneThickness.qa, main = 'Figure 11. Normal Q-Q plot of the standardized residuals')
qqline(res.model.OzoneThickness.qa, col = 2, lwd = 1, lty = 2)

ACF for standardized residuals

It can be seen that 4 lags are more than the horizontal dashed lines, hence the Stochastic component is not a white noise

# Plot the ACF of the standardized residuals of the Ozone Thickness Change
acf(res.model.OzoneThickness.qa, main = 'Figure 12. ACF of the standardized residuals')

Shapiro-Wilk test for standardized residuals

p-value turn out to be 0.6493, which is greater than 0.05. So we fail to reject the null hypothesis of normal distribution

# Apply the Shapiro-Wilk test to the standardized residuals of the Ozone Thickness Change
shapiro.test(res.model.OzoneThickness.qa)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model.OzoneThickness.qa
## W = 0.98889, p-value = 0.6493

Best Model

R2 and Adjusted R2 value for Quadratic model were better than Linear model. Hence, our best fit model is QUADRATIC MODEL.

However, after doing the analysis it can be seen that both the models are not the best fit model. Hence, we have to further perform analysis of ARIMA(p,d,q) models, as both AR(p) and MA(q) can be observed in the model.

Forecasting

We will be performing the prediction of ozone thcikness change, using the best fit model, i.e. Quadratic model

Make the predictions

# Set the time points
t = time(OzoneThickness.ts)

# Set the forecast for 5 years ahead
h = 5

# Set the time points of years to be forcasted
tp = seq((t[90]+1), (t[90]+h), 1)
tp_squared = tp^2

# Create new data frame containing the t and t2 values for the fitted model
newdata = data.frame(t = tp, t2 = tp_squared)

# Make the predictions
forecasts = predict(model.OzoneThickness.qa, newdata = newdata, interval = "prediction")
print(forecasts)
##         fit       lwr       upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701

Time series plot with the predictions

# Plot the time series plot of the Ozone Thickness Change with the forecast for next 5 years
plot(OzoneThickness.ts, xlim = c(1927,2021), ylim = c(-15, 5), ylab = 'Ozone Thickness (in Dobson units)', main = 'Figure 13. Annual Ozone Thickness series with forecasts')
lines(ts(as.vector(forecasts[,1]), start = 2017), col="red", type="l")
lines(ts(as.vector(forecasts[,2]), start = 2017), col="blue", type="l")
lines(ts(as.vector(forecasts[,3]), start = 2017), col="blue", type="l")
legend("topleft", lty=1, pch=1, col=c("black","blue","red"), text.width = 18,
       c("Data","5% forecast limits", "Forecasts"))

Result

As per the results and the plotted time series plot, we can see that there is a decrease in the thickness of ozone layer predicted in the next 5 years.

Note: The forecast limit is set to 5%

Task 2

In task 2, we will be proposing a set of possible ARIMA(p, d, q) models, using each and every model specification tool such as ACF, PACF, EACF, BIC table and other specification tests.

Since, all the imports are already done and the data set is loaded and converted to time series data, we proceed analysis of data.

Analysis of Data

Time Series Plot

In the plot below, following can be figured out: -
* There is a negative trend that can be clearly seen.
* Minor change in variance can be seen after the years 1960s and 2000s
* There no evidence of seasonality
* Intervention are not visible
* Minor change in variance can be seen after the years 1960s and 2000s
* Succeeding observations can imply the existence of autoregressive behavior
* Bouncing observations around the mean level imply the existence of moving average behavior

# Plot the time series plot
plot(OzoneThickness.ts, ylab='Ozone Thickness Change (in Dobson units)', xlab='Years', type='o', main = 'Figure 14. Annual Ozone Thickness Change (1927-2016)')

ACF and PACF plot

In the ACF and PACF plots of our time series data, we can see slowly decaying pattern in ACF and very high first correlation in PACF implies the existence of trend and non-stationarity.

# Plot the ACF and PACF plots of time series data
par(mfrow=c(1,2))
acf(OzoneThickness.ts, main='Figure 15. ACF vs Lag')
pacf(OzoneThickness.ts, main='Figure 16. PACF vs Lag')

Dickey-Fuller Unit-Root Test (ADF)

p-value turn out to be 0.0867, which is greater than 0.05. So we have to we conclude that our time series data is non-stationary.

Hence, we move forward to apply Box-Cox and Natural Logarithm transformation to remove the variance in data.

# Perform Dickey-Fuller Unit-Root Test (ADF) Test on time series data
adf.test(OzoneThickness.ts)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  OzoneThickness.ts
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary

Positive Value

Since we have negative values in our data, we make our data positive by adding the minimum value in our data and add 1.

# Add minimum value in our data and add 1
OzoneThickness.ts.positive <- OzoneThickness.ts + abs(min(OzoneThickness.ts))+1

Box-Cox transformation

We perform the Box-Cox Transformation on our time series data below and check the confidence interval

# Perform Box-Cox transformation
OzoneThickness.ts.bc.transform = BoxCox.ar(OzoneThickness.ts.positive)
title(main = 'Figure 17. Log likelihood vs the values of lambda')

# Check confidence interval
OzoneThickness.ts.bc.transform$ci
## [1] 0.9 1.5

Create Box-Cox transformed data

To create the Box-Cox transformed data, we used the mid point of the confidence interval as lambda value

# Set lambda value to 1.2
lambda = 1.2

# Create Box-Cox transformed data
OzoneThickness.ts.bc = (OzoneThickness.ts.positive^lambda-1)/lambda

Plot Box-Cox transformed data

In the Box-Cox transformed time series plot, we can see that the variation is not decreased by this transfromation.

# Plot the Box-Cox transformed data
plot(OzoneThickness.ts.bc,type='o', ylab='Ozone Thickness Change', xlab='Years', main='Figure 18. Box-Cox transformed Ozone Thickness Change')

QQ plot of Box-Cox transformed data

In the plotted Q-Q plot, it can be observed that the tails of distribution is far from the normality.

# Plot the QQ plot of Box-Cox transformed data
qqnorm(OzoneThickness.ts.bc, main = 'Figure 19. QQ plot of Box-Cox transformed data')
qqline(OzoneThickness.ts.bc, col = 2)

Shapiro Test of Box-Cox transformed data

p-value turn out to be 0.01995, which is less than 0.05. So we have to reject the null hypothesis of normally distribution

# Perform Shapiro Test of Box-Cox transformed data
shapiro.test(OzoneThickness.ts.bc)
## 
##  Shapiro-Wilk normality test
## 
## data:  OzoneThickness.ts.bc
## W = 0.96644, p-value = 0.01995

Conclusion of Q-Q Plot and Shapiro Test

By checking the Q-Q Plot and Shapiro Test, we conclude that the Box-Cox transformation was unsuccessful to improve the normality of the observations.

Hence, we will try natural logarithm transformation.

Logarithm transformation

We perform the Natural Logarithm Transformation below to our positive data

# Create Logarithm transformed data
OzoneThickness.ts.log <- log(OzoneThickness.ts.positive)

Plot the Logarithm transformed data

In the Natural Logarithm transformed time series plot, we can see that the variation is not decreased by this transformation.

# Plot the Logarithm transformed data
plot(OzoneThickness.ts.log, ylab='Natural Logarithm transformed data', xlab="Years", type='o', main = "Figure 20. Annual Ozone Thickness (1927-2016)")

QQ plot of Logarithm transformed data

In the plotted Q-Q plot, it can be observed that the tails of distribution is far from the normality.

# Plot the QQ plot of Logarithm transformated data
qqnorm(OzoneThickness.ts.log, main = 'Figure 21. QQ plot of Natural Logarithm transformed data')
qqline(OzoneThickness.ts.log, col = 2)

Shapiro Test of Logarithm transformed data

p-value turn out to be 3.949e-09, which is less than 0.05. So we have to reject the null hypothesis of normally distribution

# Perform the Shapiro Test of Logarithm transformated data
shapiro.test(OzoneThickness.ts.log)
## 
##  Shapiro-Wilk normality test
## 
## data:  OzoneThickness.ts.log
## W = 0.81905, p-value = 3.949e-09

Conclusion of Q-Q Plot and Shapiro Test

By checking the Q-Q Plot and Shapiro Test, we conclude that the Natural Logarithm transformation was unsuccessful to improve the normality of the observations.

We can, therefore, move forward to apply first difference to normal time series data, using the principle of parsimony.

First differencing of Time Series data

We perform the first differencing below to our time series data

# Perform first differencing of Time Series data
OzoneThickness.ts.diff = diff(OzoneThickness.ts)

Plot the first differencing of Time Series data

In the time series plot, we can see that the after applying first differencing to our time series data, the data is now detrended and stationary We can, therefore, move forward to apply first difference.

# Plot the first differencing of Time Series data
plot(OzoneThickness.ts.diff,type='o', ylab='First differencing Data', xlab='Years', main='Figure 22. First difference transformed data')

Dickey-Fuller Unit-Root Test (ADF)

p-value turn out to be less than 0.01, which is less than 0.05. So we have to conclude that first differencing of our time series data

# Perform the Dickey-Fuller Unit-Root Test (ADF) on first differencing of time series data
adf.test(OzoneThickness.ts.diff)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  OzoneThickness.ts.diff
## Dickey-Fuller = -7.1568, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

ADF and PADF plot

In the following ADF and PADF plots, we can observe that there is a tailing off pattern in ACF after lag 3 and a tailing off pattern after lags 3 and 4 in the PACF.

So we can consider ARIMA(3,1,3) and ARIMA(3,1,4) models.

# Plot the ADF and PADF plots for first differencing of time series data
par(mfrow=c(1,2))
acf(OzoneThickness.ts.diff, ci.type = 'ma', main='Figure 23. ACF vs Lag')
pacf(OzoneThickness.ts.diff, main='Figure 24. ACF vs Lag')

par(mfrow=c(1,1))

Extended Autocorrelation Function (EACF)

Since we cannot find any clear vertex in EACF, we can take the row corresponding to p = 0 our chosen vertex and include IMA(1,3), IMA(1,4) and ARIMA(1,1,3) models into the set of possible AIRMA(p,d,q) models.

# Create the Extended Autocorrelation Function (EACF) model
eacf(OzoneThickness.ts.diff)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o x o o o x o o x o  o  o  o 
## 1 x o x o o o o o o x o  o  o  o 
## 2 x o x o o o x o o x o  o  o  o 
## 3 x o x o o x o o o o o  o  o  o 
## 4 x o o x o x o o o o o  o  o  o 
## 5 x x x x o x o o o o o  o  o  o 
## 6 o o o x x o o o o o o  o  o  o 
## 7 o o o x o o o o o o o  o  o  o

Bayesian Information Criterion (BIC)

In the BIC table, we check the shaded columns and conclude that we can include ARIMA(3,1,2), ARIMA(3,1,2), ARIMA(4,1,2) and ARI(3,1) models in the set of our candidate models.

# Plot the Bayesian Information Criterion table
bic = armasubsets(y=OzoneThickness.ts.diff,nar=4,nma=4,y.name='test',ar.method='ols')
plot(bic)
title(main = 'Figure 25. Bayesian Information Criterion table', line= 6)

Result

So to conclude we have the final set of candidate models as ARIMA(1,1,3), ARIMA(3,1,3) and ARIMA(3,1,4), ARIMA(3,1,2) and ARIMA(4,1,2).

Conclusion

References