INTRODUCTION

The ozone layer – a layer that absorbs majority of the Sun’s ultraviolet radiation – has been affected by the emission of ozone-depleting gases. Ozone (O3) is a naturally occurring gas in the Earth’s atmosphere. By interacting with man-made chemicals in the upper atmosphere, the ozone layer can be depleted and broken down. This layer of ozone is critical for maintaining the optimal environment for the survival of life on Earth.This research examines the shifts in the ozone layer thickness in Dobson Units from 1927 to 2016. Our aim is to forecast changes in the thickness of the ozone layer for the next five years, from 2017 to 2023, using R studio and related model fitting and necessary functions. A negative value in the Ozone layer data set indicates a decrease in thickness, while a positive value indicates a rise.

Making sure there are no environment variables

rm(list=ls())

Installing and loading TSA library into the R session install.packages(“TSA”)

library(TSA)

## Warning: package 'TSA' was built under R version 4.1.2

## 
## Attaching package: 'TSA'

## The following objects are masked from 'package:stats':
## 
##     acf, arima

## The following object is masked from 'package:utils':
## 
##     tar

require(tseries)

## Loading required package: tseries

## Warning: package 'tseries' was built under R version 4.1.2

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Changing the current working directory to the location where I have the data set

setwd("/Volumes/DataScienceMaterial/Semester 3/Time Series/Assisgnment1")

Loading the data set into a variable and checking the details of data frame

OzoneData <- read.csv("data1.csv", col.names = 'Change in Ozone Thickness', header = FALSE)
class(OzoneData)

## [1] "data.frame"

Converting data frame into a Time Series object by telling starting, ending and frequency of the series.

OzoneTS <- ts(as.vector(OzoneData), start = 1927, end = 2016, frequency = 1)

Confirming the data type and viewing the Time Series that is created

class(OzoneTS)

## [1] "ts"

head(OzoneTS)

##      Change.in.Ozone.Thickness
## [1,]                 1.3511844
## [2,]                 0.7605324
## [3,]                -1.2685573
## [4,]                -1.4636872
## [5,]                -0.9792030
## [6,]                 1.5085675

Plotting the Time Series:

plot(OzoneTS, ylab = 'Ozone Thickness Change', xlab = 'Year', type = 'o', 
     main = 'Time Series plot for the Ozone Thickness change.')

From the above plot, we can comment on the five components of Time Series:
1. Trend: A continuous downward trend is observed till 2015 and then slightly increment in the year 2016.
2. Change in Variance: No change in variance is observed.
3. Seasonality: No Seasonality is observed.
4. Intervention: No Intervention point in the series is observed.
5. Behavior: Oscillating behavior with some successive points is observed which shows Auto-regressive and Moving Average Behavior.

Let’s find if there is any correlation between consecutive years’ data.
Creating variable for correlation

x = OzoneTS   
head(x)

##      Change.in.Ozone.Thickness
## [1,]                 1.3511844
## [2,]                 0.7605324
## [3,]                -1.2685573
## [4,]                -1.4636872
## [5,]                -0.9792030
## [6,]                 1.5085675

Creating second variable(with first lag of actual values) for correlation

y = zlag(OzoneTS)    
head(y)

## [1]         NA  1.3511844  0.7605324 -1.2685573 -1.4636872 -0.9792030

Creating index for removing first NA value in y

index = 2:length(x)

Finding correlation between current and past year’s data

cor(y[index], x[index])

## [1] 0.8700381

This shows very high correlation between successive years’ data
Let’s create a Scatter plot to see the correlation

plot(y = OzoneTS, x = zlag(OzoneTS), ylab = 'Ozone thickness change in current year',
     xlab = 'Ozone thickness change in previous year', 
     main = "Scatter plot of consecutive years' change in Ozone thickness.")

Scatter plot shows there is a very high correlation between consecutive years of ozone layer thickness with decreasing trend.

Let’s find if there is any correlation between current year’s data to couple of years back data.
Creating one LAG to see if there is any correlation between current year’s data to couple of years back data

z = zlag(zlag(OzoneTS))
head(z)

## [1]         NA         NA  1.3511844  0.7605324 -1.2685573 -1.4636872

Creating index for removing first two NA values in z

index = 3:length(x)

Finding correlation between current year’s and couple of years back data

cor(z[index], x[index])

## [1] 0.7198518

This shows high correlation of value 0.72 between current year’s and couple of years back data
Let’s create another Scatter plot to see the correlation

plot(y = OzoneTS, x = zlag(zlag(OzoneTS)), ylab = 'Ozone thickness change in current year',
     xlab = 'Ozone thickness change with 2 years of lag', 
     main = "Scatter plot of current year's data to couple of years back data.")

As expected we can observe high correlation between thickness of ozone layer with two years lag.

Fitting a Linear Trend model to our data

Fitting a Linear Trend Model and checking the summary of this model

linearModel <- lm(OzoneTS ~ time(OzoneTS))
summary(linearModel)

## 
## Call:
## lm(formula = OzoneTS ~ time(OzoneTS))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   213.720155  16.257158   13.15   <2e-16 ***
## time(OzoneTS)  -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

From the summary, we conclude: The estimates of slope is -0.110029 which defines that there will be decrement in thickness per year. Pr(>|t|) is less than 0.05 hence our estimate of slope is significant which further explains that the our time has significant contribution to explain time series. The p-value is also less than 0.05, so finally we can conclude that our model is significantly explaining the series. But through MAx residuals value and Adjusted R-squared we can say that there is still residuals left and trend is not properly captured by the Linear Trend Model.
Fitting the model on our graph to see if it is a good fit

plot(OzoneTS, ylab = 'Ozone Thickness Change', xlab = 'Year', type = 'o', 
     main = 'Time Series plot for Ozone Thickness change With Fitted Linear Model.')
abline(linearModel, col = '#0066FF')

Even though the model does good job in predicting the values, we can have better models that can capture more data.

Fitting a Quadratic Trend model to our data

Extracting time from the Time Series object

t <- time(OzoneTS)
t2 <- t^2
t

## Time Series:
## Start = 1927 
## End = 2016 
## Frequency = 1 
##  [1] 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941
## [16] 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956
## [31] 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
## [46] 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986
## [61] 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
## [76] 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

t2

## Time Series:
## Start = 1927 
## End = 2016 
## Frequency = 1 
##  [1] 3713329 3717184 3721041 3724900 3728761 3732624 3736489 3740356 3744225
## [10] 3748096 3751969 3755844 3759721 3763600 3767481 3771364 3775249 3779136
## [19] 3783025 3786916 3790809 3794704 3798601 3802500 3806401 3810304 3814209
## [28] 3818116 3822025 3825936 3829849 3833764 3837681 3841600 3845521 3849444
## [37] 3853369 3857296 3861225 3865156 3869089 3873024 3876961 3880900 3884841
## [46] 3888784 3892729 3896676 3900625 3904576 3908529 3912484 3916441 3920400
## [55] 3924361 3928324 3932289 3936256 3940225 3944196 3948169 3952144 3956121
## [64] 3960100 3964081 3968064 3972049 3976036 3980025 3984016 3988009 3992004
## [73] 3996001 4000000 4004001 4008004 4012009 4016016 4020025 4024036 4028049
## [82] 4032064 4036081 4040100 4044121 4048144 4052169 4056196 4060225 4064256

Fitting a Quadratic Trend Model and checking the summary of this model

quadraticModel <- lm(OzoneTS ~ t + t2)
summary(quadraticModel)

## 
## Call:
## lm(formula = OzoneTS ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

From the summary, we conclude:
The estimates of intercept, quadratic term and Pr(>|t|) is less than 0.05 which defines that they are significant, which explains that the quadratic term(t^2) time has significant contribution to explain time series. The p-value is also less than 0.05 and we are getting improved Adjusted R-squared value, so finally we can conclude that our model is significantly explaining the series and we are getting a better fit. But through MAx residuals value and Adjusted R-squared we can say that there are still residuals left and trend is not properly captured by the Quadratic Trend Model.
Fitting the model on our graph to see if it is a good fit

plot(ts(fitted(quadraticModel)), ylab = 'Ozone Thickness Change', 
     ylim = c(min(c(fitted(quadraticModel), as.vector(OzoneTS))),
              max(c(fitted(quadraticModel), as.vector(OzoneTS)))), 
              main = 'Fitted Quadratic Curve for the Ozone Thickness series.', col = '#0066FF')
lines(as.vector(OzoneTS), type = 'o')

As we can see, quadratic trend model does better job than linear trend model but residuals are still high. We will try to improve our predictions using a different model.

Fitting a Cosine Trend model to our data

Creating the cosine-sine harmonics for our data

har. = harmonic(OzoneTS, 0.45)
head(har.)

##      cos(2*pi*t)   sin(2*pi*t)
## [1,]           1 -3.938192e-13
## [2,]           1  2.667406e-13
## [3,]           1 -8.916889e-13
## [4,]           1 -2.311291e-13
## [5,]           1  4.294307e-13
## [6,]           1 -7.289989e-13

Creating data & fitting a Cosine model and checking the summary of this model

cosineData <- data.frame(OzoneTS, har.)
cosineModel <- lm(OzoneTS ~ cos.2.pi.t. + sin.2.pi.t. , data = cosineData)
summary(cosineModel)

## 
## Call:
## lm(formula = OzoneTS ~ cos.2.pi.t. + sin.2.pi.t., data = cosineData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3520 -1.8906  0.4837  2.3643  6.4248 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.970e+00  4.790e-01  -6.199 1.79e-08 ***
## cos.2.pi.t.         NA         NA      NA       NA    
## sin.2.pi.t.  5.462e+11  7.105e+11   0.769    0.444    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.522 on 88 degrees of freedom
## Multiple R-squared:  0.006672,   Adjusted R-squared:  -0.004616 
## F-statistic: 0.5911 on 1 and 88 DF,  p-value: 0.4441

From the summary we understand cos & sin parameters has no significance at all on change in values of our data. We also see great decrease in R-squared value which indicates that our data does not follow cosine trend. P-value of the harmonic model suggests that the model is insignificant as the p vale is greater than 0.05.
Fitting the model on our graph to see if it is a good fit

plot(ts(fitted(cosineModel)), ylab = 'Ozone Thickness Change', 
     ylim = c(min(c(fitted(cosineModel), as.vector(OzoneTS))),
              max(c(fitted(cosineModel), as.vector(OzoneTS)))), 
     main = 'Fitted Cosine Curve for the Ozone Thickness series.', col = '#0066FF')
lines(as.vector(OzoneTS), type = 'o')

From the above plot we can observe that cosine model does very bad job than previous two models in predicting. Therefore, we will make our forecast using Quadratic Model. But before that, let’s do residual analysis to confirm that Quadratic Trend Model is best out of all.

Validating the modules using various residual analysis techniques:

Standard Residual Analysis for Linear Trend Model

Plotting Standardised residuals for Linear Trend Model

res.linearModel = rstudent(linearModel)
plot(y = res.linearModel, x = as.vector(time(OzoneTS)),
     xlab = 'Time', ylab='Standardised Residuals',type='o',
     main = "Standardised residuals from Linear Trend Model.", col = '#0066FF')

Plotting Histogram Plot of Standardised residuals for Linear Trend Model

hist(res.linearModel,xlab='Standardised Residuals', main = "Histogram of standardised residuals of Linear Trend Model.",
     col = '#0066FF')

Quantile-Quantile(QQ) Plot of Standardised residuals for Linear Trend Model

qqnorm(y=res.linearModel, main = "QQ plot of standardised residuals of Linear Trend Model.")
qqline(y=res.linearModel, col = 2, lwd = 2, lty = 2)

Shapiro-Wilk test for Linear Trend Model

shapiro.test(res.linearModel)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.linearModel
## W = 0.98733, p-value = 0.5372

Autocorrelation Function plots for Linear Model

acf(res.linearModel, main = "ACF of standardized residuals of Linear Trend Model.", col = 'red')

pacf(res.linearModel, main = "PACF of standardized residuals of Linear Model.", col = 'red')

## Result Interpretation of the Residual Analysis of Linear Trend Model:
1. Histogram: Although this plot is somewhat symmetric, it has tails at both the high and low ends.
2. Quantile-quantile (QQ-plot): At the low and high ends, there is a slight change away from the reference line. As a result, there is no noticeable white noise behavior.
3. Shapiro-Wilk Test: P-value is 0.5372 which doesn’t allow us to reject null hypothesis, so we can conclude that stochastic component of the linear trend model is normally distributed.
4. Sample Autocorrelation Function (ACF): The linear trend model has three significant lags, demonstrating that the sequence is not white noise. In the ACF, we also see a consistent pattern.
5. Partial Sample Autocorrelation Function (PACF): The linear trend model has two significant lags and one minor lag, demonstrating that the sequence is not white noise.

Standard Residual Analysis for Quadratic Trend Model

Plotting Standardised residuals for Quadratic Trend Model

res.quadraticModel = rstudent(quadraticModel)
plot(y = res.quadraticModel, x = as.vector(time(OzoneTS)),
     xlab = 'Time', ylab='Standardised Residuals',type='o',
     main = "Standardised residuals from Quadratic Trend Model.", col = '#0066FF')

Plotting Histogram Plot of Standardised residuals for Quadratic Trend Model

hist(res.quadraticModel,xlab='Standardised Residuals', main = "Histogram of standardised residuals of Quadratic Trend Model.",
     col = '#0066FF')

Quantile-Quantile(QQ) Plot of Standard residuals for Quadratic Trend Model

qqnorm(y=res.quadraticModel, main = "QQ plot of standardised residuals of Quadratic Trend Model.")
qqline(y=res.quadraticModel, col = 2, lwd = 2, lty = 2)

Shapiro-Wilk test for Quadratic Trend Model

shapiro.test(res.quadraticModel)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.quadraticModel
## W = 0.98889, p-value = 0.6493

Autocorrelation Function plots for Quadratic Trend Model

acf(res.quadraticModel, main = "ACF of standardized residuals of Quadratic Trend Model.", col = 'red')

pacf(res.quadraticModel, main = "PACF of standardized residuals of Quadratic Model.", col = 'red')

## Result Interpretation of the Residual Analysis of Quadratic Trend Model:
1. Histogram: This plot is asymmetric, representing non-normal distribution.
2. Quantile-quantile (QQ-plot): At the low and high ends, points are slightly away from the reference line. As a result, there is noticeable trend and hence it cannot be considered as a white noise.
3. Shapiro-Wilk Test: P-value is 0.6493 which doesn’t allow us to reject null hypothesis, so we can conclude that stochastic component of the quadratic trend model is normally distributed.
4. Sample Autocorrelation Function (ACF) & Partial Sample Autocorrelation Function (PACF): From the ACF and PACF it can be considered that the there are some significant auto correlations not captured by the quadratic trend model which implies it’s a stochastic trend rather than a deterministic trend.

Standard Residual Analysis for Cosine Trend Model

Plotting Standardised residuals for Cosine Trend Model

res.cosineModel = rstudent(cosineModel)
plot(y = res.cosineModel, x = as.vector(time(OzoneTS)),
     xlab = 'Time', ylab='Standardised Residuals',type='o',
     main = "Standardised residuals from Cosine Trend Model.", col = '#0066FF')

Plotting Histogram Plot of Standardised residuals for Cosine Trend Model

hist(res.cosineModel,xlab='Standardised Residuals', main = "Histogram of standardised residuals of Cosine Trend Model.",
     col = '#0066FF')

Quantile-Quantile(QQ) Plot of Standard residuals for Cosine Trend Model

qqnorm(y=res.cosineModel, main = "QQ plot of standardised residuals of Cosine Trend Model.")
qqline(y=res.cosineModel, col = 2, lwd = 2, lty = 2)

Shapiro-Wilk test for Cosine Trend Model

shapiro.test(res.cosineModel)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.cosineModel
## W = 0.95856, p-value = 0.005875

Autocorrelation Function plots for Cosine Trend Model

acf(res.cosineModel, main = "ACF of standardized residuals of Cosine Trend Model.", col = 'red')

pacf(res.cosineModel, main = "PACF of standardized residuals of Cosine Trend Model.", col = 'red')

## Result Interpretation of the Residual Analysis of Cosine Trend Model:
1. Histogram: This plot is asymmetric, representing non-normal distribution showing left-skewness.
2. Quantile-quantile (QQ-plot): At the low and high ends, points are really away from the reference line. As a result, there is noticeable trend and hence it cannot be considered as a white noise.
3. Shapiro-Wilk Test: P-value is 0.005875 which allow us to fail to reject null hypothesis, so we can conclude that stochastic component of the cosine trend model is not normally distributed.
4. Sample Autocorrelation Function (ACF) & Partial Sample Autocorrelation Function (PACF): From the ACF a slowly decaying pattern can be observed which indicates a trend in the series. Therefore, it can be considered as a non-stationary. In PACF, there is one significant lag and one mild lag, considering all the points we can say that cosine trend model is not as significant as quadratic trend model.

Best Model:

When comparing the three models discussed above, we can infer that the quadratic trend model is a better fit since it better explains the variance in the series and takes into account the coefficient of determination, namely Adjusted-R2.
However, the residual analysis and the model both fail to cover all of the patterns in the sequence, implying that none of the series is the best match. Further analysis can be carried out, especially when using the ARIMA model, as both AR and MA signs can be seen in the time series presented above.

Predicting the next 5 year’s change in Ozone layer thickness using Quadratic Model

Creating a data frame of the time series to be predicted

n <- 4
futureTimes <- data.frame(t = seq(2017, 2017+n, 1), t2 = (seq(2017, 2017+n, 1))^2)

quadraticForecast <- predict(quadraticModel, futureTimes, interval = 'prediction')
quadraticForecast

##         fit       lwr       upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701

Plotting the predictions to see how it goes with the actual data

plot(OzoneTS, xlim = c(1927, 2023), ylim = c(-15, 5), ylab = 'Ozone thickness series',
     main = 'Quadratic Model forecast for next 5 years.')
lines(ts(as.vector(quadraticForecast[,3]), start = c(2017,1), frequency = 1), col = 'blue', type = 'l', lwd = 2)
lines(ts(as.vector(quadraticForecast[,1]), start = c(2017,1), frequency = 1), col = 'red', type = 'l', lwd = 2)
lines(ts(as.vector(quadraticForecast[,2]), start = c(2017,1), frequency = 1), col = 'blue', type = 'l', lwd = 2)
legend('bottomleft', lwd = 1, pch = 1, col = c('black', 'blue', 'red'), text.width = 11,
       c('Data', '5% of Forecast limits', 'Forecasts'))

TASK 2

set.seed(92397)

Plotting time series object plot to check stationarity of time series

plot(OzoneTS, ylab = 'Ozone Thickness Change', xlab = 'Year', type = 'o', 
     main = 'Time Series plot for the Ozone Thickness change.')

Plotting ACF and PACF

acf(OzoneTS, main = "ACF of the Time Series plot for the Ozone Thickness change.", col = 'red')

pacf(OzoneTS, main = "PACF of the Time Series plot for the Ozone Thickness change.", col = 'red')

From ACF we can see slowly decaying pattern which confirms the trend and PACF has one very significant lag which proves that our time series is Non-Stationary.

The Dickey-Fuller Unit-Root Test:

require(tseries)
adf.test(OzoneTS, alternative = c("stationary", "explosive"),
         k = trunc((length(x)-1)^(1/3)))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  OzoneTS
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary

In ADF test we are getting p-value > 0.05 so we can’t reject H0, hence we can conclude timeseries is non-stationary.
To deal with non-stationary we will do Transformation(Box-Cox) and Differencing to check which one is making series Stationary.

First, try differencing transformations with difference = 1 to see if it helps.
Plotting time series to see the first difference in time series

OzoneTSDiff <- diff((OzoneTS), differences = 1)
plot(OzoneTSDiff, ylab='Differencing of Ozone thickness change', xlab='Time', type='o', main='First Differncing-Transformation of Ozone Thickness Change.')

From plot we can say series has been normalised up to a good extent.
## The Dickey-Fuller Unit-Root Test for first differecing series:

adf.test(OzoneTSDiff, alternative = c("stationary", "explosive"),
         k = trunc((length(x)-1)^(1/3)))

## Warning in adf.test(OzoneTSDiff, alternative = c("stationary", "explosive"), :
## p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  OzoneTSDiff
## Dickey-Fuller = -7.1568, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

p-value is smaller than 0.05 so we can accept the H0, hence we can conclude series is stationary
Checking normality using QQ-Plot

qqnorm(y=OzoneTSDiff, main = "QQ plot of Differencing Ozone Thickness Change.")
qqline(y=OzoneTSDiff, col = 2, lwd = 1, lty = 2)

QQ-Plot also shows series has become stationary.
Trying differencing transformations with difference = 2 to see if it helps.
Plotting time series to see the second difference in time series

OzoneTSDiff2 <- diff((OzoneTS), differences = 2)
plot(OzoneTSDiff2, ylab='Differencing of Ozone thickness change', xlab='Time', type='o', main='First Differncing-Transformation of Ozone Thickness Change')

From plot we can say series has not been changed much.
## The Dickey-Fuller Unit-Root Test for second differencing series:

adf.test(OzoneTSDiff2, alternative = c("stationary", "explosive"),
         k = trunc((length(x)-1)^(1/3)))

## Warning in adf.test(OzoneTSDiff2, alternative = c("stationary", "explosive"), :
## p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  OzoneTSDiff2
## Dickey-Fuller = -7.6128, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

The p-value is same as first differencing.
Second differencing is not making any improvement so we will proceed with Box-Cox to see if makes any improvement
Let’s apply Box-Cox transformation to see if there is any improvement in Differencing transformation.
Since values in OzoneTS are negative we will change the range of lambda to get rid of this case using the lambda argument.

options(warn=-1)
BC = BoxCox.ar(OzoneTS + abs(min(OzoneTS))+1, lambda = seq(-1, 1, 0.01))

Finding values of the first and third vertical lines
To find the lambda value of the middle vertical line

BC$ci

## [1] 0.84 1.00

lambda <- BC$lambda[which(max(BC$loglike) == BC$loglike)]
lambda

## [1] 1

## Applying Box-Cox transformation:
Plotting Box-Cox transformation

OzoneTS = OzoneTS + abs(min(OzoneTS))+1
BC.OzoneTS = (OzoneTS^lambda-1)/lambda
plot(BC.OzoneTS,type='o',xlab='Time',ylab='Box-Cox transformed of Ozone Thickness', main ="Time series plot of BC-transformed Ozone Thickness.")

## The Dickey-Fuller Unit-Root Test to check Stationarity:

adf.test(BC.OzoneTS, alternative = c("stationary", "explosive"),
         k = trunc((length(x)-1)^(1/3)))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  BC.OzoneTS
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary

p-value is larger than 0.05 so we will reject the H0, and will conclude BOx-Cox is making series non-stationary.
Therefore, we will proceed with first differencing
Plotting ACF and PACF plots to find the optimal values of p and q with first differencing time series

acf(OzoneTSDiff, main = "ACF of the first difference of Ozone Thickness Change.", col = 'red')

pacf(OzoneTSDiff, main = "PACF of the first difference of Ozone Thickness Change.", col = 'red')

Possible values of Q can be 1 and 0 as we can see a little trend in the plot.
Possible values of P can be 1 and 2 as two bars are significant.
Possible models can be: ARIMA(0,1,1);ARIMA(0,1,2);ARIMA(1,1,2);ARIMA(1,1,1).

EACF Plot

eacf(OzoneTSDiff)

## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o x o o o x o o x o  o  o  o 
## 1 x o x o o o o o o x o  o  o  o 
## 2 x o x o o o x o o x o  o  o  o 
## 3 x o x o o x o o o o o  o  o  o 
## 4 x o o x o x o o o o o  o  o  o 
## 5 x x x x o x o o o o o  o  o  o 
## 6 o o o x x o o o o o o  o  o  o 
## 7 o o o x o o o o o o o  o  o  o

Possible values of p can be 0 and q will be 0,1 and diiferencing will be 1
ARIMA models can be with first diiferencing will be: ARIMA(0,1,0); ARIMA(0,1,1)

Plotting BIC Table:

OzoneBIC=armasubsets(y=OzoneTSDiff,nar=10,nma=10,y.name='test', ar.method='ols')

## Reordering variables and trying again:

plot(OzoneBIC)

ARIMA(10,1,6); ARIMA(10,1,7);ARIMA(3,1,5);ARIMA(3,1,6) but these models are very big so we will reduce nar and mar

OzoneBIC5=armasubsets(y=OzoneTSDiff,nar=5,nma=5,y.name='test', ar.method='ols')

plot(OzoneBIC5)

Possible ARIMA Models from BIC Table are ARIMA(3,1,0); ARIMA(3,1,2)

Conclusion

After fitting all the models, we can confirm that Quadratic Trend Model successfully follows the decreasing pattern of the original series. The data used to forecast changes in the thickness of the ozone layer shows that the sequence is not stationary. The trend component is not nullified because mean value in the time series is not constant. The seasonality effect is minimal, indicating that there is no seasonal pattern or trend. The set of possible ARIMA Models after plotting ACF, PACF, EACF and BIC Table are:
ARIMA(3,1,0), ARIMA(3,1,2), ARIMA(0,1,1), ARIMA(0,1,0), ARIMA(1,1,1), ARIMA(1,1,2) and ARIMA(0,1,2).

REFERENCES

MATH1318 Time Series Analysis notes prepared by Dr. Haydar Demirhan.
Time Series Analysis with application in R by Cryer and Chan.

Time Series Analysis for Thickness of Ozone data

Shristi Chavhan

S3822713