Introduction

Ozone (O3) is a gas which is present naturally within Earth’s atmosphere. The earth’s ozone layer is responsible for absorbing most of the Sun’s ultraviolet radiation. Ritchie and Roser (2020)

In this report, we analysed and modeled the thickness of Ozone layer in Dobson units. The data represent yearly changes in Ozone layer thickness from 1927 to 2016. A negative value in the dataset represents a decrease in the thickness and a positive value represents an increase in the thickness.

We will apply time series analysis technique to the data and fit the best model to predict how is the changes in the ozone layer will be in the future. This will be important to inform the public of the state of the Ozone layer. It can also influence the environmental governments and large organisations around the world.

Scope

We model the data using two approaches: regression and time series approach.

For the linear regression approach we will be:

  1. Find the best fitting model among the linear, quadratic, cosine, cyclical or seasonal trend models.

  2. Give the predictions of yearly changes for the next 5 years using the best model.

For the time series approach, we will be propose a set of possible ARIMA(p, d, q) models.

Method

Using R version 3.6.1 and the TSA packages, we will visualise the time series data with graphs. We will also use model specification tool such as ACF-PACF, EACF, BIC table to justify model selections.

Data

The data represent yearly changes in Ozone layer thickness from 1927 to 2016. A negative value in the dataset represents a decrease in the thickness and a positive value represents an increase in the thickness.

Fig 1: Time series plot of ozone layer thickess change

Fig 1: Time series plot of ozone layer thickess change

Looking at figure 1, we note that the mean is not constant. There is a downward trend to the data. In the following sections, we will look to determine which model that can best fit this data.

Task 1: Linear regression

Simple linear regression

Applying the deterministic simple linear trend model

\[ \mu_t = \beta_0 +\beta_1 t \]

## 
## Call:
## lm(formula = ts ~ time(ts))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 213.720155  16.257158   13.15   <2e-16 ***
## time(ts)     -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

We see here \(\beta_1 = -0.11\) with a t value of -13.34. This means that the slope is significant and there is a trend to the data. This also means that the mean is not constant and the time series data is not stationary.

The simple linear model explains 66.93% of the variation in the data.

Plotting the linear regression line against the data.

Fig 2: Simple Linear model of ozone thickness change

Fig 2: Simple Linear model of ozone thickness change

We see what while the linear model capture the direction of trend, at the beginning and at the end the data falls consistently under the line.

Analyzing the residuals of the data by plotting a scatter plot:

Fig 3: Scatter plot of 
 linear model residual

Fig 3: Scatter plot of linear model residual

There is still a trend in the residuals. Seems to be below the mean line at the beginning and end.

Assessing the distribution of the residuals:

Fig 4: Distribution of standardized residuals from linear model

Fig 4: Distribution of standardized residuals from linear model

The figure indicates that it is a pretty good fit. There are some data that falls outside the normal curve. Indicating some kurtosis. Overall symmetry is good.

We can look further by analyzing it with a qq-plot

Fig 5: QQ Plot of linear model residuals

Fig 5: QQ Plot of linear model residuals

We see the drift away from normal at the beginning and at the end is more pronounced in the qqplot. However, the data fit the normal line quite well in the middle.

Conduction a Shapiro test for normality:

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98733, p-value = 0.5372

Result have a p-value greater than 0.05, we fail to reject the data is normal.

Checking the residuals’ sample auto-correlation function(ACF)

Fig 6: ACF of linear model residuals

Fig 6: ACF of linear model residuals

However, looking at the auto-correlation function, we see there is correlation values higher than the confidence bound at the first lag. This is not what we expect from a white noise process.

Quadratic regression

The deterministic quadratic trend model is expressed as follows:

\[ \mu_t = \beta_0 + \beta_1 t + \beta_2 t^2 \] Fitting the ozone data with a quadratic regression model:

## 
## Call:
## lm(formula = ts ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

Note all the coefficients are are significant. Also the model have a R-squared value of 0.7391 indicating explaining 73.91% of the variance in the ozone change data.

Fitted quadratic trend is shown in figure 7:
Fig 7: Quadratic model to change in ozone thickeness

Fig 7: Quadratic model to change in ozone thickeness

It looks lie a better fit. As indicated by the higher R-squared value.

Analyzing residuals of the quadratic model

Fig 8: Residuals of fitted quadratic model

Fig 8: Residuals of fitted quadratic model

We see the mean sits very close to 0. The variance looks to be constant also.

Analysis of quadratic model residuals:

Fig 9: Distribution of standardized residuals from quadratic model

Fig 9: Distribution of standardized residuals from quadratic model

There is good symmetry. Although still some data falls outside of the normal curves. We further analyse it via the qqplot.

Fig 10: Q-Q Plot for residuals of quadratic model

Fig 10: Q-Q Plot for residuals of quadratic model

The qq plot fits quite closely. There is some drift away at the beginning and end also. However this drift is much less than the simple linear model.

Checking the normality with the Shapiro test:

## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98889, p-value = 0.6493

The p value is larger than 0.05. This means we fail to reject the \(H_0\) that the residuals are normally distributed.

Final check on the sample auto-correlation function(ACF)
Fig 11: Autocorrelation plot of residuals for quadratic model

Fig 11: Autocorrelation plot of residuals for quadratic model

There is still significant correlation at lag 1, 3 and 4. This indicate that there is still some trend in the residuals. Further evaluation needed.

Seasonality / Cyclic model

Data is in years. Hence no season.

We can check if there is a cycle. Plotting the data and checking to see if there is any repeated patters. Initially the data looks to be drop and fall around a period of 7 years.

Fig 12: Seasonality analysis

Fig 12: Seasonality analysis

It does not look like there is any patters. Initially looks to be a peak and trough of around 7 years with trough at year 6. Then the trough happens at year 3 and 2 more.

The trend also does not have a constant mean. Therefore does not follow a cosine wave pattern.

When applying a cyclic harmonic model to the data:

## 
## Call:
## lm(formula = ts ~ har.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.8294 -1.8422  0.7481  2.4701  5.8635 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -3.1949     0.3700  -8.636  2.5e-13 ***
## har.cos(2*pi*t)   0.7386     0.5226   1.413    0.161    
## har.sin(2*pi*t)  -0.2544     0.5239  -0.486    0.628    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.509 on 87 degrees of freedom
## Multiple R-squared:  0.02487,    Adjusted R-squared:  0.002453 
## F-statistic: 1.109 on 2 and 87 DF,  p-value: 0.3344

We get a model with very small R square at 0.02487.

Also the p values for both the cos and sin terms are not significant. Therefore we can reject the simple cyclic model as one of the models to fit to this data.

We try to fit a cyclic model with a linear trend.

## 
## Call:
## lm(formula = ts ~ t + har.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2187 -1.6011 -0.0105  1.1970  4.2311 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     213.141767  15.988761  13.331   <2e-16 ***
## t                -0.109732   0.008109 -13.532   <2e-16 ***
## har.cos(2*pi*t)   0.560458   0.297409   1.884   0.0629 .  
## har.sin(2*pi*t)  -0.396451   0.298047  -1.330   0.1870    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.995 on 86 degrees of freedom
## Multiple R-squared:  0.6884, Adjusted R-squared:  0.6775 
## F-statistic: 63.32 on 3 and 86 DF,  p-value: < 2.2e-16

Again, while,the intercept and t value is significant, the coefficients to the cyclic model is not. Hence the model is not significant enough to be considered.

Looking further into the model’s residuals:

# Shapiro test for normality
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98889, p-value = 0.6493

The shapiro test is finding significant evidence that the data is not normally distributed. This finding confirms this cyclic model is not a good model to use here.

Predictions of yearly changes for the next 5 years using the best model

Since the quadratic model have a largest R-squared value and normally distributed residuals, it is the better deterministic model to use for prediction.

Fitting the data and forecasting 5 years ahead, we get:

Fig 14: 5 Year forecast at 95% Confidence Interval with Quadratic Model

Fig 14: 5 Year forecast at 95% Confidence Interval with Quadratic Model

Task 1 Summary

  1. Best model we found was the quadratic model which explains 73.91% of the variance.
  2. Both the simple linear and quadratic model had normally distributed residuals.
  3. Both the simple linear and and quadratic model still have residuals that had significant auto-correlation in it.
  4. The cyclical model did not have significant coefficients. Also the R-squared value is 2.49%. This suggest the cyclic model is not good to use for predicting the ozone thickness change.

Next step we will try to model the data with stochastic models.

Task 2: Integrated auto-regressive moving average model

The aim of this section is to propose a set of possible ARIMA(p, d, q) models using suitable model specification tools.

Testing for stationary

From linear regression section, figure 2, we have already determined there is the presence of a downward trend in the data. Hence implies non-stationary. The succeeding behavior of the data assistance of auto-regressive behavior

Figure 2 is outputted below again for easy of reading:

Fig 2: Simple Linear model of ozone thickness change

Fig 2: Simple Linear model of ozone thickness change

Plotting the ACF and PACF to analyse the trends

Fig 15: ACF of ozone series with sophisticated standard error bounds

Fig 15: ACF of ozone series with sophisticated standard error bounds

Fig 16: PACF of ozone series

Fig 16: PACF of ozone series

Slowly decreasing ACF and high first PACF with sharp drop confirmed finding from the previous analysis on the existence of trend. The wave pattern in the ACF suggest some seasonality in the data also.

Conducting the ADF test on the ozone data:

## 
## Call:
## ar(x = ts)
## 
## Coefficients:
##       1        2        3        4  
##  0.9807  -0.1417  -0.2581   0.2985  
## 
## Order selected 4  sigma^2 estimated as  3.242

Using AIC for the ozone data, we find \(k=4\) to be used in the ADF test

## Loading required package: timeDate
## 
## Attaching package: 'timeDate'
## The following objects are masked from 'package:TSA':
## 
##     kurtosis, skewness
## Loading required package: timeSeries
## Loading required package: fBasics
## 
## Title:
##  ADF Test with No constant nor time trend
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 4
##   STATISTIC:
##     Dickey-Fuller: 0.5699
##   P VALUE:
##     0.7942 
## 
## Description:
##  Mon May 04 10:32:56 2020 by user: chris

ADF test with no constant nor time trend have a p-value 0.7942. Fail to reject the null of nonstationarity at a 10% significance level.

## 
## Title:
##  ADF Test with constant no time trend
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 4
##   STATISTIC:
##     Dickey-Fuller: -0.4361
##   P VALUE:
##     0.8924 
## 
## Description:
##  Mon May 04 10:32:56 2020 by user: chris

The ADF test with an constant but no time trend have a p-value of 0.8924. Also fail to reject the null of nonstationarity at a 10% significance level.

## 
## Title:
##  ADF Test with constant and with time trend
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 4
##   STATISTIC:
##     Dickey-Fuller: -3.2376
##   P VALUE:
##     0.0867 
## 
## Description:
##  Mon May 04 10:32:56 2020 by user: chris

The ADF test with an constant and a time trend have a p-value of 0.0867. At 10% significance we can reject the null of nonstationarity. This test suggest that the ozone difference data is trend stationary.

Two factors to consider here are the lag is low at 4. Additionally the p-value with a time trend is still high at .0867. So at 5% significance level we will still fail to reject the null of non-stationary.

Changing variance

Initial view of the data, the variance does not look to be changing very much. To make sure, we need to make sure by checking if it is worth to transform the data to reduce variance.

Fig 17: Finding the best lambda

Fig 17: Finding the best lambda

The confidence interval of lambda includes 1. So the Box Cox transformation does not suggest strongly we should do some transformation on the data Therefore we can just use 1 and do not have to do a power or log transformation on the data. This also is supported by the visual of the ozone data where variance does not appear to significantly change over time.

Applying difference

We found that there is a trend in the data and the variance is not changing significant enough to warrant a log or power transformation. So now we take the first difference of the ozone data.

Fig 18: Plot of the first difference in the ozone layer series

Fig 18: Plot of the first difference in the ozone layer series

We see the data appears to have stabilized. To test this observation, apply the ADF test.

## 
## Call:
## ar(x = ts_diff)
## 
## Coefficients:
##       1        2        3        4        5        6  
## -0.1976  -0.2628  -0.6019  -0.3064  -0.2253  -0.3045  
## 
## Order selected 6  sigma^2 estimated as  2.232

Using k = 6 for the ADF test on the differenced ozone data.

## Warning in adfTest(ts_diff, lags = 6, type = "nc", title = "ADF Test with No
## constant nor time trend"): p-value smaller than printed p-value
## 
## Title:
##  ADF Test with No constant nor time trend
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 6
##   STATISTIC:
##     Dickey-Fuller: -5.0757
##   P VALUE:
##     0.01 
## 
## Description:
##  Mon May 04 10:32:58 2020 by user: chris
## Warning in adfTest(ts_diff, lags = 6, type = "c", title = "ADF Test with
## constant no time trend"): p-value smaller than printed p-value
## 
## Title:
##  ADF Test with constant no time trend
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 6
##   STATISTIC:
##     Dickey-Fuller: -5.5901
##   P VALUE:
##     0.01 
## 
## Description:
##  Mon May 04 10:32:58 2020 by user: chris
## Warning in adfTest(ts_diff, lags = 6, type = "ct", title = "ADF Test with
## constant and with time trend"): p-value smaller than printed p-value
## 
## Title:
##  ADF Test with constant and with time trend
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 6
##   STATISTIC:
##     Dickey-Fuller: -5.7327
##   P VALUE:
##     0.01 
## 
## Description:
##  Mon May 04 10:32:58 2020 by user: chris

All 3 test concerning with with - no constant nor time trend, - with constant no time trend,
- with constant and with time trend
all resulted in a significant p value of less than 0.01. Therefore we can reject the null hypothesis and confirm we find statistically significant evidence that the time series is stationary after first differencing.

Specifying ARMA orders using differenced ozone data.

Using ACF and PACF

Fig 20: ACF plot of differenced ozone series

Fig 20: ACF plot of differenced ozone series

Fig 21, PACF of differenced ozone series

Fig 21, PACF of differenced ozone series

There is a vague wave pattern of oscillating pattern in the ACF. We have 3 significant values at lag 3, 7 and 10. Also have 3 significant value in the PACF at 4, 5, and 6.

Possible models

Since the presence of pattern of dampened sine wave. We can possibly say there is a AR presence. ARI(3,1) then can be picked because of the significant lags in pacf.

On the other hand, if we are picking from just the significant lags then ARIMA(3,1,3) is a potential model.

ARIMA(3,1,2), ARIMA(3,1,1), ARIMA(2,1,3), ARIMA(2,1,2), ARIMA(2,1,1), ARIMA(1,1,1), ARIMA(1,1,2), ARIMA(1,1,3), ARI(3,1), IMA(1,3) can all be considered as they are smaller models of ARIMA(3,1,3) for the ozone data.

Using EACF

## AR/MA
##   0 1 2 3 4 5 6 7 8 9
## 0 o o x o o o x o o x
## 1 x o x o o o o o o x
## 2 x o x o o o x o o x
## 3 x o x o o x o o o o
## 4 x o o x o x o o o o
## 5 x x x x o x o o o o
## 6 o o o x x o o o o o

The top left vertex in the EACF model starts from p=0 and q=3. So from there we can get MA(3) as a possible model. Further we can get the surrounding models of MA(4), ARMA(1,3) and ARMA(1,4) as possible models according to the EACF.

So from the EACF to model the ozone data, we add IMA(1,3), IMA(1,4), ARIMA(1,1,3), ARIMA(1,1,4) as potential models.

Using BIC

## Reordering variables and trying again:
Fig 22, BIC table of differenced ozone series

Fig 22, BIC table of differenced ozone series

The BIC table shaded columns correspond to AR(7) and MA(6).

Hence to model the ozone data, we can include ARIMA(7,1,6) as a potential model.

Conclusion

The best deterministic model to use to model the thickness changes in the ozone layer is the quadratic model.

For the stochastic models, the possible models are: - from ACF and PACF ARIMA(3,1,3) and all smaller models ARIMA(3,1,2), ARIMA(3,1,1), ARIMA(2,1,3), ARIMA(2,1,2), ARIMA(2,1,1), ARIMA(1,1,1), ARIMA(1,1,2), ARIMA(1,1,3), ARI(3,1), IMA(1,3) can be considered - from EACF IMA(1,3), IMA(1,4), ARIMA(1,1,3), ARIMA(1,1,4) - from BIC table ARIMA(7,1,6)

Here, IMA(1,3), ARIMA(1,1,3) are the models selected by two or more model specification tools.

Going further, we can perform tests to justify which is the best model out of the proposed ARIMA models. However, that is beyond the scope of this report.

Appendix

R codes to produce the graphs are attached:

library(TSA)
library(tseries)

#loading the data
ozone = read.table("D:/OneDrive - RMIT University/Current Studies/Timeseries Analysis/Assignment 1/data1.csv", sep = ",")

#converting data into time serise
ts = ts(ozone, start = 1927, end = 2016)

#Task 1

#ploting the time series plot for the data
plot(ts, type = 'o',
     main ="Time series plot of ozone layer thickess change",
     ylab = "Change in Ozone thickness",
     xlab = "Year")

#Linear Modelling the trend
model1 = lm(ts~time(ts)) #creating linear model as model1
summary(model1) #look at the stats of the linear model

#plot the linear model
plot(ts, type = 'o',
     main ="Simple Linear model of ozone thickness change",
     ylab = "Change in Ozone thickness",
     xlab = "Year")
abline(model1, col = 'red') #putting the linearline on the trend.

#residual of model1
res.model1 = rstudent(model1)
y = res.model1
plot(y = y, x = as.vector(time(ts)),
     xlab = 'Year', ylab='Standardized Residuals',type='p',
     main = "Residual of linear regression model of ozone thickness")
abline(h=mean(y), col='red')

#Analyse the distribution of the residuals
hist(y, xlab='Standardized Residuals', 
     main = "Distribution of standardized residuals from linear model
     with normal approximation overlay",
     freq=FALSE,
     ylim = c(0, 0.45))
m = round(mean(y), 4)
sd = round(sqrt(var(y)), 4)
abline(v = m, lwd = 2, col = 'darkred')
curve(dnorm(x, mean=m, sd=sd), 
      col="darkred", lwd=2, add=TRUE, yaxt="n")
text(1,0.4, paste("mean =", m, "
std =", sd), col = "darkred", adj = c(0, 0.4))


#qq plot to test for normality of residuals
qqnorm(y, main = "Q-Q Plot for residuals of linear model") 
qqline(y, col = 2, lwd = 2, lty = 3)

#Shapiro test for normality
shapiro.test(y)

#Autocorrelation plot of the residuals
acf(y, main = "Autocorrelation plot of the residuals \n for linear model of Ozone Data")

#Quadratic model
t = time(ts)
t2 = t^2
model2 = lm(ts~ t + t2) #creating the quadratic model as model2
summary(model2) #looking at the significance of model2

#plotting the quadratic curve
plot(
  ts(fitted(model2), start = 1927),
  ylab = 'Change in Ozone thickness',
  main = "Fitted quadratic curve to change in ozone thickeness",
  ylim = c(-12, 3),
  col = 'red'
)
#overlay the ozone layer data
lines(ts,type="o")

#residual analysis for model2
res.model2 = rstudent(model2)
plot(y = res.model2, x = as.vector(time(ts)),
     xlab = 'Year', ylab='Standardized Residuals',type='p',
     main = "Residual of quadratic regression model of ozone thickness")
abline(h=mean(y), col='red')

#Plotting histogram of residuals from the quadratic model
y = res.model2
hist(y, xlab='Standardized Residuals'
     , main = "Distribution of standardized residuals from quadratic model",
     freq=FALSE,
     ylim = c(0, 0.45))

#overlay the normal curve to check for normality
m = round(mean(y), 5)
sd = round(sqrt(var(y)), 4)
abline(v = m, lwd = 2, col = 'darkred')
curve(dnorm(x, mean=m, sd=sd), 
      col="darkred", lwd=2, add=TRUE, yaxt="n")
text(1,0.4, paste("mean =", m, "
std =", sd), col = "darkred", adj = c(0, 0.4))

#qq plot to test for normality of residuals
qqnorm(y, main = "Q-Q Plot for residuals of quadratic model") 
qqline(y, col = 2, lwd = 2, lty = 3)

#Shapiro test for normality
shapiro.test(y)

#Autocorrelation plot of the residuals
acf(y, main = "Autocorrelation plot of residuals for quadratic model of Ozone Data")

#set the cycles frequencies
f = 7
ts =ts(ts,frequency =f)

#Cyclical trend
ts_plot1 = plot(ts, type = 'l',
                main ="Time series plot of the yearly changes \n in thickness of Ozone layer",
                ylab = "Change in Ozone thickness",
                xlab = "cycle = 7 years")
cycle = factor(rep(1:f, length.out = length(ts)), ordered = TRUE)
points(y=ts,x=time(ts), pch=as.character(cycle), col =2, cex = 1.15)

#Checking for harmonics model validity
har.<- harmonic(ts,1)
model3 =lm(ts~har.)
summary(model3)
#model R squared value is very small.  Hence not a good model.

#modelling with additional trend component in the cyclic model
model4 =lm(ts~t+har.)
summary(model4)

# Shapiro test for normality
shapiro.test(y)


#Resetting the time series
ts = ts(ozone, start = 1927, end = 2016)
h=5 # 5 step ahead forecasting
new = data.frame(t = seq((min(t)), (max(t)+h), 1))
new$t2 = new$t^2
#making the predictions
pred1 = predict(model2, new, interval = "prediction") 

#getting just the predictions for year 2017 - 2021
pred1_table = data.frame(Year = seq(2017, 2021, 1),pred1[91:95,])
colnames(pred1_table) = c("Year", "Prediction", "LowerCI", "UpperCI")
pred1_table

#graphing it out
plot(ts, type = 'o',
     main ="Forecast 5 years ahead with quadratic model",
     ylab = "Change in Ozone thickness",
     xlab = "Year",
     xlim=c(1927,2021),
     ylim = c(-15,4))
lines(ts(as.vector(pred1[,1]), start = 1927), col="red", type="l") 
lines(ts(as.vector(pred1[,2]), start = 1927), col="blue", type="l") 
lines(ts(as.vector(pred1[,3]), start = 1927), col="blue", type="l")  
legend("topright", lty=1, pch=1, col=c("black","blue","red"), text.width
       = 18, 
       c("Data","5% forecast limits", "Forecasts"))


#Task 2

plot(ts, type = 'o',
     main ="Simple Linear model of ozone thickness change",
     ylab = "Change in Ozone thickness",
     xlab = "Year")
abline(model1, col = 'red') #putting the linearline on the trend.

#plotting the the acf
acf(ts, ci.type = 'ma', main = "ACF of Ozone change")

#plotting the the pacf
pacf(ts, main = "PACF of Ozone change")

#getting the lag to use in the ADF test
ar(ts)

#conducting the ADF tests
library(fUnitRoots)
adfTest(ts, lags = 4, type = "nc", title = 'ADF Test with No constant nor time trend')
adfTest(ts, lags = 4, type = "c", title = 'ADF Test with constant no time trend')
adfTest(ts, lags = 4, type = "ct", title = 'ADF Test with constant and with time trend')

#checking for Box_Cox transformation best lambda
ts_transf <-  BoxCox.ar(ts+13)
title(main = "Log-likelihood versus the values of lambda")

#Plotting the first difference
ts_diff = diff(ts)
plot(ts_diff, type = 'o',
     main ="First Diff of Yearly changes in Ozone Layer",
     ylab = "Change in Ozone thickness",
     xlab = "Year")

#getting the lag to use in the ADF test
ar(ts_diff)

#conducting the adf test
adfTest(ts_diff, lags = 6, type = "nc", title = 'ADF Test with No constant nor time trend')
adfTest(ts_diff, lags = 6, type = "c", title = 'ADF Test with constant no time trend')
adfTest(ts_diff, lags = 6, type = "ct", title = 'ADF Test with constant and with time trend')

#acf and pacf of differenced data
acf(ts_diff, main = 'ACF of differenced ozone series')
pacf(ts_diff, main = 'PACF of differenced ozone series')

#calculating the eacf
eacf(ts_diff, ar.max = 6, ma.max = 9)

#creating the BIC plot
res = armasubsets(ts_diff, nar = 9, nma = 9, y.name='ar', ar.method = 'ols')
plot(res)
title(main = "BIC table of differenced ozone series.", line = 6)


       

Bibliography

Ritchie, Hannah, and Max Roser. 2020. “Ozone Layer.” Our World in Data.