Introduction

Task 1

Importing the Ozone Thickness time series data

#Importing the files 
data <- read_csv("data1.csv", col_names = FALSE)

#Provide column name 
colnames(data) <- ("ozone thickness")

#creating year sequence
year <- seq(1927,2016)

#Adding the year column to dataframe 
ozone <-  cbind(year,data)

Converting the Dataframe to the Time Series Object

#Convert the dataframe to Time Series object 
ozonets <- ts(ozone$`ozone thickness`,start = c(1927),end = c(2016),frequency = 1)

Plotting the Ozone thickness time series data

#plot the time series of ozone thickness 
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
     main="Time Series of changes in Ozone thickness from 1927 to 2016")

Visualize the autocorrelation of Ozone layer thickness in consecutive years

plot(y=ozonets,x=zlag(ozonets),xlab = "Previous Year thickness of Ozone layer",ylab= "Dobson Units",main = "Scatter plot of Ozone layer thickness in consecutive years")

Building right model

Linear Trend Model

#----Linear trend model----
#To fit a trend model, we need to extract time from the time series object:
t <- time(ozonets)

#fitting model 
model1 <- lm(ozonets ~ t) 
summary(model1)
## 
## Call:
## lm(formula = ozonets ~ t)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 213.720155  16.257158   13.15   <2e-16 ***
## t            -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

Observation:

  • Intercept and the Slope of the linear trend model is 213.72 and -0.110029 respectively.

  • Intercept,Slope,and p-value of the linear trend model is statistically significant at 5% level.

  • Adjusted R-squared value shows that 66.55% of the variance in the ozone thickness time series is explained by linear model.

Plotting the Time Series fitted to Linear Trend Model

#plotting the time series fitted to Linear model 
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
     main="Time Series of changes in Ozone thickness from 1927 to 2016")
abline(model1, col="red", lty = 2) # add a fitted least squares line from model1
legend("topright", legend = c("Time series plot", "Fitted Linear model"),
 col = c("black", "red"), lwd = 1, lty = c(1, 2), pch = c(1, NA))

  • According to the plot ,we can suggest that the Regression line does not cover all the points, some of the points are left out.

Residual Analysis of Linear Trend Model

The leftover of linear trend model which behaves like stochastic model is residuals.We can analyse these residuals if they behave like independant,normally distributed random variables(white noise).

par(mfrow=c(2,2))  #Divides plotting page into 4 subplots
#Visualizing  standaradized Residuals 
plot(y=rstudent(model1),x=as.vector(time(ozonets)), xlab='Time',
     ylab='Standardized Residuals',type='o', main = "Time series plot of standardized residuals 
     fitted to Ozone-Thickness series.")

#hist view 
hist(rstudent(model1),xlab='Standardized Residuals'
     , main = "Histogram of standardised residuals
     fitted to the Ozone-Thickness series.")

#---QQ plot
y = rstudent(model1)
qqnorm(y, main = "QQ plot of standardised residuals for
       linear model fitted to the Ozone-Thickness series.")
qqline(y, col = 2, lwd = 1, lty = 2)

#AutoCorrelation Function of Standaradized Residuals of model 
acf(rstudent(model1), main = "ACF of standardized residuals.")

#Shapiro-Wilk Normality for residuals
y=rstudent(model1)
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98733, p-value = 0.5372
  • Histogram : The plot concludes the standardized residuals are normally distributed.

  • QQ plot : We can see more number of the points of the time series falling on the straight line pattern ,there is slight deviation from the straight line pattern at the low and high ends and a slight shift is obesrved in the points around the mid section. We can say that the QQ plot supports of normally distributed stochastic component in linear trend model and very less white noise behaviour observed.

  • Standardized series plot : The plot suggests that the trend has been improved after standardizing the residuals.

  • Sample Autocorrelation Function(ACF) : According to the plot we can say that lag 1, 7 and 8 auto-correlations surpasses conidence limit.We can conclude that white noise is not the stochastic part of the series.

  • Shapiro test : The p-value of Shapiro test is 0.5372 which is greater than 0.05 hence, we can conclude that residual data are normally distributed.

Quadratic Model

#----Quadratic model------
# since we have time points of the series ,we can find the t^2 points 
t2 =t^2
model2 = lm(ozonets ~ t+t2) 
summary(model2)
## 
## Call:
## lm(formula = ozonets ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

Observations:

  • Intercept,Slope,and p-value of the quadratic trend model is statistically significant at 5% level.

  • R-squared value shows that 73.31% of the variance in the ozone thickness time series is explained by quadratic model better than linear model.

Plotting the Time Series fitted to Quadratic Model

#Visualizing the time series fitted to model
plot(ts(fitted(model2)), ylab='y', main = "Fitted quadratic curve to ozone-thickness series.",col = "blue",lty=2,
    ylim = c(min(c(fitted(model2), as.vector(ozonets))) ,
             max(c(fitted(model2), as.vector(ozonets)))) )
lines(as.vector(ozonets),type="o")
legend("topright", legend = c("Time series plot", "Fitted Quadratic model"),
 col = c("black", "blue"), lwd = 1, lty = c(1, 2), pch = c(1, NA))

  • The plot suggests that the quadratic curve covers more points as compared to regression line.

Residual analyis of Quadratic Trend Model

par(mfrow=c(2,2)) 
#Plot of standaradized Residuals 
plot(y=rstudent(model2),x=as.vector(time(ozonets)), xlab='Time',
     ylab='Standardized Residuals',type='o', main = "Time series plot of standardized residuals fitted
     to Ozone-Thickness series")

#---hist view 
hist(rstudent(model2),xlab='Standardized Residuals'
     , main = "Histogram of standardised residual
     fitted to the Ozone-Thickness series")

#QQ plot  
y = rstudent(model2)
qqnorm(y, main = "QQ plot of standardised residuals 
fitted to the Ozone-Thickness series")
qqline(y, col = 2, lwd = 1, lty = 2)

#----AutoCorrelation Function----
acf(rstudent(model2), main = "ACF of standardized residuals 
fitted to the Ozone-Thickness series")

#Shapiro-Wilk Normality for residuals
y=rstudent(model2)
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98889, p-value = 0.6493
  • Histogram : The plot concludes the standardized residuals are normally distributed.

  • QQ plot : We can see most of the points of the time series falling on the straight line pattern,the deviation from the straight line pattern is relatively less at the low and high ends compared to the linear model.

  • Standardized series plot : The plot suggests that the trend seems to be bette after standardizing the residuals, but change in variances can be observed.

  • Sample Autocorrelation Function(ACF) : We can see many correlations which are higher than the confidence intervals at many lags,slight white noise like behaviour is observed.

  • Shapiro test : The p-value of Shapiro test is 0.6493 which is greater than 0.05 hence, we can conclude that residual data are normally distributed. hence, we conclude that the standardizing residual data make the series staionary.

Forecast yearly changes of ozone layer thickness for the next 5 years

Since Quadratic Model is better than Linear model,we are using quadratic model to forecast the changes in the thickness of ozone layer for next five years.

h <- 5 # 5 steps ahead forecasts
t <- time(ozonets)
t2 <- t^2 
aheadTimes <- data.frame(t = seq(2017, 2017+h, 1),
                         t2 = seq(2017, 2017+h, 1)^2)

#creating a forecast model 
frcModel2 <- predict(model2, newdata = aheadTimes, interval = "prediction")
print(frcModel2)
##         fit       lwr       upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701
## 6 -11.62856 -15.51482 -7.742297
#Visualizing the Forecast model
plot(ozonets, xlim= c(1927,2021+h+1), ylim = c(-15,5), ylab = "Ozone-thickness series",
     main = "Forecasts from the quadratic model fitted to the Ozone-thickness series.")
lines(ts(as.vector(frcModel2[,3]), start = 2017), col="blue" , type="l")
lines(ts(as.vector(frcModel2[,1]), start = 2017), col="red", type="l")
lines(ts(as.vector(frcModel2[,2]), start = 2017), col="blue", type="l")
legend("topright", lty=1, pch=1, col=c("black","blue","red"), text.width = 24,
       c("Data" ,"5% forecast limits","Forecasts"))

Summary

  • Quadratic trend model would be better fit than the linear trend for the Ozone thickness time series because the R-squared value of quadratic model(73.31%) is higher than that of linear model(66.55%) as it describes better variation in the series.

  • Quadratic model is used for forecasting the thickness of Ozone layer for 5 years over linear trend model.

Task 2

Time Series plot of the Ozone thickness data

#Time series plot of ozone thickness 
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
     main="Time Series of changes in Ozone thickness from 1927 to 2016")

ACF and PACF plots for Ozone thickness data

#ACF and PACF tests for ozone thickness time series
par(mfrow=c(1,2))
acf(ozonets, main ="ACF plot ")
pacf(ozonets, main ="PACF plot")

Dickey-Fuller Test to check the stationarity of ozone thickness time series

#Test to check the stationarity of the ozone thickness time series
adf.test(ozonets, alternative = c("stationary")) # Use default value of k
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ozonets
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary

Box-Cox Transformation for the Ozone thickness time series

#Box cox transformation 


#Since we have negative values in our data, we include them by adding the minimum value in our data and add 1.
ozonets = ozonets + abs(min(ozonets))+1

ozonets.BC.transform = BoxCox.ar(ozonets + abs(min(ozonets))+1)
title(main="Log-Likelihood vs Value of Lamdba")

#Confidence levels of Box-Cox transformed series
ozonets.BC.transform$ci
## [1] 0.9 1.8
# To find the optimal lambda value
lambda <- ozonets.BC.transform$lambda[which(max(ozonets.BC.transform$loglike) == ozonets.BC.transform$loglike)]
lambda
## [1] 1.3

Applying the Box-Cox transformation to the Ozone thickness time series

# Apply Box-Cox transformation
ozonetsBC <- (ozonets^lambda - 1) / lambda

Plotting the Box-Cox transformed Ozone thickness Series

#plot of Box -cox transformation
plot(ozonetsBC, ylab='Box-Cox transformed ozone thickness series', xlab='Time', type='o',main="Box Cox transformed Ozone thickness Series")

First Lag Difference time series analysis

#Calculating the first difference lag tinme series
ozonetsdiff <- diff(ozonets, differences = 1)

#Visualizing the first difference series
plot(ozonetsdiff, ylab='First difference of Ozone thickness series', xlab='Time', type='o',main="First difference of Ozone thickness series")

ACF & PACF for the first difference lag time series

#ACF and PACF tests for first difference lag time series
par(mfrow=c(1,2))
acf(ozonetsdiff, main ="ACF of first difference time series.")
pacf(ozonetsdiff, main ="PACF of first difference time series.")

In order to have better sense of stationarity of the time series we will analyze second lag difference of time series

Second Lag Difference time series analysis

#Calculating the second difference lag tinme series
ozonetsdiff_ <- diff(ozonets, differences = 2)

#Visualizing the second difference series
plot(ozonetsdiff_, ylab='Second lag difference of Ozone thickness series', xlab='Time', type='o',main="Second lag difference of Ozone thickness series")

ACF & PACF for the second difference lag time series

#ACF and PACF tests for second difference lag time series
par(mfrow=c(1,2))
acf(ozonetsdiff_, main ="ACF of second difference time series.")
pacf(ozonetsdiff_, main ="PACF of second difference time series.")

* ACF - There is no seasonal pattern in the plot and we ca proceed for further fitting the ARIMA model

The possible ARIMA models is (3,2,2)

# Stationarity test for the first lag difference time series
adf.test(ozonetsdiff_, alternative = c("stationary")) # Use default value of k
## Warning in adf.test(ozonetsdiff_, alternative = c("stationary")): p-value
## smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ozonetsdiff_
## Dickey-Fuller = -7.6128, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

The above test confirms that the first lag differnece time series is stationary because the p-value is lesser than 0.05.

Extended Auto-Correlation Function Table

#Extended Auto Correlation Function
eacf(ozonetsdiff_)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o x o o o o o o x o  o  o  o 
## 1 o o x o o o o o o x o  o  o  o 
## 2 x x x o o o o o o x o  o  o  o 
## 3 x o x o o o o o o o o  o  o  o 
## 4 x x o x o o o o o o o  o  o  o 
## 5 x x x x o o o o o o o  o  o  o 
## 6 x o x x o o o o o o o  o  o  o 
## 7 x x o x o o o o o o o  o  o  o

Bayesian Information Criterion

#BIC table 
res_ = armasubsets(y=ozonetsdiff_ , nar=5 , nma=5, y.name='Ozone thickness Index', ar.method='ols')
plot(res_)

Summary

  • We can still observe trend in the Box-Cox transformed time series which does not succeed in making the time series as stationary series.
  • Second lag difference of time series analysis is chosen over the first lag difference of time series for yeilding better stationary time series with no existence of trend.

Conclusion

Appendix

library(TSA) library(readr) library(tseries)

data <- read_csv(“D:/Time Series Analysis/Assignment 1/data1.csv”, col_names = FALSE)

colnames(data) <- (“ozone thickness”) View(data)

year <- seq(1927,2016) year

ozone <- cbind(year,data) View(ozone)

ozonets <- ts(ozone$ozone thickness,start = c(1927),end = c(2016),frequency = 1)

plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”)

plot(y=ozonets,x=zlag(ozonets),xlab = “Previous Year thickness of Ozone layer”,ylab= “Dobson Units” ,main = “Scatter plot of Ozone layer thickness in consecutive years”)

t <- time(ozonets)

model1 <- lm(ozonets ~ t) summary(model1)

plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”) abline(model1) # add a fitted least squares line from model1

plot(y=rstudent(model1),x=as.vector(time(ozonets)), xlab=‘Time’, ylab=‘Standardized Residuals’,type=‘o’, main = “Time series plot of standardized resid fitted to Ozone-Thickness series.”)

hist(rstudent(model1),xlab=‘Standardized Residuals’ , main = “Histogram of standardised residual fitted to the Ozone-Thickness series.”)

y = rstudent(model1) qqnorm(y, main = “QQ plot of standardised residuals for the linear model fitted to the Ozone-Thickness series.”) qqline(y, col = 2, lwd = 1, lty = 2)

acf(rstudent(model1), main = “ACF of standardized residuals the linear model fitted to the Ozone-Thickness series.”)

y=rstudent(model1) shapiro.test(y)

t2 =t^2 model2 = lm(ozonets ~ t+t2) summary(model2)

plot(ts(fitted(model2)), ylab=‘y’, main = “Fitted quadratic curve to ozone-thickness series.”, ylim = c(min(c(fitted(model2), as.vector(ozonets))) , max(c(fitted(model2), as.vector(ozonets)))) ) lines(as.vector(ozonets),type=“o”)

plot(y=rstudent(model2),x=as.vector(time(ozonets)), xlab=‘Time’, ylab=‘Standardized Residuals’,type=‘o’, main = “Time series plot of standardized resid fitted to Ozone-Thickness series.”)

hist(rstudent(model2),xlab=‘Standardized Residuals’ , main = “Histogram of standardised residual fitted to the Ozone-Thickness series.”)

y = rstudent(model2) qqnorm(y, main = “QQ plot of standardised residuals for the linear model fitted to the Ozone-Thickness series.”) qqline(y, col = 2, lwd = 1, lty = 2)

acf(rstudent(model2), main = “ACF of standardized residuals the quadratic model fitted to the land-use series.”)

y=rstudent(model1) shapiro.test(y)

h <- 5 # 5 steps ahead forecasts t <- time(ozonets) aheadTimes <- data.frame(t = seq(2017, 2021, 1)) frcModel1 <- predict(model1, newdata = aheadTimes, interval = “prediction”) print(frcModel1)

plot(ozonets, xlim= c(1927,2021), ylim = c(-12,10), ylab = “Ozone-thickness series”, main = “Forecasts from the linear model fitted to the Ozone-thickness series.”) lines(ts(as.vector(frcModel1[,3]), start = 2017), col=“blue” , type=“l”) lines(ts(as.vector(frcModel1[,1]), start = 2017), col=“red”, type=“l”) lines(ts(as.vector(frcModel1[,2]), start = 2017), col=“blue”, type=“l”) legend(“topright”, lty=1, pch=1, col=c(“black”,“blue”,“red”), text.width = 10, c(“Data” ,“5% forecast limits”,“Forecasts”))

h <- 5 # 5 steps ahead forecasts t <- time(ozonets) t2 <- t^2 aheadTimes <- data.frame(t = seq(2017, 2017+h, 1), t2 = seq(2017, 2017+h, 1)^2) frcModel2 <- predict(model2, newdata = aheadTimes, interval = “prediction”) print(frcModel2)

plot(ozonets, xlim= c(1927,2021+h+1), ylim = c(-12,10), ylab = “Ozone-thickness series”, main = “Forecasts from the quadratic model fitted to the Ozone-thickness series.”) lines(ts(as.vector(frcModel2[,3]), start = 2017), col=“blue” , type=“l”) lines(ts(as.vector(frcModel2[,1]), start = 2017), col=“red”, type=“l”) lines(ts(as.vector(frcModel2[,2]), start = 2017), col=“blue”, type=“l”) legend(“topright”, lty=1, pch=1, col=c(“black”,“blue”,“red”), text.width = 10, c(“Data” ,“5% forecast limits”,“Forecasts”))

plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”)

par(mfrow=c(1,2)) acf(ozonets, main =“ACF plot”) pacf(ozonets, main =“PACF plot”)

adf.test(ozonets, alternative = c(“stationary”)) # Use default value of k

ozonets = ozonets + abs(min(ozonets))+1

ozonets.BC.transform = BoxCox.ar(ozonets + abs(min(ozonets))+1) title(main=“Log-Likelihood vs Value of Lamdba”)

ozonets.BC.transform$ci

lambda <- ozonets.BC.transform\(lambda[which(max(ozonets.BC.transform\)loglike) == ozonets.BC.transform$loglike)] lambda

ozonetsBC <- (ozonets^lambda - 1) / lambda

plot(ozonetsBC, ylab=‘Box-Cox transformed ozone thickness series’, xlab=‘Time’, type=‘o’ ,main=“Box Cox transformed Ozone thickness Series”)

ozonetsdiff <- diff(ozonets, differences = 1)

plot(ozonetsdiff, ylab=‘First difference of ozone thickness series’, xlab=‘Time’, type=‘o’ ,main=“First difference of Ozone thickness series”)

par(mfrow=c(1,2)) acf(ozonetsdiff, main =“ACF of first difference time series.”) pacf(ozonetsdiff, main =“PACF of first difference time series.”)

adf.test(ozonetsdiff, alternative = c(“stationary”)) # Use default value of k

eacf(ozonetsdiff)

res_ = armasubsets(y=ozonetsdiff , nar=5 , nma=5, y.name=‘Ozone thickness Index’, ar.method=‘ols’) plot(res_)

References