The main objective of the study is to analyze thickness of ozone layer.
The dataset represents the yearly changes in the thickness of Ozone layer 1927 to 2016 in Dobson unit.A negative value in the dataset represents a decrease in the thickness and a positive value represents an increase in the thickness.
The first task is to determine the best fitting model and forecast the thickness of ozone layer for next 5 years.
The second task is to suggest suitable ARMA/ARIMA(p, d, q) model using model specification tools like ACF-PACF, EACF,BIC table
Importing the Ozone Thickness time series data
#Importing the files
data <- read_csv("data1.csv", col_names = FALSE)
#Provide column name
colnames(data) <- ("ozone thickness")
#creating year sequence
year <- seq(1927,2016)
#Adding the year column to dataframe
ozone <- cbind(year,data)
Converting the Dataframe to the Time Series Object
#Convert the dataframe to Time Series object
ozonets <- ts(ozone$`ozone thickness`,start = c(1927),end = c(2016),frequency = 1)
Plotting the Ozone thickness time series data
#plot the time series of ozone thickness
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
main="Time Series of changes in Ozone thickness from 1927 to 2016")
Trend : Trend looks like downward trend.
Seasonality : There might be seasonality but we need to perform PACF test is required to confirm seasonality in time series.
Change in Variance : Changes in variance are not significant by looking at the plot.
Behaviour : Moving Average like behaviour is observed therefore the data is non-stationery.
Change Point : There were no observations of Change point in the Time Series.
Visualize the autocorrelation of Ozone layer thickness in consecutive years
plot(y=ozonets,x=zlag(ozonets),xlab = "Previous Year thickness of Ozone layer",ylab= "Dobson Units",main = "Scatter plot of Ozone layer thickness in consecutive years")
#----Linear trend model----
#To fit a trend model, we need to extract time from the time series object:
t <- time(ozonets)
#fitting model
model1 <- lm(ozonets ~ t)
summary(model1)
##
## Call:
## lm(formula = ozonets ~ t)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7165 -1.6687 0.0275 1.4726 4.7940
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 213.720155 16.257158 13.15 <2e-16 ***
## t -0.110029 0.008245 -13.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared: 0.6693, Adjusted R-squared: 0.6655
## F-statistic: 178.1 on 1 and 88 DF, p-value: < 2.2e-16
Intercept and the Slope of the linear trend model is 213.72 and -0.110029 respectively.
Intercept,Slope,and p-value of the linear trend model is statistically significant at 5% level.
Adjusted R-squared value shows that 66.55% of the variance in the ozone thickness time series is explained by linear model.
Plotting the Time Series fitted to Linear Trend Model
#plotting the time series fitted to Linear model
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
main="Time Series of changes in Ozone thickness from 1927 to 2016")
abline(model1, col="red", lty = 2) # add a fitted least squares line from model1
legend("topright", legend = c("Time series plot", "Fitted Linear model"),
col = c("black", "red"), lwd = 1, lty = c(1, 2), pch = c(1, NA))
The leftover of linear trend model which behaves like stochastic model is residuals.We can analyse these residuals if they behave like independant,normally distributed random variables(white noise).
par(mfrow=c(2,2)) #Divides plotting page into 4 subplots
#Visualizing standaradized Residuals
plot(y=rstudent(model1),x=as.vector(time(ozonets)), xlab='Time',
ylab='Standardized Residuals',type='o', main = "Time series plot of standardized residuals
fitted to Ozone-Thickness series.")
#hist view
hist(rstudent(model1),xlab='Standardized Residuals'
, main = "Histogram of standardised residuals
fitted to the Ozone-Thickness series.")
#---QQ plot
y = rstudent(model1)
qqnorm(y, main = "QQ plot of standardised residuals for
linear model fitted to the Ozone-Thickness series.")
qqline(y, col = 2, lwd = 1, lty = 2)
#AutoCorrelation Function of Standaradized Residuals of model
acf(rstudent(model1), main = "ACF of standardized residuals.")
#Shapiro-Wilk Normality for residuals
y=rstudent(model1)
shapiro.test(y)
##
## Shapiro-Wilk normality test
##
## data: y
## W = 0.98733, p-value = 0.5372
Histogram : The plot concludes the standardized residuals are normally distributed.
QQ plot : We can see more number of the points of the time series falling on the straight line pattern ,there is slight deviation from the straight line pattern at the low and high ends and a slight shift is obesrved in the points around the mid section. We can say that the QQ plot supports of normally distributed stochastic component in linear trend model and very less white noise behaviour observed.
Standardized series plot : The plot suggests that the trend has been improved after standardizing the residuals.
Sample Autocorrelation Function(ACF) : According to the plot we can say that lag 1, 7 and 8 auto-correlations surpasses conidence limit.We can conclude that white noise is not the stochastic part of the series.
Shapiro test : The p-value of Shapiro test is 0.5372 which is greater than 0.05 hence, we can conclude that residual data are normally distributed.
#----Quadratic model------
# since we have time points of the series ,we can find the t^2 points
t2 =t^2
model2 = lm(ozonets ~ t+t2)
summary(model2)
##
## Call:
## lm(formula = ozonets ~ t + t2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1062 -1.2846 -0.0055 1.3379 4.2325
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.733e+03 1.232e+03 -4.654 1.16e-05 ***
## t 5.924e+00 1.250e+00 4.739 8.30e-06 ***
## t2 -1.530e-03 3.170e-04 -4.827 5.87e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared: 0.7391, Adjusted R-squared: 0.7331
## F-statistic: 123.3 on 2 and 87 DF, p-value: < 2.2e-16
Intercept,Slope,and p-value of the quadratic trend model is statistically significant at 5% level.
R-squared value shows that 73.31% of the variance in the ozone thickness time series is explained by quadratic model better than linear model.
Plotting the Time Series fitted to Quadratic Model
#Visualizing the time series fitted to model
plot(ts(fitted(model2)), ylab='y', main = "Fitted quadratic curve to ozone-thickness series.",col = "blue",lty=2,
ylim = c(min(c(fitted(model2), as.vector(ozonets))) ,
max(c(fitted(model2), as.vector(ozonets)))) )
lines(as.vector(ozonets),type="o")
legend("topright", legend = c("Time series plot", "Fitted Quadratic model"),
col = c("black", "blue"), lwd = 1, lty = c(1, 2), pch = c(1, NA))
par(mfrow=c(2,2))
#Plot of standaradized Residuals
plot(y=rstudent(model2),x=as.vector(time(ozonets)), xlab='Time',
ylab='Standardized Residuals',type='o', main = "Time series plot of standardized residuals fitted
to Ozone-Thickness series")
#---hist view
hist(rstudent(model2),xlab='Standardized Residuals'
, main = "Histogram of standardised residual
fitted to the Ozone-Thickness series")
#QQ plot
y = rstudent(model2)
qqnorm(y, main = "QQ plot of standardised residuals
fitted to the Ozone-Thickness series")
qqline(y, col = 2, lwd = 1, lty = 2)
#----AutoCorrelation Function----
acf(rstudent(model2), main = "ACF of standardized residuals
fitted to the Ozone-Thickness series")
#Shapiro-Wilk Normality for residuals
y=rstudent(model2)
shapiro.test(y)
##
## Shapiro-Wilk normality test
##
## data: y
## W = 0.98889, p-value = 0.6493
Histogram : The plot concludes the standardized residuals are normally distributed.
QQ plot : We can see most of the points of the time series falling on the straight line pattern,the deviation from the straight line pattern is relatively less at the low and high ends compared to the linear model.
Standardized series plot : The plot suggests that the trend seems to be bette after standardizing the residuals, but change in variances can be observed.
Sample Autocorrelation Function(ACF) : We can see many correlations which are higher than the confidence intervals at many lags,slight white noise like behaviour is observed.
Shapiro test : The p-value of Shapiro test is 0.6493 which is greater than 0.05 hence, we can conclude that residual data are normally distributed. hence, we conclude that the standardizing residual data make the series staionary.
Since Quadratic Model is better than Linear model,we are using quadratic model to forecast the changes in the thickness of ozone layer for next five years.
h <- 5 # 5 steps ahead forecasts
t <- time(ozonets)
t2 <- t^2
aheadTimes <- data.frame(t = seq(2017, 2017+h, 1),
t2 = seq(2017, 2017+h, 1)^2)
#creating a forecast model
frcModel2 <- predict(model2, newdata = aheadTimes, interval = "prediction")
print(frcModel2)
## fit lwr upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701
## 6 -11.62856 -15.51482 -7.742297
#Visualizing the Forecast model
plot(ozonets, xlim= c(1927,2021+h+1), ylim = c(-15,5), ylab = "Ozone-thickness series",
main = "Forecasts from the quadratic model fitted to the Ozone-thickness series.")
lines(ts(as.vector(frcModel2[,3]), start = 2017), col="blue" , type="l")
lines(ts(as.vector(frcModel2[,1]), start = 2017), col="red", type="l")
lines(ts(as.vector(frcModel2[,2]), start = 2017), col="blue", type="l")
legend("topright", lty=1, pch=1, col=c("black","blue","red"), text.width = 24,
c("Data" ,"5% forecast limits","Forecasts"))
Quadratic trend model would be better fit than the linear trend for the Ozone thickness time series because the R-squared value of quadratic model(73.31%) is higher than that of linear model(66.55%) as it describes better variation in the series.
Quadratic model is used for forecasting the thickness of Ozone layer for 5 years over linear trend model.
Time Series plot of the Ozone thickness data
#Time series plot of ozone thickness
plot(ozonets,type="o",ylab="Ozone thickness(in Dobson units)",xlab="Year",
main="Time Series of changes in Ozone thickness from 1927 to 2016")
ACF and PACF plots for Ozone thickness data
#ACF and PACF tests for ozone thickness time series
par(mfrow=c(1,2))
acf(ozonets, main ="ACF plot ")
pacf(ozonets, main ="PACF plot")
Dickey-Fuller Test to check the stationarity of ozone thickness time series
#Test to check the stationarity of the ozone thickness time series
adf.test(ozonets, alternative = c("stationary")) # Use default value of k
##
## Augmented Dickey-Fuller Test
##
## data: ozonets
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary
Box-Cox Transformation for the Ozone thickness time series
#Box cox transformation
#Since we have negative values in our data, we include them by adding the minimum value in our data and add 1.
ozonets = ozonets + abs(min(ozonets))+1
ozonets.BC.transform = BoxCox.ar(ozonets + abs(min(ozonets))+1)
title(main="Log-Likelihood vs Value of Lamdba")
#Confidence levels of Box-Cox transformed series
ozonets.BC.transform$ci
## [1] 0.9 1.8
# To find the optimal lambda value
lambda <- ozonets.BC.transform$lambda[which(max(ozonets.BC.transform$loglike) == ozonets.BC.transform$loglike)]
lambda
## [1] 1.3
Applying the Box-Cox transformation to the Ozone thickness time series
# Apply Box-Cox transformation
ozonetsBC <- (ozonets^lambda - 1) / lambda
Plotting the Box-Cox transformed Ozone thickness Series
#plot of Box -cox transformation
plot(ozonetsBC, ylab='Box-Cox transformed ozone thickness series', xlab='Time', type='o',main="Box Cox transformed Ozone thickness Series")
First Lag Difference time series analysis
#Calculating the first difference lag tinme series
ozonetsdiff <- diff(ozonets, differences = 1)
#Visualizing the first difference series
plot(ozonetsdiff, ylab='First difference of Ozone thickness series', xlab='Time', type='o',main="First difference of Ozone thickness series")
ACF & PACF for the first difference lag time series
#ACF and PACF tests for first difference lag time series
par(mfrow=c(1,2))
acf(ozonetsdiff, main ="ACF of first difference time series.")
pacf(ozonetsdiff, main ="PACF of first difference time series.")
In order to have better sense of stationarity of the time series we will analyze second lag difference of time series
Second Lag Difference time series analysis
#Calculating the second difference lag tinme series
ozonetsdiff_ <- diff(ozonets, differences = 2)
#Visualizing the second difference series
plot(ozonetsdiff_, ylab='Second lag difference of Ozone thickness series', xlab='Time', type='o',main="Second lag difference of Ozone thickness series")
ACF & PACF for the second difference lag time series
#ACF and PACF tests for second difference lag time series
par(mfrow=c(1,2))
acf(ozonetsdiff_, main ="ACF of second difference time series.")
pacf(ozonetsdiff_, main ="PACF of second difference time series.")
* ACF - There is no seasonal pattern in the plot and we ca proceed for further fitting the ARIMA model
The possible ARIMA models is (3,2,2)
# Stationarity test for the first lag difference time series
adf.test(ozonetsdiff_, alternative = c("stationary")) # Use default value of k
## Warning in adf.test(ozonetsdiff_, alternative = c("stationary")): p-value
## smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: ozonetsdiff_
## Dickey-Fuller = -7.6128, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
The above test confirms that the first lag differnece time series is stationary because the p-value is lesser than 0.05.
Extended Auto-Correlation Function Table
#Extended Auto Correlation Function
eacf(ozonetsdiff_)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o x o o o o o o x o o o o
## 1 o o x o o o o o o x o o o o
## 2 x x x o o o o o o x o o o o
## 3 x o x o o o o o o o o o o o
## 4 x x o x o o o o o o o o o o
## 5 x x x x o o o o o o o o o o
## 6 x o x x o o o o o o o o o o
## 7 x x o x o o o o o o o o o o
Bayesian Information Criterion
#BIC table
res_ = armasubsets(y=ozonetsdiff_ , nar=5 , nma=5, y.name='Ozone thickness Index', ar.method='ols')
plot(res_)
library(TSA) library(readr) library(tseries)
data <- read_csv(“D:/Time Series Analysis/Assignment 1/data1.csv”, col_names = FALSE)
colnames(data) <- (“ozone thickness”) View(data)
year <- seq(1927,2016) year
ozone <- cbind(year,data) View(ozone)
ozonets <- ts(ozone$ozone thickness,start = c(1927),end = c(2016),frequency = 1)
plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”)
plot(y=ozonets,x=zlag(ozonets),xlab = “Previous Year thickness of Ozone layer”,ylab= “Dobson Units” ,main = “Scatter plot of Ozone layer thickness in consecutive years”)
t <- time(ozonets)
model1 <- lm(ozonets ~ t) summary(model1)
plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”) abline(model1) # add a fitted least squares line from model1
plot(y=rstudent(model1),x=as.vector(time(ozonets)), xlab=‘Time’, ylab=‘Standardized Residuals’,type=‘o’, main = “Time series plot of standardized resid fitted to Ozone-Thickness series.”)
hist(rstudent(model1),xlab=‘Standardized Residuals’ , main = “Histogram of standardised residual fitted to the Ozone-Thickness series.”)
y = rstudent(model1) qqnorm(y, main = “QQ plot of standardised residuals for the linear model fitted to the Ozone-Thickness series.”) qqline(y, col = 2, lwd = 1, lty = 2)
acf(rstudent(model1), main = “ACF of standardized residuals the linear model fitted to the Ozone-Thickness series.”)
y=rstudent(model1) shapiro.test(y)
t2 =t^2 model2 = lm(ozonets ~ t+t2) summary(model2)
plot(ts(fitted(model2)), ylab=‘y’, main = “Fitted quadratic curve to ozone-thickness series.”, ylim = c(min(c(fitted(model2), as.vector(ozonets))) , max(c(fitted(model2), as.vector(ozonets)))) ) lines(as.vector(ozonets),type=“o”)
plot(y=rstudent(model2),x=as.vector(time(ozonets)), xlab=‘Time’, ylab=‘Standardized Residuals’,type=‘o’, main = “Time series plot of standardized resid fitted to Ozone-Thickness series.”)
hist(rstudent(model2),xlab=‘Standardized Residuals’ , main = “Histogram of standardised residual fitted to the Ozone-Thickness series.”)
y = rstudent(model2) qqnorm(y, main = “QQ plot of standardised residuals for the linear model fitted to the Ozone-Thickness series.”) qqline(y, col = 2, lwd = 1, lty = 2)
acf(rstudent(model2), main = “ACF of standardized residuals the quadratic model fitted to the land-use series.”)
y=rstudent(model1) shapiro.test(y)
h <- 5 # 5 steps ahead forecasts t <- time(ozonets) aheadTimes <- data.frame(t = seq(2017, 2021, 1)) frcModel1 <- predict(model1, newdata = aheadTimes, interval = “prediction”) print(frcModel1)
plot(ozonets, xlim= c(1927,2021), ylim = c(-12,10), ylab = “Ozone-thickness series”, main = “Forecasts from the linear model fitted to the Ozone-thickness series.”) lines(ts(as.vector(frcModel1[,3]), start = 2017), col=“blue” , type=“l”) lines(ts(as.vector(frcModel1[,1]), start = 2017), col=“red”, type=“l”) lines(ts(as.vector(frcModel1[,2]), start = 2017), col=“blue”, type=“l”) legend(“topright”, lty=1, pch=1, col=c(“black”,“blue”,“red”), text.width = 10, c(“Data” ,“5% forecast limits”,“Forecasts”))
h <- 5 # 5 steps ahead forecasts t <- time(ozonets) t2 <- t^2 aheadTimes <- data.frame(t = seq(2017, 2017+h, 1), t2 = seq(2017, 2017+h, 1)^2) frcModel2 <- predict(model2, newdata = aheadTimes, interval = “prediction”) print(frcModel2)
plot(ozonets, xlim= c(1927,2021+h+1), ylim = c(-12,10), ylab = “Ozone-thickness series”, main = “Forecasts from the quadratic model fitted to the Ozone-thickness series.”) lines(ts(as.vector(frcModel2[,3]), start = 2017), col=“blue” , type=“l”) lines(ts(as.vector(frcModel2[,1]), start = 2017), col=“red”, type=“l”) lines(ts(as.vector(frcModel2[,2]), start = 2017), col=“blue”, type=“l”) legend(“topright”, lty=1, pch=1, col=c(“black”,“blue”,“red”), text.width = 10, c(“Data” ,“5% forecast limits”,“Forecasts”))
plot(ozonets,type=“o”,ylab=“Ozone thickness(in Dobson units)”,xlab=“Year”, main=“Time Series of changes in Ozone thickness from 1927 to 2016”)
par(mfrow=c(1,2)) acf(ozonets, main =“ACF plot”) pacf(ozonets, main =“PACF plot”)
adf.test(ozonets, alternative = c(“stationary”)) # Use default value of k
ozonets = ozonets + abs(min(ozonets))+1
ozonets.BC.transform = BoxCox.ar(ozonets + abs(min(ozonets))+1) title(main=“Log-Likelihood vs Value of Lamdba”)
ozonets.BC.transform$ci
lambda <- ozonets.BC.transform\(lambda[which(max(ozonets.BC.transform\)loglike) == ozonets.BC.transform$loglike)] lambda
ozonetsBC <- (ozonets^lambda - 1) / lambda
plot(ozonetsBC, ylab=‘Box-Cox transformed ozone thickness series’, xlab=‘Time’, type=‘o’ ,main=“Box Cox transformed Ozone thickness Series”)
ozonetsdiff <- diff(ozonets, differences = 1)
plot(ozonetsdiff, ylab=‘First difference of ozone thickness series’, xlab=‘Time’, type=‘o’ ,main=“First difference of Ozone thickness series”)
par(mfrow=c(1,2)) acf(ozonetsdiff, main =“ACF of first difference time series.”) pacf(ozonetsdiff, main =“PACF of first difference time series.”)
adf.test(ozonetsdiff, alternative = c(“stationary”)) # Use default value of k
eacf(ozonetsdiff)
res_ = armasubsets(y=ozonetsdiff , nar=5 , nma=5, y.name=‘Ozone thickness Index’, ar.method=‘ols’) plot(res_)