This data contains information about our OZONE Layer which can found be Stratosphere, nearly above 15-30 km the earth’s Surface.Ozone layer protects our planet from harmful ultraviolet rays which is emitted by sun.In this report, we have data which consists the thickness of ozone layer from 1927-2016. We will analyze the data and our main aim is to model the time series by using deterministic and ARIMA Models. We will go through following steps in this project.
The Ozone layer dataset has 90 observations(rows)with a single variable(column). Each observation consists data of one year. The data is provided to us for the purpose of this individual assignment.
# importing the essential packages
library(readr)
library(TSA)
# Importing the data
Ozone <- read.csv("data1.csv", header = FALSE)
# Renaming the rows and column present in the dataset
colnames(Ozone) <- "Thickness"
rownames(Ozone) <- seq(from=1927, to=2016)
# Displaying the head and tail of dataset
head(Ozone)
tail(Ozone)
In the next step, we will convert the available dataset into time series object with frequency = 1, because our observations are annually recorded. We will also explain our findings by displaying the plots of time series and observing the autocorrelation with first lag.
# Converting the Ozone dataset into Time Series
tsOzone <- ts(Ozone$Thickness, start=1927, end=2016, frequency = 1)
# we have used frequency = 1 because it is annually.
tsOzone
Time Series:
Start = 1927
End = 2016
Frequency = 1
[1] 1.35118436 0.76053242 -1.26855735 -1.46368717 -0.97920302 1.50856746 1.62991681 1.83242333
[9] -0.83968364 -1.09611566 -2.67457473 -2.78716606 -2.97317944 -0.23495548 0.97067000 1.23652307
[17] 2.23062331 0.35671637 -2.12028099 -2.66812477 -0.64702795 0.99608564 1.83817742 1.89922697
[25] -0.55488121 -1.40387419 -3.39178762 0.05777194 3.40717183 1.31488379 -0.12882457 -2.51580137
[33] -3.06205664 -3.33637179 -2.66332198 -0.67958655 -2.11660422 -2.36318997 -5.36156537 -3.03103458
[41] -2.28838624 -1.06438684 -1.68813570 -3.16974819 -3.65647649 -1.25151090 -1.08431732 -0.44863234
[49] -0.17636387 -2.64954530 -1.28317654 -4.29289634 -3.24282341 -3.60135297 -2.57288652 -5.00586059
[57] -6.68548244 -4.58870210 -4.32654629 -2.29370761 -2.26456266 -2.27184846 -2.66440082 -3.79556478
[65] -7.65843185 -10.17433972 -4.21230497 -2.82287161 -1.36776491 -4.43997062 -3.78323838 -5.85304107
[73] -8.55125744 -8.06501289 -7.75975806 -6.65633206 -6.62708203 -7.83548356 -8.84424264 -7.67352209
[81] -7.05582939 -4.69497353 -7.12712128 -9.58954985 -10.19222042 -9.33224686 -10.80567444 -10.86096923
[89] -11.57941376 -9.30284452
# Plotting Time Series Data
plot(tsOzone,ylab='Thickness(Ozone layer)',xlab='Year',type='o',
main = "Time Series plot displaying thickness of ozone layer between 1927-2016")
We find the following trends in the time series Plots:
# Creating the first lag
lag1 <- zlag(tsOzone)
head(lag1)
[1] NA 1.3511844 0.7605324 -1.2685573 -1.4636872 -0.9792030
tail(lag1)
[1] -9.589550 -10.192220 -9.332247 -10.805674 -10.860969 -11.579414
# Calculating the correlation between data and the first lag
index1 <- 2:length(lag1)
cor(tsOzone[index1],lag1[index1])
[1] 0.8700381
# Plotting the time series data containing first lag
plot(y=tsOzone, x = lag1, ylab='Thickness',xlab='Thickness in last year', main='Scatter plot displaying the thickness of Ozone Layer with first lag')
NA
NA
Here we can see that the correlation coefficient is 0.87 which shows that the thickness of ozone layer is highly correlated with its first lag. The scatter plot this as well.
In this part of the project, we will use all 4 deterministic models(Linear, quadratic, Cosine and seasonal) and compare them to find out which is the best model of fit for the data.
Linear Regression model
This model is used to predict the non-constant mean trend over the time by using given slope and intercept at a particular time.
# Fitting the linear trend model and plotting the regression line on the dataset and naming it as "lmodel"
lmodel = lm(tsOzone~time(tsOzone))
summary(lmodel)
Call:
lm(formula = tsOzone ~ time(tsOzone))
Residuals:
Min 1Q Median 3Q Max
-4.7165 -1.6687 0.0275 1.4726 4.7940
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 213.720155 16.257158 13.15 <2e-16 ***
time(tsOzone) -0.110029 0.008245 -13.34 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.032 on 88 degrees of freedom
Multiple R-squared: 0.6693, Adjusted R-squared: 0.6655
F-statistic: 178.1 on 1 and 88 DF, p-value: < 2.2e-16
plot(tsOzone,type='o',ylab='y')
abline(lmodel)
After the visualization, we can say that the slope and intercept in the linear model are statistically significant. 66.93% of total variation is visible in the model. We can say that this model is a good fit, but not an ideal fit for the data. We will further analyze the stochastic component of this model and if it behaves like white noise, then we can say that model is good fit.
Analysis of Residuals for Linear Trend Model
If we found white noise in this model, then we can say that the model is good fit which will also showcase normally distributed residuals. The Autocorrelation function plots will also show lags between the confidence limits.
# Residual Analysis through visualization
residual_model = rstudent(lmodel)
par(mfrow=c(2,2))
#Linear model showing standard residuals
plot(y = residual_model, x = as.vector(time(tsOzone)),xlab = 'Time', ylab='Standardized Residuals',type='l',main = "Linear model containing Standardized residuals")
# histogram of standardized residuals.
hist(residual_model,xlab='Standardized Residuals', main = "Histogram showing Standardized residuals")
#qqplot of standardized residuals
qqnorm(y=residual_model, main = "QQ plot displaying the Standardized residuals")
qqline(y=residual_model, col=2, lwd=1, lty=2)
#shapiro test of residual model
shapiro.test(residual_model)
Shapiro-Wilk normality test
data: residual_model
W = 0.98733, p-value = 0.5372
#ACF plot of residual model.
acf(residual_model, main = "ACF of standardized residuals.")
par(mfrow=c(1,1))
We find the following observations:
The residuals are equally scattered throughout the time and no fix pattern is visible.
The Histogram looks Symmetric in shape, first and last observation are similar as it should be in case of normal distribution.
While looking at QQ plot, we can say tha most of the residual values fall on the normal line, while residuals from both ends diverge from the line.
In ACF plot, we can see that certain correlations of residual values are higher then the confidence limits. That’s not what we expect to see in a white noise process.
The Shapiro-Wilk test confirms the normality. As we can see that p-value is significant [0.5372>0.05].
By all the above points we can say that residuals are normally distributed.
Quadratic Regression Model
Quadratic model contains one more parameter as compare to linear model which is “t” in case of time series.
# Fitting the quadratic model
t = time(tsOzone) #creating a "t" variable containing the timeseries of dataset
t2 = t^2 #creating"t2" which contains the square of t variable
Qmodel = lm(tsOzone~ t + t2) #fitting the quadratic model using lm function
summary(Qmodel) #displaying the summary of Qmodel.
Call:
lm(formula = tsOzone ~ t + t2)
Residuals:
Min 1Q Median 3Q Max
-5.1062 -1.2846 -0.0055 1.3379 4.2325
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.733e+03 1.232e+03 -4.654 1.16e-05 ***
t 5.924e+00 1.250e+00 4.739 8.30e-06 ***
t2 -1.530e-03 3.170e-04 -4.827 5.87e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.815 on 87 degrees of freedom
Multiple R-squared: 0.7391, Adjusted R-squared: 0.7331
F-statistic: 123.3 on 2 and 87 DF, p-value: < 2.2e-16
# plotting the q model on the graph
plot(ts(fitted(Qmodel)), ylim = c(min(c(fitted(Qmodel), as.vector(tsOzone))), max(c(fitted(Qmodel),as.vector(tsOzone)))),
ylab='' , main = "Fitted Quadratic curve of wages series", type="l",lty=2,col="blue")
lines(as.vector(tsOzone),type="o")
By above visualizations, we can say that quadratic model is also statistically significant. This model explains 73.91% variation, while linear was showing 66.93%. This model looks a better fir, we will analyze further the residuals.
Quadratic Model Residual Analysis
# Residual Analysis of the above quadratic model through Visualization;
res_Qmodel = rstudent(Qmodel)
par(mfrow=c(2,2))
# Quadratic model showing the standard residuals
plot(y = res_Qmodel, x = as.vector(time(tsOzone)), ylab='Standardized Residuals',xlab = 'Time',type='l',main = "Quadratic model containing Standardized residuals")
# histogram showing Standard Residuals
hist(res_Qmodel,xlab='Standardized Residuals', main = "Histogram displaying Standardized residuals")
# QQ Plot of Standardized Residuals
qqnorm(y=res_Qmodel, main = "QQ plot displaying Standardized residuals.")
qqline(y=res_Qmodel, col = 2, lwd = 1, lty = 2)
#Shapiro test
shapiro.test(res_Qmodel)
Shapiro-Wilk normality test
data: res_Qmodel
W = 0.98889, p-value = 0.6493
#ACF Plot of Standardized Residuals
acf(res_Qmodel, main = "ACF plot displaying standardized residuals.")
par(mfrow=c(1,1))
By above visualtions, we can say that:
The residuals are equally scattered throughout the time and no fix pattern is visible.
The Histogram shows a symmetric distribution even though there was a slight irregularity.
QQ plot shows that the most of the residuals falls on the normal line.
The ACF plot showscase that there are lags above the confidence limits, which indicates that there is no white noise in stotaschic component.
The Shapiro-Wilk test explicts a p-value of 0.6493>0.05 which confirms normality.
Cyclical and Seasonal Trends
EVen though Seasonal trends and Cyclical trends is usually analyzed by frequency >1. We will analyze by setting frequency = 77 which is based on ACF Plots MAX R2 value.
# Here we will create a new TS object which contain the Frequency 7 based on ACF and PACF plots.
tsOzonef <- ts(Ozone$Thickness, start=1927, end=2016, frequency = 7)
# Fitting the cyclical and seasonal model and creating a new model
month.=season(tsOzonef)
Seas_model=lm(tsOzonef~month.-1) # -1 removes the intercept term in the model
summary(Seas_model) #displaying the summary of new model
Call:
lm(formula = tsOzonef ~ month. - 1)
Residuals:
Min 1Q Median 3Q Max
-8.4632 -1.5808 0.4692 2.2938 6.6094
Coefficients:
Estimate Std. Error t value Pr(>|t|)
month.Monday -3.2023 0.3645 -8.785 <2e-16 ***
month.Tuesday -3.1237 0.3665 -8.522 <2e-16 ***
month.Wednesday -3.1334 0.3665 -8.549 <2e-16 ***
month.Thursday -3.1168 0.3665 -8.503 <2e-16 ***
month.Friday -3.1162 0.3665 -8.502 <2e-16 ***
month.Saturday -3.1081 0.3665 -8.480 <2e-16 ***
month.Sunday -3.1337 0.3665 -8.549 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.458 on 617 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4475
F-statistic: 73.21 on 7 and 617 DF, p-value: < 2.2e-16
We can see that estimates are statiscally significant in this model. Here only 45.37% of variation is visible, which is even low as compare to linear regression. So, We can say that model is not good fit.
Cosine Trends/Harmonic Model
In this section, we will include the shape of seasonal trend by assigning a cosine curve as the mean function.
# Fitting a cosine curve using TS object
har.=harmonic(tsOzonef,1)
Cosmodel=lm(tsOzonef~har.)
summary(Cosmodel)
Call:
lm(formula = tsOzonef ~ har.)
Residuals:
Min 1Q Median 3Q Max
-8.4736 -1.5860 0.4836 2.3097 6.5691
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.133533 0.137985 -22.709 <2e-16 ***
har.cos(2*pi*t) -0.028434 0.194984 -0.146 0.884
har.sin(2*pi*t) -0.004878 0.195296 -0.025 0.980
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.447 on 621 degrees of freedom
Multiple R-squared: 3.525e-05, Adjusted R-squared: -0.003185
F-statistic: 0.01095 on 2 and 621 DF, p-value: 0.9891
# Plotting the Cosine curve
plot(ts(fitted(Cosmodel),freq=12,start=c(1964,1)),ylab='Thickness',type='l',
ylim=range(c(fitted(Cosmodel),tsOzonef)),main="Fitted model to Ozone Layer Thickness series")
# ylim ensures that the y axis range fits the raw data and the fitted values
points(tsOzonef)
This model is poor fit as we can see that the R-squared value<0, which shows that there is no variations in the data. The slope estimates are not statistically significant.
By above Visualizations, we can say that the quadratic model showcase the highest variation(73.91%) among all the 4 deteriministic models. It also follows normality. Thus, we can say that is the best fit for our Ozone Layer Dataset.
Now, We will Quadratic model to predict the trend in thickness of ozone layer for next 5 years staring from 2016
# Prediction for the next 5 years using Quadratic model
t = c(2017,2018,2019,2020,2021) # next 5 year time points
t2= t^2
# In this step we will use two-step algorithm to predict the data
predict_data = data.frame(t,t2)
# Forecasting of the data
forecast_data = predict(Qmodel, predict_data, interval = "prediction")
# Here the interval argument displays the prediction interval printed in "forcasted_data)
print(forecast_data)
fit lwr upr
1 -10.34387 -14.13556 -6.552180
2 -10.59469 -14.40282 -6.786548
3 -10.84856 -14.67434 -7.022786
4 -11.10550 -14.95015 -7.260851
5 -11.36550 -15.23030 -7.500701
# Plotting the prediction data
plot(tsOzone, xlim = c(1927,2025), ylim = c(-15, 10), ylab = "Thickness",
main = "5 years forcasted data of ozone layer thickness")
# To able to use the plot function,we need to convert the forecast into a time series object starting from the first time steps-ahead.We have the apply the same method for all columns of forecasts
lines(ts(as.vector(forecast_data[,1]), start = 2017), col="blue", type="l")
lines(ts(as.vector(forecast_data[,2]), start = 2017), col="green", type="l")
lines(ts(as.vector(forecast_data[,3]), start = 2017), col="green", type="l")
legend("topright", lty=1, pch=1, col=c("black","green","blue"), text.width = 24,
c("Data","forecast limits(5%)", "Actual Forecasts"))
It is clearly visible in the forecast that the thickness of ozone layer tends to drop continuously and should be around -11.36550 with a CI limit of [-15.23030,-7.500701].
Now, we will look at ARIMA(p,d,q)models that will be for in the OZONE layer thickness data. WE will implement the use of all the available model specification tools like ACF-PACF, BIC table and EACF. The series shows a downward trend and indicates the sign of Moving average and Autoregressive behavior.
Sample Autocorrelation and Partial Autocorrelation Functions
In this step we will further analyze the trends by using ACF and PACF plots.
# ACF plot
acf(tsOzone, main="ACF plot of Ozone Layer thickness")
# PACF plot
pacf(tsOzone,main="")
title(main="PACF plot of Ozone Layer thickness")
In The ACF plot, as the lag increases we can a declining pattern. This shows the nonstationarity of the series. While, in the first lag PACF displays high correlation, decline in second lah followed by some oscillations. After all these observations we can say that AR(1) is our potential model.
ADF Test
To test the null hypothesis that whether the process is difference non-stationary, we use ADF test aka Augmented Dickey-Fuller Unit root test.The alternate hypothesis is that the process is stationary.
# Finding he number of significant lags present in dataset
ar(tsOzone)
Call:
ar(x = tsOzone)
Coefficients:
1 2 3 4
0.9807 -0.1417 -0.2581 0.2985
Order selected 4 sigma^2 estimated as 3.242
We can see that number of significant lags k = 4, we find out that from AIC. Now, we will initially implement ADF test with neither intercept nor time constant.
library(fUnitRoots)
# below showing ADF with no intercept(constant) nd no time trends.
adfTest(tsOzone, lags = 4, type = "nc", title = NULL,description = NULL)
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 4
STATISTIC:
Dickey-Fuller: 0.5699
P VALUE:
0.7942
Description:
Sun Apr 25 17:40:57 2021 by user: DELL
And now, we will implement ADF with the intercept but there will be no time constant.
#below showing ADF with intercept(constant), no time trend
adfTest(tsOzone, lags = 4, type = "c", title = NULL,description = NULL)
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 4
STATISTIC:
Dickey-Fuller: -0.4361
P VALUE:
0.8924
Description:
Sun Apr 25 17:41:10 2021 by user: DELL
ADF test with intercept and time constant.
# Below showing ADF with both Intercept(constant) and time trend.
adfTest(tsOzone, lags = 4, type = "ct", title = NULL,description = NULL)
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 4
STATISTIC:
Dickey-Fuller: -3.2376
P VALUE:
0.0867
Description:
Sun Apr 25 17:41:16 2021 by user: DELL
We find out that for all the 3 ADF test, the p-va;ue is >0.1. Therefore, we fail to reject the null hypothesis. Conclusion: Ozone layer data is non stationary.
Data Transformation
From above we saw that the series is non stationary. Therefor,to make the series stationary, we will perform Box- Cox transformation. This transformation is also known as Power transformation.It applies only to positive values, so we will add a positive constant to each and every negative values in Ozone layer data.
# creating a new df by adding constant to all the negative values.
Cons.tsOzone = tsOzone + abs(min(tsOzone))+1
# Box-Cox Transformation
tsOzoneBC = BoxCox.ar(Cons.tsOzone)
possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1possible convergence problem: optim gave code = 1
title(main = 'Log-likelihood vs the values of lambda for Ozone Data')
NA
NA
NA
# Confidence intervals of BOX-COX
tsOzoneBC$ci
[1] 0.9 1.5
From above, we can see that the lambda value of 1 is included in The CI of BOX-COX transformation. hence, the transformation of the data is not required and we can proceed straight to the differencing of original data.
Differencing
Differencing is used to transform nonstationary data to stationary. We will implement differencing, observe the data using time series plot and confirm stationarity using ADF test.
# Differencing of data
DifftsOzone = diff(tsOzone)
# Plot of data after the first differencing
plot(DifftsOzone,ylab='Thickness',xlab='Year',type='o',
main = "Thickness after the first differencing")
First differencing seems enough as we cannot see the downward trend anymore in the ozone layer thickness data series and furthermore it can be satisfied by performing ADF test.
# Finding the number of lags in the data
ar(DifftsOzone)
Call:
ar(x = DifftsOzone)
Coefficients:
1 2 3 4 5 6
-0.1976 -0.2628 -0.6019 -0.3064 -0.2253 -0.3045
Order selected 6 sigma^2 estimated as 2.232
From AIC, we find k=6.
# Below shows ADF with no intercept(constant) and no time trend
adfTest(DifftsOzone, lags = 6, type = "nc", title = NULL,description = NULL)
p-value smaller than printed p-value
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 6
STATISTIC:
Dickey-Fuller: -5.0757
P VALUE:
0.01
Description:
Sun Apr 25 17:41:41 2021 by user: DELL
# Below shows ADF with intercept(constant) but no time trend
adfTest(DifftsOzone, lags = 6, type = "c", title = NULL,description = NULL)
p-value smaller than printed p-value
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 6
STATISTIC:
Dickey-Fuller: -5.5901
P VALUE:
0.01
Description:
Sun Apr 25 17:41:44 2021 by user: DELL
# Below shows ADF with intercept(constant) and time trend
adfTest(tsOzone, lags = 6, type = "ct", title = NULL,description = NULL)
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 6
STATISTIC:
Dickey-Fuller: -1.5891
P VALUE:
0.7453
Description:
Sun Apr 25 17:41:47 2021 by user: DELL
We will reject the null hypothesis of non - stationarity as all the p-values are <0.1. Therefore, we can say that the ozone data is stationary after the first differencing only.
ACF and PACF plots after differencing
# ACF and PACF plots after differencing
acf(DifftsOzone, ci.type='ma', main="ACF plot of Ozone Layer thickness after differencing")
pacf(DifftsOzone,main="")
title(main="PACF plot of Ozone Layer thickness after differencing")
Significant values lags at 3,7,10 are shown by ACF Plot, while lags at 3,4,6 are shown by PACF plow. The order of difference i.e d=1, because the data becomes stationary after first differencing only.
Through above visualizations we can identify the potential ARIMA models as : ARIMA(3,1,3),ARIMA(3,1,2),ARIMA(3,1,1) ARIMA(2,1,3),ARIMA(2,1,2),ARIMA(2,1,1) ARIMA(1,1,3),ARIMA(1,1,2),ARIMA(1,1,1) ARI(3,1) IMA(1,3)
Extended Autocorrelation Function (EACF)
# EACF
eacf(DifftsOzone, ar.max = 7, ma.max = 11)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11
0 o o x o o o x o o x o o
1 x o x o o o o o o x o o
2 x o x o o o x o o x o o
3 x o x o o x o o o o o o
4 x o o x o x o o o o o o
5 x x x x o x o o o o o o
6 o o o x x o o o o o o o
7 o o o x o o o o o o o o
The above EACF model shows that topleft vertex is p=0 and q=3. Therefore, the potential models are MA(3), MA(4), ARMA(1,3), ARMA(1,4), ARIMA(1,1,3), ARIMA(1,1,4)
BIC Table
# BIC Table
resbic = armasubsets(y=DifftsOzone,nar=5,nma=5,y.name='ar',ar.method='ols')
plot(resbic)
title(main = 'BIC table of Ozone Layer Dataset', line= 5)
From the BIC table, the potential model is ARIMA(3,1,2)
From the above visualization we can say that the ozone layer dataset doesn’t require transformation because it was stationary after the first differencing only.After careful observation of all models, The ARIMA models for the ozone layer data can be seen as:
ARIMA(3,1,3),ARIMA(3,1,2),ARIMA(3,1,1) ARIMA(2,1,3),ARIMA(2,1,2),ARIMA(2,1,1) ARIMA(1,1,3),ARIMA(1,1,2),ARIMA(1,1,1)
The other possible models are ARI(3,1), IMA(1,3), MA(3), MA(4), ARMA(1,3), ARMA(1,4).