Introduction

The dataset we will analyse in this assignment represents yearly changes in the thickness of Ozone layer 1927 to 2016 in Dobson units. A negative value in the dataset represents a decrease in the thickness and a positive value represents an increase in the thickness.

Data Set

The ozone thickness series is given by “dataset1.csv” contains of 1 variable and 90 obsevations.

Data Pre-processing

Preliminaries

In this project, we used the following R packages.

library(TSA)

Data Cleaning and Transformation

We read the thickness of Ozone datasets by naming the column values start from 1927 to 2016. We would later convert the data frame to time series data, we purposely to preparing the data for visulisation and looking for the pattern or trend.

Ozone <- read.csv("C:/RMIT/MATH1318 TimeSeriesAnalysis/Assignment1/data1.csv",header=FALSE)
rownames(Ozone) <- seq(from=1927, to=2016)
head(Ozone)
##              V1
## 1927  1.3511844
## 1928  0.7605324
## 1929 -1.2685573
## 1930 -1.4636872
## 1931 -0.9792030
## 1932  1.5085675
class(Ozone)
## [1] "data.frame"
Ozone.ts <- ts(as.vector(Ozone), start=1927, end=2016)
class(Ozone.ts)
## [1] "ts"
plot(Ozone.ts,type='o',ylab='The ozone thickness series', main= "Figure 1: Time series plot for simulated the thickness of Ozone layer series")

Figure 1 shown depicts a series in which there is an obvious downward trend over time: This time series display trend, the relation between plots, auto regressive behaviour and changing variances. It seems that succeeding measurements related to one anothers as the values that are neighbors in time tend to be similar in size. This is more clearly seen in the following scatter plot of neighboring pair:

plot(y=Ozone.ts,x=zlag(Ozone.ts),ylab='The ozone thickness', xlab='Previous Year the ozone thickness', main= "Figure 2: Scatter plot of neighboring the thickness of Ozone layer")

In the scatter plot we see an upward trend in the plot-low values tend to be followed by low values in the next year, middle-sized values by middle-sized values, and high values by high values, which apparent strong relation. The amount of correlation between neighboring the ozone thickness values is 0.87:

y = Ozone.ts             # Read the ozone thickness data into y
x = zlag(Ozone.ts)       # Generate first lag of The ozone thickness series
index = 2:length(x)      # Create an index to get rid of the first NA value in x
cor(y[index],x[index])   # Calculate correlation between numerical values in x and y
## [1] 0.8700381

Analysis of trend

Now, we try to fit the regression line to the ozone thickness data.

Linaer Trend in Time

model.Ozone.ln = lm(Ozone.ts~time(Ozone.ts)) # label the linear trend model as model.Ozone.ln
summary(model.Ozone.ln)
## 
## Call:
## lm(formula = Ozone.ts ~ time(Ozone.ts))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    213.720155  16.257158   13.15   <2e-16 ***
## time(Ozone.ts)  -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

Estimates of slope and intercept are Beta 1 = -0.110029 and Beta 0 = 213.720155, respectively. The slope is statistically significant at 5% significance level, R-squared = 0.6693. The trend line is plotted over the time series in the following plot:

plot(ts(fitted(model.Ozone.ln)), ylim = c(min(c(fitted(model.Ozone.ln),as.vector(Ozone.ts))),max(c(fitted(model.Ozone.ln),as.vector(Ozone.ts)))),ylab='y' ,main = "Figure 3: Fitted linear to the ozone thickness data", type="l",lty=2,col="red")
lines(as.vector(Ozone.ts),type="o")

Linear model is seem to be good, it is capture the trend but it does not not capture all the point and auto correlation.

Quadratic Trend in Time

t = time(Ozone.ts)
t2 = t^2
model.Ozone.qa = lm(Ozone.ts~ t + t2) # label the quadratic trend model as model1
summary(model.Ozone.qa)
## 
## Call:
## lm(formula = Ozone.ts ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

Estimates of slope and intercept are Beta 2 = -1.530e-03,Beta 1 = 5.924e+00 and Beta 0 = -5.733e+03, respectively. The slope is statistically significant at 5% significance level, R-squared = 0.7391. The fitted quadratic trend is shown below:

plot(ts(fitted(model.Ozone.qa)), ylim = c(min(c(fitted(model.Ozone.qa),as.vector(Ozone.ts))),max(c(fitted(model.Ozone.qa),as.vector(Ozone.ts)))),ylab='y' ,main = "Figure 4: Fitted quadratic curve to the ozone thickness data", type="l",lty=2,col="red")
lines(as.vector(Ozone.ts),type="o")

Quadratic model is seem to be good, it is capture the trend and auto correlation all point.

Diagnostic check

We will do diagnostic check by observed the residual standard deviation and coefficeient of determination. To compare between linaer and quadratic model, the result shown the goodness of fit of the trend R-squared of quadratic model about 74% higher than linear model about 70% and the values of R-squared close to 1 that implies a satisfactory fit. The residual standard error as 1.815 lower than 2.032.

Residual Analysis

  1. Linear Model
res.model.Ozone.ln = rstudent(model.Ozone.ln)
plot(y = res.model.Ozone.ln, x = as.vector(time(Ozone.ts)),xlab = 'Time', ylab='Standardized Residuals',main = "Figure 5: Time series plot of standardized residuals",type='o',col="blue")
abline(h=0)

  1. Quadratic Model
res.model.Ozone.qa = rstudent(model.Ozone.qa)
plot(y = res.model.Ozone.qa, x = as.vector(time(Ozone.ts)),xlab = 'Time', ylab='Standardized Residuals',main = "Figure 6: Time series plot of standardized residuals",type='o',col="blue")
abline(h=0)

We can check the normality of residuals with a histrogram. The following display a frequency histrogram of the standardised residuals from the both model.

Histogram of residuals

hist(rstudent(model.Ozone.ln),xlab='Standardized Residuals',main = "Figure 7: Histrogram of residuals(Linear model)")

hist(rstudent(model.Ozone.qa),xlab='Standardized Residuals',main = "Figure 8: Histrogram of residuals(Quadratic model)")

Figure 7-8 shown: The plot is somewhat symmetric and tail off at both the high and low ends as normal dirtrbution does.

Normal Q-Q Plot

The follwing plot shows the QQ normal scores plot for standardised residuals from the both model for the thickness of Ozone layer series. The straight-line pattern supports the assumption of a normal distributed stochastic component in this model.

y = rstudent(model.Ozone.ln)
qqnorm(y,main = "Figure 9: Normal Q-Q Plot (Linear model)")
qqline(y, col = 2, lwd = 1, lty = 2)

y = rstudent(model.Ozone.qa)
qqnorm(y,main ="Figure 10: Normal Q-Q Plot (Quadratic model)")
qqline(y, col = 2, lwd = 1, lty = 2)

Figure 9-10 shown: The Q-Q plot for both model can considered as a normal distribution, In addition we can check normality assumption of the stocahstic component by various hypothesis test and one of these test is Shapiro-Wilk test that calculates the correlation between the residuals and the corresponding normal quantiles.

Shapiro-Wilk normality test

Linear Model

y = rstudent(model.Ozone.ln)
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98733, p-value = 0.5372

Qudratic Model

y = rstudent(model.Ozone.qa)
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98889, p-value = 0.6493

The result we get the p-value of 0.5372 and 0.6493 respective for both model. So we conclude not to reject the null hypothesis that the stochastic component of both model are normally distributed.

ACF of standardized residuals

acf(rstudent(model.Ozone.ln), main = "Figure 11: ACF of standardized residuals for linear model")

acf(rstudent(model.Ozone.qa), main = "Figure 12: ACF of standardized residuals for quadratic model")

Both ACF plot confirm the smoothness of time series plot as we have correlation values higher than the confidence bound for few lags. The models has not captured everthing from time series.

Forecasting

As the result of all test above we will considered the quadratic model for forecasting.

h = 5 
x = seq((as.vector(tail(t,1))+1), (as.vector(tail(t,1))+h))
predict = data.frame(t = x, t2 = x^2)
forecasts.qa = predict(model.Ozone.qa, predict, interval = "prediction")
plot(Ozone.ts, xlim = c(1927,2021), ylim = c(-15, 10), ylab = "The thickness of Ozone layer data",main = "Figure 13: Forecast of the thickness of Ozone layer for the next 5 years")
lines(ts(as.vector(forecasts.qa[,1]), start = 2017), col="red", type="l")
lines(ts(as.vector(forecasts.qa[,2]), start = 2017), col="blue", type="l")
lines(ts(as.vector(forecasts.qa[,3]), start = 2017), col="blue", type="l")
legend("topleft", lty=1, pch=1, col=c("black","blue","red"), text.width = 18,
c("Data","5% forecast limits", "Forecasts"))

Summary

In conclusion, statistically significant evidence was found to the result of analysis shown the prediction of thickness Ozone layer continue decresing in next five years.

However, there are many limitations to this investigation such as normal distribution, Q-Q plot and ACF test is does not captured everthing from time series for both model.