Predicting changes in the thickness of Ozone layer based on its time-series data from 1926 - 2016


Student Details

Dilip Chandra (S3574580)

Introduction

The ozone layer in Earth’s atmosphere contains high concentration of ozone which absorbs sun’s ultraviolet light. Ozone layer thickness is measured in Dobson units. In this assignment, we see how the thickness of Ozone layer has changed from 1927 to 2016. We also aim to predict the change in thickness of ozone layer in the next 5 year using relevant model fitting and functions on R studio. The ozone thickness series is given by “dataset1.csv”. The dataset consists of one variable - Dobson units - most basic measure used in ozone research.

Methodology

To undertake this research, time series methods on R Studio are being used to infer from the dataset.

Research and Inferences

Use of simple visualisations for modelling, describing, and predicting trends in time series. Examine efficiency and quality of all the methods used investigation and study the reliability of the fitted model.

library(TSA)
library(readr)

We Read in the dataset and convert it into a time series. We plot the series to check for visual inferences in the dataset.

data1 <- read_csv ("data1.csv",col_names = FALSE)
## Parsed with column specification:
## cols(
##   X1 = col_double()
## )
class(data1)
## [1] "tbl_df"     "tbl"        "data.frame"
data1 <- ts(as.vector(data1), start=1927, end=2016, frequency = 1)
plot(data1,ylab='Dobson Units',xlab='Year',type='o', main = "Time series plot of Thickness of Ozone layer from 1927 to 2016")

Above plot shows a decreasing moving average in thickness of ozone layer over the years from this time series plot. Negative Dobson value represents decrease in the thickness and a positive value represents an increase in the thickness. We can see a continuous decreasing trend up till 2015 and finally positive value in the year 2016.

y = data1           
x = zlag(data1)        
index = 2:length(x)           
cor(y[index],x[index]) 
## [1] 0.8700381

Correlation value implies that the series is there is strong relationship between yearly thickness of ozone layer.

Linear Model

Now we progress to construct and interpret the time series by use least squares to fit a linear time trend to this time series.

model1 = lm(data1~time(data1))
summary(model1)
## 
## Call:
## lm(formula = data1 ~ time(data1))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 213.720155  16.257158   13.15   <2e-16 ***
## time(data1)  -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16

P value of the linear model suggest that the model is significant as the p vale is less than 0.05, however being very close to zero also means that we need to reject the null hypotheses. Hence, we look at the r squared value 0.669, its suggest that the model is partial significant.

plot(data1, ylab='Dobson Units',xlab='Year',type='o', main = "Fitted linear trend model - Thickness of Ozone layer from 1927 to 2016")
abline(model1)

Standardized residuals are saved and further fitted to the time series plot.

res.model1 = rstudent(model1)
plot(y = res.model1, x = as.vector(time(data1)),xlab = 'Year',type='o', ylab='Standardized Residuals',main = "Residuals of linear trend model - Thickness of Ozone layer from 1927 to 2016")

Quadratic Model

Now we progress to construct and interpret the time series by least squares approach to fit a quadratic time trend to this time series.

t = time(data1)
t2 = t^2
model.data1.qa = lm(data1~ t + t2) 
summary(model.data1.qa)
## 
## Call:
## lm(formula = data1 ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16

P value of the quadratic model suggest that the model is significant as the p vale is less than 0.05, however being very close to zero also means that we need to reject the null hypotheses. Hence, we look at the r squared value 0.73, its suggest that the model is more significant than linear model

plot(ts(fitted(model.data1.qa)), ylim = c(min(c(fitted(model.data1.qa),
                                                as.vector(data1))), max(c(fitted(model.data1.qa),as.vector(data1)))),
     ylab='Dobson Units' , main = "Fitted quadratic trend model - Thickness of Ozone layer from 1927 to 2016", type="l",lty=2,col="Green")
lines(as.vector(data1),type="o")

Harmonic Model

Now we progress to construct and interpret the time series by least squares approach to fit a harmonic trend to this time series.

har.=harmonic(data1,0.45)
model.data1.har=lm(data1~har.)
summary(model.data1.har)
## 
## Call:
## lm(formula = data1 ~ har.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3520 -1.8906  0.4837  2.3643  6.4248 
## 
## Coefficients: (1 not defined because of singularities)
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.970e+00  4.790e-01  -6.199 1.79e-08 ***
## har.cos(2*pi*t)         NA         NA      NA       NA    
## har.sin(2*pi*t)  5.462e+11  7.105e+11   0.769    0.444    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.522 on 88 degrees of freedom
## Multiple R-squared:  0.006672,   Adjusted R-squared:  -0.004616 
## F-statistic: 0.5911 on 1 and 88 DF,  p-value: 0.4441

As we see that there is no seasonally, harmonica model cannot be used for the predication. Hence P value of the harmonic model suggest that the model is insignificant as the p vale is more than 0.05. There is no seasonality and also missing values for har.cos which confirms that the model is insignificant.

plot(ts(fitted(model.data1.har)), ylim = c(min(c(fitted(model.data1.har),
                                                 as.vector(data1))), max(c(fitted(model.data1.har),as.vector(data1)))),
     ylab='Dobson Units' , main = "Fitted harmonice trend model - Thickness of Ozone layer from 1927 to 2016", type="l",lty=2,col="red")
lines(as.vector(data1),type="o")
Forecasting

After the looking at the output of each individual model to the time series, the above graphs suggest that quadratic model is the closest to the actual time series. To predict the next 5 years, we have manually updated the t value if time as years, post which this has been fitted to the new data frame.

t = c(2017, 2018, 2019, 2020, 2021)
t2 = t^2
new = data.frame(t,t2)
forecast = predict(model.data1.qa, new, interval = "prediction")
print(forecast)
##         fit       lwr       upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701
plot(data1, xlim = c(1924,2022), ylim = c(-16, 5), type="o", ylab='Dobson Units' , main = " Forceasting quadratic trend model - Thickness of Ozone layer from 2017 to 2021",)
lines(ts(as.vector(forecast [,1]), start = c(2017,1), frequency = 1), col="red", type="l")
lines(ts(as.vector(forecast [,2]), start = c(2017,1), frequency = 1), col="blue", type="l")
lines(ts(as.vector(forecast [,3]), start = c(2017,1), frequency = 1), col="blue", type="l")
Conculsion:

Based on fitting each of the model, we can confirm that, the quadratic trend model successfully follows the declining pattern of the original series. Quadratic trend model provides the most accurate prediction when compare to the harmonic and linear progression model. Data used for predicting changes in the thickness of Ozone layer suggests that there is not stationary in the series. The mean value in the time series is not constant, which implies, the trend component is not nullified. The seasonality effect is minimal which implies that there is not trend or seasonal patterns.