Use of simple visualisations for modelling, describing, and predicting trends in time series. Examine efficiency and quality of all the methods used investigation and study the reliability of the fitted model.
library(TSA)
library(readr)
We Read in the dataset and convert it into a time series. We plot the series to check for visual inferences in the dataset.
data1 <- read_csv ("data1.csv",col_names = FALSE)
## Parsed with column specification:
## cols(
## X1 = col_double()
## )
class(data1)
## [1] "tbl_df" "tbl" "data.frame"
data1 <- ts(as.vector(data1), start=1927, end=2016, frequency = 1)
plot(data1,ylab='Dobson Units',xlab='Year',type='o', main = "Time series plot of Thickness of Ozone layer from 1927 to 2016")
Above plot shows a decreasing moving average in thickness of ozone layer over the years from this time series plot. Negative Dobson value represents decrease in the thickness and a positive value represents an increase in the thickness. We can see a continuous decreasing trend up till 2015 and finally positive value in the year 2016.
y = data1
x = zlag(data1)
index = 2:length(x)
cor(y[index],x[index])
## [1] 0.8700381
Correlation value implies that the series is there is strong relationship between yearly thickness of ozone layer.
Now we progress to construct and interpret the time series by use least squares to fit a linear time trend to this time series.
model1 = lm(data1~time(data1))
summary(model1)
##
## Call:
## lm(formula = data1 ~ time(data1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7165 -1.6687 0.0275 1.4726 4.7940
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 213.720155 16.257158 13.15 <2e-16 ***
## time(data1) -0.110029 0.008245 -13.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared: 0.6693, Adjusted R-squared: 0.6655
## F-statistic: 178.1 on 1 and 88 DF, p-value: < 2.2e-16
P value of the linear model suggest that the model is significant as the p vale is less than 0.05, however being very close to zero also means that we need to reject the null hypotheses. Hence, we look at the r squared value 0.669, its suggest that the model is partial significant.
plot(data1, ylab='Dobson Units',xlab='Year',type='o', main = "Fitted linear trend model - Thickness of Ozone layer from 1927 to 2016")
abline(model1)
Standardized residuals are saved and further fitted to the time series plot.
res.model1 = rstudent(model1)
plot(y = res.model1, x = as.vector(time(data1)),xlab = 'Year',type='o', ylab='Standardized Residuals',main = "Residuals of linear trend model - Thickness of Ozone layer from 1927 to 2016")
Now we progress to construct and interpret the time series by least squares approach to fit a quadratic time trend to this time series.
t = time(data1)
t2 = t^2
model.data1.qa = lm(data1~ t + t2)
summary(model.data1.qa)
##
## Call:
## lm(formula = data1 ~ t + t2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1062 -1.2846 -0.0055 1.3379 4.2325
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.733e+03 1.232e+03 -4.654 1.16e-05 ***
## t 5.924e+00 1.250e+00 4.739 8.30e-06 ***
## t2 -1.530e-03 3.170e-04 -4.827 5.87e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared: 0.7391, Adjusted R-squared: 0.7331
## F-statistic: 123.3 on 2 and 87 DF, p-value: < 2.2e-16
P value of the quadratic model suggest that the model is significant as the p vale is less than 0.05, however being very close to zero also means that we need to reject the null hypotheses. Hence, we look at the r squared value 0.73, its suggest that the model is more significant than linear model
plot(ts(fitted(model.data1.qa)), ylim = c(min(c(fitted(model.data1.qa),
as.vector(data1))), max(c(fitted(model.data1.qa),as.vector(data1)))),
ylab='Dobson Units' , main = "Fitted quadratic trend model - Thickness of Ozone layer from 1927 to 2016", type="l",lty=2,col="Green")
lines(as.vector(data1),type="o")
Now we progress to construct and interpret the time series by least squares approach to fit a harmonic trend to this time series.
har.=harmonic(data1,0.45)
model.data1.har=lm(data1~har.)
summary(model.data1.har)
##
## Call:
## lm(formula = data1 ~ har.)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3520 -1.8906 0.4837 2.3643 6.4248
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.970e+00 4.790e-01 -6.199 1.79e-08 ***
## har.cos(2*pi*t) NA NA NA NA
## har.sin(2*pi*t) 5.462e+11 7.105e+11 0.769 0.444
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.522 on 88 degrees of freedom
## Multiple R-squared: 0.006672, Adjusted R-squared: -0.004616
## F-statistic: 0.5911 on 1 and 88 DF, p-value: 0.4441
As we see that there is no seasonally, harmonica model cannot be used for the predication. Hence P value of the harmonic model suggest that the model is insignificant as the p vale is more than 0.05. There is no seasonality and also missing values for har.cos which confirms that the model is insignificant.
plot(ts(fitted(model.data1.har)), ylim = c(min(c(fitted(model.data1.har),
as.vector(data1))), max(c(fitted(model.data1.har),as.vector(data1)))),
ylab='Dobson Units' , main = "Fitted harmonice trend model - Thickness of Ozone layer from 1927 to 2016", type="l",lty=2,col="red")
lines(as.vector(data1),type="o")
After the looking at the output of each individual model to the time series, the above graphs suggest that quadratic model is the closest to the actual time series. To predict the next 5 years, we have manually updated the t value if time as years, post which this has been fitted to the new data frame.
t = c(2017, 2018, 2019, 2020, 2021)
t2 = t^2
new = data.frame(t,t2)
forecast = predict(model.data1.qa, new, interval = "prediction")
print(forecast)
## fit lwr upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701
plot(data1, xlim = c(1924,2022), ylim = c(-16, 5), type="o", ylab='Dobson Units' , main = " Forceasting quadratic trend model - Thickness of Ozone layer from 2017 to 2021",)
lines(ts(as.vector(forecast [,1]), start = c(2017,1), frequency = 1), col="red", type="l")
lines(ts(as.vector(forecast [,2]), start = c(2017,1), frequency = 1), col="blue", type="l")
lines(ts(as.vector(forecast [,3]), start = c(2017,1), frequency = 1), col="blue", type="l")
Based on fitting each of the model, we can confirm that, the quadratic trend model successfully follows the declining pattern of the original series. Quadratic trend model provides the most accurate prediction when compare to the harmonic and linear progression model. Data used for predicting changes in the thickness of Ozone layer suggests that there is not stationary in the series. The mean value in the time series is not constant, which implies, the trend component is not nullified. The seasonality effect is minimal which implies that there is not trend or seasonal patterns.