Introduction

The objective of this assignment is to find the best fitting model among linear, quadratic, cyclic,seasonal and cosine and then predict the values of ozone layer thickness for the next five years using the best fitting model.

Data

The ozone dataset is a csv file which contains 90 observations of the ozone layer thickness from 1927 to 2016. The dataset includes both negative and positive values. Negative value means decrease in the ozone layer thickness while postive value means increase in the ozone layer thickness.

Task 1

Required Packages

library(TSA)
library(readr)
library(tseries)
library(knitr)

Reading Ozone Dataset

#reading the dataset
ozone <- read.csv("C:/Users/mafza/Downloads/TIMESERIES/assignmnt_1/ozone.csv", header= FALSE)
rownames(ozone) <- seq(from=1927, to=2016)
#checking the class of object
class(ozone)
## [1] "data.frame"

Descriptive Analysis

After reading the ozone dataset, i converted the datatype of our ozone dataset from dataframe to timeseries.

# Converting it  to the Time series object
ozone_ts <- ts(as.vector(ozone), start=1927, end=2016, frequency = 1) 
#checking the class of converted object
class(ozone_ts)
## [1] "ts"
plot(ozone_ts,type='o',ylab='ozone', main='Time series plot of Ozone Layer Series')

Five Charateristics

ScatterPlot of Ozone series and its first lag

To find the impact of previous year’s ozone layer thickness on the next year’s, we need to calculate the value of pearson correlaton between ozone series and its first lag.

y = ozone_ts           
x = zlag(ozone_ts)        
index = 2:length(x)           
cor(y[index],x[index]) 
## [1] 0.8700381
plot(y[index],x[index],ylab='Ozone series', xlab='First lag of ozone series', main = 'Scatter plot of ozone series and its first lag')

Interpretation of ScatterPlot

The pearson correlation value = 0.87 and the scatterplot indicates that there is strong correlation between ozone series and its first lag.

Fiting Regression Models

1.Fitting Linear Model

# Fitting the linear model

t <- time(ozone_ts)
t
## Time Series:
## Start = 1927 
## End = 2016 
## Frequency = 1 
##  [1] 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941
## [16] 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956
## [31] 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
## [46] 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986
## [61] 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
## [76] 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
linear_model<- lm(ozone_ts ~ t) # label the model as linear_model
summary(linear_model)
## 
## Call:
## lm(formula = ozone_ts ~ t)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7165 -1.6687  0.0275  1.4726  4.7940 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 213.720155  16.257158   13.15   <2e-16 ***
## t            -0.110029   0.008245  -13.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.032 on 88 degrees of freedom
## Multiple R-squared:  0.6693, Adjusted R-squared:  0.6655 
## F-statistic: 178.1 on 1 and 88 DF,  p-value: < 2.2e-16
plot(ozone_ts,ylab='ozone',xlab='Year',type='o', main = "Fitted linear model to the ozone series")
abline(linear_model)

Summary Statistics of Linear Model

Residual Analysis of Linear Model

# Residual analysis of linear model
plot(y=rstudent(linear_model),x=as.vector(time(ozone_ts)), xlab='Time',
     ylab='Standardized Residuals',type='o', main = "Residual plot")

plot(y=rstudent(linear_model),x=as.vector(time(ozone_ts)), xlab='Time',
     ylab='Standardized Residuals',type='p', main = "standardized residuals vs fitted.")
abline(h=0)

hist(rstudent(linear_model),xlab='Standardized Residuals', main = "Histogram of standardised residuals.")

y = rstudent(linear_model)
qqnorm(y, main = "QQ plot of standardised residuals.")
qqline(y, col = 2, lwd = 1, lty = 2)

acf(rstudent(linear_model), main = "ACF of standardized residuals.")

y = rstudent(linear_model)
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98733, p-value = 0.5372

Interpretation of Residual Analysis of Linear Model

2. Fitting Quadratic Model

# Fit the quadratic model

t= time(ozone_ts)
t2 = t^2 
quadratic_model = lm(ozone_ts ~ t + t2) # label the model as quadratic_model
summary(quadratic_model)
## 
## Call:
## lm(formula = ozone_ts ~ t + t2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1062 -1.2846 -0.0055  1.3379  4.2325 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.733e+03  1.232e+03  -4.654 1.16e-05 ***
## t            5.924e+00  1.250e+00   4.739 8.30e-06 ***
## t2          -1.530e-03  3.170e-04  -4.827 5.87e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.815 on 87 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.7331 
## F-statistic: 123.3 on 2 and 87 DF,  p-value: < 2.2e-16
plot(ts(fitted(quadratic_model)), ylim = c(min(c(fitted(quadratic_model), as.vector(ozone_ts))), max(c(fitted(quadratic_model),as.vector(ozone_ts)))),
     ylab='y' , main = "Fitted quadratic curve to ozone series", type="l",lty=2,col="red")
lines(as.vector(ozone_ts),type="o")

Summary Statistics of Quadratic Model
  • The significance of T and T^2 are significant, in other words, their p-vavlue is less than alpha level 0.05.
  • R- squared value = 0.7391 , which indicates there is 73.91% variance.
  • The overall general p-value is also less than 0.05 which means my quadratic model is significant.

Residual Analysis of Quadratic Model

# Residual analysis of Quadratic model
plot(y=rstudent(quadratic_model),x=as.vector(time(ozone_ts)), xlab='Time',
     ylab='Standardized Residuals',type='o', main = "Residual plot.")

plot(y=rstudent(quadratic_model),x=as.vector(time(ozone_ts)), xlab='Time',
     ylab='Standardized Residuals',type='p', main = "standardized residuals vs fitted")
abline(h=0)

hist(rstudent(quadratic_model),xlab='Standardized Residuals', main = "Histogram of standardised residuals.")

y = rstudent(quadratic_model)
qqnorm(y, main = "QQ plot of standardised residuals.")
qqline(y, col = 2, lwd = 1, lty = 2)

acf(rstudent(quadratic_model), main = "ACF of standardized residuals.")

y = rstudent(quadratic_model)
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.98889, p-value = 0.6493

Interpretation of Residual Analysis of Quadratic Model

3. Fitting Harmonic Model

# Fit the Harmonic Model


har. <- harmonic(ozone_ts, 0.45) 
harmonic_model <- lm(ozone_ts ~ har.)
summary(harmonic_model)
## 
## Call:
## lm(formula = ozone_ts ~ har.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3520 -1.8905  0.4837  2.3643  6.4248 
## 
## Coefficients: (1 not defined because of singularities)
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.970e+00  4.790e-01  -6.199 1.79e-08 ***
## har.cos(2*pi*t)         NA         NA      NA       NA    
## har.sin(2*pi*t)  5.462e+11  7.105e+11   0.769    0.444    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.522 on 88 degrees of freedom
## Multiple R-squared:  0.006672,   Adjusted R-squared:  -0.004616 
## F-statistic: 0.5911 on 1 and 88 DF,  p-value: 0.4441
plot(ts(fitted(harmonic_model)), ylab='y', main = "Fitted cosine wave to ozone series.",
     ylim = c(min(c(fitted(harmonic_model), as.vector(ozone_ts))) ,
              max(c(fitted(harmonic_model), as.vector(ozone_ts)))
     ), col = "green" )
lines(as.vector(ozone_ts),type="o")

Summary Statistics of Harmonic Model

Residual Analysis of Harmonic Model

# Residual analysis of Harmonic model
plot(y=rstudent(harmonic_model),x=as.vector(time(ozone_ts)), xlab='Time',
     ylab='Standardized Residuals',type='o', main = "Time series plot of standardized residuals.")

plot(y=rstudent(harmonic_model),x=as.vector(time(ozone_ts)), xlab='Time',
     ylab='Standardized Residuals',type='p', main = "standardized residuals vs fitted.")
abline(h=0)

hist(rstudent(harmonic_model),xlab='Standardized Residuals', main = "Histogram of standardised residuals.")

y = rstudent(harmonic_model)
qqnorm(y, main = "QQ plot of standardised residuals.")
qqline(y, col = 2, lwd = 1, lty = 2)

acf(rstudent(harmonic_model), main = "ACF of standardized residuals.")

y = rstudent(harmonic_model)
shapiro.test(y)
## 
##  Shapiro-Wilk normality test
## 
## data:  y
## W = 0.95856, p-value = 0.005875

Interpretation of Residual Analysis of Harmonic Model

Best Fitting Model

From the summary statistics of all the models we can conclude that quadratic model is the best best model, as its R-squared value = 0.7391 is greater as compared to linear model which has R-sqaured value = 0.6693 which means quadratic model has higher variation than linear model. Also standard error = 1.815 of quadratic model is less than as compared to linear model which has higher standard error value = 2.032. So quadratic model is the best model as it has higher variation and less residual standard error.

Forecasting Using Quadratic Model

#forecasting quadratic model

h <- 4

t <- time(ozone_ts)
t2 <- t^2

aheadTimes <- data.frame(t = seq(2017, 2017+h, 1),
                         t2 = seq(2017, 2017+h, 1)^2)

forecasting_quadratic_model <- predict(quadratic_model, newdata = aheadTimes, interval = "prediction")

print(forecasting_quadratic_model)
##         fit       lwr       upr
## 1 -10.34387 -14.13556 -6.552180
## 2 -10.59469 -14.40282 -6.786548
## 3 -10.84856 -14.67434 -7.022786
## 4 -11.10550 -14.95015 -7.260851
## 5 -11.36550 -15.23030 -7.500701
plot(ozone_ts, xlim= c(1927,2021) ,ylim = c(-20,10),
     ylab = "Ozone-Layer series",
     main = "Forecasts from the quadratic model fitted to the Ozone series.")
lines(ts(as.vector(forecasting_quadratic_model[,3]), start = 2017), col="blue", type="l")
lines(ts(as.vector(forecasting_quadratic_model[,1]), start = 2017), col="red", type="l")
lines(ts(as.vector(forecasting_quadratic_model[,2]), start = 2017), col="blue", type="l")
legend("topleft", lty=1, pch=1, col=c("black","blue","red"),
       text.width = 18,
       c("Data","5% forecast limits", "Forecasts"))

The graph indicates that there is a decrease in ozone layer thickness predicted in the next five years.

Conclusion of Task-1

Task 2

Propose a set of possible ARIMA(p,d,q) models

plot(ozone_ts,type='o',ylab='ozone', main='Time series plot of ozone layer series')

ACF & PACF Plots

par(mfrow=c(1,2))
acf(ozone_ts,main="ACF plot for Ozone Layer series.")
pacf(ozone_ts,main="PACF plot for Ozone Layer series.")

There is a slowly decaying pattern in acf plot and very high first correlation in pacf plot which indicates that there is a trend in our series and our series is non-stationary.

Dickey-Fuller Unit-Root Test (ADF)

adf.test(ozone_ts, alternative = c("stationary"))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ozone_ts
## Dickey-Fuller = -3.2376, Lag order = 4, p-value = 0.0867
## alternative hypothesis: stationary

Ho: Non-stationary Ha: Stationary Our p-value = 0.0867 which is greater than 0.05, so we fail to reject our null hypothesis and hence the series is non-stationary.

Box-Cox Transformation

ozone_ts_posi = ozone_ts + abs(min(ozone_ts))+1
BC = BoxCox.ar(ozone_ts_posi )

BC$ci
## [1] 0.9 1.5
lambda <- BC$lambda[which(max(BC$loglike) == BC$loglike)]
lambda
## [1] 1.2
ozone_bc = (ozone_ts_posi^lambda-1)/lambda
plot(ozone_bc,type='o',ylab='Simulated series',  main = "Box Cox transformed series.")

qqnorm(ozone_bc)
qqline(ozone_bc, col = 4)

shapiro.test(ozone_bc)
## 
##  Shapiro-Wilk normality test
## 
## data:  ozone_bc
## W = 0.96644, p-value = 0.01995

Interpretation of Box-Cox Transformation

Differencing on the original series

dif = diff(ozone_ts)
par(mfrow=c(1,1))
plot(dif,type='o',ylab='Average weekly earnings', main = "Time series plot of the first differenced ozone series.")

adf.test(dif)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  dif
## Dickey-Fuller = -7.1568, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

Ho: Non-stationary Ha: Stationary Our p-value = 0.01 which is less than 0.05, so we reject our null hypothesis and hence after the first differencing on the original series, the series has become stationary.

par(mfrow=c(1,2))
acf(dif,main="ACF of first difference.")
pacf(dif,main="PACF of first difference.")

EACF PLOT

eacf(dif)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o x o o o x o o x o  o  o  o 
## 1 x o x o o o o o o x o  o  o  o 
## 2 x o x o o o x o o x o  o  o  o 
## 3 x o x o o x o o o o o  o  o  o 
## 4 x o o x o x o o o o o  o  o  o 
## 5 x x x x o x o o o o o  o  o  o 
## 6 o o o x x o o o o o o  o  o  o 
## 7 o o o x o o o o o o o  o  o  o

The top-left “o” can be found at p=0 and q=0 which shows a clear vertex. The neibouring ARMA models are (p=0,q=1) and (p=1, q=1). So the possible sets of arima models from eacf are ARIMA(0,1,0), ARIMA(0,1,1), ARIMA(1,1,1)

BIC Table

I am setting the values of nar and nma parameters as 6 because in acf, pacf and eacf plots all the ARMA values are under 6.

res2 = armasubsets(dif , nar=6 , nma=6, y.name='ozone', ar.method='ols')
plot(res2)

From the bic table AR(3) and AR(4) and MA(2). So the possible arima models from bic table are ARIMA(3,1,2), ARIMA(4,1,2)

Conclusion of Task 2 .

So the possible set of arima models from acf, pacf, eacf and bic are: