This report is about analyzing the egg deposition of Lake Huron Bloasters by using various analysis method.we analysed the data,applied transformation on the data wherever required,plotted the ACF & PACF’s along with EACF & BIC table and coefficient tests on all the possible models.The result showed a best possible model among set of possible models which was further used to forecasts for next 5 years.
Introduction
Required Library
Forecasting
Conclusion
Lake Huron Bloasters scientifically known as Coregonus Hoyi is a small silvery colored whitefish which is found in most Great Lakes (North America). The amount of eggs produced by female varies depending upon her size.
we impute all the required libraries for the analysis.
library(TSA)
##
## Attaching package: 'TSA'
## The following objects are masked from 'package:stats':
##
## acf, arima
## The following object is masked from 'package:utils':
##
## tar
library(knitr)
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(fUnitRoots)
## Loading required package: timeDate
##
## Attaching package: 'timeDate'
## The following objects are masked from 'package:TSA':
##
## kurtosis, skewness
## Loading required package: timeSeries
## Loading required package: fBasics
library(FitAR)
## Loading required package: lattice
## Loading required package: leaps
## Loading required package: ltsa
## Loading required package: bestglm
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following object is masked from 'package:timeSeries':
##
## time<-
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(dLagM)
## Loading required package: nardl
## Loading required package: dynlm
library(forecast)
## Registered S3 methods overwritten by 'forecast':
## method from
## fitted.Arima TSA
## plot.Arima TSA
##
## Attaching package: 'forecast'
## The following object is masked from 'package:dLagM':
##
## forecast
## The following object is masked from 'package:FitAR':
##
## BoxCox
The data is displayed using head() which shows top 6 rows of the data set.it has 2 columns namely year and eggs.
The class of below data is of data frame and needs to be converted to time series data for further analysis
egg<- read.csv("eggs.csv",header = TRUE)
head(egg)
class(egg)
## [1] "data.frame"
After converting the data to time series format , we generated a time series plot and analyzed the following :
#coverting dataframe to ts data
data_egg <- ts(as.vector(egg$eggs),start=1981,end=1996)
plot(data_egg,type='o',ylab="Egg Deposited ",xlab='Year')
Fig.3.1^[Time series plot of the egg deposition of Bolasters between 1981-1996]
ACF plot are designed to show whether the elements of time series are positively correlated,negatively correlated or independent on each other.We calculate the correlation for time series observations with previous time stamps called lags.
The below ACF shows 1 significant lag.
acf(data_egg)
Fig.3.2^[ACF Plot]
PACF A partial auto correlation is the amount of correlation between a variable and a lag of itself that is not explained by correlations at all lower-order-lags.
The below PACF also shows 1 significant lag.
pacf(data_egg)
Fig.3.3^[PACF Plot]
We then perform Augmented Dickey Fuller (ADF) test to check whether the given time series is stationary or not.It is one of the most commonly used test when it comes to analyzed the stationarity of data.
After performing the ADF test we get \(p-value=0.54\) which is greater than\(0.05\). Hence we cannot reject the null hypothesis and we say that series in not stationary.
adf.test(data_egg)
##
## Augmented Dickey-Fuller Test
##
## data: data_egg
## Dickey-Fuller = -2.0669, Lag order = 2, p-value = 0.5469
## alternative hypothesis: stationary
After plotting the scatter plot, we observed a positive upward trend when we go from left to right.This indicates that there is strong correlation between egg deposition in an year as compared to previous year.
Also using correlation function we get the value around \(74\)% which justifies the upward trend in the plot.
plot(y=data_egg,x=zlag(data_egg),ylab="Egg Deposition",xlab="Previous year Egg Deposition",col='blue')
Fig.3.4^[ScatterPlot of neighbouring egg deposition]
# computing the correlation
y=data_egg
x=zlag(data_egg)
index=2:length(x)
cor(x[index],y[index])
## [1] 0.7445657
Data transformation are important tool for proper statistical analysis. As in this case our series is not normally distributed, we transform the data using Box-Cox to make the transformed values normally distributed.i.e to stabilize the variance
The lambda value appears to be \(0.48\).
we plot the Transformed egg deposition data to see if any changes.
data.egg.transform=BoxCox.ar(data_egg,method = "yule-walker")
## Warning in arima0(x, order = c(i, 0L, 0L), include.mean = demean): possible
## convergence problem: optim gave code = 1
## Warning in arima0(x, order = c(i, 0L, 0L), include.mean = demean): possible
## convergence problem: optim gave code = 1
#data.egg.transform$ci
lambda=0.48
egg.bc=(data_egg^lambda-1)/lambda
plot(egg.bc, type = "o", ylab = "Egg Depositions (in millions)")
Fig.3.5^[Transformed - Egg depositions between 1981 and 1996]
The plot appears to be quite similar as before.
Also from the ADF test ,we get \(p-value=0.6877\) which is greater than \(0.05\) and therefore cannot reject null hypothesis and the series is still not stationary.
adf.test(egg.bc)
##
## Augmented Dickey-Fuller Test
##
## data: egg.bc
## Dickey-Fuller = -1.6973, Lag order = 2, p-value = 0.6877
## alternative hypothesis: stationary
As the Transformed values did not showed any normal distribution, we now calculate the first difference of the transformed series and check whether the series is now stationary or not.
we now plot the difference of the transformed egg deposition graph.
diff1=diff(egg.bc,differences=1)
plot(egg.bc, type = "o", ylab = "Egg Depositions (in millions)")
Fig.3.6^[Difference of the Transformed Egg depositions between 1981 and 1996]
As we see change of variance in the plot , we apply ADF test to check the stationarity.
From ADF test, the \(p-value=0.047\) which is smaller than \(0.05\) states that we can reject the null hypothesis and the series is stationary now.
adf.test(diff1)
##
## Augmented Dickey-Fuller Test
##
## data: diff1
## Dickey-Fuller = -3.6374, Lag order = 2, p-value = 0.04733
## alternative hypothesis: stationary
The ACF & PACF do not show any significant lags which implies existence of white noise behavior.
From ACF & PACF we can draw ARIMA{0,1,0}.
par(mfrow=c(1,2)) # To plot both ACF and PACF in the same panel of plots
acf(diff1,main="ACF Plot")
pacf(diff1,main="PACF Plot")
par(mfrow=c(1,1))
Fig.3.7^[ACF & PACF Plots after differencing]
Because there is no significant lags seen in ACF and PACF it is not possible to proceed over the ACF and PACF for this series.
We then draw models from EACF tables which gives ARIMA{1,1,0},ARIMA{0,1,1},ARIMA{1,1,2}.
Here 1 in the middle is due to 1st differencing
Because of the size of the series we restrict the maximum number of AR & MA parameters.
eacf(diff1,ar.max = 3,ma.max = 3)
## AR/MA
## 0 1 2 3
## 0 o o o o
## 1 o o o o
## 2 o o o o
## 3 o o o o
We plot BIC table to get different ARIMA models.
In BIC table below shaded columns correspond to AR(2),AR(5) & MA(3) coefficients.
ARIMA{2,1,3} and ARIMA{5,1,3}are the set of possible models we get through BIC table
res=armasubsets(y=diff1,nar=5,nma=7,y.name='test',ar.method = 'yw')
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 7 linear dependencies found
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : nvmax reduced to 5
plot(res)
Fig.4.1^[BIC Table]
The set of all possible models are - ARIMA{0,1,0},ARIMA{1,1,0},ARIMA{0,1,1},ARIMA{1,1,2},ARIMA{2,1,3} & ARIMA{5,1,3}.
We then apply coefficient test on all models to get the best possible outcomes.
#ARIMA{0,1,1} ,Method= 'CSS'
model011css = arima(egg.bc, order = c(0,1,1), method = 'CSS')
coeftest(model011css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 0.11384 0.24898 0.4572 0.6475
#ARIMA{0,1,1} ,Method= 'ML'
model011ml = arima(egg.bc, order = c(0,1,1), method = 'ML')
coeftest(model011ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 0.10721 0.24292 0.4413 0.659
#ARIMA{1,1,0} ,Method= 'CSS'
model110css = arima(egg.bc, order = c(1,1,0), method = 'CSS')
coeftest(model110css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.10865 0.25625 0.424 0.6716
#ARIMA{1,1,0} ,Method= 'ML'
model110ml = arima(egg.bc, order = c(1,1,0), method = 'ML')
coeftest(model110ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.10217 0.24874 0.4108 0.6812
#ARIMA{1,1,2} ,Method= 'CSS'
model112css = arima(egg.bc, order = c(1,1,2), method = 'CSS')
## Warning in stats::arima(x = x, order = order, seasonal = seasonal, xreg =
## xreg, : possible convergence problem: optim gave code = 1
coeftest(model112css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 1.333368 0.012082 110.3586 < 2.2e-16 ***
## ma1 -1.670042 0.209601 -7.9677 1.616e-15 ***
## ma2 -0.053185 0.308400 -0.1725 0.8631
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#ARIMA{1,1,2} ,Method= 'ML'
model112ml = arima(egg.bc, order = c(1,1,2), method = 'ML')
coeftest(model112ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.48783 0.98856 0.4935 0.6217
## ma1 -0.41308 0.96848 -0.4265 0.6697
## ma2 -0.11356 0.20994 -0.5409 0.5885
#ARIMA{2,1,3} ,Method= 'CSS'
model213css = arima(egg.bc, order = c(2,1,3), method = 'CSS')
coeftest(model213css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -1.52039 0.21232 -7.1608 8.020e-13 ***
## ar2 -0.79428 0.19302 -4.1151 3.870e-05 ***
## ma1 2.07973 0.11280 18.4376 < 2.2e-16 ***
## ma2 2.45580 0.21094 11.6424 < 2.2e-16 ***
## ma3 1.30070 0.21207 6.1334 8.603e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#ARIMA{2,1,3} ,Method= 'ML'
model213ml = arima(egg.bc, order = c(2,1,3), method = 'ML')
coeftest(model213ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.47703 0.57052 -0.8361 0.4031
## ar2 -0.26931 0.42750 -0.6300 0.5287
## ma1 0.73806 0.58267 1.2667 0.2053
## ma2 0.58442 0.63052 0.9269 0.3540
## ma3 -0.37372 0.50790 -0.7358 0.4618
#ARIMA{5,1,3} ,Method= 'CSS'
model513css = arima(egg.bc, order = c(5,1,3), method = 'CSS')
## Warning in stats::arima(x = x, order = order, seasonal = seasonal, xreg =
## xreg, : possible convergence problem: optim gave code = 1
coeftest(model513css)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.3343669 0.1369784 -2.4410 0.014646 *
## ar2 -0.1452824 0.0061417 -23.6549 < 2.2e-16 ***
## ar3 -0.0834647 0.0295939 -2.8203 0.004797 **
## ar4 0.1127213 0.0202048 5.5789 2.420e-08 ***
## ar5 0.5563530 0.0789690 7.0452 1.852e-12 ***
## ma1 0.9236458 0.1698461 5.4381 5.384e-08 ***
## ma2 -3.4243590 0.3598196 -9.5169 < 2.2e-16 ***
## ma3 -4.6209339 0.7281348 -6.3463 2.206e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#ARIMA{5,1,3} ,Method= 'ML'
model513ml = arima(egg.bc, order = c(5,1,3), method = 'ML')
## Warning in stats::arima(x = x, order = order, seasonal = seasonal, xreg =
## xreg, : possible convergence problem: optim gave code = 1
coeftest(model513ml)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.34775 0.50766 -0.6850 0.49334
## ar2 0.30836 0.54345 0.5674 0.57043
## ar3 -0.12989 0.24071 -0.5396 0.58946
## ar4 -0.16913 0.23632 -0.7157 0.47419
## ar5 0.58694 0.29554 1.9860 0.04703 *
## ma1 1.30795 1.50067 0.8716 0.38344
## ma2 0.11548 1.64244 0.0703 0.94395
## ma3 -0.49137 0.62925 -0.7809 0.43487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Residuals in time series model are what is left over after fitting a model. For many time series models ,the residuals are equal to difference between the observation and corresponding fitted values:
\[e_t=y_t-y`_t\]
Residuals are important in verifying whether a model have captured the information in data.
we now check the five different graphs to make predictions.
residual.analysis <- function(model, std = TRUE,start = 2, class = c("ARIMA","GARCH","ARMA-GARCH")[1]){
if (class == "ARIMA"){
if (std == TRUE){
res.model = rstandard(model)
}else{
res.model = residuals(model)
}
}else if (class == "GARCH"){
res.model = model$residuals[start:model$n.used]
}else if (class == "ARMA-GARCH"){
res.model = model@fit$residuals
}else {
stop("The argument 'class' must be either 'ARIMA' or 'GARCH' ")
}
par(mfrow=c(3,2))
plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised residuals")
abline(h=0)
hist(res.model,main="Histogram of standardised residuals")
acf(res.model,main="ACF of standardised residuals")
pacf(res.model,main="PACF of standardised residuals")
qqnorm(res.model,main="QQ plot of standardised residuals")
qqline(res.model, col = 2)
print(shapiro.test(res.model))
k=0
LBQPlot(res.model, lag.max = length(model$residuals)-1, StartLag = k + 1, k = 0, SquaredQ = FALSE)
}
residual.analysis(model = model011css)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.93906, p-value = 0.3376
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.1^[Residual Analysis for model011css]
residual.analysis(model = model011ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.93867, p-value = 0.333
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.2^[Residual Analysis for model011ml]
residual.analysis(model=model110css)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.94101, p-value = 0.3616
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.3^[Residual Analysis for model110css]
residual.analysis(model=model110ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.93965, p-value = 0.3448
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.4^[Residual Analysis for model110ml]
residual.analysis(model=model112ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.92284, p-value = 0.1874
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.5^[Residual Analysis for model112ml]
residual.analysis(model=model112css)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.91349, p-value = 0.1325
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.6^[Residual Analysis for model112css]
residual.analysis(model=model213css)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.95947, p-value = 0.6522
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.7^[Residual Analysis for model213css]
residual.analysis(model=model213ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.98427, p-value = 0.9884
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.8^[Residual Analysis for model213ml]
residual.analysis(model=model513css)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.9104, p-value = 0.1181
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.9^[Residual Analysis for model513css]
residual.analysis(model=model513ml)
##
## Shapiro-Wilk normality test
##
## data: res.model
## W = 0.93767, p-value = 0.3215
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple of
## shorter object length
Fig.5.10^[Residual Analysis for model513ml]
sort.score <- function(x, score = c("bic", "aic")){
if (score == "aic"){
x[with(x, order(AIC)),]
} else if (score == "bic") {
x[with(x, order(BIC)),]
} else {
warning('score = "x" only accepts valid arguments ("aic","bic")')
}
}
sort.score(AIC(model011ml,model110ml,model112ml,model213ml,model513ml),score="aic")
Model with all coefficient significant was ARIMA(0,1,1)
From AIC & BIC functions we get ARIMA(0,1,1) Model for both the cases.
sort.score(BIC(model011ml,model110ml,model112ml,model213ml,model513ml),score="bic")
After specifying the best model for time series data, we now use the model to forecast. The results shows that there is no trend, no seasonality, and insufficient temporal dynamics to allow the future observations to have different conditional means.
predict(model011ml,n.ahead = 5,newxreg = NULL,se.fit=TRUE)
## $pred
## Time Series:
## Start = 1997
## End = 2001
## Frequency = 1
## [1] 0.02318134 0.02318134 0.02318134 0.02318134 0.02318134
##
## $se
## Time Series:
## Start = 1997
## End = 2001
## Frequency = 1
## [1] 0.4293660 0.6405924 0.7977230 0.9286363 1.0432492
fit=Arima(data_egg,c(0,1,1))
plot(forecast(fit,h=5))
Fig.6^[Forecasting of Egg Deposition for next 5 Years]
We analysed the egg deposition of Lake Huron Bloasters by using various analysis.We found the best fitting model as ARIMA(0,1,1).We first converted the time series ,computing the correlation ,then transforming it using Box-Cox transformation and after that applying differencing .After every step we calculated ADF test to make sure series comes to be stationary in order to fit the model.In model fitting ,We identified some possible set of models using EACF & BIC and estimated parameters using coefficient test.At last we conducted model diagnostics by conducting residual analysis and conclude that ARIMA (0,1,1) is the best model. Further ,we forecast using the model for next 5 years which shows no trend, no seasonality to make future predictions.