1 Introduction

The data below shows the yearly changes in the Eggs Depositions (in millions). In this assignment, we are observing how the egg deposition has changed from 1981 to 1996. We also aim to forecast the changes in Egg Deposition in the next 5 years using the best model that we find fit for the study.

1.1 Data Import

In this section we read/import the data into R, then saved it as a data frame but using the read.csv() funtion. The class() function is being used to make sure the class of the dataset is in Time-Series(TS). The Time-Series Plot of Eggs Depostion from 1981 to 1996 is shown as follows:

## 'data.frame':    16 obs. of  2 variables:
##  $ year: int  1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 ...
##  $ eggs: num  0.0402 0.0602 0.1205 0.1807 0.7229 ...

## [1] "ts"

Initial Observations we can obtain in this time-series plot is:

The overall series seems to have an upward trend although there seemed to have some small changing local trends.
The data does not possess any seasonality.
The data seems to have some changing variance.
Clearly, the data observed is non stationary and has a constant mean with respect to time. This series has Autoregressive (AR) and Moving Average (MA) characteristics.

Although ARIMA model can be assumed as first observed. However, we must perform model fitting for in order to prove our hypothesis.

2 Data Modelling

Now, we have to proceed with our analysis with the most basic models which are the Linear and Quadratic Trend and perform residul analysis on both such models if necessary. If any of these model were to be concluded as the best fit, we will able to stop further analysis and make predictions accordingly.

2.1 Linear Model

We can start of by fiiting the obtain Time-Series (TS) plot of the Eggs Deposition with a Linear Model.

## 
## Call:
## lm(formula = eggs.ts ~ time(eggs.ts))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4048 -0.2768 -0.1933  0.2536  1.1857 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -165.98275   49.58836  -3.347  0.00479 **
## time(eggs.ts)    0.08387    0.02494   3.363  0.00464 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4598 on 14 degrees of freedom
## Multiple R-squared:  0.4469, Adjusted R-squared:  0.4074 
## F-statistic: 11.31 on 1 and 14 DF,  p-value: 0.004642

The model summary here suggests that both the line and the intercepts are not statistically significant enough to be the appropriate fitting model. Furthermore, the R-squared = 0.4074 which indicates that it is at 40% variance, which is not statistically strong to be a fitting model. Therefore, we will not worry about any diagnostic checking for this model. Hence, we have to further fit it with the quadratic model.

2.2 Quadratic Trend

## 
## Call:
## lm(formula = eggs.ts ~ t + t2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50896 -0.25523 -0.02701  0.16615  0.96322 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -4.647e+04  2.141e+04  -2.170   0.0491 *
## t            4.665e+01  2.153e+01   2.166   0.0494 *
## t2          -1.171e-02  5.415e-03  -2.163   0.0498 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4092 on 13 degrees of freedom
## Multiple R-squared:  0.5932, Adjusted R-squared:  0.5306 
## F-statistic: 9.479 on 2 and 13 DF,  p-value: 0.00289

The p-value = 0.00289 indicates that the model is significant since it is less than 0.05.
R-squared = 0.5306 which indicates that it is at 53% variance. It is slightly more significant than the linear model. However, 53% variance is too low to be a good fit.
Now, we shall further test the model by doing a diagnostic check.

2.2.1 Diagnostic Check

Based the aboves tests, the quadratic plot turns out to be the better fit. The Q-Q Plot has a better fit with the data points. Hence, we clearly see that quadratic fit is the closest to the actual time-series plot. However, we can see the the p-value in the quadratic model is less than 0.05, we can reject the Null Hypothesis \(H_0\) and say that the model fits the series. Furthermore, \(R^2\) value is 0.5306. That means that on 53% of the variation in the egg deposition series can be explained by the Quadratic Model. Therefore, the quadratic model will also not be the best fit for the prediction even though it has a much better results compared to the Linear Model.

Since there is no seasonal or cyclical trend in the original dataset, The Harmonic Model was not taken under consideration here.

3 Data Preparation for ARIMA Model

We can move on with the ARIMA model fitting as the Linear and Quadratic Models has failed to fit the data.

3.1 Stationarity and Normality Check

We now verify the hypothesis based on the ACF and PACF.

The slow decaying pattern in the ACF and the strong corelation in the first lag of the PACF shows the existence of a trend which indicates that a transformation and differencing is required on the data to make a stationary series. We can further check our hypothesis by using the adftest and the Shapiro-Wilk Normality Test.

ADF Test Assumtions:

\(H_0\): The given series is non-stationary
\(H_A\): Stationary

adf.test(eggs.ts)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  eggs.ts
## Dickey-Fuller = -2.0669, Lag order = 2, p-value = 0.5469
## alternative hypothesis: stationary

shapiro.test(eggs.ts)

## 
##  Shapiro-Wilk normality test
## 
## data:  eggs.ts
## W = 0.94201, p-value = 0.3744

The p-value = 0.5469 for the Dickey-Fuller Test shows that it is not small enough to reject the null hypothesis. This clearly implies non-stationarity. Furthermore, the p-value = 0.3744 for the Shapiro test is clearly greater than 0.05 which means that we cannot reject the null hypothesis. Hence, the data exhibits normality.

3.2 Transformation

We further required to do a Box-Cox transformation before doing any transformations.

## [1] 0.1 0.8

With a 95% confidence interval of the obove plot, the λ value can be found between 0.1 and 0.8. Hence, λ is approximately 0.45 which is the midpoint of the values with the CI.

3.2.1 Normality Test with λ-value

## 
##  Shapiro-Wilk normality test
## 
## data:  eggs.ts.BC
## W = 0.96269, p-value = 0.7107

Based on the p-value after the Box Cox transform, we are unable to reject the null hypothesis of it’s normality as it is greater than 0.05. Now, based on the previous p-value, we have slighly improved the results from 0.374 to 0.711. Hence, we are on the right track.

We can now proceed with differencing to achieve a stationary data and further improvements for modeling.

3.3 Differencing

3.3.1 First Differencing

Now, using the unit root test, we can check for its stationarity. ADF Test Assumtions:

\(H_0\): The given series is non-stationary
\(H_A\): Stationary

## 
##  Augmented Dickey-Fuller Test
## 
## data:  eggs.D1
## Dickey-Fuller = -3.6798, Lag order = 2, p-value = 0.0443
## alternative hypothesis: stationary

Given that the p-value = 0.0443 which is less than 0.05, this implies that we have removed the trend in the data through BoxCox Transformation and Differencing. Thus we are able to reject the null-hypothesis which says that the series is non-stationary. However, the plot above still tends to show a trend and therefore required further testing it with the second differencing.

3.3.2 Second Differencing

ADF Test Assumtions:

\(H_0\): The given series is non-stationary
\(H_A\): Stationary

## 
##  Augmented Dickey-Fuller Test
## 
## data:  eggs.D2
## Dickey-Fuller = -3.1733, Lag order = 2, p-value = 0.1254
## alternative hypothesis: stationary

Here, we did a unit root test to test for stationarity again and found that the p-value = 0.1254 which is greater than 0.05. Now, based on this value, we clearly see that the p-value is not small enough to reject the null hypothesis for non-stationarity which is uncommon since we no longer see any trend on the plot. Therefore, by considering the trend, we can use the second differecing even though the p-value is not small enough. In this case, the second differencing would be a better choice as it leads to non-stationarity as approved by both test and visual inspection.

4 Modeling ARIMA

4.1 ACF and PACF Test

We preoceed to do modelling since we have obtained a stationary data after removing the changing variance and the trend in the plot.

Since we see no significant lag in both the ACF and PACF, we were not able to determine the values of p and q from here. This might indicate that the data has the possibility of exhibiting a white noice behaviour. We have to turn our attention the EACF to determine the values of p and q.

4.2 EACF Test

## AR/MA
##   0 1 2 3
## 0 o o o o
## 1 o o o o
## 2 o o o o
## 3 o o o o

The max limit for the eacf() function is set as the function returns an error if no limit is set. Based of the EACF, we can include the following models ; {ARIMA(0,2,1), ARIMA(1,2,0), ARIMA(1,2,1)}. Furthermore, the table indicates the existence of white noice behaviour.

Now, we can chart the BIC table to determine a more probable models for the data.

4.3 BIC Table

From the BIC table above, additional probable models that can be taken into consideration is {ARIMA(2,2,1), ARIMA(2,2,2), ARIMA(2,2,4)} as the shaded columns correspond to AR(2), MA(1), MA(2) and MA(4) coefficients.

In total, we have 5 probable models which are ready for testing. These models include ARIMA(0,2,1), ARIMA(1,2,0), ARIMA(1,2,1), ARIMA(2,2,1), ARIMA(2,2,2) and ARIMA(2,2,4). Now, we are required to do a parameter estimation on these probable models.

5 Parameter Estimation

Here, we are using the Conditional Sum of Squares (CSS) and Maximum Likelihood (ML) method to test for the significance of the model.

5.1 Probable Models Testing

5.1.1 ARIMA(0,2,1)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -1.093338   0.056532  -19.34 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)
## ma1 -0.99994    0.66759 -1.4978   0.1342

5.1.2 ARIMA(1,2,0)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ar1 -0.40609    0.24460 -1.6602  0.09687 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)
## ar1 -0.38056    0.23493 -1.6199   0.1053

5.1.3 ARIMA(1,2,1)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.030947   0.290460  0.1065    0.9152    
## ma1 -1.021947   0.201926 -5.0610 4.171e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)   
## ar1  0.11474    0.26992  0.4251 0.670766   
## ma1 -1.00000    0.33204 -3.0116 0.002598 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.1.4 ARIMA(2,2,1)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value Pr(>|z|)    
## ar1 -0.041470   0.266217  -0.1558   0.8762    
## ar2 -0.086226   0.277056  -0.3112   0.7556    
## ma1 -1.201138   0.055539 -21.6271   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)  
## ar1  0.113698   0.267936  0.4243  0.67131  
## ar2 -0.063806   0.257731 -0.2476  0.80447  
## ma1 -0.999999   0.431112 -2.3196  0.02036 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.1.5 ARIMA(2,2,2)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.927927   0.373736 -2.4828 0.0130339 *  
## ar2  0.171940   0.393676  0.4368 0.6622880    
## ma1 -0.090924   0.258816 -0.3513 0.7253587    
## ma2 -1.554764   0.457355 -3.3995 0.0006752 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)
## ar1 -0.351338   1.343222 -0.2616   0.7937
## ar2  0.078788   0.296910  0.2654   0.7907
## ma1 -0.534045   1.354136 -0.3944   0.6933
## ma2 -0.465954   1.324257 -0.3519   0.7249

5.1.6 ARIMA(2,2,4)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.484683   0.401469 -1.2073 0.2273262    
## ar2 -0.090775   0.468876 -0.1936 0.8464875    
## ma1 -0.730724   0.354175 -2.0632 0.0390963 *  
## ma2 -0.435606   0.178956 -2.4341 0.0149270 *  
## ma3 -0.921816   0.274533 -3.3578 0.0007858 ***
## ma4  0.881503   0.411248  2.1435 0.0320744 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model.224.ml = arima(eggs.ts.BC, order=c(2,2,4), method='ML')
coeftest(model.224.ml)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)  
## ar1 -0.379637         NA      NA       NA  
## ar2 -0.242943   0.286517 -0.8479  0.39648  
## ma1 -0.273239         NA      NA       NA  
## ma2 -0.031795   0.338678 -0.0939  0.92520  
## ma3 -0.855545   0.466246 -1.8350  0.06651 .
## ma4  0.468620         NA      NA       NA  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As a summary, we can describe the test carried out above as follows

ARIMA(0,2,1):

CSS: Coefficients are significant at 5% significance as seen from the p-value.
ML: Coefficients are significant at 5% significance as seen from the p-value. Both of the tests above shows that the model is significant.

ARIMA(1,2,0):

CSS: AR1 coefficients are insignificant at 5% significance as seen from the p-value.
ML: AR1 coefficients are insignificant at 5% significance as seen from the p-value. Hence, this model is not a good fit for the data.

ARIMA(1,2,1):

CSS: AR1 coefficients are insignificant at 5% significance as seen from the p-value.
ML: AR1 coefficients are insignificant at 5% significance as seen from the p-value. Hence, this model is not a good fit for the data and the AR1 should be eliminated for a fitting model.

ARIMA(2,2,1):

CSS: Both AR1 and AR2 coefficients are insignificant at 5% significance as seen from the p-value.
ML: Both AR1 and AR2 coefficients are insignificant at 5% significance as seen from the p-value. Hence, this model is not a good fit for the data , both AR1 and AR2 should be eliminated for a fitting model.

ARIMA(2,2,2):

CSS: Both AR2 and MA1 coefficients are insignificant at 5% significance as seen from the p-value.
ML: No sign of significance in the p-value at 5% signifance. Hence, this model is not a good fit for the data as there are too many uncertainties present in the model.

ARIMA(2,2,4):

CSS: Both AR1 and AR2 coefficients are insignificant at 5% significance as seen from the p-value.
ML: No sign of significance in the p-value at 5% signifance. Hence, this model is not a good fit for the data as there are too many uncertainties present in the model.

Now, me move on to sort the AIC and BIC scores in order to select the best model amongst the candidate model that was predicted based on the tests above.

5.2 AIC Scores

##              df      AIC
## model.021.ml  2 23.12446
## model.121.ml  3 24.94140
## model.221.ml  4 26.88097
## model.120.ml  2 27.05798
## model.222.ml  5 28.92955
## model.224.ml  7 30.14653

5.3 BIC Scores

##              df      BIC
## model.021.ml  2 24.40257
## model.121.ml  3 26.85857
## model.120.ml  2 28.33609
## model.221.ml  4 29.43719
## model.222.ml  5 32.12483
## model.224.ml  7 34.61993

Now, Based on the above AIC and BIC tables, it is clear that we can conclue ARIMA(0,2,1) is the best fit.

However, a further testing is required to check whether the model ARIMA(0,2,2) is significant or not.

5.4 Testing for Overfitting and Redundancy

After testing the above model, we look at overfitting as another tool to detect anomalies in terms of goodness of fit. The closes fit to ARIMA(0,2,1) is ARIMA(0,2,2) which we have to test to decide whether it is at significant at 5% level of significance.

5.4.1 ARIMA(0,2,2)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -1.017026   0.274754 -3.7016 0.0002143 ***
## ma2 -0.087178   0.308221 -0.2828 0.7772974    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ma1 -0.88464    0.42637 -2.0748   0.0380 *
## ma2 -0.11535    0.25484 -0.4526   0.6508  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hence, we can clearly see the in both CSS and ML Pr(>|z|) > 0.05 which suggests that the model is insignificant at 5% level of significance and we can proceed with the residual testing for the model ARIMA(0,2,1).

5.5 Diagnostic Check for Best Fitting Model

## 
##  Shapiro-Wilk normality test
## 
## data:  model.021.ml$residuals
## W = 0.94991, p-value = 0.4883

residual.analysis(model=model.021.ml)

The residual tests above suggests that the following:

The scattering on the Time series plot of standardised residuals seems to show it is randomly distributed over time.
The are no longer any changing and no trend that are present in the data. Hence, this support the the model ARIMA(0,2,1)
The Q-Q Plot does not seemed very alligned as there are some point which do not lie in the line.
The histogram for the stardarized residuals looks normally distributed.
ACF plot shows that it has no significant lags to begin with.
From the Ljung-Box Test above, the points shown are all above the alpha which suggests that the null-hyperpothesis is not rejected and the data are independently distributed. The argument SquaredQ = FALSE, to test for regular autocorrelations.

6 Forecasting

## $pred
## Time Series:
## Start = 1997 
## End = 2001 
## Frequency = 1 
## [1] 0.1388044 0.2536666 0.3685288 0.4833910 0.5982531
## 
## $se
## Time Series:
## Start = 1997 
## End = 2001 
## Frequency = 1 
## [1] 0.4491633 0.6547624 0.8251656 0.9789287 1.1229082

7 Conclusion

Based on the investigation that has been carried out above, we have transformed and differentiate the data in order to make it stationary. With this stationary we were able to come out with probable models through which we looks at the ACF, PACF, EACH and the BIC table. Then, we carried out the parameter estimation using the Conditional Sum of Squares (CSS) and Maximum Likelihood (ML) method to test for the significance of the model. From here, we are able to do selection based on the AIC and BIC scores. Hence, we selected the best fit model which is ARIMA(0,2,1) and forecast the Eggs Depositions in the next 5 years using this model.

8 Reference

The LBQPlot function is obtained from the following link: {https://www.rdocumentation.org/packages/FitAR/versions/1.94/topics/LBQPlot}

Sort Score function: {https://rmit.instructure.com/courses/67182/files/10178194?module_item_id=2056189}

Residual Analysis Function: {https://rmit.instructure.com/courses/67182/files/10178256?module_item_id=2056188}

9 Appendix

The following are R-codes that have been ultilised in the report above which includes functions of sort score and diagnostic check that is obtained from Canvas.

The relevant packages that is required in for this investigation.

10 Appendix

10.1 Packages

The necessary packages has been installed and load below. The required codes and funtions for the projects are as follows;

library(TSA)
library(tseries)
library(forecast)
library(FitAR)
library(lmtest)
library(fUnitRoots)
library(readr)


#*Box Cox Search Function

BoxCoxSearch = function(y, lambda=seq(-3,3,0.01), 
                        m= c("sf", "sw","ad" ,"cvm", "pt", "lt", "jb"), plotit = T, verbose = T){
  N = length(m)
  BC.y = array(NA,N)
  BC.lam = array(NA,N)
  for (i in 1:N){
    if (m[i] == "sf"){
      wrt = "Shapiro-Francia Test"
    } else if (m[i] == "sw"){
      wrt = "Shapiro-Wilk  Test"
    } else if (m[i] == "ad"){
      wrt = "Anderson-Darling Test"
    } else if (m[i] == "cvm"){
      wrt = "Cramer-von Mises Test"
    } else if (m[i] == "pt"){
      wrt = "Pearson Chi-square Test"
    } else if (m[i] == "lt"){
      wrt = "Lilliefors Test"
    } else if (m[i] == "jb"){
      wrt = "Jarque-Bera Test"
    } 
    
    print(paste0("------------- ",wrt," -------------"))
    out = tryCatch({boxcoxnc(y, method = m[i], lam = lambda, lambda2 = NULL, plot = plotit, alpha = 0.05, verbose = verbose)
      BC.lam[i] = as.numeric(out$lambda.hat)}, 
      error = function(e) print("No results for this test!"))
    
  }
  return(list(lambda = BC.lam,p.value = BC.y))
}


# Residual Analysis Function

residual.analysis <- function(model, std = TRUE,start = 2, class = c("ARIMA","GARCH","ARMA-GARCH")[1]){
  # If you have an output from arima() function use class = "ARIMA"
  # If you have an output from garch() function use class = "GARCH"
  # If you have an output from ugarchfit() function use class = "ARMA-GARCH"
  
  if (class == "ARIMA"){
    if (std == TRUE){
      res.model = rstandard(model)
    }else{
      res.model = residuals(model)
    }
  }else if (class == "GARCH"){
    res.model = model$residuals[start:model$n.used]
  }else if (class == "ARMA-GARCH"){
    res.model = model@fit$residuals
  }else {
    stop("The argument 'class' must be either 'ARIMA' or 'GARCH' ")
  }
  par(mfrow=c(3,2))
  plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised residuals")
  abline(h=0)
  hist(res.model,main="Histogram of standardised residuals")
  acf(res.model,main="ACF of standardised residuals")
  pacf(res.model,main="PACF of standardised residuals")
  qqnorm(res.model,main="QQ plot of standardised residuals")
  qqline(res.model, col = 2)
  k=0
  LBQPlot(res.model, lag.max = length(model$residuals)-1, StartLag = k + 1, k = 0, SquaredQ = FALSE)
}

# Sort AIC and BIC Values Function
sort.score <- function(x, score = c("bic", "aic")){
  if (score == "aic"){
    x[with(x, order(AIC)),]
  } else if (score == "bic") {
    x[with(x, order(BIC)),]
  } else {
    warning('score = "x" only accepts valid arguments ("aic","bic")')
  }
}

eggs <- read.csv("eggs.csv")
#Checking the structure of the dataset
str(eggs)
# Convert to the TS object!
eggs.ts <- ts(eggs$eggs, start=1981)
#Checking the class of the data set again to make sure it has been changed to a Time-Series
class(eggs.ts)
plot(eggs.ts, type='o', xlab = "Years", ylab='Eggs Deposited (in Millions)', main="Eggs Depostion from 1981 to 1996")
LinearEggs = lm(eggs.ts ~ time(eggs.ts)) 
summary(LinearEggs)
plot(eggs.ts, type='o', ylab = "Egg Depositions (in millions)", 
     main = "Egg depositions of bloaters between 1981 and 1996")
abline(LinearEggs, col='red', lty= 2)
t<- time(eggs.ts)
t2<- t^2
QuadEggs<-lm(eggs.ts ~ t + t2)
summary(QuadEggs)
plot(ts(fitted(QuadEggs)), ylim = c(min(c(fitted(QuadEggs), as.vector(eggs.ts))), max(c(fitted(QuadEggs),as.vector(eggs.ts)))), ylab='y' , main = "Fitted quadratic curve ", type="l",lty=2,col="red")
lines(as.vector(eggs.ts),type="o")
par(mfrow=c(2,2))
plot(QuadEggs$fitted, QuadEggs$residuals,main="Residuals vs Fitted", xlab = "Fitted", ylab = "Residuals")
abline(h=0, lty=2, col='red')
qqnorm(rstudent(QuadEggs))  #Now we look at the Q-Q plot
qqline(rstudent(QuadEggs), col = 2, lwd = 1, lty = 2) 
plot(QuadEggs$residuals, main = "Residuals vs Time", ylab = "Residuals") #Residuals vs Time
abline(h=0,lty=2,col=2) 
acf(QuadEggs$residuals, main = "ACF of standardized residuals") #Sampling the Autocorrelation
par(mfrow=c(1,1))
par(mfrow=c(1,2))
acf(eggs.ts, main=expression(paste("Series Eggs Deposition")))
pacf(eggs.ts, main=expression(paste("Series Eggs Deposition")))
par(mfrow=c(1,2))
adf.test(eggs.ts)
shapiro.test(eggs.ts)
egg.Box=BoxCox.ar(eggs.ts, method='yule-walker')
egg.Box$ci
lambda=0.45
eggs.ts.BC = ((eggs.ts^lambda)-1)/lambda
qqnorm(eggs.ts.BC)
qqline(eggs.ts.BC,col='red')
shapiro.test(eggs.ts.BC)
eggs.D1 <- diff(eggs.ts.BC, differences = 1)
plot(eggs.D1, type='o',ylab='Eggs Deposited (in Millions)', main="Eggs Depostion from 1981 to 1996")
abline(h=0, col='red', lty=2)
adf.test(eggs.D1)
eggs.D2 <- diff(eggs.ts.BC, differences = 2)
plot(eggs.D2, type='o',ylab='Eggs Deposited (in Millions)', main="Eggs Depostion from 1981 to 1996")
abline(h=0, col='red', lty=2)
adf.test(eggs.D2, alternative = 'stationary')
par(mfrow=c(1,2))
acf(eggs.D2, main=expression(paste("Series Eggs Deposition")))
pacf(eggs.D2, main=expression(paste("Series Eggs Deposition")))
par(mfrow=c(1,1))
eacf(eggs.D1, ar.max = 3, ma.max = 3)
regEggs = armasubsets(y=eggs.D2, nar=4,nma=4,y.name='test', ar.method='yule-walker')
plot(regEggs)
model.021.css = arima(eggs.ts.BC, order=c(0,2,1), method='CSS')
coeftest(model.021.css)
model.021.ml = arima(eggs.ts.BC, order=c(0,2,1), method='ML')
coeftest(model.021.ml)
model.120.css = arima(eggs.ts.BC, order=c(1,2,0), method='CSS')
coeftest(model.120.css)
model.120.ml = arima(eggs.ts.BC, order=c(1,2,0), method='ML')
coeftest(model.120.ml)
model.121.css = arima(eggs.ts.BC, order=c(1,2,1), method='CSS')
coeftest(model.121.css)
model.121.ml = arima(eggs.ts.BC, order=c(1,2,1), method='ML')
coeftest(model.121.ml)
model.221.css = arima(eggs.ts.BC, order=c(2,2,1), method='CSS')
coeftest(model.221.css)
model.221.ml = arima(eggs.ts.BC, order=c(2,2,1), method='ML')
coeftest(model.221.ml)
model.222.css = arima(eggs.ts.BC, order=c(2,2,2), method='CSS')
coeftest(model.222.css)
model.222.ml = arima(eggs.ts.BC, order=c(2,2,2), method='ML')
coeftest(model.222.ml)
model.224.css = arima(eggs.ts.BC, order=c(2,2,4), method='CSS')
coeftest(model.224.css)
model.224.ml = arima(eggs.ts.BC, order=c(2,2,4), method='ML')
coeftest(model.224.ml)
aic=AIC(model.021.ml,model.120.ml,model.121.ml,model.221.ml,model.222.ml,model.224.ml)
sort.score(aic, score="aic")
bic=BIC(model.021.ml,model.120.ml,model.121.ml,model.221.ml,model.222.ml,model.224.ml)
sort.score(bic, score="bic")
model.022.css = arima(eggs.ts.BC, order=c(0,2,2), method='CSS')
coeftest(model.022.css)
model.022.ml = arima(eggs.ts.BC, order=c(0,2,2), method='ML')
coeftest(model.022.ml)
shapiro.test(model.021.ml$residuals)
residual.analysis(model=model.021.ml)
predict(model.021.ml, n.ahead = 5, newxreg = NULL, se.fit = TRUE)
fit = Arima(eggs.ts, c(0,2,1),lambda=0.45)
plot(forecast(fit, h=5))

MATH1318 Semester 1, 2020

Assignment 2 Time Series Analysis

Roy Wong Kher Yung, s3835352