ARIMA

library(fpp2)
library(urca)

8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer: Yes, these figures do represent somewhat the white noises. They are not exactly the same Because these figure have different bounds, which influence the narrowness / wideness of the figures.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer:Because each time point, the critical values are actually a series of data points, hence, the time series, rather than one data points. In each figure, the bounds are not the same, that is why the autorrelation showing on the ACF figures are different. They are just a reflection of the wideness of the timeframe, rather than the nature of the data themselves (the data are the same in their nature).

8.2

IMB Stock

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

autoplot(ibmclose)  +  
  labs (title ="IBM Stock Daily -Closing Price", subtitle ='In Dollars',x="Day")

AnswER: From autoplot, we can see that there is an upward trend from day 50- day 100,with corresponding ibmclose price increase from 450 to peak of 600, then stablize, but there is a big drop from day 200 (from price of 500) to day 270 (at price of 300).

ggAcf(ibmclose)  +  
  ggtitle("Correlogram of Daily Closing Price For IBM Stock")

ACF shows us us that the value of r1 is positive, and it is gradualy decreaseing from 1 to around 0.8, as the lag goes by. THe trend is visiable, therefore the data is non stationary.Diffentiration of the data is needed.

ggPacf(ibmclose)  +  ggtitle("PACF For IBM Stock Price")

PACF (partial autocorrelation figure) figures shows that there is an initial spike at 1.0. Therefore, the data is non stationary by this figure.

8-3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

US Net Electricity Generation

autoplot(usnetelec) + 
  ylab("Annual US Electricity Generation (billion kWh)") +  
  labs( title="Annual US Net Electricity Generation",
  subtitle ="in billion Kwh", x='year')

lambda_usnetelec <- BoxCox.lambda(usnetelec)
autoplot(BoxCox(usnetelec,lambda_usnetelec))   +
  labs(  title="Box Cox Transformation",
         subtitle ="Annual US Net Electricity Generation",
         x='Year')

BoxCox(usnetelec,lambda_usnetelec) %>%
    ur.kpss() %>%
    summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 1.4583 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggAcf(BoxCox(usnetelec,lambda_usnetelec))   +
labs( title ="Correlogram of Box Cox Transformation ", 
      subtitle = "US Net Electricity Generation")

ndiffs(BoxCox(usnetelec,lambda_usnetelec))

## [1] 2

The autoplot shows that the US net electricity generation goes up each year from 1950 to 2000, with a clear upward trends, almost liner. The trend persist after data is Box Cox transformed. Lamda for the Box Cox transformation is 0.52. To test if differencing is needed, the unit root test is perfromed, at mu with 3 lags. The t statistics is not significent (1.4583), much bigger compared with 5pct critical value of 0.465, and even much bigger than the 1pct value (at 0.739). Therefore, the data is non stationary and differencing is needed. The ndiff function with the Box Cox transformation (to test the difference between the original data and the lamda) indicates that 2 differencing is recommended for the data to be stationary.

ggAcf(diff(BoxCox(usnetelec,lambda_usnetelec))) +
  labs(title = 'Difference - After Box Cox Transformation', subtitle ='US Electricity Generation')

autoplot(diff(BoxCox(usnetelec,lambda_usnetelec)))+
  labs(title = 'Difference - After Box Cox Transformation', subtitle ='US Electricity Generation')

diff(BoxCox(usnetelec,lambda_usnetelec))    %>% 
  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.4315 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The ACF (auto correlation figure) after the suggested differencing number (at 2) above shows that all the ACF value is within the 95% confidence interval limits, suggesting that the difference works to make the data stationary. After I plot the Box Cox transformed data, the diff function clearly shows white noise, which is what I need.
The KPSS Unit Root Test provide a test statistics value of 0.4315, which is arbout 5% significance level, meaning it is borderline successful in making the data non stationary.

Quarterly US GDP

autoplot(usgdp) + ylab("Quarterly US GDP") +  ggtitle("US GDP")

lambda_usgdp <- BoxCox.lambda(usgdp)
autoplot(BoxCox(usnetelec,lambda_usgdp)) +  ggtitle("Box Cox Transformation of US GDP")

lambda_usgdp

## [1] 0.366352

BoxCox(usgdp,lambda_usgdp) %>% 
    ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.8114 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(usgdp,lambda_usgdp))

## [1] 1

ggAcf(BoxCox(usgdp,lambda_usgdp))  +  ggtitle("Correlogram of Box Cox Transformation of US GDP")

The original plot shows that there is an Upward trend for US GD P, The GDP starts at 2000 in 1950, goes up each and every year gradually, reaches more than 10,000 after year 2000 .Box Cox transformation shows similar trend. The Lambda for the box Cox transformation is 0.366. after I perform all cap KPSS unit root test, which suggested mu with four lags. the test statistics value is 4.8, far greater than 1% significance level, which indicates successful stationary data achievements. The ndiff function tells me that the differencing number of 1 is recommended.

ggAcf(diff(BoxCox(usgdp,lambda_usgdp))) +
  labs(title = 'Difference - After Box Cox Transformation', subtitle ='US GDP')

autoplot((diff(BoxCox(usgdp,lambda_usgdp)))) +
  labs(title = 'Difference - After Box Cox Transformation', subtitle ='US GDP')

diff(BoxCox(usgdp,lambda_usgdp)) %>% 
    ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The ACF plot ACF text plot shows that except a few fear spikes here and there, most of the data lie fall inside the 95 percent levels , indicating its close to stationary data. The box Cox transformation tells the similar story. box Cox tells similar stories. All cap KPSS test, KPSS test provides test statistics at 0.20, which is less than 10% critical value, which tells us that the data is stationary, and no different thing is needed.

Monthly Copper Prices

autoplot(mcopper) + ylab("Monthly Copper Prices") +  ggtitle("Copper Prices")

lambda_mcopper <- BoxCox.lambda(mcopper)

autoplot(BoxCox(mcopper,lambda_mcopper)) +  ggtitle("Box Cox Transformation of Copper Prices")

lambda_mcopper

## [1] 0.1919047

BoxCox(mcopper,lambda_mcopper) %>% 
  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 6.2659 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(mcopper,lambda_mcopper))

## [1] 1

ggAcf(BoxCox(mcopper,lambda_mcopper))  +  
  labs(title = 'Difference - After Box Cox Transformation', subtitle = 'Copper Prices')

The autoplot shows a upward trend overall with the high tail in the end. this trend persists after box Cox transformation. The Lambda is 0.192. With six legs suggested. The T statistics is 6.27, far greater than 1% level at 0.739, the suggested number of lag is 1.

ggAcf(diff(BoxCox(mcopper,lambda_mcopper)))

autoplot((diff(BoxCox(mcopper,lambda_mcopper))))

diff(BoxCox(mcopper,lambda_mcopper)) %>% 
    ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The ACF plot shows that except an initial value of 0.3, most of the data pho within fall within falls within 95% confidence interval, with the box Cox transformation Showing mostly white noise. Both figures suggest that the data is close Stationary. KPSS unit root test has a test statistics of 0.057, which is far less then 0.739 (1% critical value), confirming that the data is stationary and no diferencing is needed.

Montly US Domestic Enplanements

autoplot(enplanements) + ylab(" US Enplanements") +  ggtitle("US Enplanements")

lambda_enplanements <- BoxCox.lambda(enplanements)
autoplot(BoxCox(enplanements,lambda_enplanements)) +  
  # ggtitle("Box Cox Transformation of US Domestic Enplanements")
  labs (Title ='Box Cox Transformation', subtitle ='US Domestic enplanements')

lambda_enplanements

## [1] -0.2269461

BoxCox(enplanements,lambda_enplanements) %>% 
  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 4.3785 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(enplanements,lambda_enplanements))

## [1] 1

ggAcf(BoxCox(enplanements,lambda_enplanements))  +  ggtitle("Correlogram of Box Cox Transformation of US Domestic Enplanements")

The monthly domestic enplanements within USA shows a trend that goes up as time passes goes by . The box Cox transformation provide the Lambda of -0.23. the KPSS route test shows the test statistics of 4.37, which is far greater than the 1% critical value at 0.74, suggesting the differencing is needed.The test was at 5 lags mu. a unit root test suggest us to difference the data once.

ggAcf(diff(BoxCox(enplanements,lambda_enplanements))) +
  labs (title = 'Difference after Box Cox Tranformation', subtitle ='US Enplanments')

autoplot((diff(BoxCox(enplanements,lambda_enplanements))))

diff(BoxCox(enplanements,lambda_enplanements)) %>% 
  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0151 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggAcf(diff(diff(BoxCox(enplanements,lambda_enplanements)),12)) +
  labs (title = 'Difference of Differnce, 12 lag and Box Cox Transformation', subtitle = 'Enplanement' )

autoplot(diff(diff(BoxCox(enplanements,lambda_enplanements)),12)) +
  labs (title = 'Difference of Differnce, 12 lag and Box Cox Transformation', subtitle = 'Enplanement' )

diff(diff(BoxCox(enplanements,lambda_enplanements),12)) %>% 
  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0424 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

For the employment data, they are equal amounts of data that lie within the 95% confidence interval and those outside, suggesting some different thing is needed to make the data stationary. To formally test if different thing is needed, KPSS unit test is performed , resulting 0.015 as the value of test statistics, which is far greater than 0.347, at 10% level. Then the second different thing is performed and the Test statistics is 0.04, which is far greater than 10% level so that means further differencing is still needed .

Monthly Australian Overseas Visitors

autoplot(visitors) + ylab("Monthly Australian Overseas Visitors") +  ggtitle("Australian Overseas Visitors")

lambda_visitors <- BoxCox.lambda(visitors)
autoplot(BoxCox(visitors,lambda_visitors)) +  ggtitle("Box Cox Transformation of Australian Overseas Visitors")

lambda_visitors

## [1] 0.2775249

BoxCox(visitors,lambda_visitors) %>% 
  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.5233 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(visitors,lambda_visitors))

## [1] 1

ggAcf(BoxCox(visitors,lambda_visitors))  +
  labs (title ='Box Cox Transformation', subtitle = 'Australian Overseas Visitors')

The visitors data show that there is An upward trend in Australia From 1985 til 2005 continuously. Box Cox transformation confirms this trend, with the Lambda of 0.277. So some data efferencing is needed. KPSS test shows the test statistics at 4.52, far greater than 1% critical value. So there is a signicant seasonality trend for the annual or monthly for visitors to travel to Australia.

ggAcf(diff(BoxCox(visitors,lambda_visitors)))+
labs (title ='Difference - After Box Cox Transformation', subtitle = 'Australian Overseas Visitors')

autoplot((diff(BoxCox(visitors,lambda_visitors)))) +
labs (title ='Difference - After Box Cox Transformation', subtitle = 'Australian Overseas Visitors', x='Year')

diff(BoxCox(visitors,lambda_visitors)) %>% 
    ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0519 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggAcf(diff(diff(BoxCox(visitors,lambda_visitors)),12))+
labs (title ='Difference - with lag12-After Box Cox Transformation', subtitle = 'Australian Overseas Visitors')

autoplot(diff(diff(BoxCox(visitors,lambda_visitors)),12))

labs (title ='Difference - with lag12-After Box Cox Transformation', subtitle = 'Australian Overseas Visitors', x='Year')

## $x
## [1] "Year"
## 
## $title
## [1] "Difference - with lag12-After Box Cox Transformation"
## 
## $subtitle
## [1] "Australian Overseas Visitors"
## 
## attr(,"class")
## [1] "labels"

diff(diff(BoxCox(visitors,lambda_visitors),12)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0158 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggAcf(diff((diff(diff(BoxCox(visitors,lambda_visitors)),12))))+
  labs (title ='Difference on Difference- with lag12-After Box Cox Transformation', subtitle = 'Australian Overseas Visitors')

autoplot(diff(diff(diff(BoxCox(visitors,lambda_visitors)),12)))+
  labs (title ='Difference - with lag12-After Box Cox Transformation', subtitle = 'Australian Overseas Visitors')

diff(diff(diff(BoxCox(visitors,lambda_visitors),12))) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0254 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The ACF shows that the correlation varies from 0.82 zero point 6 from no lag to 24 month lag (2 year). Lambda for the Box Cox transformation is 0.277. The suggested differencing is 1. After different thing one time, about 80% of the observations are captured within 95% confidence interval range. However the 20% which are outside of their range have large spikes both upwards and downwards, suggesting a real seasonal effect. The KPSS unit root test shows T statistics at 0.0158, which is far less than 10% critical value of 0.347, meaning there is no significant evidence for seasonality effect. Further, After box Cox transformation and lag of 12 months ( one year) , the KPSS test on the difference over the differenced Data shows that The T statistics is 0.025, which is far less than 10% significance level ( 0.347), suggesting there is no effect of seasonality . Therefore we have achieved the white noise and make the data successfully transformed into stationary data.

8-5

Clothing Sales in New South Wales

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349399C"],frequency=12, start=c(1982,4))
autoplot(myts) + ylab("Retail Clothing Sales") + 
  ggtitle("New South Wales - Clothing Sales")

lambda_retail <- BoxCox.lambda(myts)
lambda_retail

## [1] 0.02074707

autoplot(BoxCox(myts,lambda_retail)) + 
  labs (title  = 'Box Cox Transformation', subtitle  ='Retail Clothing Sales in New South Wales')

BoxCox(myts,lambda_retail) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 6.0241 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(myts,lambda_retail))

## [1] 1

ggAcf(BoxCox(myts,lambda_retail))  +  
   labs (title  = 'Box Cox Transformation', subtitle  ='Retail Clothing Sales in New South Wales')

The retail data shows that there is a strong positive trend as time Goes by from 1980 At 100 Units in sales and grows to 600 after 2010 . The formal KPSS test tells us that the T statistics is 6.0241, which is highly significant because the value is much bigger than 1% level. The Lambda for box Cox transformation is 0.02. Autocorrelation figure acf shows that a the correlation fluctuate between .06 and .05 for all lag periods.

ggAcf(diff(BoxCox(myts,lambda_retail)))+
 labs (title  = 'Difference after Box Cox Transformation', subtitle  ='Retail Clothing Sales in New South Wales')

autoplot((diff(BoxCox(myts,lambda_retail))))+
 labs (title  = 'Difference after Box Cox Transformation', subtitle  ='Retail Clothing Sales in New South Wales')

diff(BoxCox(myts,lambda_retail)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0343 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggAcf(diff(diff(BoxCox(myts,lambda_retail)),12))+
 labs (title  = 'Difference after Box Cox Transformation, lag12', subtitle  ='Retail Clothing Sales in New South Wales')

autoplot(diff(diff(BoxCox(myts,lambda_retail)),12))+
   labs (title  = 'Difference after Box Cox Transformation, lag12', subtitle  ='Retail Clothing Sales in New South Wales')

diff(diff(BoxCox(myts,lambda_retail),12)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0316 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggAcf(diff((diff(diff(BoxCox(myts,lambda_retail)),12)))) +
 labs (title  = 'Difference on Difference, after Box Cox Transformation', subtitle  ='Retail Clothing Sales in New South Wales')

autoplot(diff(diff(diff(BoxCox(myts,lambda_retail)),12))) +
 labs (title  = 'Difference on Difference after Box Cox Transformation', subtitle  ='Retail Clothing Sales in New South Wales')

diff(diff(diff(BoxCox(myts,lambda_retail),12))) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0207 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After taking the box Cox translation formation, the ACF for the difference shows that around 60% of the observations fall within the 95% confidence interval, the rest do not. There are noticeable big spike at lag 12 and lag 24 months, indicating holiday season sale at each year. KPSS test shows that the difference after box Cox transformation is 0.0316 which is not significant. Then I did the lag one year and the test the difference of already differenced one by KPSS which shows T statistics at 0.02 , which is not significant. Finally two more difference were made and the difference again our tested via KPSS unit root test the T statistics shows a value of 0.02 , which is not significant. So after all these trials I am not able to bring the data into the stationary data.

8-6

ARIMA Simulation - AutoRegressive

Use R to simulate and plot some data from simple ARIMA models. a) Use the following R code to generate data from an AR(1) model with \(\phi\)=0.6 and \(\sigma^2\)=1/ The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

autoplot(y) + labs (title ="AR(1)", subtitle ="phi=0.6")

ggAcf(y) + labs (title ="AR(1)", subtitle ="phi=0.6")

When pHI = 0.6, the ACF shows high autocorrelation in the initial lag 1 through leg 3 (decreasing from 0.5 to 0.3), then remains low correlation farm leg for two lag 20. So this initial high correlation supports the AR model. This auto correlation in the shorter lag phase did not happen in the initial plot . So Phi = 0.6 will produce an AR model

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
autoplot(y) + labs (title ="AR(1)", subtitle ="phi=0.9")

ggAcf(y) + labs (title ="AR(1)", subtitle ="phi=0.9")

When Phi is changed to 0.9 the autoplot shows that there are strong correlations from time 40 to 60 ,while the time preceding that and after that are less correlated. ` The ACF figure is clearly a tails off version (suggesting AR). It shows that there are extremely strong correlation for both shorter and longer lags. The correlation starts as high as 0.9 in lag 1 and slowly decrease to about 0.5 at 10 lags , but continue that trend to 0.20 in lag 14. Only the time frame between lag 15 and lag 20 have small autocorrelation that lies within 95% confidence interval. So when Phi = 0.9, The model can be almost exclusively explained by autocorrelation as well as integration. Not so much of the moving average which is in the long girl lag can explain the model well.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]
autoplot(y) + labs (title ="AR(1)", subtitle ="phi=0")

ggAcf(y) + labs (title ="AR(1)", subtitle ="phi=0")

Whenn Phi = 0, The ACF shows that most of the correlations are within 95% confidence interval. There does not seem to have hi autocorrelation , nor moving average. The ACF figure does not demonstrate in a tail off . So the data is already stationary and ready to be used for modeling.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]
autoplot(y) + labs (title ="AR(1)", subtitle ="phi=-0.6")

ggAcf(y) + labs (title ="AR(1)", subtitle ="phi=-0.6")

When Phi=-0.6, -the ACF clearly shows tail off pattern, with initial shorter lag demonstrate correlation in both positive and negative direction. The highest correlation is around 0.3 in positive and - .50 in the negative direction. These autocorrelation merges to almost 0 at lack of 6 , and remains within nonsignificant values throughout or longer legs. So this model fit into AR( 2) model.

ARIMA Simulation - Moving Average

c)Write your own code to generate data from an MA(1) model with \(\theta_1\)=0.6 and \(\sigma^2\)=1.

y <- ts(numeric(100))
e <- rnorm(10000)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change \(\theta_1\).

autoplot(y) + labs(title="MA(1)", subtitle=" theta=0.6")

ggAcf(y) + labs(title = "MA(1)", subtitle = "theta=0.6")

When Theta= 0.6, on the ACF we see a spike of 0.5 correlation at like 1, but this spike disappears in subsequent longer lags starting with lag 2. So this means the model can be explained Solely by MA( 1), because the correlation happens only among their nearest neighbors, and not so much when observations have two time difference or time lags. So this is not an autocorrelation model, but an MA model

y <- ts(numeric(100))
e <- rnorm(10000)
for(i in 2:100)
  y[i] <- 0.9*e[i-1] + e[i]
autoplot(y) + labs(title="MA(1)", subtitle=" theta=0.9")

ggAcf(y) + labs (title ="MA(1)", subtitle=" theta=0.9")

When Theta = 0.9, the ACF show a very similar pattern to that of Theta = 0.6. There is an initial spike of correlation at lag 1, just like The pattern on Theta = 0.6. But the difference is that with Theta = 0.9, the remaining residuals for longer lag time frame our less pronounced than that of Theta =0.6.

y <- ts(numeric(100))
e <- rnorm(10000)
for(i in 2:100)
  y[i] <- 0*e[i-1] + e[i]
autoplot(y) + labs (title ="MA(1)", subtitle="theta=0")

ggAcf(y) + labs (title ="MA(1)", subtitle="theta=0")

When Theta= 0, all love the autocorrelations are within 95% confidence interval. There is no visible spikes. So that means the data is already stationary , and is ready to be used for modeling or forecasting.

ARIMA Simulation - ARMA Model

Generate data from an ARMA(1,1) model with \(\phi_1\)=0.6, \(\theta_1\)=0.6 and \(\sigma^2\)=1

y <- ts(numeric(100))
e <- rnorm(10000)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] +0.6*y[i-1] +e[i]
autoplot(y) + labs (title ="ARMA(1)", subtitle =" phi=0.6 theta=0.6")

ggAcf(y) + labs (title ="ARMA(1)", subtitle = "phi=0.6 theta=0.6")

When Phi = 0.6 and Theta = 0.6, the ACF shows a strong initial spike, the correlation is 0.75 at Lag 1, 0.4 at lag 2, then starting from lag of three it tells off and the remain very little for longer lags. So this is a typical MA model , with very strong relationship between observations for their nearest one neighbor and the modest association for their nearest 2 neighbors.

ARIMA Simulation - AR(2) AutoRegressive Model

Generate data from an AR(2) model with \(\phi_1\)=-0.8, \(\phi_2\)=0.3, and \(\sigma^2\)=1. (Note that these parameters will give a non-stationary series.)
Graph the latter two series and compare them.

y <- ts(numeric(100))
e <- rnorm(10000)
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
autoplot(y) + 
  labs (title ="AR(2)", subtitle ="phi1=-0.8, phi2=0.3")

ggAcf(y) + 
  labs (title ='AR(2)', subtitle=' phi1=-0.8, phi2=0.3')

First figure shows that as time increases, Y initially remained 0 from time 0 to time 40, gradually increase to ± 500 at time 80 . Then from that point on , accelerates I’m both directions and the reaches ± 2000 at time 100. The ACF figure shows that there our tell off graduate pad pattern. It happened from both directions , with initial correlation at 0.6 level and the crisis for longer lag time. Starting from lag 17, the correlation is diminished 2 be within 95% confidence interval.

8-7

Number of Women Murdered Each Year in US

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States. a) By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

autoplot(wmurders) +ggtitle("Number of Women Murdered each Year in the US Per 100,000 in the Population")

We can see that the murder cases clearly have some patterns it was very high at around 3.5 to 4.5 per 100,000 cases between year 1975 to 1995 . The years before 1970 and after 1996 are relatively low at the level of around the 3.0 per 100,000 population.

ggAcf(wmurders) + labs (title =" N of women Murdered in US", subtitle ='ACF')

ggPacf(wmurders) + labs (title =" N of women Murdered in US", subtitle ='PACF')

The ACF figure shows positive correlation, starting at 0.8 and gradually decrease 2 laca 10, at which point the correlation is 0.25 , from then on the correlation remains with a 95% confidence interval for longer lag time. The PACF shows there is a spike at lag one, at 0.85. But it cuts off immediately after leg two. Combining above 2 figures, it looks more like AR ( 1 ) or AR ( 2) Model to me . Other tests will be performed further below.

ndiffs(wmurders)

## [1] 2

diff_murders <- diff(diff(wmurders))
diff_murders %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0458 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggAcf(diff_murders)+
 labs (title =" N of women Murdered in US", subtitle='Differenced')

ggPacf(diff_murders) +
  labs (title =" N of women Murdered in US", subtitle='Differenced')

The ndiff function Suggest a difference of two. The KPSS unit root test at the three lags shows highly significant, with T statistics = 0.045, at << 1% level. The ACF shows a spike of - 0.60 at lag one, 0.3 at lag 2, but that diminishes at lag 3. The PACF shows a downward spike at the- 0.6 at lag1 but a cutoff add to lag 2.

ARIMA (1, 2,2) looks like a good fit here.There is auto regression level here which I they note as AR ( 1 ), then with distance or integration level at 2, then combine them with MA (2) together.

Should you include a constant in the model? Explain. Since d=2, a constant does not need to be included in the model. The mean is zero so therefore c=0.
Write this model in terms of the backshift operator. (1-\(\phi_1\)B)\((1-B)^2y_t\) = (1+\(\theta_1B + \theta_2B^2\epsilon_t\))
Fit the model using R and examine the residuals. Is the model satisfactory?

(fit <- Arima(wmurders, order=c(1,2,2)))

## Series: wmurders 
## ARIMA(1,2,2) 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7677  -0.2812  -0.4977
## s.e.   0.2350   0.2879   0.2762
## 
## sigma^2 estimated as 0.04552:  log likelihood=7.38
## AIC=-6.75   AICc=-5.92   BIC=1.13

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,2)
## Q* = 9.6215, df = 7, p-value = 0.2111
## 
## Model df: 3.   Total lags used: 10

Using that model, the AIC is -0.675, pretty small. In the Ljung-Box test, the P value is not significant. Plots of the residuals forllow a normal distribution with equal tails on both sides. Although the raw residual plot seems a little flutuaing in th earlier years (beofre 1970), the ACF figure shows that all the correlations for short and long lags all fall within the 95% confidence intervals. So, I am happy with this model choice.

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

fit %>% forecast(h=3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.534015 2.260584 2.807446 2.115838 2.952192
## 2006       2.404157 2.026793 2.781521 1.827029 2.981286
## 2007       2.331482 1.829669 2.833296 1.564025 3.098940

-.7677*wmurders[55] -.2812*residuals(fit)[55] - .4977*residuals(fit)[54]

## [1] -1.922329

fit %>% forecast(h=3) %>% 
  autoplot(include=80) + xlab("Year") +ylab ("women murdered per 100,000 people in the US")

The one year prediction follows a downward trend, predicting 2.2 murders per 100,000 population . With 95% confidence interval between 2.0 to 2 3.0.

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

auto.arima(wmurders)

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

The autom.arima() gives me a model of ARIMA (1, 2, 1), with AIC= -6.88. This AIC is similar to my choice. I think both my model and the auto.arima() model perform similary. Combining the information above, the MA(1) and MA(2) component in this case are a choice of user preference, as they perform similarly. To make it simple,maybe ARIMA (1,2,1) is preferrred, which is the auto.arima() choice.