This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
1 Check your working directory 2 Set your working directory to “ANLY 580/RScript”. 3 Download “Inflation.xls” data file and set the “observation_date” variable to the date format and the “CPI” variable to the numeric format. The “CPI” variable represents the consumer price index, which indicates the relative prices of a consumer basket.
setwd("C:/Users/aziz/Documents/Harrisburg/CS Master/ANLY 565/RScript")
getwd()
## [1] "C:/Users/aziz/Documents/Harrisburg/CS Master/ANLY 565/RScript"
library(readxl)
Inf_data <- read_xls("Inflation.xls", col_types = c ("date","numeric"))
str(Inf_data)
## tibble [1,273 x 2] (S3: tbl_df/tbl/data.frame)
## $ observation_date: POSIXct[1:1273], format: "1913-01-01" "1913-02-01" ...
## $ CPI : num [1:1273] 9.8 9.8 9.8 9.8 9.7 9.8 9.9 9.9 10 10 ...
4 Create two stand alone variables: “date” and “cpi”. “date” variable should represent the values of the “observation_date” variable from the “Inflation” data set, while, “cpi” variable should represent values of the “cpi” variable from the “Inflation” data set.
date <- Inf_data$observation_date
cpi <- Inf_data$CPI
5 Transform “cpi” variable from numeric format to the time series format by using ts() function. Label the new variable as “cpits”.
range(date)
## [1] "1913-01-01 UTC" "2019-01-01 UTC"
cpits <- ts(cpi, start = c(1913,1), end = c(2019,1), frequency = 12)
head(cpits,5)
## [1] 9.8 9.8 9.8 9.8 9.7
6 Please construct the following three graphs: 1)time series plot, 2) autocorrelation and 3) partial autocorrelation functions for the “cpits” variable. Based on the signature of these graphs, does the variable appear stationary? Explain
plot(cpits, main = "CPI 1913-2019", xlab = "Months", ylab = "CPI")
acf(cpits)
pacf(cpits)
the data appears to have a strong trend moving upward throughout the years therefore not stationnary
7 Use “cpits” variable and window() function to create 2 new variables called “cpi.pre”, “cpi.post”. The “cpi.pre” should include all observations for the period starting from January of 1990 and up until October 2018. The “cpi.post” should include all observations starting from November 2018. and up until the last month in the dataset.
cpi.pre <- window(cpits, start = c(1990,1), end = c(2018,1), frequency = 12)
cpi.post <- window(cpits, start = c(2018,11), end = c(2019,1), frequency = 12)
8 Use time() function and “cpi.pre” variable to create a variable called “Time”. Moreover, use “cpi.pre” variable and cycle() function to create a factor variable titled “Seas”.
Time <- time(cpi.pre)
Seas <- factor(cycle(cpi.pre))
9 Use lm() function to estimate parameter values of a linear regression model by regressing “Time”, and “Seas” on “cpi.pre”. Save these estimates as “cpi.lm”. Set the value of the intercept to 0, in order to interpret the coefficients of the seasonal dummy variables as seasonal intercepts. (Setting intercept to 0 ensures that for each season there is a unique intercept) Save these estimates as cpi.lm
cpi.lm <- lm(cpi.pre ~ 0 + Time + Seas)
coef(cpi.lm)
## Time Seas1 Seas2 Seas3 Seas4 Seas5
## 4.367881 -8564.638995 -8564.210600 -8563.627590 -8563.365188 -8563.269535
## Seas6 Seas7 Seas8 Seas9 Seas10 Seas11
## -8563.198632 -8563.413265 -8563.444362 -8563.334674 -8563.648842 -8564.287118
## Seas12
## -8564.963858
10 Create the following new items: “new.Time”- sequence of 12 values starting from 2018.75+1/12 and each number going up by 1/12 “new.Seas”- a vector with the following values c(11,12,1,2,3,4,5,6,7,8,9,10) “new.data”- a data frame that combines the “new.Time” and “new.Seas” variables.
new.Time <- seq(2018.75+1/12, length =12, by=1/12)
new.Time
## [1] 2018.833 2018.917 2019.000 2019.083 2019.167 2019.250 2019.333 2019.417
## [9] 2019.500 2019.583 2019.667 2019.750
new.Seas <- factor(c(11,12,1,2,3,4,5,6,7,8,9,10))
new.data<- data.frame(Time = new.Time, Seas = new.Seas)
new.data
## Time Seas
## 1 2018.833 11
## 2 2018.917 12
## 3 2019.000 1
## 4 2019.083 2
## 5 2019.167 3
## 6 2019.250 4
## 7 2019.333 5
## 8 2019.417 6
## 9 2019.500 7
## 10 2019.583 8
## 11 2019.667 9
## 12 2019.750 10
11 Use predict() function and cpi.lm model to create a 12 month ahead forecast of the consumer price index. Save this forecast as “predict.lm”
predict.lm <- predict(cpi.lm, new.data)
predict.lm
## 1 2 3 4 5 6 7 8
## 253.7366 253.4238 254.1127 254.9050 255.8520 256.4784 256.9381 257.3730
## 9 10 11 12
## 257.5223 257.8552 258.3289 258.3787
12 Collect residuals from the “cpi.lm” model and save them as “cpi.lm.resid”. Moreover, construct acf and pacf for the “cpi.lm.resid” series. Is the series stationary? Is there autocorrelation in the residual series?
cpi.lm.resid <- cpi.lm$residuals
acf(cpi.lm.resid)
pacf(cpi.lm.resid)
it shows that there is a strong correlation
13 Based on the AIC, identify the best order of ARMA model (without the seasonal component) for the cpi.lm.resid time series and estimate the value of the parameter coefficients. Please, consider any ARMA model with up to 3 AR and/or MA terms. Save these estimates as resid.best.arma. What is the order of resid.best.arma?
best.order <- c(0,0,0)
best.aic <- Inf
for (i in 0:3) for (j in 0:3) {
fit.aic <- AIC(arima(cpi.lm.resid, order = c(i, 0,
j)))
if (fit.aic < best.aic) {
best.order <- c(i, 0, j)
resid.best.arma <- arima(cpi.lm.resid, order = best.order)
best.aic <- fit.aic
}}
resid.best.arma
##
## Call:
## arima(x = cpi.lm.resid, order = best.order)
##
## Coefficients:
## ar1 ma1 ma2 intercept
## 0.9441 0.5793 0.1627 -0.0728
## s.e. 0.0182 0.0578 0.0549 0.7743
##
## sigma^2 estimated as 0.2292: log likelihood = -231.73, aic = 473.46
best.order
## [1] 1 0 2
looks like best order is 1,0,2
14 Use predict() function and resid.best.arma to create a 12 period ahead forecast of cpi.lm.resid series. Save the forecasted values as resid.best.arma.pred
resid.best.arma.pred <- predict(resid.best.arma, n.ahead = 12)
resid.best.arma.pred$pred
## Time Series:
## Start = 338
## End = 349
## Frequency = 1
## [1] -1.4978454 -1.3342014 -1.2637141 -1.1971656 -1.1343358 -1.0750168
## [7] -1.0190125 -0.9661377 -0.9162175 -0.8690867 -0.8245896 -0.7825789
15 Use ts() function to combine the cpi values forecaseted by cpi.lm model and the residual values forecasted by resid.best.arma. Lable this time series as cpi.pred
cpi.pred <- ts((predict.lm + resid.best.arma.pred$pred), start = c(2018,11), frequency = 12)
cpi.pred
## Jan Feb Mar Apr May Jun Jul Aug
## 2018
## 2019 252.8489 253.7079 254.7177 255.4034 255.9191 256.4068 256.6061 256.9861
## Sep Oct Nov Dec
## 2018 252.2387 252.0896
## 2019 257.5043 257.5961
16 Use ts.plot() function to plot cpi.pre and cpi.pred together on one graph. What do you expect will happen to the CPI during the next 12 month?
ts.plot(cbind(cpi.pre, cpi.pred), lty = 1:2)
the cpi will keep going up in the next 12 months
17 Please calculate mean absolute percentage error for the cpi.pred forecast for the first three month (November 2018, December 2018, January 2019) How accurate is the model?
mean(abs((cpi.post[1:3] - cpi.pred[1:3])/ cpi.post[1:3]))*100
## [1] 0.2907615
model is accurate with mape of 0.29%
18 What is the forecasted rate of inflation between December 2018 and January 2019? Hint: Inflation = % change in CPI
diff(log(cpi.pred[1:3]))*100
## 2 3
## -0.05913051 0.30076564
diff(log(cpi.post[1:3]))*100
## [1] -0.3199074 0.1904781
forecasted inflation is -.06 vs -0.32
19 Policy makers often care more about inflation rather than cpi. Create a new stand alone variable that would represent the first log difference of the the cpits variable. Label this variable “pi”, which represents monthly inflation rate in the US. If percentage change is positive there is inflation (prices go up), and if the percentage change is negative there is deflation (prices fall). What was the lowest monthly rate of inflation(deflation) recorded in US during the time sample? What about was the highest?
pi <- diff(log(cpits))*100
summary(pi)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.2088 0.0000 0.2397 0.2552 0.5451 5.7158
20 Please construct the time series plot, the autocorrelation and partial autocorrelation functions for the “pi” variable. Based on the signature of these graphs, does the variable appear stationary? Explain
plot(pi, main="Inflation 1913-2019", xlab="Months", ylab="Inflation %")
acf(pi)
pacf(pi)
there are period in which there is strong inflation then it stabilizes
21 Use “pi” variable and window() function to create 2 new variables called “pi.pre”, “pi.post”. The “pi.pre” should include all observations for the period starting from January of 1990 and up until October 2018. The “pi.post” should include all observations starting from November 2018. and up until the last month in the dataset.
pi.pre <- window(pi, start = c(1990,1), end=c(2018,10), frequency=12)
pi.post <- window(pi, start = c(2018,11), end=c(2019,1), frequency=12)
22 Please create a function that takes a time series as input, and then uses AIC to identify the best SARIMA model. The function should return the following: - the order of the best SARIMA, - its AIC - and the estimates of its coefficient values Lable this formula get.best.sarima
get.best.sarima <- function(x.ts, maxord = c(2,2,2,2,2,2))
{
best.aic <- Inf
n <- length(x.ts)
for (p in 0:maxord[2]) for(d in 0:maxord[2]) for(q in 0:maxord[2])
for (P in 0:maxord[2]) for(D in 0:maxord[2]) for(Q in 0:maxord[2])
{
fit <- arima(x.ts, order = c(p,d,q),
seas = list(order = c(P,D,Q),
frequency(x.ts)), method = "CSS")
fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
if (fit.aic < best.aic)
{
best.aic <- fit.aic
best.fit <- fit
best.order <- c(p,d,q,P,D,Q)
}
}
list(best.order,best.aic, best.fit)
}
23 By using get.best.sarima() function please identify the best SARIMA model for pi.pre time series. Please cosider SARIMA(2,2,2,2,2,2) as the maximum order of the model. Save the results of the get.best.sarima() function as “pi.best.sarima” What is the order of the best SARIMA model?
AIC(arima((pi.pre), order=c(0,0,1), seas=list(order=c(2,1,1))))
## [1] 47.65343
AIC(arima((pi.pre), order=c(2,0,0), seas=list(order=c(2,0,1))))
## [1] 38.91891
pi.best.sarima <- get.best.sarima((pi.pre), maxord = c(2,2,2,2,2,2))
## Warning in arima(x.ts, order = c(p, d, q), seas = list(order = c(P, D, Q), :
## possible convergence problem: optim gave code = 1
## Warning in arima(x.ts, order = c(p, d, q), seas = list(order = c(P, D, Q), :
## possible convergence problem: optim gave code = 1
## Warning in arima(x.ts, order = c(p, d, q), seas = list(order = c(P, D, Q), :
## possible convergence problem: optim gave code = 1
## Warning in arima(x.ts, order = c(p, d, q), seas = list(order = c(P, D, Q), :
## possible convergence problem: optim gave code = 1
## Warning in arima(x.ts, order = c(p, d, q), seas = list(order = c(P, D, Q), :
## possible convergence problem: optim gave code = 1
## Warning in arima(x.ts, order = c(p, d, q), seas = list(order = c(P, D, Q), :
## possible convergence problem: optim gave code = 1
## Warning in arima(x.ts, order = c(p, d, q), seas = list(order = c(P, D, Q), :
## possible convergence problem: optim gave code = 1
pi.best.sarima
## [[1]]
## [1] 0 0 1 2 1 1
##
## [[2]]
## [1] 52.45196
##
## [[3]]
##
## Call:
## arima(x = x.ts, order = c(p, d, q), seasonal = list(order = c(P, D, Q), frequency(x.ts)),
## method = "CSS")
##
## Coefficients:
## ma1 sar1 sar2 sma1
## 0.4615 -0.1775 -0.0910 -0.8422
## s.e. 0.0444 0.0600 0.0577 0.0379
##
## sigma^2 estimated as 0.06311: part log likelihood = -12.53
best order model is 0,0,1,2,1,1
24 Please use predict() function and the best.sarima.pi model to forecast monthly rate of inflation in the US during November 2018, December 2018 and January 2019. Save these predictions as pi.sarima.pred
pi.sarima.pred <- predict(pi.best.sarima[[3]], n.ahead=3)$pred
pi.sarima.pred
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2018 -0.1475132 -0.2388366
## 2019 0.2646618
25 Please calculate mean absolute percentage error of the best.sarima model. How accurate is the model?
mean(abs((pi.post[1:3] - pi.sarima.pred)/(pi.post[1:3])))*100
## [1] 40.10648
pi.post
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2018 -0.3354970 -0.3199074
## 2019 0.1904781
pi.sarima.pred
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2018 -0.1475132 -0.2388366
## 2019 0.2646618
looks like model can do better in accuracy since its mape is 40%
26 Extract the residual series from the pi.best.sarima model, and save them as sarima.resid.
sarima.resid <- pi.best.sarima[[3]]$residuals
27 Plot the acf of the sarima.resid series and acf of the sarima.resid^2 series What can you conclude based on these graphs?
plot(sarima.resid)
acf(sarima.resid)
pacf(sarima.resid)
trend graph shows there is no trend, no autocorrelation also
#28 Download fGarch package and upload it to the library
library(fGarch)
## NOTE: Packages 'fBasics', 'timeDate', and 'timeSeries' are no longer
## attached to the search() path when 'fGarch' is attached.
##
## If needed attach them yourself in your R script by e.g.,
## require("timeSeries")
29 Use garchFit() function from the fGarch package to estimate garch(1,1) model of the sarima.resid time series. By doing so you will be able to analyze the volatility of the inflation, or, in other words, how stable it is. Save the estimated coefficients as resid.garch
resid.garch <- garchFit(~garch(1,1), data=sarima.resid, include.mean =F, trace = F)
summary(resid.garch)
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(1, 1), data = sarima.resid, include.mean = F,
## trace = F)
##
## Mean and Variance Equation:
## data ~ garch(1, 1)
## <environment: 0x000000fc2475d8c0>
## [data = sarima.resid]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## omega alpha1 beta1
## 0.00065719 0.33466123 0.72819655
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## omega 0.0006572 0.0002065 3.183 0.00146 **
## alpha1 0.3346612 0.0737839 4.536 5.74e-06 ***
## beta1 0.7281965 0.0433202 16.810 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 67.64266 normalized: 0.195499
##
## Description:
## Sat Dec 3 14:11:24 2022 by user: aziz
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 104.6063 0
## Shapiro-Wilk Test R W 0.9648452 2.11466e-07
## Ljung-Box Test R Q(10) 3.589069 0.9639872
## Ljung-Box Test R Q(15) 17.51113 0.2892379
## Ljung-Box Test R Q(20) 18.74426 0.5385011
## Ljung-Box Test R^2 Q(10) 5.76984 0.8342153
## Ljung-Box Test R^2 Q(15) 6.80949 0.9627221
## Ljung-Box Test R^2 Q(20) 10.02731 0.967674
## LM Arch Test R TR^2 6.544643 0.8861823
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -0.3736570 -0.3403064 -0.3738056 -0.3603767
coef(resid.garch)
## omega alpha1 beta1
## 0.0006571871 0.3346612263 0.7281965470
30 The main priority of the monetary authority (Federal Reserve) in the United States is to ensure stable value of currency. Simply put, Fed wants to keep inflation stable (no volatility). To maintain stability the Fed depends on a number of tools, and its effectiveness is judged based on the forecasting model of volatility. Please use resid.garch variable and predict function to forecast two period ahead inflation volatility, which is measured by a square of the forecasted standard deviation. How stable will be the currency in February and March of 2019?
resid.garch.pi<- garchFit(~garch(1,1), data = pi, include.mean = F, trace = F)
frcst<- predict(resid.garch.pi, n.ahead = 2)
frcst$standardDeviation^2
## [1] 0.0964593 0.1022910
looks like the inflation will keep increasing in Feb & Mar
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.