Project 1 Description
This project consists of 3 parts - two required and one bonus and is worth 15% of your grade. The project is due at 11:59 PM on Sunday Apr 11. I will accept late submissions with a penalty until the meetup after that when we review some projects. Part A – ATM Forecast, ATM624Data.xlsx
In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
df_atm <- read.csv("C:/Users/SeungminSong/Downloads/624R/ATM624Data.csv")
head(df_atm)
## DATE ATM Cash
## 1 5/1/2009 12:00:00 AM ATM1 96
## 2 5/1/2009 12:00:00 AM ATM2 107
## 3 5/2/2009 12:00:00 AM ATM1 82
## 4 5/2/2009 12:00:00 AM ATM2 89
## 5 5/3/2009 12:00:00 AM ATM1 85
## 6 5/3/2009 12:00:00 AM ATM2 90
If there are missing values, it is difficult to convert them to a time series, so the missing values are deleted. Moreover, given the high liquidity of ATM cash withdrawals, removing missing values is deemed more practical for accurate analysis.
sum(is.na(df_atm))
## [1] 19
df_atm <- na.omit(df_atm)
df_atm$DATE <- as.POSIXct(df_atm$DATE, format="%m/%d/%Y %I:%M:%S %p")
# Define as tsibble with DATE as index
#atm_data_ts <-df_atm %>%
# as_tsibble(index = DATE, key = ATM)
#atm_data_ts
ATM1 represents stationary time series data. The Augmented Dickey-Fuller (ADF) test can determine if a time series is stationary by checking for a unit root in the data. In a stationary time series, the mean, variance, and covariance remain constant over time.
# Filter to ATM1 data
atm1_data <- df_atm %>%
filter(ATM == 'ATM1')
nrow(atm1_data)
## [1] 362
head(atm1_data)
## DATE ATM Cash
## 1 2009-05-01 ATM1 96
## 2 2009-05-02 ATM1 82
## 3 2009-05-03 ATM1 85
## 4 2009-05-04 ATM1 90
## 5 2009-05-05 ATM1 99
## 6 2009-05-06 ATM1 88
atm1_data_ts <- atm1_data %>%
mutate(DATE = as.Date(DATE, format = "%m/%d/%Y %I:%M:%S %p")) %>%
as_tsibble(index = DATE)
plot(atm1_data_ts$DATE, atm1_data_ts$Cash, type="l",
main="Cash (in hundreds $USD)")
Dickey-Fuller value:
Lag order:
p-value:
This ATM data means that the time series maintains a constant mean and variance over time, and the data can be interpreted as having time series stationarity. Because it is a stationary data, decided that transformation is not necessary.
atm1_ts <- ts(atm1_data_ts$Cash, start=c(2009,5), frequency=7)
adf_test_result <- adf.test(atm1_ts, alternative = "stationary")
print(adf_test_result)
##
## Augmented Dickey-Fuller Test
##
## data: atm1_ts
## Dickey-Fuller = -4.6043, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
The moving average stays within a stable range, suggesting a consistent trend in the time series data.
atm1_data_ts <- atm1_data_ts %>%
mutate(Moving_Average = rollapply(Cash, 7, mean, partial = TRUE, align = 'right'))
#print(atm1_data_ts, n = Inf)
Because the data is for one year, frequency 365 is not possible. Therefore, check weekly data with frequency 7.
Overall, this graph can be interpreted as showing that the time series data is seasonal, with trends that are relatively stable but have some increases or decreases. Although there are no major outliers in the residual panel, it is recommended to apply a statistical outlier detection technique to the residual data to more accurately determine whether there are outliers.
atm1_ts <- ts(atm1_data_ts$Cash, start = c(2009, 5), frequency=7)
atm1_decomposed <- stl(atm1_ts, s.window="periodic")
plot(atm1_decomposed)
STL detected there are no major outliers in the residual panel. However, when looking at the residual data separately, outliers are identified. However, due to the nature of the cash dispenser, there may be sufficient up and down volatility, so it was decided to maintain it without deleting or replacing it.
remainder_data <- atm1_decomposed$time.series[, "remainder"]
boxplot(remainder_data, main="Boxplot for Remainder Data")
outliers <- tsoutliers(remainder_data)
print(outliers)
## $index
## [1] 47
##
## $replacements
## [1] 45.05059
outlier_values_atm1_data_ts <- boxplot.stats(atm1_data_ts$Cash)$out
#par(mfrow=c(2,2))
boxplot(atm1_data_ts$Cash, main = "ATM1", boxwex = 0.5)
mtext(paste("Outliers: ", paste(outlier_values_atm1_data_ts, collapse = ", ")), cex = 0.6)
Given that the ACF gradually decreases as the lag increases and the PACF shows no significant spikes, this might indicate that the time series data is more suited for an ARIMA model rather than a pure AR or MA model.
par(mfrow = c(2, 1))
acf(atm1_ts, main="ACF for ATM1")
pacf(atm1_ts, main="PACF for ATM1")
Arima(0,1,1) AIC and BIC is lowever than Arima(0,0,1). Used (0,1,1) for the forecast
Set seasonal = TRUE to indicate seasonality.
arima <- auto.arima(atm1_ts, seasonal = TRUE)
summary(arima)
## Series: atm1_ts
## ARIMA(0,0,1)(0,1,1)[7]
##
## Coefficients:
## ma1 sma1
## 0.1674 -0.5813
## s.e. 0.0565 0.0472
##
## sigma^2 = 666.6: log likelihood = -1658.32
## AIC=3322.63 AICc=3322.7 BIC=3334.25
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
## ACF1
## Training set -0.01116544
checkresiduals(arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
##
## Model df: 2. Total lags used: 14
Some trends are detected, so include.drift = TRUE
model <- Arima(atm1_ts, order = c(0,1,1), include.drift = TRUE)
model
## Series: atm1_ts
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## -1.0000 -0.0251
## s.e. 0.0101 0.0184
##
## sigma^2 = 1344: log likelihood = -1814.42
## AIC=3634.84 AICc=3634.91 BIC=3646.51
checkresiduals(model)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 353.17, df = 13, p-value < 2.2e-16
##
## Model df: 1. Total lags used: 14
model7 <- Arima(atm1_ts, order = c(0,0,1), include.drift = TRUE)
model7
## Series: atm1_ts
## ARIMA(0,0,1) with drift
##
## Coefficients:
## ma1 intercept drift
## 0.0993 88.4623 -0.0252
## s.e. 0.0668 4.2126 0.0201
##
## sigma^2 = 1336: log likelihood = -1814.92
## AIC=3637.85 AICc=3637.96 BIC=3653.42
checkresiduals(model7)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with drift
## Q* = 349.77, df = 13, p-value < 2.2e-16
##
## Model df: 1. Total lags used: 14
fore<- forecast(model, h = 30)
plot(fore)
The median lower bound of the 95% prediction interval is approximately 7.00, and the median upper bound is approximately 150.92.
fore
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2061.286 79.32234 32.27145 126.3732 7.364218 151.2805
## 2061.429 79.29719 32.24630 126.3481 7.339069 151.2553
## 2061.571 79.27204 32.22116 126.3229 7.313921 151.2302
## 2061.714 79.24689 32.19601 126.2978 7.288773 151.2050
## 2061.857 79.22175 32.17086 126.2726 7.263625 151.1799
## 2062.000 79.19660 32.14571 126.2475 7.238477 151.1547
## 2062.143 79.17145 32.12056 126.2223 7.213328 151.1296
## 2062.286 79.14630 32.09541 126.1972 7.188180 151.1044
## 2062.429 79.12115 32.07027 126.1720 7.163032 151.0793
## 2062.571 79.09600 32.04512 126.1469 7.137884 151.0541
## 2062.714 79.07086 32.01997 126.1217 7.112735 151.0290
## 2062.857 79.04571 31.99482 126.0966 7.087587 151.0038
## 2063.000 79.02056 31.96967 126.0714 7.062439 150.9787
## 2063.143 78.99541 31.94453 126.0463 7.037291 150.9535
## 2063.286 78.97026 31.91938 126.0212 7.012143 150.9284
## 2063.429 78.94512 31.89423 125.9960 6.986994 150.9032
## 2063.571 78.91997 31.86908 125.9709 6.961846 150.8781
## 2063.714 78.89482 31.84393 125.9457 6.936698 150.8529
## 2063.857 78.86967 31.81878 125.9206 6.911550 150.8278
## 2064.000 78.84452 31.79364 125.8954 6.886401 150.8026
## 2064.143 78.81937 31.76849 125.8703 6.861253 150.7775
## 2064.286 78.79423 31.74334 125.8451 6.836105 150.7523
## 2064.429 78.76908 31.71819 125.8200 6.810957 150.7272
## 2064.571 78.74393 31.69304 125.7948 6.785809 150.7021
## 2064.714 78.71878 31.66790 125.7697 6.760660 150.6769
## 2064.857 78.69363 31.64275 125.7445 6.735512 150.6518
## 2065.000 78.66849 31.61760 125.7194 6.710364 150.6266
## 2065.143 78.64334 31.59245 125.6942 6.685216 150.6015
## 2065.286 78.61819 31.56730 125.6691 6.660068 150.5763
## 2065.429 78.59304 31.54215 125.6439 6.634919 150.5512
# Filter to ATM2 data
atm2_data <- df_atm %>%
filter(ATM == 'ATM2')
nrow(atm2_data)
## [1] 363
head(atm2_data)
## DATE ATM Cash
## 1 2009-05-01 ATM2 107
## 2 2009-05-02 ATM2 89
## 3 2009-05-03 ATM2 90
## 4 2009-05-04 ATM2 55
## 5 2009-05-05 ATM2 79
## 6 2009-05-06 ATM2 19
atm2_data_ts <- atm2_data %>%
mutate(DATE = as.Date(DATE, format = "%m/%d/%Y %I:%M:%S %p")) %>%
as_tsibble(index = DATE)
plot(atm2_data_ts$DATE, atm2_data_ts$Cash, type="l",
main="Cash (in hundreds $USD)")
Dickey-Fuller value:
Lag order:
p-value:
This ATM data means that the time series maintains a constant mean and variance over time, and the data can be interpreted as having time series stationarity. Because it is a stationary data, decided that transformation is not necessary.
atm2_ts <- ts(atm2_data_ts$Cash, start=c(2009,5), frequency=7)
adf_test_result2 <- adf.test(atm2_ts, alternative = "stationary")
print(adf_test_result2)
##
## Augmented Dickey-Fuller Test
##
## data: atm2_ts
## Dickey-Fuller = -6.009, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
The moving average stays within a stable range, suggesting a consistent trend in the time series data.
atm2_data_ts <- atm2_data_ts %>%
mutate(Moving_Average = rollapply(Cash, 7, mean, partial = TRUE, align = 'right'))
#print(atm2_data_ts, n = Inf)
Because the data is for one year, frequency 365 is not possible. Therefore, check weekly data with frequency 7.
Overall, this graph can be interpreted as showing that the time series data is seasonal, with trends that are relatively stable but have some increases or decreases. Although there are no major outliers in the residual panel, it is recommended to apply a statistical outlier detection technique to the residual data to more accurately determine whether there are outliers.
atm2_ts <- ts(atm2_data_ts$Cash, start = c(2009, 5), frequency=7)
atm2_decomposed <- stl(atm2_ts, s.window="periodic")
plot(atm2_decomposed)
STL detected there are no major outliers in the residual panel. However, when looking at the residual data separately, outliers are identified. However, due to the nature of the cash dispenser, there may be sufficient up and down volatility, so it was decided to maintain it without deleting or replacing it.
remainder_data <- atm2_decomposed$time.series[, "remainder"]
boxplot(remainder_data, main="Boxplot for Remainder Data")
outliers <- tsoutliers(remainder_data)
print(outliers)
## $index
## [1] 5 12 19 26 33 36 39 40 47 49 55 57 62 69 70 132 176 212 241
## [20] 251 293
##
## $replacements
## [1] -28.4396017 -28.8897437 -22.6264346 -38.1987865 -18.3224833 30.4286118
## [7] -17.6371514 -49.8802125 -49.6961975 -52.5743098 -14.8278293 4.8023732
## [13] -4.0286966 10.3716430 0.7832172 44.5067303 7.9306150 33.2011885
## [19] 13.9146149 39.4783457 34.8244937
outlier_values_atm2_data_ts <- boxplot.stats(atm2_data_ts$Cash)$out
#par(mfrow=c(2,2))
boxplot(atm2_data_ts$Cash, main = "ATM2", boxwex = 0.5)
mtext(paste("Outliers: ", paste(outlier_values_atm2_data_ts, collapse = ", ")), cex = 0.6)
Given that the ACF gradually decreases as the lag increases and the PACF shows no significant spikes, this might indicate that the time series data is more suited for an ARIMA model rather than a pure AR or MA model.
par(mfrow = c(2, 1))
# Plotting the ACF for Amazon's closing stock prices
acf(atm2_ts, main="ACF for ATM2")
# Plotting the PACF for Amazon's closing stock prices
pacf(atm2_ts, main="PACF for ATM2")
The median lower bound of the 95% prediction interval is approximately 7.00, and the median upper bound is approximately 150.92.
# Since seasonality appears, seasonal = TRUE
arima2 <- auto.arima(atm2_ts, seasonal = TRUE)
summary(arima2)
## Series: atm2_ts
## ARIMA(2,0,2)(0,1,2)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## -0.4148 -0.8922 0.4008 0.7534 -0.7220 0.0801
## s.e. 0.0674 0.0510 0.1027 0.0688 0.0567 0.0550
##
## sigma^2 = 708.1: log likelihood = -1671.98
## AIC=3357.96 AICc=3358.28 BIC=3385.08
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.4583759 26.12942 18.09675 -Inf Inf 0.8421495 0.009277035
checkresiduals(arima2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,2)[7]
## Q* = 12.12, df = 8, p-value = 0.1459
##
## Model df: 6. Total lags used: 14
Generally, the lower the AIC and BIC and the higher the log likelihood, the better the model fit. When comparing the two models, the AIC and BIC of the ARIMA(0,1,2) model are higher and the log likelihood is lower, so the ARIMA(2,0,2) model is judged to have a better fit.
The residuals of the ARIMA(2,0,2) model are slightly more centered and show lower ACF values.
The ARIMA(2,0,2) model shows slightly lower autocorrelation.
#Some trends are detected, so include.drift = TRUE
model2 <- Arima(atm2_ts, order = c(2,0,2), include.drift = TRUE)
model2
## Series: atm2_ts
## ARIMA(2,0,2) with drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept drift
## 0.1435 -0.1639 -0.2071 -0.2652 74.8231 -0.0682
## s.e. 0.1134 0.0957 0.1083 0.0889 1.9165 0.0091
##
## sigma^2 = 1234: log likelihood = -1804.17
## AIC=3622.34 AICc=3622.65 BIC=3649.6
checkresiduals(model2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2) with drift
## Q* = 226.42, df = 10, p-value < 2.2e-16
##
## Model df: 4. Total lags used: 14
fore2<- forecast(model2, h = 30)
plot(fore2)
model3 <- Arima(atm2_ts, order = c(0,1,2), include.drift = TRUE)
model3
## Series: atm2_ts
## ARIMA(0,1,2) with drift
##
## Coefficients:
## ma1 ma2 drift
## -1.0782 0.0782 -0.0671
## s.e. 0.1059 0.1055 0.0177
##
## sigma^2 = 1473: log likelihood = -1835.57
## AIC=3679.15 AICc=3679.26 BIC=3694.71
checkresiduals(model3)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2) with drift
## Q* = 493.19, df = 12, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 14
Arima (2,0,2) model Forecast
The median lower bound of the 95% prediction interval is approximately -27.35777, and the median upper bound is approximately 123.75854. The negative values in this prediction result may be due to the nature of the statistical prediction model. Therefore, the range can be predicted to be around $0 - 124.
fore2
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2061.429 30.43631 -14.581220835 75.45384 -38.41206 99.28468
## 2061.571 24.55643 -20.551923674 69.66479 -44.43085 93.54372
## 2061.714 49.43449 0.201671011 98.66731 -25.86061 124.72959
## 2061.857 53.89972 4.610242351 103.18920 -21.48203 129.28147
## 2062.000 50.39397 1.019625798 99.76831 -25.11757 125.90551
## 2062.143 49.08938 -0.291486690 98.47024 -26.43214 124.61089
## 2062.286 49.40709 0.024918922 98.78926 -26.11642 124.93060
## 2062.429 49.59694 0.214427185 98.97945 -25.92709 125.12097
## 2062.571 49.50256 0.120037693 98.88508 -26.02149 125.02661
## 2062.714 49.38833 0.005801912 98.77087 -26.13573 124.91240
## 2062.857 49.31784 -0.064689703 98.70038 -26.20622 124.84191
## 2063.000 49.25688 -0.125651773 98.63941 -26.26719 124.78095
## 2063.143 49.19012 -0.192412584 98.57265 -26.33395 124.71419
## 2063.286 49.12097 -0.261567645 98.50350 -26.40310 124.64503
## 2063.429 49.05242 -0.330116002 98.43495 -26.47165 124.57648
## 2063.571 48.98435 -0.398184886 98.36688 -26.53972 124.50841
## 2063.714 48.91625 -0.466284372 98.29878 -26.60782 124.44031
## 2063.857 48.84807 -0.534466829 98.23060 -26.67600 124.37213
## 2064.000 48.77988 -0.602656182 98.16241 -26.74419 124.30394
## 2064.143 48.71170 -0.670832926 98.09423 -26.81237 124.23577
## 2064.286 48.64353 -0.739006731 98.02606 -26.88054 124.16759
## 2064.429 48.57535 -0.807182179 97.95788 -26.94872 124.09942
## 2064.571 48.50717 -0.875358346 97.88971 -27.01689 124.03124
## 2064.714 48.43900 -0.943534346 97.82153 -27.08507 123.96306
## 2064.857 48.37082 -1.011710205 97.75336 -27.15324 123.89489
## 2065.000 48.30265 -1.079886071 97.68518 -27.22142 123.82671
## 2065.143 48.23447 -1.148061960 97.61700 -27.28960 123.75854
## 2065.286 48.16629 -1.216237853 97.54883 -27.35777 123.69036
## 2065.429 48.09812 -1.284413741 97.48065 -27.42595 123.62219
## 2065.571 48.02994 -1.352589629 97.41248 -27.49412 123.55401
# Filter to ATM3 data
atm3_data <- df_atm %>%
filter(ATM == 'ATM3')
nrow(atm3_data)
## [1] 365
head(atm3_data)
## DATE ATM Cash
## 1 2009-05-01 ATM3 0
## 2 2009-05-02 ATM3 0
## 3 2009-05-03 ATM3 0
## 4 2009-05-04 ATM3 0
## 5 2009-05-05 ATM3 0
## 6 2009-05-06 ATM3 0
atm3_data_ts <- atm3_data %>%
mutate(DATE = as.Date(DATE, format = "%m/%d/%Y %I:%M:%S %p")) %>%
as_tsibble(index = DATE)
sum(is.na(atm3_data_ts))
## [1] 0
plot(atm3_data_ts$DATE, atm3_data_ts$Cash, type="l",
main="Cash (in hundreds $USD)")
In fact, there are only three pieces of useful data, so the analysis of STL, ADF, ACF, and PACF previously performed is meaningless.
outlier_values_atm3_data_ts <- boxplot.stats(atm3_data_ts$Cash)$out
boxplot(atm3_data_ts$Cash, main = "ATM3", boxwex = 0.5)
mtext(paste("Outliers (ATM3): ", paste(outlier_values_atm1_data_ts, collapse = ", ")), cex = 0.6)
atm3_data_ts$Cash <- ifelse(atm3_data_ts$Cash < 1,
NA, atm3_data_ts$Cash )
atm3_data_ts$Cash
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [101] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [151] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [176] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [201] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [226] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [251] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [276] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [301] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [326] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [351] NA NA NA NA NA NA NA NA NA NA NA NA 96 82 85
sum(is.na(atm3_data_ts$Cash))
## [1] 362
new_atm3_data_ts <- atm3_data_ts %>% filter(!is.na(Cash))
new_atm3_data_ts
## # A tsibble: 3 x 3 [1D]
## DATE ATM Cash
## <date> <chr> <int>
## 1 2010-04-28 ATM3 96
## 2 2010-04-29 ATM3 82
## 3 2010-04-30 ATM3 85
arima <- auto.arima(atm3_data_ts$Cash)
arima
## Series: atm3_data_ts$Cash
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 87.6667
## s.e. 3.4749
##
## sigma^2 = 54.33: log likelihood = -9.64
## AIC=23.28 AICc=Inf BIC=21.48
The p-value here is very high at 0.9941. This means that there is not enough evidence to reject the null hypothesis, so it is very likely that the residuals are white noise. Put another way, it means that the model has captured the time series structure of the data well, with little pattern or structural information remaining in the residuals.
model4 <- arima(atm3_data_ts$Cash, order = c(0,0,2))
model4
##
## Call:
## arima(x = atm3_data_ts$Cash, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## -1.6664 1.0000 86.810
## s.e. 1.4564 1.4541 1.154
##
## sigma^2 estimated as 7.02: log likelihood = -8.69, aic = 25.37
tsdiag(model4)
Box.test(model4$residuals, lag=1, type = "Ljung")
##
## Box-Ljung test
##
## data: model4$residuals
## X-squared = 5.4051e-05, df = 1, p-value = 0.9941
fore <- forecast(model4, h=30)
plot(fore, col = "black", lwd=2, flty=3, shadecols = c("gray", "mistyrose"))
The median lower bound of the 95% prediction is 75.46026, and the median upper bound of the 95% prediction is 98.15983.
fore
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 366 90.08179 86.02022 94.14337 83.87015 96.29344
## 367 85.65046 78.69122 92.60969 75.00723 96.29369
## 368 86.81005 79.38882 94.23128 75.46026 98.15983
## 369 86.81005 79.38882 94.23128 75.46026 98.15983
## 370 86.81005 79.38882 94.23128 75.46026 98.15983
## 371 86.81005 79.38882 94.23128 75.46026 98.15983
## 372 86.81005 79.38882 94.23128 75.46026 98.15983
## 373 86.81005 79.38882 94.23128 75.46026 98.15983
## 374 86.81005 79.38882 94.23128 75.46026 98.15983
## 375 86.81005 79.38882 94.23128 75.46026 98.15983
## 376 86.81005 79.38882 94.23128 75.46026 98.15983
## 377 86.81005 79.38882 94.23128 75.46026 98.15983
## 378 86.81005 79.38882 94.23128 75.46026 98.15983
## 379 86.81005 79.38882 94.23128 75.46026 98.15983
## 380 86.81005 79.38882 94.23128 75.46026 98.15983
## 381 86.81005 79.38882 94.23128 75.46026 98.15983
## 382 86.81005 79.38882 94.23128 75.46026 98.15983
## 383 86.81005 79.38882 94.23128 75.46026 98.15983
## 384 86.81005 79.38882 94.23128 75.46026 98.15983
## 385 86.81005 79.38882 94.23128 75.46026 98.15983
## 386 86.81005 79.38882 94.23128 75.46026 98.15983
## 387 86.81005 79.38882 94.23128 75.46026 98.15983
## 388 86.81005 79.38882 94.23128 75.46026 98.15983
## 389 86.81005 79.38882 94.23128 75.46026 98.15983
## 390 86.81005 79.38882 94.23128 75.46026 98.15983
## 391 86.81005 79.38882 94.23128 75.46026 98.15983
## 392 86.81005 79.38882 94.23128 75.46026 98.15983
## 393 86.81005 79.38882 94.23128 75.46026 98.15983
## 394 86.81005 79.38882 94.23128 75.46026 98.15983
## 395 86.81005 79.38882 94.23128 75.46026 98.15983
# Filter to ATM4 data
atm4_data <- df_atm %>%
filter(ATM == 'ATM4')
nrow(atm4_data)
## [1] 365
head(atm4_data)
## DATE ATM Cash
## 1 2009-05-01 ATM4 777
## 2 2009-05-02 ATM4 524
## 3 2009-05-03 ATM4 793
## 4 2009-05-04 ATM4 908
## 5 2009-05-05 ATM4 53
## 6 2009-05-06 ATM4 52
atm4_data_ts <- atm4_data %>%
mutate(DATE = as.Date(DATE, format = "%m/%d/%Y %I:%M:%S %p")) %>%
as_tsibble(index = DATE)
plot(atm4_data_ts$DATE, atm4_data_ts$Cash, type="l",
main="Cash (in hundreds $USD)")
Because the extremely random value of 10920 was an outlier, I decided to remove this outlier.
outlier_values_atm4_data_ts <- boxplot.stats(atm4_data_ts$Cash)$out
#par(mfrow=c(2,2))
boxplot(atm4_data_ts$Cash, main = "ATM4", boxwex = 0.5)
mtext(paste("Outliers: ", paste(outlier_values_atm4_data_ts, collapse = ", ")), cex = 0.6)
Caluculate Outlier to remove
boxplot_stats <- boxplot.stats(atm4_data_ts$Cash)
outliers <- boxplot_stats$out
atm4_data_cleaned <- atm4_data_ts[!atm4_data_ts$Cash %in% outliers, ]
boxplot(atm4_data_cleaned$Cash, main = "ATM4 Cash Transactions without Outliers", boxwex = 0.5)
plot(atm4_data_cleaned$DATE, atm4_data_cleaned$Cash, type="l",
main="Cash (in hundreds $USD)")
Dickey-Fuller value:
Lag order:
p-value:
This ATM data means that the time series maintains a constant mean and variance over time, and the data can be interpreted as having time series stationarity. Because it is a stationary data, decided that transformation is not necessary.
atm4_ts <- ts(atm4_data_cleaned$Cash, start=c(2009,5, as.numeric(format(min(atm4_data_cleaned$DATE), "%j"))), frequency=7)
adf_test_result4 <- adf.test(atm4_ts, alternative = "stationary")
print(adf_test_result4)
##
## Augmented Dickey-Fuller Test
##
## data: atm4_ts
## Dickey-Fuller = -5.6265, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
The moving average stays within a stable range, suggesting a consistent trend in the time series data.
atm4_data_ts <- atm4_data_ts %>%
filter(Cash != 10920)
atm4_data_ts <- atm4_data_ts %>%
mutate(Moving_Average = rollapply(Cash, 7, mean, partial = TRUE, align = 'right'))
#print(atm4_data_ts, n = Inf)
Because the data is for one year, frequency 365 is not possible. Therefore, check weekly data with frequency 7.
This graph illustrates a time series that exhibits a clear seasonal pattern, as evidenced by the regular fluctuations in the seasonal component. The trend component indicates that the underlying trend of the series is relatively stable, with occasional increases or decreases over time. In the residual panel, there do not appear to be any major outliers present, suggesting that most of the systematic variation has been captured by the trend and seasonal components.
atm4_decomposed <- stl(atm4_ts, s.window="periodic")
plot(atm4_decomposed)
The ACF plot shows a gradual decline and the PACF plot doesn’t show significant spikes, suggesting that the ATM4 time series may be best modeled with an ARIMA model rather than a pure AR or MA model.
par(mfrow = c(2, 1))
# Plotting the ACF for Amazon's closing stock prices
acf(atm4_ts, main="ACF for ATM4")
# Plotting the PACF for Amazon's closing stock prices
pacf(atm4_ts, main="PACF for ATM4")
The median lower bound of the 95% prediction interval is approximately 7.00, and the median upper bound is approximately 150.92.
arima4 <- auto.arima(atm4_ts, seasonal = TRUE)
summary(arima4)
## Series: atm4_ts
## ARIMA(2,0,2)(2,0,2)[7] with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sar2 sma1 sma2
## 1.8520 -0.8824 -1.7913 0.8184 0.0280 0.8230 0.0715 -0.6947
## s.e. 0.1332 0.1288 0.1646 0.1602 0.1661 0.1553 0.1820 0.1555
## mean
## 437.6635
## s.e. 34.3522
##
## sigma^2 = 111258: log likelihood = -2620.37
## AIC=5260.74 AICc=5261.37 BIC=5299.69
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.5806187 329.3931 269.7137 -479.831 511.4238 0.7835461
## ACF1
## Training set 0.002777017
checkresiduals(arima4)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(2,0,2)[7] with non-zero mean
## Q* = 21.675, df = 6, p-value = 0.001386
##
## Model df: 8. Total lags used: 14
Generally, the lower the AIC and BIC and the higher the log likelihood, the better the model fit. When comparing the two models, the AIC and BIC of the ARIMA(0,1,2) model are higher and the log likelihood is lower, so the ARIMA(2,0,2) model is judged to have a better fit.
The residuals of the ARIMA(2,0,2) model are slightly more centered and show lower ACF values.
The ARIMA(2,0,2) model shows slightly lower autocorrelation.
model6 <- Arima(atm4_ts, order = c(2,0,2), include.drift = TRUE)
model6
## Series: atm4_ts
## ARIMA(2,0,2) with drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept drift
## -0.1070 0.5131 0.2000 -0.5334 464.9270 -0.1267
## s.e. 0.5626 0.4117 0.5677 0.4596 40.3717 0.1922
##
## sigma^2 = 119165: log likelihood = -2633.49
## AIC=5280.98 AICc=5281.29 BIC=5308.24
checkresiduals(model6)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2) with drift
## Q* = 31.792, df = 10, p-value = 0.0004337
##
## Model df: 4. Total lags used: 14
fore4 <- forecast(model6, h = 30)
plot(fore4, col = "black", lwd=2, flty=3, shadecols = c("gray", "mistyrose"))
Therefore, the median lower bound of the 95% prediction interval is approximately -265.6057, and the median upper bound is approximately 1097.671.
fore4
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2061.429 432.3302 -10.06509 874.7254 -244.2550 1108.915
## 2061.571 395.4158 -48.88845 839.7201 -284.0889 1074.921
## 2061.714 427.9827 -16.52326 872.4887 -251.8305 1107.796
## 2061.857 405.4834 -39.59376 850.5606 -275.2034 1086.170
## 2062.000 424.5244 -20.64950 869.6984 -256.3103 1105.359
## 2062.143 410.8682 -34.48282 856.2193 -270.2374 1091.974
## 2062.286 422.0234 -23.36946 867.4163 -259.1462 1103.193
## 2062.429 413.7481 -31.70136 859.1975 -267.5080 1095.004
## 2062.571 420.2816 -25.18482 865.7480 -261.0005 1101.564
## 2062.714 415.2615 -30.22343 860.7464 -266.0489 1096.572
## 2062.857 419.0755 -26.41604 864.5670 -262.2450 1100.396
## 2063.000 416.0165 -29.48119 861.5142 -265.3134 1097.346
## 2063.143 418.2254 -27.27481 863.7256 -263.1084 1099.559
## 2063.286 416.3443 -29.15795 861.8466 -264.9926 1097.681
## 2063.429 417.6037 -27.89956 863.1069 -263.7347 1098.942
## 2063.571 416.4286 -29.07537 861.9325 -264.9109 1097.768
## 2063.714 417.1252 -28.37912 862.6295 -264.2148 1098.465
## 2063.857 416.3725 -29.13205 861.8770 -264.9679 1097.713
## 2064.000 416.7352 -28.76949 862.2398 -264.6054 1098.076
## 2064.143 416.2349 -29.26981 861.7397 -265.1058 1097.576
## 2064.286 416.3993 -29.10549 861.9041 -264.9415 1097.740
## 2064.429 416.0498 -29.45501 861.5546 -265.2910 1097.391
## 2064.571 416.0963 -29.40856 861.6011 -265.2446 1097.437
## 2064.714 415.8367 -29.66810 861.3416 -265.5041 1097.178
## 2064.857 415.8131 -29.69174 861.3180 -265.5278 1097.154
## 2065.000 415.6072 -29.89762 861.1121 -265.7336 1096.948
## 2065.143 415.5419 -29.96297 861.0467 -265.7990 1096.883
## 2065.286 415.3680 -30.13686 860.8729 -265.9729 1096.709
## 2065.429 415.2778 -30.22703 860.7827 -266.0631 1096.619
## 2065.571 415.1230 -30.38184 860.6279 -266.2179 1096.464
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
df_power <- read.csv("C:/Users/SeungminSong/Downloads/624R/ResidentialCustomerForecastLoad-624.csv")
head(df_power)
## CaseSequence YYYY.MMM KWH
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
names(df_power)[names(df_power) == "YYYY.MMM"] <- "DATE"
head(df_power)
## CaseSequence DATE KWH
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
df_power$DATE <- as.Date(paste0(df_power$DATE, "-01"), format="%Y-%b-%d")
df_power_tsibble <- as_tsibble(df_power, index = DATE)
#print(df_power_tsibble, n = Inf)
plot(df_power_tsibble$DATE, df_power_tsibble$KWH, type = "l", main = "Power Usage",
xlab = "Date", ylab = "KWH")
sum(is.na(df_power_tsibble))
## [1] 1
One way to deal with missing values is to use the average of the surrounding values. Missing values, denoted as NA, were found and replaced with the average of the values of the previous and next month.
na_index <- which(is.na(df_power_tsibble$KWH))
for (i in na_index) {
if (i > 1 && i < nrow(df_power_tsibble)) {
df_power_tsibble$KWH[i] <- mean(c(df_power_tsibble$KWH[i-1], df_power_tsibble$KWH[i+1]), na.rm = TRUE)
} else if (i == 1) {
df_power_tsibble$KWH[i] <- df_power_tsibble$KWH[i+1]
} else if (i == nrow(df_power_tsibble)) {
df_power_tsibble$KWH[i] <- df_power_tsibble$KWH[i-1]
}
}
Quartiles method
The quartile method is a method that sets an approximate outlier interval using data distribution and value size. Because it is simple and intuitive, it is widely used in outlier removal. The results show that outliers have been stably removed.
outlier_values_df_power <- boxplot.stats(df_power_tsibble$KWH)$out
boxplot(df_power_tsibble$KWH, main = "KWH", boxwex = 0.5)
mtext(paste("Outliers (KWH): ", paste(outlier_values_df_power, collapse = ", ")), cex = 0.6)
# Calculate the first (Q1) and third (Q3) quartiles
Q1 <- quantile(df_power_tsibble$KWH, 0.25)
Q3 <- quantile(df_power_tsibble$KWH, 0.75)
# Calculate the Interquartile Range (IQR)
IQR <- Q3 - Q1
# Define the bounds for outliers, typically 1.5 times the IQR from the Q1 and Q3
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
# Find and save the outliers
outliers <- df_power_tsibble$KWH < lower_bound | df_power_tsibble$KWH > upper_bound
outliers_data <- df_power_tsibble[outliers, ]
df_power_tsibble <- df_power_tsibble[!outliers, ]
# Print or return the outliers
outliers_data
## # A tsibble: 1 x 3 [1D]
## CaseSequence DATE KWH
## <int> <date> <dbl>
## 1 883 2010-07-01 770523
plot(df_power_tsibble$DATE, df_power_tsibble$KWH, type = "l", main = "Power Usage",
xlab = "Date", ylab = "KWH")
Dickey-Fuller value:
Lag order:
p-value:
In conclusion, this ADF test result means that the time series maintains constant mean and variance over time, providing strong evidence that this data has time series stationarity.
df_power_ts <- ts(df_power_tsibble$KWH, start=c(1998,1), end=c(2013,12), frequency=12)
adf_test_result <- adf.test(df_power_ts, alternative = "stationary")
print(adf_test_result)
##
## Augmented Dickey-Fuller Test
##
## data: df_power_ts
## Dickey-Fuller = -4.7514, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
ssssssssss
start(df_power_ts)
## [1] 1998 1
end(df_power_ts)
## [1] 2013 12
ggseasonplot(df_power_ts, DATE.labels=TRUE, DATE.labels.left=F) +
ggtitle("Seasonal Plot")
Use kpass to determine the appropriate differential order for the Turnover column. Looking at the results, the number was 1.
When comparing non-transformed with differenced data:
ndiffs_diff <- ndiffs(df_power_tsibble$KWH, test = "kpss")
ndiffs_diff
## [1] 1
power_diff <- diff(df_power_tsibble$KWH, differences = ndiffs_diff)
lambda_opt <- BoxCox.lambda(df_power_tsibble$KWH)
df_power_transformed <- BoxCox(df_power_tsibble$KWH, lambda_opt)
df_power_diff <- diff(df_power_transformed)
df_power_seasonally_diff <- diff(df_power_transformed, differences = 1, lag = 12)
par(mfrow=c(2,2))
plot(df_power_tsibble$DATE, df_power_tsibble$KWH, type="l",
main="Non-transformed KWH")
plot(df_power_tsibble$DATE[-1], df_power_diff, type="l",
main=paste("Differenced KWH with λ =", round(lambda_opt, 2)))
acf(df_power_tsibble$KWH, main="ACF for Non-transformed KWH")
acf(df_power_diff, main=paste("ACF for Differenced KWH\n with λ =", round(lambda_opt, 2)))
pacf(df_power_tsibble$KWH, main="PACF for Non-transformed KWH")
pacf(df_power_diff, main=paste("PACF for Differenced KWH\n with λ =", round(lambda_opt, 2)))
The graph indeed indicates the presence of both a seasonal component and a trend. Therefore, it seems more appropriate to use df_power_seasonally_diff in ARIMA.
power_ts <- ts(df_power_tsibble$KWH, start = c(1998, 1), end=c(2013,12), frequency=12)
power_decomposed <- stl(power_ts, s.window="periodic")
plot(power_decomposed)
Dickey-Fuller value:
The test statistic is -4.7639. This value is negative, providing strong evidence that there is no unit root (implying non-stationarity). The more negative the statistic, the stronger the evidence for the absence of a unit root, which means the time series is more likely to be stationary. Lag order:**
The number of lags is set to 5. This represents the optimal number of lags taken into account when the ADF test is calculated. The number of delays is adjusted according to the data to increase the accuracy of the test.
p-value: 0.01, which is very low. This value means that there is sufficient evidence to reject the null hypothesis (that the time series data are non-stationary with a unit root). Generally, if the p-value is less than 0.05 (5%), the null hypothesis is rejected and the data are considered normal. Here we reject the null hypothesis even at the 1% level, so there is very strong evidence that the data are normal.**
This time series data can be judged to be a statistically normal time series, and time series analysis and predictive modeling based on it may be appropriate. Stationary time series data are a good basis for analysis or forecasting using models such as ARIMA.
df_power_diff_ts <- ts(df_power_seasonally_diff, start = c(1998, 1), end=c(2013,12), frequency = 12)
adf_test_result <- adf.test(df_power_diff_ts, alternative = "stationary")
print(adf_test_result)
##
## Augmented Dickey-Fuller Test
##
## data: df_power_diff_ts
## Dickey-Fuller = -4.7639, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Given that the ACF gradually decreases as the lag increases and the PACF shows no significant spikes, this might indicate that the time series data is more suited for an ARIMA model rather than a pure AR or MA model.
par(mfrow = c(2, 1))
# Plotting the ACF for Amazon's closing stock prices
acf(df_power_diff_ts, main="ACF for KWH")
# Plotting the PACF for Amazon's closing stock prices
pacf(df_power_diff_ts, main="PACF for KWH")
#df_power_diff_ts <- ts(df_power_seasonally_diff, start = c(1998, 1), frequency = 12)
# Fit the ARIMA model
fit <- auto.arima(df_power_diff_ts, seasonal = TRUE)
# Summary and residual checks
summary(fit)
## Series: df_power_diff_ts
## ARIMA(0,0,1)(1,0,0)[12] with zero mean
##
## Coefficients:
## ma1 sar1
## 0.3301 -0.4010
## s.e. 0.0727 0.0667
##
## sigma^2 = 5.068e-11: log likelihood = 2003.19
## AIC=-4000.37 AICc=-4000.24 BIC=-3990.6
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 6.923498e-07 7.081833e-06 5.371889e-06 -3399.037 3683.803
## MASE ACF1
## Training set 0.4993653 -0.02169511
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(1,0,0)[12] with zero mean
## Q* = 40.72, df = 22, p-value = 0.008896
##
## Model df: 2. Total lags used: 24
model <- ar(df_power_diff_ts)
# Extract the residuals from the model
residuals <- model$resid
# Check if residuals are present
if (!is.null(residuals)) {
# Create Q-Q plot of residuals
qqnorm(residuals)
qqline(residuals)
} else {
stop("No residuals to plot. Please check your model fitting process.")
}
shapiro.test(residuals)
##
## Shapiro-Wilk normality test
##
## data: residuals
## W = 0.9885, p-value = 0.1522
According to the AIC value, the ARIMA(0,0,1) model has lower values, which indicates that this model fits the data better. Both RMSE and MAE are lower in this model, MASE is slightly lower, and ACF1 is also closer to 0, meaning there is less autocorrelation left in the residuals.
fit_drift <- Arima(df_power_diff_ts, order=c(0,0,1))
summary(fit_drift)
## Series: df_power_diff_ts
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 0.3296 0e+00
## s.e. 0.0701 1e-04
##
## sigma^2 = 6.043e-11: log likelihood = 1987.35
## AIC=-3968.7 AICc=-3968.57 BIC=-3958.92
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -2.483219e-09 7.733025e-06 5.998467e-06 1456.816 1540.194
## MASE ACF1
## Training set 0.5576113 0.002949623
checkresiduals(fit_drift)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 66.52, df = 23, p-value = 4.139e-06
##
## Model df: 1. Total lags used: 24
forecast_fit_drift <- forecast(fit_drift, h = 12)
plot(forecast_fit_drift)
fit <- Arima(df_power_diff_ts, order=c(1,0,0))
summary(fit)
## Series: df_power_diff_ts
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.3052 0e+00
## s.e. 0.0685 1e-04
##
## sigma^2 = 6.08e-11: log likelihood = 1986.77
## AIC=-3967.54 AICc=-3967.41 BIC=-3957.77
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -4.087088e-09 7.75666e-06 5.99675e-06 1380.77 1453.868 0.5574517
## ACF1
## Training set 0.02356285
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0) with non-zero mean
## Q* = 67.025, df = 23, p-value = 3.471e-06
##
## Model df: 1. Total lags used: 24
forecast_fit <- forecast(fit, h = 12)
plot(forecast_fit)
Arima(0,0,1) Forecast
forecast_fit_drift
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 6.666831e-07 -9.295610e-06 1.062898e-05 -1.456933e-05 1.590270e-05
## Feb 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Mar 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Apr 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## May 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Jun 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Jul 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Aug 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Sep 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Oct 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Nov 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05
## Dec 2014 5.726816e-07 -9.916689e-06 1.106205e-05 -1.546943e-05 1.661479e-05