Part A – ATM Forecast, ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file. Also please submit the forecast which you will put in an Excel readable file.

Data exploration and preparation

df<-read_excel("/Users/albina/Downloads/ATM624Data.xlsx")
head(df)

## # A tibble: 6 x 3
##    DATE ATM    Cash
##   <dbl> <chr> <dbl>
## 1 39934 ATM1     96
## 2 39934 ATM2    107
## 3 39935 ATM1     82
## 4 39935 ATM2     89
## 5 39936 ATM1     85
## 6 39936 ATM2     90

str(df)

## Classes 'tbl_df', 'tbl' and 'data.frame':    1474 obs. of  3 variables:
##  $ DATE: num  39934 39934 39935 39935 39936 ...
##  $ ATM : chr  "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: num  96 107 82 89 85 90 90 55 99 79 ...

datatable(describe(df))

First look of the data shows that we need to clean the data such as change the date format, investigate missing values and outliers, and replace/impute those values as necessary.

df$DATE <- as.Date(df$DATE, origin = "1899-12-30") 
df %>% ggplot(aes(x = DATE, y = Cash, col = ATM)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~ ATM, ncol = 1, scales = "free_y")

Missing values and outliers

There are two weeks for which data are missing completely (both ATM and cash) at the end of the series (May 2010), we will drop these as they add no value, and also represent a part of a period for which we need to produce the forecast.
ATM1 and ATM2 have missing values, given obvious seasonal pattern and no apparent trend in these two series, we will impute missing values using na_seadec() function from imputeTS package that removes the seasonal component from the time series, performs imputation on the deseasonalized series and afterwards adds the seasonal component again.
ATM3 shows zeroes throughout the time and only has last three observations. That could be due to malfunction of the machine or machine being installed only recently. Because of that, I will treat those zeroes as NA and not use in any forecast.
ATM4 includes what seems to be an outlier. Given seasons are not as apparent as in series ATM1 adn ATM2, I will replace the outlier with the mean of that series using functionality of outliers package.

df <- df %>% 
  # remove last two weeks of NAs
  drop_na() %>% 
  # create a column for each atm
  spread(ATM,Cash) 

plotNA.distribution(df$ATM1)

plotNA.distribution(df$ATM2)

plotNA.distribution(df$ATM3)

plotNA.distribution(df$ATM4)

# na_seadec removes the seasonal component from the time series, performs imputation on the deseasonalized series and afterwards adds the seasonal component again.
df$ATM1 <- na_seadec(df$ATM1)
df$ATM2 <- na_seadec(df$ATM2)

# substitute zeroes in ATM3 with NAs
df$ATM3[df$ATM3 == 0] <- NA

# Find value with largest difference between vector and sample mean, which can be an outlier, and replace with the mean of the series.
outlier(df$ATM4, opposite = FALSE, logical = FALSE)

## [1] 10919.76

df$ATM4 <- rm.outlier(df$ATM4, fill = TRUE, median = FALSE, opposite = FALSE)
datatable(describe(df))

Now that the data have been cleaned, let’s take a closer look at individual series.

df %>% gather(atm, Cash, -DATE) %>% 
  ggplot(aes(x = DATE, y = Cash, col = atm)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~ atm, ncol = 1, scales = "free_y")

atm1_ts <- ts(df$ATM1,frequency=7)
ggtsdisplay(atm1_ts)

ggseasonplot(atm1_ts, polar=TRUE)

autoplot(decompose(atm1_ts))

Data show clear weekly pattern. Decomposition charts confirm that there is no apparent trend. Interestingly, remainder starts oscilating towards the end of the series. At the same time, seasonal plot show changing behavior, where Thursdays go from being the day with highest withdrawals in the middle of the time series towards lowest in the end of the series. So there appears to be some connection between the two.

atm2_ts <- ts(df[,"ATM2"],frequency=7)
ggtsdisplay(atm2_ts)

ggseasonplot(atm2_ts, polar=TRUE)

autoplot(decompose(atm2_ts))

Similarly to ATM1, ATM2 data show clear weekly pattern but with slight downward trend. Same as with ATM1, remainder starts oscilating towards the end of the series and we see similar shift in pattern in seasonal plot.

atm4_ts <- ts(df[,"ATM4"],frequency=7)
ggtsdisplay(atm4_ts)

ggseasonplot(atm4_ts, polar=TRUE)

autoplot(decompose(atm4_ts))

ATM4 show same seasonaly patterns, no apparent trend and the remainder appears to be white noise.

Forecasting Methods

It is a good practice to have several models fit first to be able to compare the performance and choose a final candidate for forecasting.
Given seasonality in ATM1, ATM2, and ATM4, we will fit and test the following models:
- Seasonal naive, in which we set each forecast to be equal to the last observed value from the same season of the year (month, etc.)
- Holt-Winters seasonal method allow to capture seasonal component. The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. Since or data appear to maintain variance, we will use additive method.
- ARIMA
Given the lack of data for ATM3, we will use simple mean forecasting.

Fit and test models using train and test

For ATM1, ATM2, and ATM4 we will train the above models on 90% of available data, then select best performing one based on RMSE for test set, and lastly retrain the model on all data and predict the 31 values for May 2010.

ATM1

train_perc = 0.9
train <- head(atm1_ts, round(length(atm1_ts) * train_perc))
h <- length(atm1_ts) - length(train)
test <- tail(atm1_ts, h)

# Seasonal Naive
fit <- snaive(train, length(test))
fc <- forecast(fit, h = length(test))
snaive_rmse <- rmse(test,fc$mean)
autoplot(train,main='Seasonal Naive') +
  autolayer(fc,series='Naive') +
  autolayer(test,series='Actual')

fit[['model']]

## Call: snaive(y = train, h = length(test)) 
## 
## Residual sd: 28.995

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 75.609, df = 14, p-value = 1.83e-10
## 
## Model df: 0.   Total lags used: 14

# Holt-Winters seasonal method
fit <- hw(train,seasonal="additive",length(test))
fc <- forecast(fit, h = length(test))
hws_rmse <- rmse(test,fc$mean)
autoplot(train,main="Holt-Winters seasonal") +
  autolayer(fc,series='HW seasonal') +
  autolayer(test,series='Actual')

fit[['model']]

## Holt-Winters' additive method 
## 
## Call:
##  hw(y = train, h = length(test), seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.0227 
##     beta  = 1e-04 
##     gamma = 0.3248 
## 
##   Initial states:
##     l = 78.0384 
##     b = 0.0058 
##     s = -60.074 -3.6633 18.9093 4.2836 17.4204 13.6339
##            9.4902
## 
##   sigma:  25.263
## 
##      AIC     AICc      BIC 
## 4031.368 4032.359 4076.884

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 27.892, df = 3, p-value = 3.826e-06
## 
## Model df: 11.   Total lags used: 14

# ARIMA
fit <- auto.arima(train, seasonal=T, stepwise=F, approximation=F)
fc <- forecast(fit, h = length(test))
arima_rmse <- rmse(test,fc$mean)
autoplot(train,main='ARIMA') +
  autolayer(fc,series='ARIMA') +
  autolayer(test,series='Actual')

summary(fit)

## Series: train 
## ARIMA(0,0,1)(1,1,1)[7] 
## 
## Coefficients:
##          ma1    sar1     sma1
##       0.2075  0.2316  -0.8030
## s.e.  0.0582  0.0893   0.0666
## 
## sigma^2 estimated as 601.3:  log likelihood=-1483.43
## AIC=2974.86   AICc=2974.99   BIC=2989.95
## 
## Training set error measures:
##                     ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 0.1966475 24.14531 15.2034 -108.9955 124.9788 0.8154204
##                     ACF1
## Training set -0.01123355

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(1,1,1)[7]
## Q* = 15.6, df = 11, p-value = 0.1566
## 
## Model df: 3.   Total lags used: 14

All of the models are sufficient based on the residuals but comparing the RMSE results side-by-side, we see that Holt-Winters results in the best model.

res <- data.frame(rbind(snaive_rmse,hws_rmse, arima_rmse))
colnames(res) <- "RMSE"
rownames(res) <- c('Seasonal Naive','Holt-Winters Seasonal', 'ARIMA')
datatable(res) %>% formatRound('RMSE',3)

ATM2

train_perc = 0.9
train <- head(atm2_ts, round(length(atm2_ts) * train_perc))
h <- length(atm2_ts) - length(train)
test <- tail(atm2_ts, h)

# Seasonal Naive
fit <- snaive(train, length(test))
fc <- forecast(fit, h = length(test))
snaive_rmse <- rmse(test,fc$mean)
autoplot(train,main='Seasonal Naive') +
  autolayer(fc,series='Naive') +
  autolayer(test,series='Actual')

fit[['model']]

## Call: snaive(y = train, h = length(test)) 
## 
## Residual sd: 30.9242

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 95.121, df = 14, p-value = 4.052e-14
## 
## Model df: 0.   Total lags used: 14

# Holt-Winters seasonal method
fit <- hw(train,seasonal="additive",length(test))
fc <- forecast(fit, h = length(test))
hws_rmse <- rmse(test,fc$mean)
autoplot(train,main='Holt-Winters Seasonal') +
  autolayer(fc,series='HW seasonal') +
  autolayer(test,series='Actual')

fit[['model']]

## Holt-Winters' additive method 
## 
## Call:
##  hw(y = train, h = length(test), seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
##     gamma = 0.3588 
## 
##   Initial states:
##     l = 74.186 
##     b = -0.0665 
##     s = -45.0356 -25.9466 19.9782 1.8337 4.0716 13.9378
##            31.161
## 
##   sigma:  26.1721
## 
##      AIC     AICc      BIC 
## 4054.559 4055.550 4100.076

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 38.198, df = 3, p-value = 2.566e-08
## 
## Model df: 11.   Total lags used: 14

# ARIMA
fit <- auto.arima(train, seasonal=T, stepwise=F, approximation=F)
fc <- forecast(fit, h = length(test))
arima_rmse <- rmse(test,fc$mean)
autoplot(train,main='ARIMA') +
  autolayer(fc,series='ARIMA') +
  autolayer(test,series='Actual')

summary(fit)

## Series: train 
## ARIMA(2,0,2)(0,1,1)[7] with drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1    drift
##       -0.4482  -0.9516  0.4972  0.8257  -0.8231  -0.0730
## s.e.   0.0372   0.0344  0.0669  0.0491   0.0438   0.0378
## 
## sigma^2 estimated as 617.4:  log likelihood=-1486.87
## AIC=2987.75   AICc=2988.1   BIC=3014.15
## 
## Training set error measures:
##                     ME     RMSE      MAE  MPE MAPE      MASE         ACF1
## Training set 0.4557674 24.35081 17.00471 -Inf  Inf 0.7981446 -0.005601147

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,2)(0,1,1)[7] with drift
## Q* = 9.0769, df = 8, p-value = 0.3359
## 
## Model df: 6.   Total lags used: 14

Once again, we see that Holt-Winters results in the best model for ATM2.

res <- data.frame(rbind(snaive_rmse,hws_rmse, arima_rmse))
colnames(res) <- "RMSE"
rownames(res) <- c('Seasonal Naive','Holt-Winters Seasonal', 'ARIMA')
datatable(res) %>% formatRound('RMSE',3)

ATM4

train_perc = 0.9
train <- head(atm4_ts, round(length(atm4_ts) * train_perc))
h <- length(atm4_ts) - length(train)
test <- tail(atm4_ts, h)

# Seasonal Naive
fit <- snaive(train, length(test))
fc <- forecast(fit, h = length(test))
snaive_rmse <- rmse(test,fc$mean)
autoplot(train,main='Seasonal Naive') +
  autolayer(fc,series='Naive') +
  autolayer(test,series='Actual')

fit[['model']]

## Call: snaive(y = train, h = length(test)) 
## 
## Residual sd: 463.3882

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 97.738, df = 14, p-value = 1.288e-14
## 
## Model df: 0.   Total lags used: 14

# Holt-Winters seasonal method
fit <- hw(train,seasonal="additive",length(test))
fc <- forecast(fit, h = length(test))
hws_rmse <- rmse(test,fc$mean)
autoplot(train,main='Holt-Winters seasonal') +
  autolayer(fc,series='HW seasonal') +
  autolayer(test,series='Actual')

fit[['model']]

## Holt-Winters' additive method 
## 
## Call:
##  hw(y = train, h = length(test), seasonal = "additive") 
## 
##   Smoothing parameters:
##     alpha = 0.0385 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 581.6897 
##     b = -0.1469 
##     s = -311.4446 -54.6314 51.8001 41.8296 96.1871 57.9326
##            118.3266
## 
##   sigma:  340.5622
## 
##      AIC     AICc      BIC 
## 5737.792 5738.782 5783.308

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 14.026, df = 3, p-value = 0.00287
## 
## Model df: 11.   Total lags used: 14

# ARIMA
fit <- auto.arima(train, seasonal=T, stepwise=F, approximation=F)
fc <- forecast(fit, h = length(test))
arima_test_rmse <- rmse(test,fc$mean)
autoplot(train,main='ARIMA') +
  autolayer(fc,series='ARIMA') +
  autolayer(test,series='Actual')

summary(fit)

## Series: train 
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean 
## 
## Coefficients:
##         sar1    sar2      mean
##       0.1506  0.0888  447.6496
## s.e.  0.0552  0.0555   25.2657
## 
## sigma^2 estimated as 124501:  log likelihood=-2388.12
## AIC=4784.23   AICc=4784.36   BIC=4799.41
## 
## Training set error measures:
##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.0746234 351.2298 290.8324 -508.0236 540.5876 0.8223528
##                    ACF1
## Training set 0.07818136

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 15.022, df = 11, p-value = 0.1815
## 
## Model df: 3.   Total lags used: 14

This time, auto.ARIMA generates the best model for ATM4.

res <- data.frame(rbind(snaive_rmse,hws_rmse, arima_rmse))
colnames(res) <- "RMSE"
rownames(res) <- c('Seasonal Naive','Holt-Winters Seasonal', 'ARIMA')
datatable(res) %>% formatRound('RMSE',3)

Retrain each model on all data and forecast cash for May 2010

atm1_pred <- hw(atm1_ts,seasonal="additive",h=31,level=95)
atm2_pred <- hw(atm2_ts,seasonal="additive",h=31,level=95)
atm3_pred <- meanf(ts(tail(df$ATM3,3)),h=31,level=95)
atm4_fit <- Arima(atm4_ts, order = c(0,0,0), seasonal = list(order=c(2,0,0),period=7))
atm4_pred <- forecast(atm4_fit, h=31,level=95)

values = seq(from = as.Date("2010-05-01"), to = as.Date("2010-05-31"), by = 'day')
part_a_pred = data_frame("Date"=values,
                         "ATM1"=as.numeric(atm1_pred$mean),
                         "ATM2"=as.numeric(atm2_pred$mean),
                         "ATM3"=as.numeric(atm3_pred$mean),
                         "ATM4"=as.numeric(atm4_pred$mean))
write_csv(part_a_pred,'Part A predictions.csv')

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

df<-read_excel("/Users/albina/Downloads/ResidentialCustomerForecastLoad-624.xlsx")
head(df)

## # A tibble: 6 x 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377
## 6          738 1998-Jun   6467147

str(df)

## Classes 'tbl_df', 'tbl' and 'data.frame':    192 obs. of  3 variables:
##  $ CaseSequence: num  733 734 735 736 737 738 739 740 741 742 ...
##  $ YYYY-MMM    : chr  "1998-Jan" "1998-Feb" "1998-Mar" "1998-Apr" ...
##  $ KWH         : num  6862583 5838198 5420658 5010364 4665377 ...

datatable(describe(df))

Data exploration and preparation

myts <- ts(df$KWH,frequency=12, start=c(1998,1))
plotNA.distribution(myts)

ggtsdisplay(myts)

ggseasonplot(myts, polar=TRUE)

ggsubseriesplot(myts)

The data has a distinctive seasonality pattern and a slight upward trend towards the end of the series. The data peaks twice in a year which is logical given that people tend to use more electricity in colder months to warm their houses and in hotter months to cool them off.
There is one missing value that we will impute using imputeTS package that takes seasonality into account. There is also what appears to be an outlier (July 2010 drop) that is hard to explain with no additonal context about the data. Although it could be possible due to outage, we will assume this is an erroneous observation and replace it with the mean of the data.

myts <- na_seadec(myts)
# Find value with largest difference between vector and sample mean, which can be an outlier, and replace with the mean of the series.
outlier(myts, opposite = FALSE, logical = FALSE)

## [1] 770523

myts <- rm.outlier(myts, fill = TRUE, median = FALSE, opposite = FALSE)
autoplot(decompose(myts))

Forecasting Methods

Fit and test models using train and test

Given the model

train_perc = 0.9
train <- head(myts, round(length(myts) * train_perc))
h <- length(myts) - length(train)
test <- tail(myts, h)
autoplot(myts) +
  autolayer(train, series="Training") +
  autolayer(test, series="Test") + 
  theme(legend.title=element_blank())

# Seasonal Naive
fit <- snaive(train, length(test))
fc <- forecast(fit, h = length(test))
snaive_test_rmse <- rmse(test,fc$mean)
autoplot(train,main='Seasonal Naive') +
  autolayer(fc,series='Naive') +
  autolayer(test,series='Test')

fit[['model']]

## Call: snaive(y = train, h = length(test)) 
## 
## Residual sd: 810850.2641

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 84.596, df = 24, p-value = 1.109e-08
## 
## Model df: 0.   Total lags used: 24

# Holt-Winters seasonal method
fit <- hw(train,seasonal="additive",damped=TRUE, length(test))
fc <- forecast(fit, h = length(test))
hws_test_rmse <- rmse(test,fc$mean)
autoplot(train,main='Holt-Winters seasonal') +
  autolayer(fc,series='HW seasonal') +
  autolayer(test,series='Test')

fit[['model']]

## Damped Holt-Winters' additive method 
## 
## Call:
##  hw(y = train, h = length(test), seasonal = "additive", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.2779 
##     beta  = 1e-04 
##     gamma = 1e-04 
##     phi   = 0.9775 
## 
##   Initial states:
##     l = 6185694.4435 
##     b = 3949.0468 
##     s = -480375.8 -1614156 -774013 1219060 1713068 1221644
##            -92928.94 -1463345 -1162787 -455988.5 511465.3 1378358
## 
##   sigma:  628778.8
## 
##      AIC     AICc      BIC 
## 5529.256 5533.698 5586.015

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' additive method
## Q* = 40.038, df = 7, p-value = 1.238e-06
## 
## Model df: 17.   Total lags used: 24

# ARIMA
fit <- auto.arima(train, seasonal=T, stepwise=F, approximation=F)
fc <- forecast(fit, h = length(test))
arima_test_rmse <- rmse(test,fc$mean)
autoplot(train,main='ARIMA') +
  autolayer(fc,series='ARIMA') +
  autolayer(test,series='Test')

summary(fc)

## 
## Forecast method: ARIMA(0,0,3)(2,1,0)[12] with drift
## 
## Model Information:
## Series: train 
## ARIMA(0,0,3)(2,1,0)[12] with drift 
## 
## Coefficients:
##          ma1     ma2     ma3     sar1     sar2     drift
##       0.3571  0.0459  0.2284  -0.8077  -0.4973  7737.812
## s.e.  0.0793  0.0827  0.0691   0.0832   0.0828  2814.255
## 
## sigma^2 estimated as 3.393e+11:  log likelihood=-2368.33
## AIC=4750.66   AICc=4751.39   BIC=4772.23
## 
## Error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -8651.303 551363.7 415191.4 -0.7681296 6.514558 0.6712225
##                     ACF1
## Training set 0.009119741
## 
## Forecasts:
##          Point Forecast   Lo 80    Hi 80   Lo 95    Hi 95
## Jun 2012        6825450 6078947  7571953 5683772  7967128
## Jul 2012        8009982 7217302  8802663 6797682  9222283
## Aug 2012        8923027 8129607  9716446 7709596 10136457
## Sep 2012        8132151 7320615  8943688 6891013  9373290
## Oct 2012        5883549 5072013  6695086 4642411  7124688
## Nov 2012        5543163 4731626  6354700 4302025  6784301
## Dec 2012        7240544 6429007  8052081 5999406  8481682
## Jan 2013        9222103 8410567 10033640 7980965 10463242
## Feb 2013        8677864 7866327  9489401 7436726  9919003
## Mar 2013        7063787 6252250  7875324 5822649  8304925
## Apr 2013        5922271 5110734  6733808 4681133  7163409
## May 2013        5481729 4670192  6293266 4240590  6722867
## Jun 2013        7351526 6527389  8175662 6091118  8611934
## Jul 2013        8137796 7312066  8963526 6874951  9400641
## Aug 2013        9069467 8243711  9895223 7806582 10332352
## Sep 2013        8442535 7616128  9268942 7178655  9706415
## Oct 2013        6006991 5180585  6833398 4743111  7270872
## Nov 2013        5577603 4751196  6404010 4313723  6841484
## Dec 2013        7234009 6407602  8060416 5970129  8497889

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,3)(2,1,0)[12] with drift
## Q* = 13.305, df = 18, p-value = 0.7731
## 
## Model df: 6.   Total lags used: 24

This time, auto.ARIMA generates the best model.

res <- data.frame(rbind(snaive_rmse,hws_rmse, arima_rmse))
colnames(res) <- "RMSE"
rownames(res) <- c('Seasonal Naive','Holt-Winters Seasonal', 'ARIMA')
datatable(res) %>% formatRound('RMSE',3)

Retrain model on all data and forecast residential power usage for 2014

#myts_pred <- hw(myts,seasonal="additive",h=12,level=95)
myts_fit <- Arima(myts, order = c(0,0,3), seasonal = list(order=c(2,1,0),period=12))
myts_pred <- forecast(myts_fit, h=12,level=95)

values = seq(from = as.Date("2014-01-01"), to = as.Date("2014-12-31"), by = 'month')
part_b_pred = data_frame("Date"=values,
                         "power-usage"=as.numeric(myts_pred$mean))
write_csv(part_b_pred,'Part B predictions.csv')

data624_project1

Albina Gallyavova

3/29/2020

Part A – ATM Forecast, ATM624Data.xlsx

Data exploration and preparation

Missing values and outliers

Forecasting Methods

Fit and test models using train and test

ATM1

ATM2

ATM4

Retrain each model on all data and forecast cash for May 2010

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Data exploration and preparation

Forecasting Methods

Fit and test models using train and test

Retrain model on all data and forecast residential power usage for 2014