DATA 624 Project 1

ATM Withdrawl Predictions

Data was collected from 4 ATM machines from May 1, 2009 through April 30, 2010. This data will be used to build models to predict the amount of money withdrawn from each ATM machine for May 20

A sample of the data can be seen below. It contains the amount of cash in hundreds of dollars taken out from each ATM on each day.

##               DateTime  ATM Cash
## 1 5/1/2009 12:00:00 AM ATM1   96
## 2 5/1/2009 12:00:00 AM ATM2  107
## 3 5/2/2009 12:00:00 AM ATM1   82
## 4 5/2/2009 12:00:00 AM ATM2   89
## 5 5/3/2009 12:00:00 AM ATM1   85
## 6 5/3/2009 12:00:00 AM ATM2   90

The time stamp will be removed so that the date can be used for forecasting.

##         Date  ATM Cash
## 1 2009-05-01 ATM1   96
## 2 2009-05-01 ATM2  107
## 3 2009-05-02 ATM1   82
## 4 2009-05-02 ATM2   89
## 5 2009-05-03 ATM1   85
## 6 2009-05-03 ATM2   90

A separate data frame for each ATM machine will be created and the data will be converted into a time series object to make predictions. The frequency is chosen to be 365.25 since data was collected daily.

ATMs 1 and 2 show a large dip in the removal of cash from ATM machines once a week. This suggests a seasonality with a frequency of 1 week.

ATM3 shows no usage until the end of data collection in 2010. I would expect that is when this ATM machine went online or there is an issue with the data collection.

ATM4 shows a single large spike toward the end of 2009, indicating that more than $9,000,000 was taken from ATM4 in 1 day. That is unlikely and that outlier will be removed.

The value of the removed outlier for ATM 4 is $10,920,000.

The lag plots for ATM 1 and ATM 2 show strong positive relationship for a lag of 7. A lag of 7 refers to 1 week. This confirms my prediction that ATMs 1 and 2 have a seasonality with a frequency of 1 week.

ATM 3 only 3 values of data above 0. Due to the lack of data collected for ATM 3, it cannot be used in a meaningful way to make future predictions about it.

ATM 4 does not show a strong positive relationship for any lags. This suggests that it does not exhibit a seasonality.

ATM 1: The ACF shows a higher lag every 7 lags, indicating a seasonality of 7 days. There are 4-5 negative lags between the positive lags, likely indicating that middle of the week usage at ATM 1 is lower than weekend usage. The auto correlations decrease over time, indicating that there is a decreasing trend. There is an increase in the lags around day 230, which is likely due to the holiday season.

ATM 2: The ACF shows a higher lag every 7 lags, indicating a seasonality of 7 days. The lags alternate between being positive and negative. The lags decrease initially, indicating a negative trend. However the lags increase and then decrease later. That is likely a yearly seasonality occurring in December, at the holiday season.

ATM 4: The ACF shows a higher lag every 7 lags, indicating a seasonality of 7 days. The lags alternate between being positive and negative. The lags decrease, indicating a negative trend.

The variance in each of the ATM machines looks constant. To show, this the ideal lambda for a Box Cox transformation each ATM machine is calculated. Lambda for each is 1, indicating that a transformation is not necessary.

## [1] 1

## [1] 1

## [1] 1

## [1] 1

The data for each ATM machine is summarized below:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00   73.00   91.00   83.89  108.00  180.00       3

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   25.50   67.00   62.58   93.00  147.00       2

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.7206  0.0000 96.0000

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0   123.8   403.5   445.3   704.2  1712.0

ATM 1 has 3 missing values and ATM 2 has 2 missing values. The average of the previous day’s value and the next day’s value will be imputed for the missing value.

Each ATM machine’s data is broken up into a training set, consisting of May 2009 through the end of March 2010. The test set consists of data for April of 2010. Models will be built based on the training set and accuracy will be measured by comparing the predictions to the values in the test set.

To compare the scales and patterns over a single month for the different ATM machines, the cash withdrawn from the 4 ATM machines in April 2010 are plotted on top of each other. The range in values for ATMs 1 and 2 are fairly similar, as are the times in which the amount of money taken out peaks or drops. ATMs 1 and 2 both show 1 dip in the amount of money taken out per week. The dips go to 3000 and 4000 dollars removed from the ATM machine. Perhaps this dip is due to the location of that ATM machine being less accessible one day a week due to businesses that are open nearby. The mean amount of money taken out of ATM 4 is significantly higher than that of ATMs 1 or 2. ATM 4 does not follow a similar pattern to ATMs 1 and 2. ATM 3 follows almost the identical pattern to ATM 2 once money begins to be withdrawn from it.

ATM Forecasts

Models will not be built for ATM 3 since there is not sufficient data to do so.

Seasonal Naive Model

The seasonal naive method will be used to make predictions. This entails making a prediction equal to the amount of cash withdrawn from that ATM machine in the previous season.

##                     ME     RMSE      MAE       MPE     MAPE MASE
## Training set       NaN      NaN      NaN       NaN      NaN  NaN
## Test set     -1.709677 11.98521 9.129032 -31.76956 39.01083  NaN
##                    ACF1  Theil's U
## Training set         NA         NA
## Test set     0.08652696 0.01387299

##                     ME     RMSE      MAE       MPE     MAPE MASE
## Training set       NaN      NaN      NaN       NaN      NaN  NaN
## Test set     -12.70968 24.70601 19.22581 -108.5663 116.6305  NaN
##                    ACF1 Theil's U
## Training set         NA        NA
## Test set     -0.3607095 0.2964983

##                 ME     RMSE MAE       MPE    MAPE MASE       ACF1
## Training set   NaN      NaN NaN       NaN     NaN  NaN         NA
## Test set     -21.4 410.7612 361 -575.4356 636.021  NaN -0.1749636
##              Theil's U
## Training set        NA
## Test set     0.8181594

The graphs comparing the seasonal naive predictions to the test set show a reasonably close approximation for ATMs 1 and 2, but less so for ATM 4.

Holt Winters Model

ATMs 1 and 2 display a seasonality so a Holt Winters model will be built to make a prediction. The lag plot for ATM 4 does display a seasonality. A Holt model will be used for ATM 4. A Holt Winters model will also be used for ATM 4 to confirm assess whether a seasonal model can be successful.

##                       ME     RMSE       MAE        MPE     MAPE      MASE
## Training set -0.09422321 24.66875 15.714461 -114.35660 129.9076 0.8497104
## Test set     -5.96755378 11.73830  9.262757  -61.89711  65.5155 0.5008546
##                    ACF1  Theil's U
## Training set  0.1457123         NA
## Test set     -0.2500024 0.05957053

##                        ME     RMSE      MAE      MPE     MAPE      MASE
## Training set  0.009502276 25.69295 18.11367     -Inf      Inf 0.8500905
## Test set     26.973129956 36.37543 27.55717 32.46136 34.34875 1.2932827
##                     ACF1  Theil's U
## Training set  0.02475680         NA
## Test set     -0.08532886 0.09604031

##                       ME     RMSE      MAE        MPE      MAPE      MASE
## Training set   -3.413756 331.2179 263.8321  -413.6818  445.2174 0.7462139
## Test set     -339.454056 374.7645 339.4541 -2314.4672 2314.4672 0.9601005
##                    ACF1 Theil's U
## Training set  0.1015367        NA
## Test set     -0.0835703   3.09943

##                      ME     RMSE      MAE        MPE      MAPE      MASE
## Training set  -4.627694 360.0918 299.9680  -522.2904  555.7683 0.8484195
## Test set     -32.161832 286.8422 244.6787 -1034.2002 1062.8799 0.6920411
##                      ACF1 Theil's U
## Training set  0.079839950        NA
## Test set     -0.008059049 0.1614532

The Holt Winters model does the best job in making predictions for ATM 1. The Holt Winters model does a poor job making predictions for ATM 4. The Holt model takes the trend for ATM 4 into account.

Exponential Smoothing Model Model

The ets function to choose the model with the lowest AICc.

##                       ME     RMSE       MAE        MPE      MAPE      MASE
## Training set  0.08906076 24.62907 15.664438 -112.77529 128.66387 0.8470055
## Test set     -5.75237990 11.61487  9.111839  -61.03912  64.73164 0.4926942
##                    ACF1 Theil's U
## Training set  0.1476928        NA
## Test set     -0.2540539 0.0569798

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -1.115496 25.64028 18.13900      -Inf      Inf 0.8512796
## Test set      9.525816 19.37845 14.32637 -25.40946 57.41915 0.6723492
##                     ACF1  Theil's U
## Training set  0.02437516         NA
## Test set     -0.09535631 0.05693494

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -19.32035 348.6001 273.6036 -422.6548 455.0142 0.7738511
## Test set     -52.63719 270.2735 234.2636 -754.4345 778.3578 0.6625834
##                    ACF1 Theil's U
## Training set 0.08196991        NA
## Test set     0.04152400 0.3391968

ATM 1: The ETS model is an ETS(A,N,A) model, which is equivalent to an ARIMA(0,1,m)(0,1,0)$_m$. This model is not equivalent to the ARIMA model predicted by the auto.arima function shown below.

ATM 2: The ETS model is an ETS(A,N,A) model, which is equivalent to an ARIMA(0,1,m)(0,1,0)$_m$. This model is not equivalent to the ARIMA model predicted by the auto.arima function shown below.

ATM 4: The ETS model is an ETS(M,N.M) model.

The ETS model is able to make meaningful predictions or ATMs 1 and 2, but not for ATM 4.

ARIMA Model

## Series: atm1s7.train 
## ARIMA(0,0,1)(0,1,2)[7] 
## 
## Coefficients:
##          ma1     sma1     sma2
##       0.2003  -0.5834  -0.1090
## s.e.  0.0578   0.0532   0.0531
## 
## sigma^2 estimated as 597.7:  log likelihood=-1514.45
## AIC=3036.9   AICc=3037.02   BIC=3052.07

##                      ME     RMSE       MAE        MPE      MAPE      MASE
## Training set  0.2362395 24.08116 15.143933 -106.94275 122.78752 0.8188609
## Test set     -6.8961857 12.38991  9.706059  -83.28674  86.36082 0.5248248
##                    ACF1  Theil's U
## Training set -0.0108296         NA
## Test set     -0.2870166 0.05258493

## Series: atm2s7.train 
## ARIMA(5,0,3)(0,1,1)[7] with drift 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4      ar5      ma1     ma2      ma3
##       0.3398  -0.5424  0.5637  -0.0027  -0.1529  -0.3120  0.4036  -0.5591
## s.e.  0.2124   0.1397  0.2311   0.0757   0.0730   0.2154  0.1321   0.1933
##          sma1    drift
##       -0.8145  -0.0759
## s.e.   0.0491   0.0268
## 
## sigma^2 estimated as 613.5:  log likelihood=-1516.25
## AIC=3054.51   AICc=3055.34   BIC=3096.23

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set  0.5154065 24.13162 17.21514      -Inf      Inf 0.8079217
## Test set     11.2043998 23.10800 18.72250 -31.69574 66.02793 0.8786636
##                     ACF1 Theil's U
## Training set 0.001950252        NA
## Test set     0.139883524 0.1728716

## Series: atm4s7.train 
## ARIMA(0,0,0)(1,0,0)[7] with non-zero mean 
## 
## Coefficients:
##         sar1      mean
##       0.1622  449.6806
## s.e.  0.0541   22.7907
## 
## sigma^2 estimated as 123867:  log likelihood=-2438.7
## AIC=4883.4   AICc=4883.47   BIC=4894.84

##                        ME     RMSE      MAE        MPE     MAPE      MASE
## Training set   0.02681923 350.8951 293.9945  -488.9515  522.364 0.8315243
## Test set     -55.16243623 285.2272 245.2728 -1013.1805 1038.463 0.6937212
##                     ACF1 Theil's U
## Training set 0.076536206        NA
## Test set     0.008810268 0.1578391

ATM 1: The ARIMA model is a (0,1,2) model. The model is an MA(2) built upon differencing once.
ATM 2: The ARIMA model is a (0,1,1) model. This model is an MA(1) built upon differencing once. ATM 4: The non-seasonal ARIMA model is a (0,0,0) model, which indicates white noise. The seasonal model is a (1,0,0) model which is an AR(1) model.

Model Based on the Mean of the Data

##                 ME     RMSE      MAE       MPE     MAPE       ACF1
## Test set -7.488128 31.74735 19.45635 -311.8962 323.7505 -0.1469268
##          Theil's U
## Test set 0.1807297

##                 ME     RMSE      MAE       MPE     MAPE      ACF1
## Test set -1.191781 37.11595 32.08164 -662.0508 695.0437 0.1359286
##          Theil's U
## Test set 0.3853172

##                 ME     RMSE      MAE       MPE     MAPE        ACF1
## Test set -50.48835 289.9134 248.1579 -1082.322 1108.709 -0.00533011
##          Theil's U
## Test set 0.1523946

ATM 1
	Seasonal_Naive	HW	ETS	ARIMA	mean
RMSE	11.98521	11.7383	11.61487	12.38991	31.74735

ATM 2
	Seasonal_Naive	HW	ETS	ARIMA	mean
RMSE	24.70601	36.37543	19.37845	23.108	37.11595

ATM 4
	Seasonal_Naive	HW	Holt	ETS	ARIMA	mean
RMSE	410.7612	374.7645	286.8422	270.2735	285.2272	289.9134

The root mean square error is lowest for the exponential smoothing model for ATMs 1, 2 and 4. So therefore that model will be used to build the prediction for May 2010. For ATM 3, it is challenging to build a prediction model without more information. If ATM 3 is in a similar location to ATM 2 and just when online, the prediction for ATM 2 could be used to predict the behavior of ATM 3. However without that information, it is not responsible to make a prediction for ATM 3 based solely on 3 data points. While the ETS model will be used to build the model for ATM 4, it could be argued that simply using the mean as the prediction would also be compelling since the data shows few patterns.

Residential Power Usage Predictions

Data of residential power usage was collected from January 1998 until December 2013. A monthly forecast for 2014 will be built.

##       Date  Energy
## 1 1998-Jan 6862583
## 2 1998-Feb 5838198
## 3 1998-Mar 5420658
## 4 1998-Apr 5010364
## 5 1998-May 4665377
## 6 1998-Jun 6467147

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##   770523  5429912  6283324  6502475  7620524 10655730        1

The plot of residential energy usage displays a seasonality. It looks like there are 2 peaks and 2 troughs per year, presumably because people use more energy heating their homes in winter and cooling them in summer. There is one large dip in the middle of 2010 and 1 missing value. The missing value will be imputed by calculating the average of the data point before and after the missing value.

The additive decomposition displays a seasonality of 6 months. There is a step in the trend between 2010-2011. There is a single outlier present at the dip that is visible in the data in 2010.

Prior to making predictions, the data will be broken up into a training set and testing set.

Seasonal Naive Model

##                     ME      RMSE      MAE       MPE     MAPE      MASE
## Training set  81457.02 1191016.1 707980.8 -4.420714 15.15885 1.0000000
## Test set     290716.36  958105.2 523527.6  3.541180  6.28014 0.7394659
##                    ACF1 Theil's U
## Training set 0.24252436        NA
## Test set     0.01451197 0.6490296

Holt Winters’ Model

##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 62175.63 825861.5 519575.6 -4.3612537 12.21217 0.7338838
## Test set     78057.81 880078.5 515655.7 -0.0881887  6.51678 0.7283470
##                    ACF1 Theil's U
## Training set 0.22172522        NA
## Test set     0.09648418 0.5838521

##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set  44929.46 819420.5 510919.1 -4.6331811 12.186267 0.7216568
## Test set     100573.70 880422.5 566700.3  0.6394197  7.183157 0.8004458
##                   ACF1 Theil's U
## Training set 0.1978976        NA
## Test set     0.1864843 0.5820991

Simple Exponential Smoothing Model

The ets function chooses the model with the lowest AICc.

## ETS(M,N,M) 
## 
## Call:
##  ets(y = powerdata.train) 
## 
##   Smoothing parameters:
##     alpha = 0.1476 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 6132871.0504 
##     s = 0.9343 0.7484 0.8763 1.1793 1.2594 1.2059
##            0.9962 0.7666 0.8099 0.9152 1.0807 1.2278
## 
##   sigma:  0.1185
## 
##      AIC     AICc      BIC 
## 5851.644 5854.553 5899.622 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 42349.64 831710.8 506848.2 -4.963082 12.55188 0.7159067
##                   ACF1
## Training set 0.1553239

##                    ME     RMSE      MAE        MPE      MAPE      MASE
## Training set 42349.64 831710.8 506848.2 -4.9630818 12.551882 0.7159067
## Test set     55732.26 916743.3 613109.5  0.0452251  7.736063 0.8659973
##                   ACF1 Theil's U
## Training set 0.1553239        NA
## Test set     0.1539203 0.6006417

ARIMA Model

## Series: powerdata.train 
## ARIMA(1,0,1)(2,1,0)[12] with drift 
## 
## Coefficients:
##          ar1      ma1     sar1     sar2     drift
##       0.6810  -0.5078  -0.8505  -0.6501  7548.754
## s.e.  0.2324   0.2744   0.0671   0.0746  3427.584
## 
## sigma^2 estimated as 6.874e+11:  log likelihood=-2548.88
## AIC=5109.77   AICc=5110.29   BIC=5128.55

##                     ME      RMSE     MAE       MPE     MAPE      MASE
## Training set  -7791.51  789202.7  490737 -5.044089 11.58144 0.6931502
## Test set     557383.94 1618718.3 1011161  6.171292 12.60330 1.4282319
##                     ACF1 Theil's U
## Training set 0.005883929        NA
## Test set     0.084162853 0.8984846

The black line in the graph above represents the test set. Visually, all of the models except for the ARIMA model have fairly similar predictions. The Holt-Winters’ models appear to be the most accurate prediction of the test set.

Energy Usage
	Seasonal_Naive	HW_Additive	HW_Multiplicative	ETS	ARIMA
RMSE	958105.2	880078.5	880422.5	916743.3	1618718

The Holt Winters’ multiplicative model yields the lowest root mean square error for the test set. I will therefore use that model to make predictions. I will build the model off of all of the data, not just the training set.

R Code

suppressWarnings(suppressMessages(library(fpp2))) suppressWarnings(suppressMessages(library(dplyr))) suppressWarnings(suppressMessages(library(tidyr))) suppressWarnings(suppressMessages(library(seasonal))) suppressWarnings(suppressMessages(library(knitr))) suppressWarnings(suppressMessages(library(gridExtra)))

atmdata <- read.csv(“C:/Users/Swigo/Desktop/Sarah/DATA624_Predictive_Analytics/ATMData.csv”, stringsAsFactors = FALSE) colnames(atmdata) <- c(“DateTime”,“ATM”,“Cash”) head(atmdata)

atmdata <- atmdata %>% separate(DateTime, c(“Date”,“time”, “AM”),sep=" “) %>% select(Date, ATM, Cash) atmdata$Date <- as.Date(atmdata$Date,”%m/%d/%Y“) head(atmdata)

atm1 <- atmdata %>% filter(ATM==“ATM1”) %>% select(Date,Cash)

atm1 <- ts(atm1[,-1], start=c(2009.417), frequency = 365.25)

atm2 <- atmdata %>% filter(ATM==“ATM2”) %>% select(Date,Cash) atm2 <- ts(atm2[,-1], start=c(2009.417), frequency = 365.25)

atm3 <- atmdata %>% filter(ATM==“ATM3”) %>% select(Date,Cash) atm3 <- ts(atm3[,-1], start=c(2009.417), frequency = 365.25)

atm4 <- atmdata %>% filter(ATM==“ATM4”) %>% select(Date,Cash) atm4 <- ts(atm4[,-1], start=c(2009.417), frequency = 365.25)

break_nums <- c(2009.417,2009.498,2009.582,2009.665,2009.748,2009.832,2009.915,2009.998,2010.082,2010.165,2010.248,2010.332) label_vals <- c(“May”,“June”,“July”,“Aug”,“Sept”,“Oct”,“Nov”,“Dec”,“Jan”,“Feb”,“Mar”,“Apr”)

a1 <- autoplot(atm1) + ggtitle(“ATM 1”) + ylab(“Cash withdrawn (hundreds of dollars)”) +
scale_x_continuous(breaks = break_nums, label = label_vals) + xlab(‘2009-2010’) + theme(legend.position=“none”, axis.text.x=element_text(angle=45, vjust=0.5)) a2 <- autoplot(atm2) + ggtitle(“ATM 2”) + ylab(“Cash withdrawn (hundreds of dollars)”) + scale_x_continuous(breaks = break_nums, label = label_vals) + xlab(‘2009-2010’)+ theme(legend.position=“none”, axis.text.x=element_text(angle=45, vjust=0.5)) a3 <- autoplot(atm3) + ggtitle(“ATM 3”) + ylab(“Cash withdrawn (hundreds of dollars)”)+ scale_x_continuous(breaks = break_nums, label = label_vals) + xlab(‘2009-2010’) + theme(legend.position=“none”, axis.text.x=element_text(angle=45, vjust=0.5)) a4 <- autoplot(atm4) + ggtitle(“ATM 4”) + ylab(“Cash withdrawn (hundreds of dollars)”)+ scale_x_continuous(breaks = break_nums, label = label_vals) + xlab(‘2009-2010’) + theme(legend.position=“none”, axis.text.x=element_text(angle=45, vjust=0.5)) grid.arrange( a1, a2, nrow=1, ncol=2) grid.arrange( a3, a4, nrow=1, ncol=2)

atm4 <- atmdata %>% filter(ATM==“ATM4”) %>% filter(Cash<9000) %>% select(Date,Cash) atm4 <- ts(atm4[,-1], start=c(2009.417), frequency = 365.25) autoplot(atm4) + ggtitle(“ATM 4 - Outlier Removed”) + ylab(“Cash withdrawn (hundreds of dollars”) + scale_x_continuous(breaks = break_nums, label = label_vals) + xlab(‘2009-2010’)

gglagplot(atm1) + ggtitle(“Lag Plot for ATM 1”) + theme(axis.text.x = element_text(angle=90)) gglagplot(atm2) + ggtitle(“Lag Plot for ATM 2”) + theme(axis.text.x = element_text(angle=90)) length(atm3[atm3>0]) gglagplot(atm4) + ggtitle(“Lag Plot for ATM 4”) + theme(axis.text.x = element_text(angle=90))

ggAcf(atm1) ggAcf(atm2) ggAcf(atm4)

BoxCox.lambda(atm1) BoxCox.lambda(atm2) BoxCox.lambda(atm3) BoxCox.lambda(atm4)

summary(atm1) summary(atm2) summary(atm3) summary(atm4)

for (i in 1:length(atm1)){ if(is.na(atm1[i])){atm1[i]<-(atm1[i-1]+atm1[i+1])/2} if(is.na(atm2[i])){atm2[i]<-(atm2[i-1]+atm2[i+1])/2} }

atm1.train <- window(atm1, end=c(2010.32999999)) atm1.test <- window(atm1, start=2010.33) atm2.train <- window(atm2, end=c(2010.32999999)) atm2.test <- window(atm2, start=2010.33) atm3.train <- window(atm3, end=c(2010.32999999)) atm3.test <- window(atm3, start=2010.33) atm4.train <- window(atm4, end=c(2010.32999999)) atm4.test <- window(atm4, start=2010.33) autoplot(atm1.test, series=“ATM 1”) + autolayer(atm2.test, series=“ATM 2”)+ autolayer(atm3.test, series=“ATM 3”)+ autolayer(atm4.test, series=“ATM 4”) + xlab(“April 2010”) + ylab(“Cash withdrawn (hundreds of dollars)”) + ggtitle(“Comparison of 4 ATM machines for April 2010”)

snaive_atm1 <- snaive(atm1.train, h=31) snaive_atm2 <- snaive(atm2.train, h=31) snaive_atm4 <- snaive(atm4.train, h=31)

accuracy(snaive_atm1,atm1.test) accuracy(snaive_atm2,atm2.test) accuracy(snaive_atm4,atm4.test)

autoplot(atm1.test, series=“Test”) + autolayer(snaive_atm1, series=“Seasonal Naive”) + ggtitle(“ATM 1”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”) autoplot(atm2.test, series=“Test”) + autolayer(snaive_atm2, series=“Seasonal Naive”) + ggtitle(“ATM 2”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”) autoplot(atm4.test, series=“Test”) + autolayer(snaive_atm4, series=“Seasonal Naive”) + ggtitle(“ATM 4”) + ylab(“Cash withdrawn (hundreds of dollars)”)+xlab(“April”)

atm1s7 <- atmdata %>% filter(ATM==“ATM1”) %>% select(Date,Cash)

atm2s7 <- atmdata %>% filter(ATM==“ATM2”) %>% select(Date,Cash)

atm4s7 <- atmdata %>% filter(ATM==“ATM4”) %>% filter(Cash<9000) %>% select(Date,Cash)

atm1s7 <- ts(atm1s7[,-1], start=c(2009), frequency = 7) atm2s7 <- ts(atm2s7[,-1], start=c(2009), frequency = 7) atm4s7 <- ts(atm4s7[,-1], start=c(2009), frequency = 7)

for (i in 1:length(atm1s7)){ if(is.na(atm1s7[i])){atm1s7[i]<-(atm1s7[i-1]+atm1s7[i+1])/2} if(is.na(atm2s7[i])){atm2s7[i]<-(atm2s7[i-1]+atm2s7[i+1])/2} }

atm1s7.train <- window(atm1s7, end=c(2056.799)) atm1s7.test <- window(atm1s7, start=2056.8)

atm2s7.train <- window(atm2s7, end=c(2056.799)) atm2s7.test <- window(atm2s7, start=2056.8)

atm4s7.train <- window(atm4s7, end=c(2056.799)) atm4s7.test <- window(atm4s7, start=2056.8)

holt_atm1 <- hw(atm1s7.train,h=30) holt_atm2 <- hw(atm2s7.train,h=30) hw_atm4 <- hw(atm4s7.train,h=30) holt_atm4 <- holt(atm4s7.train,h=30)

accuracy(holt_atm1,atm1s7.test) accuracy(holt_atm2,atm1s7.test) accuracy(hw_atm4,atm1s7.test) accuracy(holt_atm4,atm4s7.test)

autoplot(atm1s7.test, series=“Test”) + autolayer(holt_atm1, series=“Holt Winters Model”,PI=FALSE) + ggtitle(“ATM 1”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”) + theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

autoplot(atm2s7.test, series=“Test”) + autolayer(holt_atm2, series=“Holt Winters Model”,PI=FALSE) + ggtitle(“ATM 2”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”) + theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

autoplot(atm4s7.test, series=“Test”) + autolayer(holt_atm4, series=“Holt Model”,PI=FALSE) + autolayer(hw_atm4, series=“Holt Winters Model”,PI=FALSE) + ggtitle(“ATM 4”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”)+ theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

ets_atm1 <- ets(atm1s7.train) ets_pred_atm1 <- ets_atm1 %>% forecast(h=30) summary(ets_pred_atm1) accuracy(forecast(ets_pred_atm1),atm1s7.test)

ets_atm2 <- ets(atm2s7.train) ets_pred_atm2 <- ets_atm2 %>% forecast(h=30) summary(ets_pred_atm2) accuracy(ets_pred_atm2,atm2s7.test)

ets_atm4 <- ets(atm4s7.train) ets_pred_atm4 <- ets_atm4 %>% forecast(h=30) summary(ets_pred_atm4) accuracy(ets_pred_atm4,atm4s7.test)

autoplot(atm1s7.test, series=“Test”) + autolayer(ets_pred_atm1, series=“ETS Model”,PI=FALSE) + ggtitle(“ATM 1”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”)+ theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

autoplot(atm2s7.test, series=“Test”) + autolayer(ets_pred_atm2, series=“ETS Model”,PI=FALSE) + ggtitle(“ATM 2”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”)+ theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

autoplot(atm4s7.test, series=“Test”) + autolayer(ets_pred_atm4, series=“ETS Model”,PI=FALSE) + ggtitle(“ATM 4”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”)+ theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

fit_arima_atm1 <- auto.arima(atm1s7.train) fit_arima_atm1 arima_atm1_fc <- fit_arima_atm1 %>% forecast(h=30) accuracy(arima_atm1_fc,atm1s7.test)

fit_arima_atm2 <- auto.arima(atm2s7.train) fit_arima_atm2 arima_atm2_fc <- fit_arima_atm2 %>% forecast(h=30) accuracy(arima_atm2_fc,atm2s7.test)

fit_arima_atm4 <- auto.arima(atm4s7.train) fit_arima_atm4 arima_atm4_fc <- fit_arima_atm4 %>% forecast(h=30) accuracy(arima_atm4_fc,atm4s7.test)

autoplot(atm1s7.test, series=“Test”) + autolayer(arima_atm1_fc, series=“ARIMA(0,1,2) Model”,PI=FALSE) + ggtitle(“ATM 1”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”)+ theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

autoplot(atm2s7.test, series=“Test”) + autolayer(arima_atm2_fc, series=“ARIMA(0,1,1) Model”,PI=FALSE) + ggtitle(“ATM 2”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”)+ theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

autoplot(atm4s7.test, series=“Test”) + autolayer(arima_atm4_fc, series=“ARIMA(1,0,0) Model”,PI=FALSE) + ggtitle(“ATM 4”) + ylab(“Cash withdrawn (hundreds of dollars)”) +xlab(“April”)+ theme( axis.text.x=element_blank(), axis.ticks.x=element_blank())

mean_val1 <- rep(mean(atm1s7),length(atm1s7.test)) mean_val2 <- rep(mean(atm2s7),length(atm2s7.test)) mean_val4 <- rep(mean(atm4s7),length(atm4s7.test)) accuracy(mean_val1,atm1s7.test) accuracy(mean_val2,atm2s7.test) accuracy(mean_val4,atm4s7.test)

atm1_table <- data.frame(list(Seasonal_Naive=11.98521,HW=11.73830, ETS=11.61487, ARIMA=12.38991, mean=31.74735), stringsAsFactors=FALSE) rownames(atm1_table) <- c(“RMSE”) kable(atm1_table, caption=“ATM 1”)

atm2_table <- data.frame(list(Seasonal_Naive=24.70601,HW=36.37543, ETS=19.37845, ARIMA=23.10800, mean=37.11595), stringsAsFactors=FALSE) rownames(atm2_table) <- c(“RMSE”) kable(atm2_table, caption=“ATM 2”)

atm4_table <- data.frame(list(Seasonal_Naive=410.7612,HW=374.7645, Holt=286.8422, ETS=270.2735, ARIMA=285.2272, mean=289.9134), stringsAsFactors=FALSE) rownames(atm4_table) <- c(“RMSE”) kable(atm4_table, caption=“ATM 4”)

pred_ATM1 <- ets(atm1s7) %>% forecast(h=31) pred_ATM1 <- pred_ATM1$mean atm1list <- rep(“ATM1”,31) datelist <- c(“5/1/2010 12:00:00 AM”,“5/2/2010 12:00:00 AM”,“5/3/2010 12:00:00 AM”,“5/4/2010 12:00:00 AM”,“5/5/2010 12:00:00 AM”,“5/6/2010 12:00:00 AM”,“5/7/2010 12:00:00 AM”,“5/8/2010 12:00:00 AM”,“5/9/2010 12:00:00 AM”,“5/10/2010 12:00:00 AM”,“5/11/2010 12:00:00 AM”,“5/12/2010 12:00:00 AM”,“5/13/2010 12:00:00 AM”,“5/14/2010 12:00:00 AM”,“5/15/2010 12:00:00 AM”,“5/16/2010 12:00:00 AM”,“5/17/2010 12:00:00 AM”,“5/18/2010 12:00:00 AM”,“5/19/2010 12:00:00 AM”,“5/20/2010 12:00:00 AM”,“5/21/2010 12:00:00 AM”,“5/22/2010 12:00:00 AM”,“5/23/2010 12:00:00 AM”,“5/24/2010 12:00:00 AM”,“5/25/2010 12:00:00 AM”,“5/26/2010 12:00:00 AM”,“5/27/2010 12:00:00 AM”,“5/28/2010 12:00:00 AM”,“5/29/2010 12:00:00 AM”,“5/30/2010 12:00:00 AM”,“5/31/2010 12:00:00 AM”)

pred_ATM1a <- data.frame(list(DATE=datelist,ATM=atm1list,Cash=pred_ATM1))

pred_ATM2 <- ets(atm2s7) %>% forecast(h=31) pred_ATM2 <- pred_ATM2$mean atm2list <- rep(“ATM2”,31) pred_ATM2a <- data.frame(list(DATE=datelist,ATM=atm2list,Cash=pred_ATM2))

pred_ATM4 <- ets(atm4s7) %>% forecast(h=31) pred_ATM4 <- pred_ATM4$mean atm4list <- rep(“ATM4”,31) pred_ATM4a <- data.frame(list(DATE=datelist,ATM=atm4list,Cash=pred_ATM4))

pred_ATM_all <- bind_rows(pred_ATM1a, pred_ATM2a,pred_ATM4a)

write.csv(pred_ATM_all,“C:/Users/Swigo/Desktop/Sarah/DATA624_Predictive_Analytics/ATM_pred_all.csv”)

powerdata <- read.csv(“C:/Users/Swigo/Desktop/Sarah/DATA624_Predictive_Analytics/ResidentialCustomerForecastLoad.csv”, stringsAsFactors = FALSE) powerdata <- powerdata[,2:3] colnames(powerdata) <- c(“Date”,“Energy”) head(powerdata) powerdata <- ts(powerdata$Energy, start=c(1998,1), frequency=12) autoplot(powerdata) + ggtitle(“Residential Energy Usage”) + ylab(“Energy (kWh)”) summary(powerdata)

for (i in 1:length(powerdata)){ if(is.na(powerdata[i])){powerdata[i]<-(powerdata[i-1]+powerdata[i+1])/2}
}

powerdata %>% decompose(type=“additive”) %>% autoplot() + ggtitle(“Additive Decomposition of Residential Energy Usage”)

powerdata.train <- window(powerdata, end=c(2013,1)) powerdata.test <- window(powerdata, start=c(2013,2))

snaive_power <- snaive(powerdata.train,h=11) accuracy(snaive_power, powerdata.test)

fit_add_power <- hw(powerdata.train,seasonal=“additive”,damped = TRUE, h=11) fit_mult_power <- hw(powerdata.train,seasonal=“multiplicative”,damped = TRUE, h=11) accuracy(fit_add_power,powerdata.test) accuracy(fit_mult_power,powerdata.test)

ets_power <- ets(powerdata.train) ets_pred <- ets_power %>% forecast(h=11) summary(ets_power) accuracy(ets_pred,powerdata.test)

fit_arima_power <- auto.arima(powerdata.train) fit_arima_power arima_power_fc <- fit_arima_power %>% forecast(h=11) accuracy(arima_power_fc,powerdata.test)

autoplot(powerdata.test, series=“Energy Usage Test Data”, color=“black”) + autolayer(snaive_power, series=“Seasonal Naive Forecasts”, PI=FALSE) + autolayer(fit_add_power, series=“HW additive forecasts”, PI=FALSE) + autolayer(fit_mult_power, series=“HW multiplicative forecasts”, PI=FALSE) + autolayer(ets_pred, series=“ets forecasts”,PI=FALSE) + autolayer(arima_power_fc, series=“ARIMA (2,1,0) forecasts”, PI=FALSE) + xlab(“2013”) + ggtitle(“Residential Energy Usage”) + ylab(“Energy (kWh)”) + scale_x_continuous(breaks = c(2013.083,2013.170,2013.254,2013.334,2013.417,2013.5,2013.583,2013.666,2013.75,2013.833,2013.917), label = c(“Feb”,“Mar”,“Apr”,“May”,“June”,“July”,“Aug”,“Sep”,“Oct”,“Nov”,“Dec”)) + guides(colour=guide_legend(title=“Forecast”))

power_table <- data.frame(list(Seasonal_Naive=958105.2,HW_Additive=880078.5, HW_Multiplicative=880422.5,ETS=916743.3, ARIMA=1618718.3), stringsAsFactors=FALSE) rownames(power_table) <- c(“RMSE”) kable(power_table, caption=“Energy Usage”)

fit_mult_power_2014 <- hw(powerdata,seasonal=“multiplicative”,damped = TRUE, h=12) autoplot(powerdata) + autolayer(fit_mult_power_2014) + ggtitle(“Residential Energy Usage with Prediction for 2014”) + ylab(“Energy (kWh)”) write.csv(fit_mult_power_2014,“power_prediction_2014.csv”,sep=“,”)

DATA 624 Project 1

Sarah Wigodsky

2019-03-31

ATM Withdrawl Predictions

ATM Forecasts

Seasonal Naive Model

Holt Winters Model

Exponential Smoothing Model Model

ARIMA Model

Model Based on the Mean of the Data

Residential Power Usage Predictions

Seasonal Naive Model

Holt Winters’ Model

Simple Exponential Smoothing Model

ARIMA Model

R Code