Part A – ATM Forecast

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

The first step is to read in the data and examine it to determine the next steps. We are reading in the data, converting it to a dataframe and converting the date into readable format.

setwd("/Users/elinaazrilyan/Downloads/")
atmdata <- readxl::read_excel("ATM624Data.xlsx", skip=0)
atmdf<-as.data.frame(atmdata)
atmdf$DATE <- as.Date(atmdf$DATE, origin = "1899-12-30")
summary(atmdf)
##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1474        Min.   :    0.0  
##  1st Qu.:2009-08-01   Class :character   1st Qu.:    0.5  
##  Median :2009-11-01   Mode  :character   Median :   73.0  
##  Mean   :2009-10-31                      Mean   :  155.6  
##  3rd Qu.:2010-02-01                      3rd Qu.:  114.0  
##  Max.   :2010-05-14                      Max.   :10919.8  
##                                          NA's   :19

Preparing the Data

We see that there are NAs, so the next step is to drop the rows with missing ATM and Cash values.

atmdf <- atmdf %>% filter((!(is.na(.$ATM))))
summary(atmdf)
##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1460        Min.   :    0.0  
##  1st Qu.:2009-07-31   Class :character   1st Qu.:    0.5  
##  Median :2009-10-30   Mode  :character   Median :   73.0  
##  Mean   :2009-10-30                      Mean   :  155.6  
##  3rd Qu.:2010-01-29                      3rd Qu.:  114.0  
##  Max.   :2010-04-30                      Max.   :10919.8  
##                                          NA's   :5

We can see that we have 1 year worth of data that ranges from 5/1/2009 to 4/30/2010. Cash withdrawals range from 0 to 1,091,980 which seems suspiciously high if we look at the mean (15,560) and median (7,300) values.

We will now separate the data for the 4 ATMs and examine those individually. We will aslo replace NAs with Median values for each ATM.

atm1 <- subset(atmdf, ATM == "ATM1")
atm2 <- subset(atmdf, ATM == "ATM2")
atm3 <- subset(atmdf, ATM == "ATM3")
atm4 <- subset(atmdf, ATM == "ATM4")

atm1$Cash <- atm1$Cash %>% replace(is.na(.), median(atm1$Cash, na.rm=T))
summary(atm1)
##       DATE                ATM                 Cash       
##  Min.   :2009-05-01   Length:365         Min.   :  1.00  
##  1st Qu.:2009-07-31   Class :character   1st Qu.: 73.00  
##  Median :2009-10-30   Mode  :character   Median : 91.00  
##  Mean   :2009-10-30                      Mean   : 83.95  
##  3rd Qu.:2010-01-29                      3rd Qu.:108.00  
##  Max.   :2010-04-30                      Max.   :180.00
atm2$Cash <- atm2$Cash %>% replace(is.na(.), median(atm2$Cash, na.rm=T))
summary(atm2)
##       DATE                ATM                 Cash      
##  Min.   :2009-05-01   Length:365         Min.   :  0.0  
##  1st Qu.:2009-07-31   Class :character   1st Qu.: 26.0  
##  Median :2009-10-30   Mode  :character   Median : 67.0  
##  Mean   :2009-10-30                      Mean   : 62.6  
##  3rd Qu.:2010-01-29                      3rd Qu.: 93.0  
##  Max.   :2010-04-30                      Max.   :147.0
atm3$Cash <- atm3$Cash %>% replace(is.na(.), median(atm3$Cash, na.rm=T))
summary(atm3)
##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:365         Min.   : 0.0000  
##  1st Qu.:2009-07-31   Class :character   1st Qu.: 0.0000  
##  Median :2009-10-30   Mode  :character   Median : 0.0000  
##  Mean   :2009-10-30                      Mean   : 0.7206  
##  3rd Qu.:2010-01-29                      3rd Qu.: 0.0000  
##  Max.   :2010-04-30                      Max.   :96.0000
atm4$Cash <- atm4$Cash %>% replace(is.na(.), median(atm4$Cash, na.rm=T))
summary(atm4)
##       DATE                ATM                 Cash          
##  Min.   :2009-05-01   Length:365         Min.   :    1.563  
##  1st Qu.:2009-07-31   Class :character   1st Qu.:  124.334  
##  Median :2009-10-30   Mode  :character   Median :  403.839  
##  Mean   :2009-10-30                      Mean   :  474.043  
##  3rd Qu.:2010-01-29                      3rd Qu.:  704.507  
##  Max.   :2010-04-30                      Max.   :10919.762

The next step is to convert the data into time series format and see if any additional adjustments and clean up will be needed.

atm1ts <- ts(atm1["Cash"], frequency = 7)
atm2ts <- ts(atm2["Cash"], frequency = 7)
atm3ts <- ts(atm3["Cash"], frequency = 7)
atm4ts <- ts(atm4["Cash"], frequency = 7)

autoplot(atm1ts)