library(fpp3)

## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --

## v tibble      3.1.4      v tsibble     1.0.1 
## v dplyr       1.0.7      v tsibbledata 0.3.0 
## v tidyr       1.1.3      v feasts      0.2.2 
## v lubridate   1.7.10     v fable       0.3.1 
## v ggplot2     3.3.5

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

library(dplyr)
library(latex2exp)
library(lubridate)
library(ggplot2)
library(fabletools)
library(mice)

## 
## Attaching package: 'mice'

## The following object is masked from 'package:stats':
## 
##     filter

## The following objects are masked from 'package:base':
## 
##     cbind, rbind

rm(list=ls())



ATM_loc<-"C:/Users/Lisa/OneDrive/CUNY/624/Project/ATM624Data.csv"

ATM<-read.table(file=ATM_loc,header=TRUE, sep=",")

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

str(ATM)

## 'data.frame':    1474 obs. of  3 variables:
##  $ DATE: chr  "05/01/09" "05/01/09" "05/02/09" "05/02/09" ...
##  $ ATM : chr  "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: int  96 107 82 89 85 90 90 55 99 79 ...

Convert the date to date format

ATM$DATE <- as.Date(ATM$DATE,"%m/%d/%y")
str(ATM)

## 'data.frame':    1474 obs. of  3 variables:
##  $ DATE: Date, format: "2009-05-01" "2009-05-01" ...
##  $ ATM : chr  "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: int  96 107 82 89 85 90 90 55 99 79 ...

ATM Problem

Let’s take a quick look at the four ATM’s:

ATM %>%
  ggplot(aes(x=DATE, y=Cash, colour=ATM))+
  geom_line() +
  facet_grid(ATM~., scales="free_y")

## Warning: Removed 14 row(s) containing missing values (geom_path).

Overview ATM

A visual inspection reveals some missing values should be dealt with prior to further analysis.

The ATM’s should be analyzed as 4 separate time series, because there are different things going on at each ATM.

The location and convenience of the ATM will differ from ATM to ATM. From the plot, there are a couple of striking issues that need to be addressed.

ATM1 and ATM2 would be good candidates to model using ARIMA or SNAIVE.

ATM3 appears to have activity at the last few dates, suggesting that it is just put in service. If possible, I would confirm this with the data source.

ATM4 has huge day of withdrawals on 2/9/10 in the order of 1,000,000. This suggests an error; it is unlikely that the ATM has the capacity to hold 1,000,000 and dispense it as well. If possible, I would confirm this with the data source. An educated guess would be to impute this erroneous value to the mean or other methodology to impute.

Create four separate timeseries datasets for ATM1, ATM2, ATM3, ATM4 respectively..

Let’s begin with ATM1

ATM1 Analysis

ATM1<-ATM %>% 
filter (ATM=="ATM1") %>%
  as_tsibble(index=DATE)

ATM1 %>% 
  autoplot()+labs(title="ATM1", subtitle="Cash withdrawals by day", y="$ 00's")

## Plot variable not specified, automatically selected `.vars = Cash`

sum(is.na(ATM1$Cash))

## [1] 3

md.pattern(ATM1)

##     DATE ATM Cash  
## 362    1   1    1 0
## 3      1   1    0 1
##        0   0    3 3

gg_season(ATM1, y=Cash, period=7)

## Warning: Removed 1 row(s) containing missing values (geom_path).

ATM1 Analysis continued

There are 3 missing values. The plot above suggests 7-day seasonality with pattern changes.

Notice that the green weeks show no activity on a Tuesday, and in later weeks there is no activity on a Thursday. This suggests the ATM is probably in service and the service scheduled changed over time.

Because there are only 3 missing values, let’s look at the missing dates and go back 1 week and forward 1 week to see the values. If the two values are similar, this will assure us that the change in operations to service on a certain day did not occur at that time.

Then, we can average the two values and impute into the missing.

#Missing Values
M1<-ATM1 %>%
  filter(is.na(Cash))
M1

## # A tsibble: 3 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-13 ATM1     NA
## 2 2009-06-16 ATM1     NA
## 3 2009-06-22 ATM1     NA

#Let's deal with the 1st missing value at 6/13/21

#Go back 1 week and forward 1 week
M2<-ATM1 %>%
  filter (DATE==as.Date("2009-06-06") | DATE==as.Date("2009-06-20"))

M2

## # A tsibble: 2 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-06 ATM1     96
## 2 2009-06-20 ATM1    110

#Let's average these two and impute into 6/13/09
(110+96)/2

## [1] 103

#103.

#get row index

which ((ATM1$DATE==as.Date("2009-06-13")))

## [1] 44

#This says row 44 so impute 103 into Cash

ATM1[44,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-13 ATM1     NA

ATM1[44,3]<-103
ATM1[44,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-13 ATM1    103

#####################################################

###################################################
# Moving on to the second missing value  
###############################################
M1<-ATM1 %>%
  filter(is.na(Cash))
M1

## # A tsibble: 2 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-16 ATM1     NA
## 2 2009-06-22 ATM1     NA

# Let's calculate a value for 6/16/21 looking back 1 week and forward 1 week.

M2<-ATM1 %>%
  filter (DATE==as.Date("2009-06-09") | DATE==as.Date("2009-06-23"))

(145+108)/2

## [1] 126.5

#Use 127 as imputed value for 6/16/21

#get row index

which ((ATM1$DATE==as.Date("2009-06-16")))

## [1] 47

#This says row 47 so impute 127 into Cash

ATM1[47,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-16 ATM1     NA

ATM1[47,3]<-127
ATM1[47,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-16 ATM1    127

##################################
# Moving on to the third missing value 
####################################

M1<-ATM1 %>%
  filter(is.na(Cash))
M1

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-22 ATM1     NA

# Let's calculate a value for 6/22/21 looking back 1 week and forward 1 week.

M2<-ATM1 %>%
  filter (DATE==as.Date("2009-06-15") | DATE==as.Date("2009-06-29"))

M2

## # A tsibble: 2 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-15 ATM1    106
## 2 2009-06-29 ATM1     98

(106+98)/2

## [1] 102

#Use 102 as imputed value for 6/22/21

#get row index

which ((ATM1$DATE==as.Date("2009-06-22")))

## [1] 53

#This says row 53 so impute 102 into Cash

ATM1[53,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-22 ATM1     NA

ATM1[53,3]<-102
ATM1[53,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-22 ATM1    102

#double check all missing gone
sum(is.na(ATM1$Cash))

## [1] 0

ATM1 Analysis continued

Let’s take a closer look at ATM1:

ATM1%>%
  autoplot(Cash)

Box-Cox Transformation ATM1

Transform the data to reduce the variability

lambda1<-ATM1%>%
  features(Cash,features=guerrero)%>%
  pull(lambda_guerrero)

lambda1

## [1] 0.3551715

ATM1<-ATM1%>% mutate(newCash=(((Cash^lambda1)-1)/lambda1))


  ATM1%>%
  autoplot(box_cox(Cash, lambda1))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed Cash with $$\\lambda1$ =",
                    round(lambda1,2))))

  lambda1

## [1] 0.3551715

ATM1%>%autoplot(newCash)

Decomposition ATM1

Let’s do a decomposition:

ATM1%>%
  model(classical_decomposition(newCash, type="additive")) %>%
  components () %>%
  autoplot()+ labs(title="Classical additive decomposition of ATM1")

## Warning: Removed 3 row(s) containing missing values (geom_path).

Seasonality ATM1

We see a strong seasonality component.

Take a closer look…..

#test if uncorrelated with itself. 
ATM1 %>% 
  features(newCash,ljung_box)

## # A tibble: 1 x 2
##   lb_stat lb_pvalue
##     <dbl>     <dbl>
## 1   0.217     0.642

#do not need differencing
ATM1 %>% features(newCash, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

#this confirms no differencing needed

#lets check seasonal differencing
ATM1 %>% features(newCash, unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

#we need to take 1st seasonal difference

ATM1<-ATM1 %>% 
  mutate(diff_Cash = (difference(newCash,7)))
ATM1%>% autoplot(diff_Cash)

## Warning: Removed 7 row(s) containing missing values (geom_path).

ATM1%>% ACF((diff_Cash))%>% autoplot()

ATM1%>% PACF((diff_Cash))%>% autoplot()

ATM1 Analysis continued

Our series has seasonality. We took the first seasonal difference.

ATM1 Model fitting

Let’s fit some models.

###########################################
#Fit SNAIVE model
##########################################

fitATM1_2<-ATM1 %>%
  model(SNAIVE(newCash~lag(7)))  

report(fitATM1_2)

## Series: newCash 
## Model: SNAIVE 
## 
## sigma^2: 4.6893

##SNAIVE  Residual SD=sqrt(4.6893)=2.206649

###########################################
#Fit ETS model
##########################################
fitATM1_3<-ATM1 %>%
  model(ETS(newCash))

report(fitATM1_3)

## Series: newCash 
## Model: ETS(A,N,A) 
##   Smoothing parameters:
##     alpha = 0.0001000197 
##     gamma = 0.3586853 
## 
##   Initial states:
##      l[0]      s[0]     s[-1]    s[-2]     s[-3]    s[-4]     s[-5]   s[-6]
##  10.06064 -6.329706 0.7406964 1.802599 0.7464719 1.001276 0.6964825 1.34218
## 
##   sigma^2:  3.4407
## 
##      AIC     AICc      BIC 
## 2615.372 2615.993 2654.371

#ETS(A,N,A)  Residual SD=sqrt(3.4407)=1.854912
#AICc=2616



###########################################
#Fit ARIMA model
##########################################
fitATM1_1<-ATM1 %>%
  model(ARIMA(diff_Cash))

report(fitATM1_1)

## Series: diff_Cash 
## Model: ARIMA(2,0,0)(2,0,0)[7] 
## 
## Coefficients:
##          ar1      ar2     sar1     sar2
##       0.1062  -0.0935  -0.5162  -0.2291
## s.e.  0.0528   0.0525   0.0512   0.0509
## 
## sigma^2 estimated as 3.516:  log likelihood=-735.59
## AIC=1481.19   AICc=1481.35   BIC=1500.69

#ARIMA(2,0,0)(2,0,0) Residual SD=sqrt(3.516)=1.8751
# AICc=1481

glance(fitATM1_2)

## # A tibble: 1 x 2
##   .model                   sigma2
##   <chr>                     <dbl>
## 1 SNAIVE(newCash ~ lag(7))   4.69

glance(fitATM1_1)

## # A tibble: 1 x 8
##   .model           sigma2 log_lik   AIC  AICc   BIC ar_roots   ma_roots 
##   <chr>             <dbl>   <dbl> <dbl> <dbl> <dbl> <list>     <list>   
## 1 ARIMA(diff_Cash)   3.52   -736. 1481. 1481. 1501. <cpl [16]> <cpl [0]>

glance(fitATM1_3)

## # A tibble: 1 x 9
##   .model       sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE   MAE
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(newCash)   3.44  -1298. 2615. 2616. 2654.  3.36  3.37  1.03

ATM1 Model choice

Comparing the sigma^2 and AICc, the ARIMA model is superior.

Take a look at the residuals and ACF.

fitATM1_1 %>%
  gg_tsresiduals()

## Warning: Removed 7 row(s) containing missing values (geom_path).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing non-finite values (stat_bin).

ATM1 Residuals

Residuals and ACF, PACF looks satisfactory

Forecast ATM1

Now forecast for ATM1

forecast(fitATM1_1, h=31) %>%
  autoplot(ATM1)

## Warning: Removed 7 row(s) containing missing values (geom_path).

fitATM1_1 %>%
  forecast(h=31) %>%
  autoplot(ATM1)

## Warning: Removed 7 row(s) containing missing values (geom_path).

ATM2 ANALYSIS

MOVING ON TO ATM2

ATM2<-ATM %>% 
filter (ATM=="ATM2") %>%
  as_tsibble(index=DATE)

ATM2 %>% 
  autoplot()+labs(title="ATM2", subtitle="Cash withdrawals by day", y="$ 00's")

## Plot variable not specified, automatically selected `.vars = Cash`

sum(is.na(ATM2$Cash))

## [1] 2

md.pattern(ATM2)

##     DATE ATM Cash  
## 363    1   1    1 0
## 2      1   1    0 1
##        0   0    2 2

gg_season(ATM2, y=Cash, period=7)

## ATM2 Preliminary analysis

There are 2 missing values. The plot above suggests 7-day seasonality with pattern changes.

Like ATM1, the green weeks show no activity on a Tuesday, and in later weeks there is no activity on a Thursday. This suggests the ATM is probably in service and the service scheduled changed over time.

Because there are only 2 missing values, let’s look at the missing dates and go back 1 week and forward 1 week to see the values. If the two values are similar, this will assure us that the change in operations to service on a certain day did not occur at that time.

Then, we can average the two values and impute into the missing.

#Missing Values
m1<-ATM2 %>%
  filter(is.na(Cash))
m1

## # A tsibble: 2 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-18 ATM2     NA
## 2 2009-06-24 ATM2     NA

#Let's deal with the 1st missing value at 6/18/21

#Go back 1 week and forward 1 week
m2<-ATM2 %>%
  filter (DATE==as.Date("2009-06-11") | DATE==as.Date("2009-06-25"))

m2

## # A tsibble: 2 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-11 ATM2      7
## 2 2009-06-25 ATM2      3

#Let's average these two and impute into 6/18/09
(3+7)/2

## [1] 5

#5.

#get row index

which ((ATM2$DATE==as.Date("2009-06-18")))

## [1] 49

#This says row 49 so impute 5 into Cash

ATM2[49,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-18 ATM2     NA

ATM2[49,3]<-5
ATM2[49,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-18 ATM2      5

#####################################################

###################################################
# Moving on to the second missing value  
m1<-ATM2 %>%
  filter(is.na(Cash))
m1

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-24 ATM2     NA

# Let's calculate a value for 6/24/21 looking back 1 week and forward 1 week.

m2<-ATM2 %>%
  filter (DATE==as.Date("2009-06-17") | DATE==as.Date("2009-07-01"))
m2

## # A tsibble: 2 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-17 ATM2     24
## 2 2009-07-01 ATM2     36

#Let's average these two and impute into 6/24/09
(24+36)/2

## [1] 30

#Use 30 as imputed value for 6/24/21

#get row index

which ((ATM2$DATE==as.Date("2009-06-24")))

## [1] 55

#This says row 55 so impute 30 into Cash

ATM2[55,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-24 ATM2     NA

ATM2[55,3]<-30
ATM2[55,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2009-06-24 ATM2     30

#double check all missing gone
sum(is.na(ATM2$Cash))

## [1] 0

ATM2 Analysis

Let’s take a closer look at ATM2:

ATM2%>%
  autoplot(Cash)

Box-Cox Transformation ATM2

Transform the data to reduce the variability

lambda2<-ATM2%>%
  features(Cash,features=guerrero)%>%
  pull(lambda_guerrero)

lambda2

## [1] 0.6343478

ATM2<-ATM2%>% mutate(newCash=(((Cash^lambda2)-1)/lambda2))


  ATM2%>%
  autoplot(box_cox(Cash, lambda2))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "ATM2: Transformed Cash with $$\\lambda2$ =",
                    round(lambda2,2))))

  lambda2

## [1] 0.6343478

ATM2%>%autoplot(newCash)

ATM2 Decomposition

Let’s do a decomposition:

ATM2%>%
  model(classical_decomposition(newCash, type="additive")) %>%
  components () %>%
  autoplot()+ labs(title="Classical additive decomposition of ATM2")

## Warning: Removed 3 row(s) containing missing values (geom_path).

## ATM2 Analysis continued

We see a strong seasonality component.

Take a closer look…..

#test if uncorrelated with itself. 
ATM2 %>% 
  features(newCash,ljung_box)

## # A tibble: 1 x 2
##   lb_stat lb_pvalue
##     <dbl>     <dbl>
## 1   0.919     0.338

#do not need differencing
ATM2 %>% features(newCash, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      1

#this confirms no differencing needed

#lets check seasonal differencing
ATM2 %>% features(newCash, unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

#we need to take 1st seasonal difference

ATM2<-ATM2 %>% 
  mutate(diff_Cash = (difference(newCash,7)))
ATM2%>% autoplot(diff_Cash)

## Warning: Removed 7 row(s) containing missing values (geom_path).

#Let's look at ACF, PACF
ATM2%>% ACF((diff_Cash))%>% autoplot()

ATM2%>% PACF((diff_Cash))%>% autoplot()

ATM2 Analysis continued

These plots show some issues, but not too bad. Let’s continue.

Our series has seasonality. We took the first seasonal difference.

ATM2 Model fitting

Let’s fit some models. Begin with SNAIVE model, then Exponential, ARIMA.

###########################################
#Fit SNAIVE model
##########################################

fitATM2_2<-ATM2 %>%
  model(SNAIVE(newCash~lag(7)))  

report(fitATM2_2)

## Series: newCash 
## Model: SNAIVE 
## 
## sigma^2: 47.905

##SNAIVE  Residual SD=sqrt(47.905)=6.921344

###########################################
#Fit ETS model
##########################################
fitATM2_3<-ATM2 %>%
  model(ETS(newCash))

report(fitATM2_3)

## Series: newCash 
## Model: ETS(A,N,A) 
##   Smoothing parameters:
##     alpha = 0.000100013 
##     gamma = 0.4058325 
## 
##   Initial states:
##      l[0]      s[0]     s[-1]    s[-2]     s[-3]    s[-4]    s[-5]    s[-6]
##  20.61538 -12.32673 -5.992331 2.521689 -2.685606 2.841684 8.392067 7.249227
## 
##   sigma^2:  35.2982
## 
##      AIC     AICc      BIC 
## 3465.148 3465.770 3504.147

#ETS(A,N,A)  Residual SD=sqrt(35.2982)=5.941229
#AICc=3466



###########################################
#Fit ARIMA model
##########################################
fitATM2_1<-ATM2 %>%
  model(ARIMA(diff_Cash))

report(fitATM2_1)

## Series: diff_Cash 
## Model: ARIMA(0,0,0)(2,0,0)[7] 
## 
## Coefficients:
##          sar1     sar2
##       -0.5518  -0.1692
## s.e.   0.0520   0.0516
## 
## sigma^2 estimated as 35.39:  log likelihood=-1149.91
## AIC=2305.83   AICc=2305.9   BIC=2317.53

#ARIMA(0,0,0)(2,0,0) Residual SD=sqrt(35.39)=5.948949
#AICc=2306

glance(fitATM2_2)

## # A tibble: 1 x 2
##   .model                   sigma2
##   <chr>                     <dbl>
## 1 SNAIVE(newCash ~ lag(7))   47.9

glance(fitATM2_1)

## # A tibble: 1 x 8
##   .model           sigma2 log_lik   AIC  AICc   BIC ar_roots   ma_roots 
##   <chr>             <dbl>   <dbl> <dbl> <dbl> <dbl> <list>     <list>   
## 1 ARIMA(diff_Cash)   35.4  -1150. 2306. 2306. 2318. <cpl [14]> <cpl [0]>

glance(fitATM2_3)

## # A tibble: 1 x 9
##   .model       sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE   MAE
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(newCash)   35.3  -1723. 3465. 3466. 3504.  34.4  34.5  4.07

ATM2 model selection

Comparing the sigma^2, AICc the ARIMA model is superior.

Take a look at the residuals and ACF.

fitATM2_1 %>%
  gg_tsresiduals()

## Warning: Removed 7 row(s) containing missing values (geom_path).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing non-finite values (stat_bin).

ATM2 Residuals on ARIMA

Residuals look satisfactory.

Forecast for ATM2

forecast(fitATM2_1, h=31) %>%
  autoplot () + labs(title="ATM2 ARIMA", subtitle="Cash withdrawals by day", y="$ 00's")

fitATM2_1 %>%
  forecast(h=31) %>%
  autoplot(ATM2)

## Warning: Removed 7 row(s) containing missing values (geom_path).

ATM 3 analysis

Let’s move on to ATM3

ATM3<-ATM %>% 
filter (ATM=="ATM3") %>%
  as_tsibble(index=DATE)

ATM3 %>% 
  autoplot()+labs(title="ATM3", subtitle="Cash withdrawals by day", y="$ 00's")

## Plot variable not specified, automatically selected `.vars = Cash`

sum(is.na(ATM3$Cash))

## [1] 0

Overview ATM3

From the plot above, the ATM3 dispenses no money until the last three days: 96,82,85.

If possible, I would confirm this with the data source.

With only three datapoints, I would approach this in two ways.

A 1) Naive model or a 2) model which uses only the last three points and takes the mean.

Let’s begin with a Naive model, followed by taking the Mean….

####################################
#NAIVE
###################################

fitATM3_1<-ATM3 %>%
  model(NAIVE(Cash))

report(fitATM3_1)

## Series: Cash 
## Model: NAIVE 
## 
## sigma^2: 25.8985

##NAIVE  Residual SD=sqrt(25.8985)=5.089057

###########################################
#Fit  model with 3 values MEAN
#Start the series at the 1st dispensing
#Series only will contain 3 values
##########################################
ATM3a<-ATM3 %>% 
  filter (DATE >= as.Date("2010-04-28"))

fitATM3_2<-ATM3a%>%
  model(MEAN(Cash))

report(fitATM3_2)

## Series: Cash 
## Model: MEAN 
## 
## Mean: 87.6667 
## sigma^2: 54.3333

##MEAN  Residual SD=sqrt(54.3333)=7.371113

glance(fitATM3_1)

## # A tibble: 1 x 2
##   .model      sigma2
##   <chr>        <dbl>
## 1 NAIVE(Cash)   25.9

glance(fitATM3_2)

## # A tibble: 1 x 2
##   .model     sigma2
##   <chr>       <dbl>
## 1 MEAN(Cash)   54.3

ATM3 Model selection

Comparing the sigma^2, the NAIVE model is superior.

Take a look at the residuals and ACF.

fitATM3_1 %>%
  gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

## ATM3 model

The naive model uses the last value to forecast. Given only three values, this is a sound approach. The residual, acf plots are not useful.

#ATM3 Forecast

Now forecast for ATM3

forecast(fitATM3_1, h=31) %>%
  autoplot() + labs(title="ATM3", subtitle="Cash withdrawals by day", y="$ 00's")

fitATM3_1 %>%
  forecast(h=31) %>%
  autoplot(ATM3)

# ATM 4 Analysis

MOVING ON TO the LAST ATM4……..

ATM4<-ATM %>% 
filter (ATM=="ATM4") %>%
  as_tsibble(index=DATE)

ATM4 %>% 
  autoplot()+labs(title="ATM4", subtitle="Cash withdrawals by day", y="$ 00's")

## Plot variable not specified, automatically selected `.vars = Cash`

sum(is.na(ATM4$Cash))

## [1] 0

ATM4 Preliminary analysis

We have no missing values, but one outlier cash withdrawal of over 1 million dollars on 2/9/2020.

This suggests an error. It is unlikely that the ATM has the capacity to hold 1,000,000 and dispense it as well. If possible, I would confirm this with the data source. An educated guess would be to impute this erroneous value to the mean or other methodology to impute.

#here is the erroneous withdrawal
ATM4[285,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2010-02-09 ATM4  10920

#Let's see 7 days prior and 7 days after
ATM4[278,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2010-02-02 ATM4    153

ATM4[292,]

## # A tsibble: 1 x 3 [1D]
##   DATE       ATM    Cash
##   <date>     <chr> <int>
## 1 2010-02-16 ATM4    989

#This values are not in the same order of magnitude.
#Let's use an overall mean for this value
mean(ATM4$Cash)

## [1] 474.0137

ATM4[285,3]<-round(mean(ATM4$Cash),0)

#Let's look at the timeseries after we imputed
ATM4 %>%
  autoplot() + labs(title="ATM4")

## Plot variable not specified, automatically selected `.vars = Cash`

gg_season(ATM4, y=Cash, period=7)

#Seasonality not well organized, test to follow.

ATM4 box-cox transformation

Transform the data to reduce the variability

lambda4<-ATM4%>%
  features(Cash,features=guerrero)%>%
  pull(lambda_guerrero)

lambda4

## [1] 0.4039733

ATM4<-ATM4%>% mutate(newCash=(((Cash^lambda4)-1)/lambda4))


  ATM4%>%
  autoplot(box_cox(Cash, lambda4))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed Cash with $$\\lambda4$ =",
                    round(lambda4,2))))

  lambda4

## [1] 0.4039733

ATM4%>%autoplot(newCash)

ATM4 decomposition

Let’s do a decomposition:

ATM4%>%
  model(classical_decomposition(newCash, type="additive")) %>%
  components () %>%
  autoplot()+ labs(title="Classical additive decomposition of ATM4")

## Warning: Removed 3 row(s) containing missing values (geom_path).

## ATM4 analysis continued

We see a strong seasonality component.

Take a closer look…..

#test if uncorrelated with itself. 
ATM4 %>% 
  features(newCash,ljung_box)

## # A tibble: 1 x 2
##   lb_stat lb_pvalue
##     <dbl>     <dbl>
## 1   0.907     0.341

#do not need differencing
ATM4 %>% features(newCash, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

#this confirms no differencing needed

#lets check seasonal differencing
ATM4 %>% features(newCash, unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       0

#we need to take 1st seasonal difference

ATM4<-ATM4 %>% 
  mutate(diff_Cash = (difference(newCash,7)))
ATM4%>% autoplot(diff_Cash)

## Warning: Removed 7 row(s) containing missing values (geom_path).

ATM4%>% ACF((diff_Cash))%>% autoplot()

ATM4%>% PACF((diff_Cash))%>% autoplot()

ATM 4 plot analysis

These plots show some issues, but look ok. Let’s continue.

Our series has seasonality. We took the first seasonal difference.

ATM4 Model fitting

Let’s fit some models.

###########################################
#Fit SNAIVE model
##########################################

fitATM4_1<-ATM4 %>%
  model(SNAIVE(newCash~lag(7)))  

report(fitATM4_1)

## Series: newCash 
## Model: SNAIVE 
## 
## sigma^2: 172.4506

##SNAIVE  Residual SD=sqrt(172.4506 )=13.13204

###########################################
#Fit ETS model
##########################################
fitATM4_2<-ATM4 %>%
  model(ETS(newCash))

report(fitATM4_2)

## Series: newCash 
## Model: ETS(M,N,A) 
##   Smoothing parameters:
##     alpha = 0.01525782 
##     gamma = 0.1700593 
## 
##   Initial states:
##      l[0]      s[0]     s[-1]    s[-2]    s[-3]    s[-4]    s[-5]    s[-6]
##  21.65918 -14.61678 -4.779729 3.479644 2.528753 4.726212 3.294735 5.367169
## 
##   sigma^2:  0.1946
## 
##      AIC     AICc      BIC 
## 3836.200 3836.821 3875.199

#ETS(A,N,A)   Residual SD=sqrt(0.1946)=0.4411349
#AICc=3837




###########################################
#Fit ARIMA model
##########################################
fitATM4_3<-ATM4 %>%
  model(ARIMA(diff_Cash))

report(fitATM4_3)

## Series: diff_Cash 
## Model: ARIMA(1,0,0)(2,0,0)[7] 
## 
## Coefficients:
##          ar1     sar1     sar2
##       0.1011  -0.6366  -0.2833
## s.e.  0.0530   0.0514   0.0513
## 
## sigma^2 estimated as 117.9:  log likelihood=-1365.33
## AIC=2738.67   AICc=2738.78   BIC=2754.27

#ARIMA(1,0,0)(2,0,0) Residual SD=sqrt(117.9)=10.85818
#AICc=2739

glance(fitATM4_1)

## # A tibble: 1 x 2
##   .model                   sigma2
##   <chr>                     <dbl>
## 1 SNAIVE(newCash ~ lag(7))   172.

glance(fitATM4_2)

## # A tibble: 1 x 9
##   .model       sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE   MAE
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(newCash)  0.195  -1908. 3836. 3837. 3875.  99.2  99.6 0.346

glance(fitATM4_3)

## # A tibble: 1 x 8
##   .model           sigma2 log_lik   AIC  AICc   BIC ar_roots   ma_roots 
##   <chr>             <dbl>   <dbl> <dbl> <dbl> <dbl> <list>     <list>   
## 1 ARIMA(diff_Cash)   118.  -1365. 2739. 2739. 2754. <cpl [15]> <cpl [0]>

ATM4 model selection ARIMA

Comparing the sigma^2, AICc, the ARIMA model is superior.

ATM4 Residual diagnostic review

Take a look at the residuals and ACF.

fitATM4_3 %>%
  gg_tsresiduals()

## Warning: Removed 7 row(s) containing missing values (geom_path).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing non-finite values (stat_bin).

ATM4 plot review

ACF, PACF residuals look satisfactory

Forecast ATM4

Now forecast for ATM4

forecast(fitATM4_3, h=31) %>%
  autoplot () + labs(title="ATM4 ARIMA", subtitle="Cash withdrawals by day", y="$ 00's")

fitATM4_3 %>%
  forecast(h=31) %>%
  autoplot(ATM4)+ labs(title="ATM4 ARIMA", subtitle="Cash withdrawals by day", y="$ 00's")

## Warning: Removed 7 row(s) containing missing values (geom_path).

## Output to EXCEL

Get forecasts for ATM4 to excel.

forecastATM1_1 <- forecast(fitATM1_1, h=31)
forecastATM2_1 <- forecast(fitATM2_1, h=31)
forecastATM3_1 <- forecast(fitATM3_1, h=31)
forecastATM4_3 <- forecast(fitATM4_3, h=31)

#write.csv(forecastATM1_1,"C:/Users/Lisa/OneDrive/CUNY/624/Project/forecastATM1_1.csv", row.names = TRUE)

#write.csv(forecastATM2_1,"C:/Users/Lisa/OneDrive/CUNY/624/Project/forecastATM2_1.csv", row.names = TRUE)

#write.csv(forecastATM3_1,"C:/Users/Lisa/OneDrive/CUNY/624/Project/forecastATM3_1.csv", row.names = TRUE)

#write.csv(forecastATM4_3,"C:/Users/Lisa/OneDrive/CUNY/624/Project/forecastATM4_3.csv", row.names = TRUE)

#write.csv(ATM1,"C:/Users/Lisa/OneDrive/CUNY/624/Project/ATM1.csv", row.names = TRUE)

#write.csv(ATM2,"C:/Users/Lisa/OneDrive/CUNY/624/Project/ATM2.csv", row.names = TRUE)

#write.csv(ATM3,"C:/Users/Lisa/OneDrive/CUNY/624/Project/ATM3.csv", row.names = TRUE)

#write.csv(ATM4,"C:/Users/Lisa/OneDrive/CUNY/624/Project/ATM4.csv", row.names = TRUE)

PART B - FORECASTING POWER

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

Power usage Exploration

Let’s explore the file.

Power_csv<-"C:/Users/Lisa/OneDrive/CUNY/624/Project/ResidentialCustomerForecastLoad-624.csv"

Power<-read.table(file=Power_csv,header=TRUE, sep=",")

str(Power)

## 'data.frame':    192 obs. of  3 variables:
##  $ CaseSequence: int  733 734 735 736 737 738 739 740 741 742 ...
##  $ YYYY.MMM    : chr  "1998-Jan" "1998-Feb" "1998-Mar" "1998-Apr" ...
##  $ KWH         : int  6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428 6989888 6345620 ...

Power - Preliminary analysis

A plot of the power timeseries

Put into timeseries structure

PowerUse<-Power %>%
  mutate (Month=yearmonth(YYYY.MMM)) %>%
  as_tsibble(index=Month)

PowerUse1<-PowerUse %>%
  select(Month,KWH)

PowerUse1 %>% 
  autoplot() + labs (title="Power by month", subtitle="Residential Power usage", y="Kwh")

## Plot variable not specified, automatically selected `.vars = KWH`

Power Pre-Processing

Check for missing values

sum(is.na(PowerUse1$KWH))

## [1] 1

#There is a missing value



gg_season(PowerUse1, y=KWH, period=12)

Power Pre-Processing Discussion

There is clear seasonality and 1 missing value and 1 outlier of in July. This is most probably an error.

Impute these values by taking the prior year + subsequent year average.

K1<-PowerUse1 %>%
  filter(is.na(KWH))
K1

## # A tsibble: 1 x 2 [1M]
##      Month   KWH
##      <mth> <int>
## 1 2008 Sep    NA

#Let's deal with the 1st missing value at Sep 2008
PowerUse[129,]

## # A tsibble: 1 x 4 [1M]
##   CaseSequence YYYY.MMM   KWH    Month
##          <int> <chr>    <int>    <mth>
## 1          861 2008-Sep    NA 2008 Sep

#Go back 1 year and forward 1 year by subtracting 12 to the row id
PowerUse1[141,]

## # A tsibble: 1 x 2 [1M]
##      Month     KWH
##      <mth>   <int>
## 1 2009 Sep 7583146

PowerUse[117,]

## # A tsibble: 1 x 4 [1M]
##   CaseSequence YYYY.MMM     KWH    Month
##          <int> <chr>      <int>    <mth>
## 1          849 2007-Sep 7666970 2007 Sep

#Let's average these two and impute into Sep 2008
(PowerUse1[141,2]+PowerUse1[117,2])/2

##       KWH
## 1 7625058

#7625058.

#get row index

#row 129

#This  row 129 so impute 7625058 into KWH

PowerUse1[129,]

## # A tsibble: 1 x 2 [1M]
##      Month   KWH
##      <mth> <int>
## 1 2008 Sep    NA

PowerUse1[129,2]<-7625058
PowerUse1[129,]

## # A tsibble: 1 x 2 [1M]
##      Month     KWH
##      <mth>   <int>
## 1 2008 Sep 7625058

# now check the month where KWH is 0 and impute 

PowerUse1[129,]

## # A tsibble: 1 x 2 [1M]
##      Month     KWH
##      <mth>   <int>
## 1 2008 Sep 7625058

Power - Error in July

As seen from the following plot, there is an outlier in July.

PowerUse1 %>% 
  autoplot() + labs (title="Power by month", subtitle="Residential Power usage", y="Kwh")

## Plot variable not specified, automatically selected `.vars = KWH`

## Error????

This is probably an error, lets take a closer look at July values.

ErrJuly<-PowerUse %>%
  filter(PowerUse$CaseSequence==883)

ErrJuly

## # A tsibble: 1 x 4 [1M]
##   CaseSequence YYYY.MMM    KWH    Month
##          <int> <chr>     <int>    <mth>
## 1          883 2010-Jul 770523 2010 Jul

#  Let's look at all other July Values


PowerJuly<-PowerUse %>%
  filter (PowerUse$CaseSequence==739 |
          PowerUse$CaseSequence==751 |
          PowerUse$CaseSequence==763 |
          PowerUse$CaseSequence==775 |
          PowerUse$CaseSequence==787 |
          PowerUse$CaseSequence==799 |
          PowerUse$CaseSequence==811 |
          PowerUse$CaseSequence==823 |
          PowerUse$CaseSequence==835 |
          PowerUse$CaseSequence==847 |
          PowerUse$CaseSequence==859 |
          PowerUse$CaseSequence==871 |
          PowerUse$CaseSequence==895 |
          PowerUse$CaseSequence==907 |
          PowerUse$CaseSequence==919 |
          PowerUse$CaseSequence==751)

PowerJuly

## # A tsibble: 15 x 4 [1M]
##    CaseSequence YYYY.MMM      KWH    Month
##           <int> <chr>       <int>    <mth>
##  1          739 1998-Jul  8914755 1998 Jul
##  2          751 1999-Jul  7444416 1999 Jul
##  3          763 2000-Jul  6862662 2000 Jul
##  4          775 2001-Jul  7855129 2001 Jul
##  5          787 2002-Jul  7039702 2002 Jul
##  6          799 2003-Jul  6900676 2003 Jul
##  7          811 2004-Jul  7307931 2004 Jul
##  8          823 2005-Jul  8337998 2005 Jul
##  9          835 2006-Jul  7945564 2006 Jul
## 10          847 2007-Jul  6426220 2007 Jul
## 11          859 2008-Jul  7643987 2008 Jul
## 12          871 2009-Jul  7713260 2009 Jul
## 13          895 2011-Jul 10093343 2011 Jul
## 14          907 2012-Jul  8886851 2012 Jul
## 15          919 2013-Jul  8415321 2013 Jul

Power July - Outlier

The outlier value in July 2010 = 770,523 while all other July values are in the the 7,000,000 range.

The error most likely is the omission of 1 digit.

Let’s impute the median value of all other July KWH for the July 2010.

median(PowerJuly$KWH)

## [1] 7713260

#This value of 7,713,260 will be imputed into the data set 

PowerUse1[151,]

## # A tsibble: 1 x 2 [1M]
##      Month    KWH
##      <mth>  <int>
## 1 2010 Jul 770523

PowerUse1[151,2]<-7713260
PowerUse1[151,]

## # A tsibble: 1 x 2 [1M]
##      Month     KWH
##      <mth>   <int>
## 1 2010 Jul 7713260

#Let's look at the autoplot now that data set has been cleaned.

Dataset Plot after cleaning

Let’s look at the autoplot now that data set has been cleaned.

PowerUse1 %>% 
  autoplot() + labs (title="Power by month", subtitle="Residential Power usage", y="Kwh")

## Plot variable not specified, automatically selected `.vars = KWH`

Power data - boxcox transformation

Transform the data to reduce the variability

lambdaP<-PowerUse1%>%
  features(KWH,features=guerrero)%>%
  pull(lambda_guerrero)

lambdaP

## [1] -0.2045992

PowerUse1<-PowerUse1%>% mutate(newKWH=(((KWH^lambdaP)-1)/lambdaP))


  PowerUse1%>%
  autoplot(box_cox(KWH, lambdaP))+
             labs(y="",
                  title=latex2exp::TeX(paste0(
                    "Transformed KWH with $$\\lambdaP$ =",
                    round(lambdaP,2))))

  lambdaP

## [1] -0.2045992

PowerUse1%>%autoplot(newKWH)

Power Decomposition

Let’s do a decomposition:

PowerUse1%>%
  model(classical_decomposition(newKWH, type="additive")) %>%
  components () %>%
  autoplot()+ labs(title="Classical additive decomposition of PowerUse1")

## Warning: Removed 6 row(s) containing missing values (geom_path).

Power decomposition

Strong seasonality and some trend is revealed

Take a closer look…..

#test if uncorrelated with itself. 
PowerUse1 %>% 
  features(newKWH,ljung_box)

## # A tibble: 1 x 2
##   lb_stat lb_pvalue
##     <dbl>     <dbl>
## 1    44.5  2.52e-11

#do need differencing
PowerUse1 %>% features(newKWH, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      1

#this confirms we need 1st difference needed

#lets check seasonal differencing
PowerUse1 %>% features(newKWH, unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

#we need to take 1st seasonal difference, 1st difference

PowerUse1<-PowerUse1 %>% 
  mutate(diff_KWH = difference(difference(newKWH,12),1))
         
PowerUse1%>% autoplot(diff_KWH)

## Warning: Removed 13 row(s) containing missing values (geom_path).

PowerUse1%>% ACF((diff_KWH))%>% autoplot()

PowerUse1%>% PACF((diff_KWH))%>% autoplot()

Power - analysis continued

Our series has both seasonality and trend We took the first seasonal difference and first difference.

Let’s fit some models.

###########################################
#Fit SNAIVE model
##########################################

fitKWH1<-PowerUse1 %>%
  model(SNAIVE(KWH~lag(12)))  

report(fitKWH1)

## Series: KWH 
## Model: SNAIVE 
## 
## sigma^2: 651107743136.935

##SNAIVE  Residual SD=sqrt(651107743136.935)=806912.5

#This model is not a good fit.

###########################################
#Fit ETS model using transformed data
##########################################
fitKWH2<-PowerUse1 %>%
  model(ETS(newKWH))

report(fitKWH2)

## Series: newKWH 
## Model: ETS(A,A,A) 
##   Smoothing parameters:
##     alpha = 0.1080076 
##     beta  = 0.0001036622 
##     gamma = 0.0001019441 
## 
##   Initial states:
##      l[0]         b[0]         s[0]      s[-1]        s[-2]       s[-3]
##  4.686777 5.207743e-05 -0.001669313 -0.0111128 -0.005659824 0.006889337
##       s[-4]       s[-5]        s[-6]       s[-7]        s[-8]        s[-9]
##  0.01020079 0.008017251 0.0009451794 -0.01036705 -0.007704751 -0.002665824
##       s[-10]      s[-11]
##  0.003441106 0.009685905
## 
##   sigma^2:  0
## 
##       AIC      AICc       BIC 
## -1118.018 -1114.501 -1062.641

#ETS(A,N,A)  Residual SD=sqrt(0)=0
#AICc=-1114.501



###########################################
#Fit ARIMA model
##########################################
fitKWH3<-PowerUse1 %>%
  model(ARIMA(diff_KWH))

report(fitKWH3)

## Series: diff_KWH 
## Model: ARIMA(4,0,0)(2,0,0)[12] 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4     sar1     sar2
##       -0.5768  -0.5113  -0.1397  -0.2131  -0.7424  -0.4093
## s.e.   0.0766   0.0881   0.0877   0.0740   0.0747   0.0746
## 
## sigma^2 estimated as 1.369e-05:  log likelihood=740.31
## AIC=-1466.61   AICc=-1466   BIC=-1443.81

#ARIMA(4,0,0)(2,0,0)[12] Residual SD=sqrt(1.369e-05)=0.0037
# AICc=-1466

glance(fitKWH1)

## # A tibble: 1 x 2
##   .model                       sigma2
##   <chr>                         <dbl>
## 1 SNAIVE(KWH ~ lag(12)) 651107743137.

glance(fitKWH2)

## # A tibble: 1 x 9
##   .model         sigma2 log_lik    AIC   AICc    BIC       MSE      AMSE     MAE
##   <chr>           <dbl>   <dbl>  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>   <dbl>
## 1 ETS(newKWH) 0.0000141    576. -1118. -1115. -1063. 0.0000129 0.0000135 0.00278

glance(fitKWH3)

## # A tibble: 1 x 8
##   .model             sigma2 log_lik    AIC   AICc    BIC ar_roots   ma_roots 
##   <chr>               <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <list>     <list>   
## 1 ARIMA(diff_KWH) 0.0000137    740. -1467. -1466. -1444. <cpl [28]> <cpl [0]>

Power model fit analysis

Comparing the sigma^2 and AICc. We see negative AICc.

Investigation of a negative AICc????????

"One question students often have about AIC is: How do I interpret negative AIC values?

The simple answer: The lower the value for AIC, the better the fit of the model. The absolute value of the AIC value is not important. It can be positive or negative." from https://www.statology.org/negative-aic/

Power Model to USE?

The ARIMA model is superior.

Take a look at the residuals and ACF.

fitKWH3 %>%
  gg_tsresiduals()

## Warning: Removed 13 row(s) containing missing values (geom_path).

## Warning: Removed 13 rows containing missing values (geom_point).

## Warning: Removed 13 rows containing non-finite values (stat_bin).

FORECASTING THE POWER WITH ARIMA

Below, we code the forecasting…….FORECASTING THE POWER WITH ARIMA

forecast(fitKWH3, h=12) %>%
  autoplot () + labs(title="Power KWH ARIMA", subtitle="KWH by month", y="KWH")

fitKWH3 %>%
  forecast(h=12) %>%
  autoplot(PowerUse1)

## Warning: Removed 13 row(s) containing missing values (geom_path).

## Export to Excel

Get forecasts for ATM4 to excel.

forecastKWH3 <- forecast(fitKWH3, h=12)


#write.csv(forecastKWH3,"C:/Users/Lisa/OneDrive/CUNY/624/Project/forecastKWH3.csv", row.names = TRUE)

Project 1 624 102521

Lisa Szydziak

10/25/2021

ATM Problem

Overview ATM

ATM1 Analysis

ATM1 Analysis continued

ATM1 Analysis continued

Box-Cox Transformation ATM1

Decomposition ATM1

Seasonality ATM1

ATM1 Analysis continued

ATM1 Model fitting

ATM1 Model choice

ATM1 Residuals

Forecast ATM1

ATM2 ANALYSIS

ATM2 Analysis

Box-Cox Transformation ATM2

ATM2 Decomposition

ATM2 Analysis continued

ATM2 Model fitting

ATM2 model selection

ATM2 Residuals on ARIMA

Forecast for ATM2

ATM 3 analysis

Overview ATM3

ATM3 Model selection

ATM4 Preliminary analysis

ATM4 box-cox transformation

ATM4 decomposition

ATM 4 plot analysis

ATM4 Model fitting

ATM4 model selection ARIMA

ATM4 Residual diagnostic review

ATM4 plot review

Forecast ATM4

PART B - FORECASTING POWER

Power usage Exploration

Power - Preliminary analysis

Power Pre-Processing

Power Pre-Processing Discussion

Power - Error in July

Power July - Outlier

Dataset Plot after cleaning

Power data - boxcox transformation

Power Decomposition

Power decomposition

Power - analysis continued

Power model fit analysis

Investigation of a negative AICc????????

Power Model to USE?

FORECASTING THE POWER WITH ARIMA