Part A – ATM Forecast, ATM624Data.xlsx

This project consists of 3 parts - two required and one bonus and is worth 15% of your grade.

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

Import and Tidy Data

Imported the data, made corrections to dates. Dates didn’t look like dates to tidy the data I converted date’s column into dates with an origin of 12/30/1899, in the format of year-month-date, I came up with the date by manually picking a date until the start date of the date was the same at the start date on the excel sheet which was 05/01/2009. Then I looked for missing data, the data has 19 missing values. Removed rows with missing data in the ATM column as may case missing leading data. Cash column had five rows with missing data, to keep the integrity of the data I imputed the missing values by using MICE Function predictive mean matching.

library(readxl)
library(tidyr)
library(ggplot2)
library(stats)
library(mice)
## Warning: package 'mice' was built under R version 4.4.1
## 
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
## 
##     filter
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.1
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibbledata 0.4.1
## ✔ lubridate   1.9.4     ✔ feasts      0.4.1
## ✔ tsibble     1.1.6     ✔ fable       0.4.1
## Warning: package 'lubridate' was built under R version 4.4.1
## Warning: package 'tsibble' was built under R version 4.4.1
## Warning: package 'feasts' was built under R version 4.4.1
## Warning: package 'fabletools' was built under R version 4.4.1
## Warning: package 'fable' was built under R version 4.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks mice::filter(), stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(fable)
library(feasts)
library(writexl)
## Warning: package 'writexl' was built under R version 4.4.1
ATM_data<-read_excel("~/Downloads/ATM624Data.xlsx")
head(ATM_data)
## # A tibble: 6 × 3
##    DATE ATM    Cash
##   <dbl> <chr> <dbl>
## 1 39934 ATM1     96
## 2 39934 ATM2    107
## 3 39935 ATM1     82
## 4 39935 ATM2     89
## 5 39936 ATM1     85
## 6 39936 ATM2     90
#Corrections to dates
ATM_data$DATE<-as.Date(ATM_data$DATE, format="%Y-%m-%d", origin= "1899-12-30") #Manually origin dates to get the start date of 05/01/2009

summary(ATM_data) #shows 19 missing values
##       DATE                ATM                 Cash        
##  Min.   :2009-05-01   Length:1474        Min.   :    0.0  
##  1st Qu.:2009-08-01   Class :character   1st Qu.:    0.5  
##  Median :2009-11-01   Mode  :character   Median :   73.0  
##  Mean   :2009-10-31                      Mean   :  155.6  
##  3rd Qu.:2010-02-01                      3rd Qu.:  114.0  
##  Max.   :2010-05-14                      Max.   :10919.8  
##                                          NA's   :19
str(ATM_data) #check structure
## tibble [1,474 × 3] (S3: tbl_df/tbl/data.frame)
##  $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
##  $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: num [1:1474] 96 107 82 89 85 90 90 55 99 79 ...
#Made a subset list with only the rows that have missing data.
NaList<-ATM_data[!complete.cases(ATM_data),]
print(NaList)
## # A tibble: 19 × 3
##    DATE       ATM    Cash
##    <date>     <chr> <dbl>
##  1 2009-06-13 ATM1     NA
##  2 2009-06-16 ATM1     NA
##  3 2009-06-18 ATM2     NA
##  4 2009-06-22 ATM1     NA
##  5 2009-06-24 ATM2     NA
##  6 2010-05-01 <NA>     NA
##  7 2010-05-02 <NA>     NA
##  8 2010-05-03 <NA>     NA
##  9 2010-05-04 <NA>     NA
## 10 2010-05-05 <NA>     NA
## 11 2010-05-06 <NA>     NA
## 12 2010-05-07 <NA>     NA
## 13 2010-05-08 <NA>     NA
## 14 2010-05-09 <NA>     NA
## 15 2010-05-10 <NA>     NA
## 16 2010-05-11 <NA>     NA
## 17 2010-05-12 <NA>     NA
## 18 2010-05-13 <NA>     NA
## 19 2010-05-14 <NA>     NA
#Removed some missing data for ATM column
ATM_Clean<-ATM_data|>
  drop_na(ATM)

#Impute Missing data for Cash column
ATM_imputed<-mice(ATM_Clean, method = "pmm",#pmm=predictive mean matching
                m=5, #number of imputed dataset
                maxit=50,#number of max iteration
                seed=10) #set a seed
## 
##  iter imp variable
##   1   1  Cash
##   1   2  Cash
##   1   3  Cash
##   1   4  Cash
##   1   5  Cash
##   2   1  Cash
##   2   2  Cash
##   2   3  Cash
##   2   4  Cash
##   2   5  Cash
##   3   1  Cash
##   3   2  Cash
##   3   3  Cash
##   3   4  Cash
##   3   5  Cash
##   4   1  Cash
##   4   2  Cash
##   4   3  Cash
##   4   4  Cash
##   4   5  Cash
##   5   1  Cash
##   5   2  Cash
##   5   3  Cash
##   5   4  Cash
##   5   5  Cash
##   6   1  Cash
##   6   2  Cash
##   6   3  Cash
##   6   4  Cash
##   6   5  Cash
##   7   1  Cash
##   7   2  Cash
##   7   3  Cash
##   7   4  Cash
##   7   5  Cash
##   8   1  Cash
##   8   2  Cash
##   8   3  Cash
##   8   4  Cash
##   8   5  Cash
##   9   1  Cash
##   9   2  Cash
##   9   3  Cash
##   9   4  Cash
##   9   5  Cash
##   10   1  Cash
##   10   2  Cash
##   10   3  Cash
##   10   4  Cash
##   10   5  Cash
##   11   1  Cash
##   11   2  Cash
##   11   3  Cash
##   11   4  Cash
##   11   5  Cash
##   12   1  Cash
##   12   2  Cash
##   12   3  Cash
##   12   4  Cash
##   12   5  Cash
##   13   1  Cash
##   13   2  Cash
##   13   3  Cash
##   13   4  Cash
##   13   5  Cash
##   14   1  Cash
##   14   2  Cash
##   14   3  Cash
##   14   4  Cash
##   14   5  Cash
##   15   1  Cash
##   15   2  Cash
##   15   3  Cash
##   15   4  Cash
##   15   5  Cash
##   16   1  Cash
##   16   2  Cash
##   16   3  Cash
##   16   4  Cash
##   16   5  Cash
##   17   1  Cash
##   17   2  Cash
##   17   3  Cash
##   17   4  Cash
##   17   5  Cash
##   18   1  Cash
##   18   2  Cash
##   18   3  Cash
##   18   4  Cash
##   18   5  Cash
##   19   1  Cash
##   19   2  Cash
##   19   3  Cash
##   19   4  Cash
##   19   5  Cash
##   20   1  Cash
##   20   2  Cash
##   20   3  Cash
##   20   4  Cash
##   20   5  Cash
##   21   1  Cash
##   21   2  Cash
##   21   3  Cash
##   21   4  Cash
##   21   5  Cash
##   22   1  Cash
##   22   2  Cash
##   22   3  Cash
##   22   4  Cash
##   22   5  Cash
##   23   1  Cash
##   23   2  Cash
##   23   3  Cash
##   23   4  Cash
##   23   5  Cash
##   24   1  Cash
##   24   2  Cash
##   24   3  Cash
##   24   4  Cash
##   24   5  Cash
##   25   1  Cash
##   25   2  Cash
##   25   3  Cash
##   25   4  Cash
##   25   5  Cash
##   26   1  Cash
##   26   2  Cash
##   26   3  Cash
##   26   4  Cash
##   26   5  Cash
##   27   1  Cash
##   27   2  Cash
##   27   3  Cash
##   27   4  Cash
##   27   5  Cash
##   28   1  Cash
##   28   2  Cash
##   28   3  Cash
##   28   4  Cash
##   28   5  Cash
##   29   1  Cash
##   29   2  Cash
##   29   3  Cash
##   29   4  Cash
##   29   5  Cash
##   30   1  Cash
##   30   2  Cash
##   30   3  Cash
##   30   4  Cash
##   30   5  Cash
##   31   1  Cash
##   31   2  Cash
##   31   3  Cash
##   31   4  Cash
##   31   5  Cash
##   32   1  Cash
##   32   2  Cash
##   32   3  Cash
##   32   4  Cash
##   32   5  Cash
##   33   1  Cash
##   33   2  Cash
##   33   3  Cash
##   33   4  Cash
##   33   5  Cash
##   34   1  Cash
##   34   2  Cash
##   34   3  Cash
##   34   4  Cash
##   34   5  Cash
##   35   1  Cash
##   35   2  Cash
##   35   3  Cash
##   35   4  Cash
##   35   5  Cash
##   36   1  Cash
##   36   2  Cash
##   36   3  Cash
##   36   4  Cash
##   36   5  Cash
##   37   1  Cash
##   37   2  Cash
##   37   3  Cash
##   37   4  Cash
##   37   5  Cash
##   38   1  Cash
##   38   2  Cash
##   38   3  Cash
##   38   4  Cash
##   38   5  Cash
##   39   1  Cash
##   39   2  Cash
##   39   3  Cash
##   39   4  Cash
##   39   5  Cash
##   40   1  Cash
##   40   2  Cash
##   40   3  Cash
##   40   4  Cash
##   40   5  Cash
##   41   1  Cash
##   41   2  Cash
##   41   3  Cash
##   41   4  Cash
##   41   5  Cash
##   42   1  Cash
##   42   2  Cash
##   42   3  Cash
##   42   4  Cash
##   42   5  Cash
##   43   1  Cash
##   43   2  Cash
##   43   3  Cash
##   43   4  Cash
##   43   5  Cash
##   44   1  Cash
##   44   2  Cash
##   44   3  Cash
##   44   4  Cash
##   44   5  Cash
##   45   1  Cash
##   45   2  Cash
##   45   3  Cash
##   45   4  Cash
##   45   5  Cash
##   46   1  Cash
##   46   2  Cash
##   46   3  Cash
##   46   4  Cash
##   46   5  Cash
##   47   1  Cash
##   47   2  Cash
##   47   3  Cash
##   47   4  Cash
##   47   5  Cash
##   48   1  Cash
##   48   2  Cash
##   48   3  Cash
##   48   4  Cash
##   48   5  Cash
##   49   1  Cash
##   49   2  Cash
##   49   3  Cash
##   49   4  Cash
##   49   5  Cash
##   50   1  Cash
##   50   2  Cash
##   50   3  Cash
##   50   4  Cash
##   50   5  Cash
## Warning: Number of logged events: 1
#Check imputation
atm<-complete(ATM_imputed,1)

Time Series creation and Plotting

#Create a time series tibble
atm<-atm|>
  rename(Date=DATE)|>
  as_tsibble(index = Date, key= ATM)

#Check for missing data
summary(atm)
##       Date                ATM                 Cash        
##  Min.   :2009-05-01   Length:1460        Min.   :    0.0  
##  1st Qu.:2009-07-31   Class :character   1st Qu.:    0.0  
##  Median :2009-10-30   Mode  :character   Median :   73.0  
##  Mean   :2009-10-30                      Mean   :  155.2  
##  3rd Qu.:2010-01-29                      3rd Qu.:  114.0  
##  Max.   :2010-04-30                      Max.   :10919.8
#Plot
atm|>
  autoplot(Cash)+
  facet_wrap(~ATM, scales="free", nrow=4)+
  labs(title = "ATM time series")

Forecasting

ATM 1 and ATM 2 seem to have seasonality no trend, I would use ARIMA +seasonal. I did a model fit for ATM 1 and ATM 2 using ARIMA, Multiplicative, Damped Multiplicative, and, Seasonal naive, based on the conditions both ATM meet. After testing the model ARIMA had the best fit as it had the lowest RSME value (less error) and the same applied with ATM 2. For ATM 1 auto ARIMA used ARIMA(0,0,1)(0,1,1), ATM2 the auto ARIMA used ARIMA(2,0,2)(0,1,1). ATM 3 seems to only have a cash withdrawal after April 2010, I would say the best model to use for predicting May 2010 values would be to use a exponential smoothing. ATM 4 seems to a high peak in Feb 2010 which seems to be a clear outlier as the other withdraws way lower for the other dates, which can be dealt with by removing the value and imputing a new value using MICE and obatining a mean value that would be more reasonable for that date with the maxed out outlier.

#ATM1
ATM1<-atm|> #Test which model works best
  filter(ATM=="ATM1")|>
  model("ARIMA"= ARIMA(Cash),
        "Multiplicative" = ETS(Cash~ error("M")+ trend("A")+season("M")),
        "Damped Multiplicative"= ETS(Cash~error("M")+ trend("Ad")+season("M")),
        "Seasonal naive"=SNAIVE(Cash))


accuracy(ATM1)|>
  select(".model", "RMSE")
## # A tibble: 4 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 ARIMA                  25.1
## 2 Multiplicative         26.5
## 3 Damped Multiplicative  26.3
## 4 Seasonal naive         30.2
#which ARIMA was selected from auto ARIMA
ATM1|>
  select(.model="ARIMA")|>
  report()
## Series: Cash 
## Model: ARIMA(0,0,1)(0,1,1)[7] 
## 
## Coefficients:
##          ma1     sma1
##       0.1092  -0.6819
## s.e.  0.0557   0.0437
## 
## sigma^2 estimated as 647.3:  log likelihood=-1667.8
## AIC=3341.61   AICc=3341.68   BIC=3353.25
ATM1_forecast<- ATM1|>
  forecast(h="31 days")|>
  filter(.model=="ARIMA")

ATM1_forecast|>
  autoplot(atm)+
  labs(y="Cash", title="ATM 1 Cash Withdrawal Forecast for May 2010", x="Date")

#ATM2
ATM2<-atm|> #Test which model works best
  filter(ATM=="ATM2")|>
  model("ARIMA"= ARIMA(Cash),
        "Multiplicative" = ETS(Cash~ error("M")+ trend("A")+season("M")),
        "Damped Multiplicative"= ETS(Cash~error("M")+ trend("Ad")+season("M")),
        "Seasonal naive"=SNAIVE(Cash))


accuracy(ATM2)|>
  select(".model", "RMSE")
## # A tibble: 4 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 ARIMA                  24.5
## 2 Multiplicative         34.8
## 3 Damped Multiplicative  34.3
## 4 Seasonal naive         30.6
#which ARIMA was selected from auto ARIMA
ATM2|>
  select(.model="ARIMA")|>
  report()
## Series: Cash 
## Model: ARIMA(2,0,2)(0,1,1)[7] 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1
##       -0.4319  -0.9182  0.4715  0.8154  -0.7567
## s.e.   0.0548   0.0395  0.0859  0.0544   0.0384
## 
## sigma^2 estimated as 620:  log likelihood=-1658.81
## AIC=3329.62   AICc=3329.86   BIC=3352.91
ATM2_forecast<- ATM2|>
  forecast(h="31 days")|>
  filter(.model=="ARIMA")

ATM2_forecast|>
  autoplot(atm)+
  labs(y="Cash", title="ATM 1 Cash Withdrawal Forecast for May 2010", x="Date")

#ATM3
atm3<-atm|>
  filter(ATM=="ATM3")

#ETS
ATM3_Forecast<-atm3|>
  model(ETS(Cash))

ATM3_Forecast<-ATM3_Forecast|>
  forecast(h="31 days")

ATM3_Forecast|>
  autoplot(atm)+
  labs(y="Cash", title="ATM 3 Cash Withdrawal Forecast for May 2010", x="Date")

#ATM4
#Handling outlier using MICE
atm <- atm %>%
  mutate(
    Cash = ifelse(ATM == "ATM4" & Date == "2010-02-09", NA, Cash)  # Remove outlier 
  )

ATM4_imputed<-atm|>
  filter(ATM=="ATM4")|>
  mice(method = "pmm", #pmm=predictive mean matching
       m=5, 
       seed=10) 
## 
##  iter imp variable
##   1   1  Cash
##   1   2  Cash
##   1   3  Cash
##   1   4  Cash
##   1   5  Cash
##   2   1  Cash
##   2   2  Cash
##   2   3  Cash
##   2   4  Cash
##   2   5  Cash
##   3   1  Cash
##   3   2  Cash
##   3   3  Cash
##   3   4  Cash
##   3   5  Cash
##   4   1  Cash
##   4   2  Cash
##   4   3  Cash
##   4   4  Cash
##   4   5  Cash
##   5   1  Cash
##   5   2  Cash
##   5   3  Cash
##   5   4  Cash
##   5   5  Cash
## Warning: Number of logged events: 1
ATM4<-complete(ATM4_imputed,1)

ATM4_forecast<-ATM4|>
  as_tsibble(index = Date, key= ATM)|>
  model(ARIMA(Cash))

ATM4_forecast<-ATM4_forecast|>
  forecast(h="31 days")

ATM4_forecast|>
  autoplot(atm)+
  labs(title = "ATM 4 Cash Withdrawals Forecast for May 2010", y="Cash", x="Date")

Part A Conclusion

For each ATM different models were applied to forecast the withdrawals for each ATM in May 2010, due to the different patterns each ATM exhibited. The forecast predicts that ATM 3 will have a constant withdrawal for the month of May with no peaks, ATM 1 and 2 seem to a seasonal pattern but ATM 2 seems to have a higher cash withdrawal, overall ATM 4 was predicted to have the highest cash withdrawal that seems to decrease in variance and have a constant withdrawal as the data gets closer to the end of May.

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

Imported the data and made corrects to column names. The data is monthly, therefore I changed the “YYYY-MMM” column in monthly format and renmaed the column to “Month”. Checked for missing data, KWH was missing for September 2008. I decided to use MICE to imute for the missing value to avoid losing the integrity of the data and to get a better prediction for the monthly forecast of 2014.

Residential<-read_excel("~/Downloads/ResidentialCustomerForecastLoad-624.xlsx")
head(Residential)
## # A tibble: 6 × 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377
## 6          738 1998-Jun   6467147
#corrections to data
Residential_data<-Residential|>
  rename(Month="YYYY-MMM")|>
  mutate(Month=yearmonth(Month),
         KWH= as.numeric(KWH))|>
  as_tsibble(index=Month, key=KWH)

#Checked for missing data for KWH
summary(Residential_data$KWH)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##   770523  5429912  6283324  6502475  7620524 10655730        1
NaList2<-Residential_data[!complete.cases(Residential_data),]
print(NaList2)#Only missing data for sept 2008
## # A tsibble: 1 x 3 [1M]
## # Key:       KWH [1]
##   CaseSequence    Month   KWH
##          <dbl>    <mth> <dbl>
## 1          861 2008 Sep    NA
head(Residential)
## # A tibble: 6 × 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377
## 6          738 1998-Jun   6467147
RD_imputed<-Residential_data|>
  mice(method = "pmm", #pmm=predictive mean matching
       m=5, 
       seed=10) 
## 
##  iter imp variable
##   1   1  KWH
##   1   2  KWH
##   1   3  KWH
##   1   4  KWH
##   1   5  KWH
##   2   1  KWH
##   2   2  KWH
##   2   3  KWH
##   2   4  KWH
##   2   5  KWH
##   3   1  KWH
##   3   2  KWH
##   3   3  KWH
##   3   4  KWH
##   3   5  KWH
##   4   1  KWH
##   4   2  KWH
##   4   3  KWH
##   4   4  KWH
##   4   5  KWH
##   5   1  KWH
##   5   2  KWH
##   5   3  KWH
##   5   4  KWH
##   5   5  KWH
## Warning: Number of logged events: 1
Residential_data<-complete(RD_imputed,1)


Residential_data<-Residential_data|>
  as_tsibble(index = Month)

Explore time serie

The plot of the residential time series shows seasonality with a very slight upwards trend, there is also a decreasing peak a little after January 2010, which may just be an outlier. To deal with the outlier I will turn the value into a missing value and them impute a value using MICE. I believe getting the mean instead of just removing the value would allow me to keep the time series flowing and without hurting the prediction as the mean will be used to the value. Using ACF and PACF.

Residential_data|>
  autoplot(KWH)+
  labs(title = "Residential time series")

Residential_data <- Residential_data %>%
  mutate(
    KWH = replace(KWH, KWH == "770523", NA)  # Remove outlier 
  )

RD2_imputed<-Residential_data|>
  mice(method = "pmm", #pmm=predictive mean matching
       m=5, 
       seed=10) 
## 
##  iter imp variable
##   1   1  KWH
##   1   2  KWH
##   1   3  KWH
##   1   4  KWH
##   1   5  KWH
##   2   1  KWH
##   2   2  KWH
##   2   3  KWH
##   2   4  KWH
##   2   5  KWH
##   3   1  KWH
##   3   2  KWH
##   3   3  KWH
##   3   4  KWH
##   3   5  KWH
##   4   1  KWH
##   4   2  KWH
##   4   3  KWH
##   4   4  KWH
##   4   5  KWH
##   5   1  KWH
##   5   2  KWH
##   5   3  KWH
##   5   4  KWH
##   5   5  KWH
## Warning: Number of logged events: 1
Residential_data<-complete(RD2_imputed,1)

Residential_data<-Residential_data|>
    as_tsibble(index = Month)

#STL plot after imputation
Residential_data|>
  model(STL(KWH~ season(window="periodic"), robust= TRUE))|>
  components()|>
  autoplot()|>
  labs(title = "Residential STL Decomposition plot")
## [[1]]

## 
## $title
## [1] "Residential STL Decomposition plot"
## 
## attr(,"class")
## [1] "labels"
#Lambda for box-cox
lambda <- Residential_data|>
  features(KWH, features = guerrero) %>%
  pull(lambda_guerrero)

Forecasting

I fit the data into three different models ARIMA, ETS, and Seasonal Naive to see which would best for forecasting a prediction for the year of 2014. ARIMA was the best model as it had the lowest RSME. The auto ARIMA used ARIMA ARIMA(1,0,0)(2,1,0)[12] with a drift and a box-cox transformation of -0.212065. I used box-cox to take care of the residuals. The residual plot shows that the residuals were just noise.

Residential_forecast<-Residential_data|>
  as_tsibble(index = Month)|>
  model("ARIMA"=ARIMA(box_cox(KWH,lambda)),
        "ETS"=ETS(box_cox(KWH, lambda)),
        "Seasonal Naive"= SNAIVE(box_cox(KWH, lambda)))

accuracy(Residential_forecast)|>
  select(".model", "RMSE")
## # A tibble: 3 × 2
##   .model            RMSE
##   <chr>            <dbl>
## 1 ARIMA          663047.
## 2 ETS            644420.
## 3 Seasonal Naive 872607.
#which ARIMA was selected from auto ARIMA
Residential_forecast|>
  select(.model="ARIMA")|>
  report()
## Series: KWH 
## Model: ARIMA(1,0,0)(2,1,0)[12] w/ drift 
## Transformation: box_cox(KWH, lambda) 
## 
## Coefficients:
##          ar1     sar1     sar2  constant
##       0.2606  -0.7411  -0.3812     9e-04
## s.e.  0.0747   0.0730   0.0743     3e-04
## 
## sigma^2 estimated as 1.329e-05:  log likelihood=763.32
## AIC=-1516.65   AICc=-1516.3   BIC=-1500.68
#ARIMA had the lowest RSME
Residential_forecast<-Residential_forecast|>
  forecast(h="12 months")|>
  filter(.model=="ARIMA")

Residential_data |>
  model(ARIMA(KWH))|>
  gg_tsresiduals()

Residential_forecast|>
  autoplot(Residential_data)+
  labs(title = "Power Usage Forecast for 2014", y="Power Usage (KWH)", x="Month")

Part B Conclusion

The ARIMA model was the best model to use for the residential forecast of 2014. For the outlier, the a new value was imputed and for the residuals a box-cox transformation was applied. Looking at the prediction of 2014 the power usage seems to be at a constant rate with no real big trend.

#Final Forecast correction
Residential_forecast <-Residential_forecast|>
  as_tsibble(index = Month, key= KWH)|>
  select( Month, .mean) %>%
  rename(KWH = .mean)|>
  mutate(CaseSequence=925:936)|> #last case sequence was at 924
  relocate(CaseSequence)

Residential_forecast|>
  autoplot(KWH)+
  labs(title = "Power Usage 2014 Forecast")

write_xlsx(list("ATM_forecasts_May_2010"= ATMC,
                "Power_Forecast_2014"= Residential_forecast), "ATM_Residential_Forecast.xlsx") #created excel file with both part A and B forecasts