This project consists of 3 parts - two required and one bonus and is worth 15% of your grade.
In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
Imported the data, made corrections to dates. Dates didn’t look like dates to tidy the data I converted date’s column into dates with an origin of 12/30/1899, in the format of year-month-date, I came up with the date by manually picking a date until the start date of the date was the same at the start date on the excel sheet which was 05/01/2009. Then I looked for missing data, the data has 19 missing values. Removed rows with missing data in the ATM column as may case missing leading data. Cash column had five rows with missing data, to keep the integrity of the data I imputed the missing values by using MICE Function predictive mean matching.
library(readxl)
library(tidyr)
library(ggplot2)
library(stats)
library(mice)
## Warning: package 'mice' was built under R version 4.4.1
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.1
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibbledata 0.4.1
## ✔ lubridate 1.9.4 ✔ feasts 0.4.1
## ✔ tsibble 1.1.6 ✔ fable 0.4.1
## Warning: package 'lubridate' was built under R version 4.4.1
## Warning: package 'tsibble' was built under R version 4.4.1
## Warning: package 'feasts' was built under R version 4.4.1
## Warning: package 'fabletools' was built under R version 4.4.1
## Warning: package 'fable' was built under R version 4.4.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks mice::filter(), stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(fable)
library(feasts)
library(writexl)
## Warning: package 'writexl' was built under R version 4.4.1
ATM_data<-read_excel("~/Downloads/ATM624Data.xlsx")
head(ATM_data)
## # A tibble: 6 × 3
## DATE ATM Cash
## <dbl> <chr> <dbl>
## 1 39934 ATM1 96
## 2 39934 ATM2 107
## 3 39935 ATM1 82
## 4 39935 ATM2 89
## 5 39936 ATM1 85
## 6 39936 ATM2 90
#Corrections to dates
ATM_data$DATE<-as.Date(ATM_data$DATE, format="%Y-%m-%d", origin= "1899-12-30") #Manually origin dates to get the start date of 05/01/2009
summary(ATM_data) #shows 19 missing values
## DATE ATM Cash
## Min. :2009-05-01 Length:1474 Min. : 0.0
## 1st Qu.:2009-08-01 Class :character 1st Qu.: 0.5
## Median :2009-11-01 Mode :character Median : 73.0
## Mean :2009-10-31 Mean : 155.6
## 3rd Qu.:2010-02-01 3rd Qu.: 114.0
## Max. :2010-05-14 Max. :10919.8
## NA's :19
str(ATM_data) #check structure
## tibble [1,474 × 3] (S3: tbl_df/tbl/data.frame)
## $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
## $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
## $ Cash: num [1:1474] 96 107 82 89 85 90 90 55 99 79 ...
#Made a subset list with only the rows that have missing data.
NaList<-ATM_data[!complete.cases(ATM_data),]
print(NaList)
## # A tibble: 19 × 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2009-06-13 ATM1 NA
## 2 2009-06-16 ATM1 NA
## 3 2009-06-18 ATM2 NA
## 4 2009-06-22 ATM1 NA
## 5 2009-06-24 ATM2 NA
## 6 2010-05-01 <NA> NA
## 7 2010-05-02 <NA> NA
## 8 2010-05-03 <NA> NA
## 9 2010-05-04 <NA> NA
## 10 2010-05-05 <NA> NA
## 11 2010-05-06 <NA> NA
## 12 2010-05-07 <NA> NA
## 13 2010-05-08 <NA> NA
## 14 2010-05-09 <NA> NA
## 15 2010-05-10 <NA> NA
## 16 2010-05-11 <NA> NA
## 17 2010-05-12 <NA> NA
## 18 2010-05-13 <NA> NA
## 19 2010-05-14 <NA> NA
#Removed some missing data for ATM column
ATM_Clean<-ATM_data|>
drop_na(ATM)
#Impute Missing data for Cash column
ATM_imputed<-mice(ATM_Clean, method = "pmm",#pmm=predictive mean matching
m=5, #number of imputed dataset
maxit=50,#number of max iteration
seed=10) #set a seed
##
## iter imp variable
## 1 1 Cash
## 1 2 Cash
## 1 3 Cash
## 1 4 Cash
## 1 5 Cash
## 2 1 Cash
## 2 2 Cash
## 2 3 Cash
## 2 4 Cash
## 2 5 Cash
## 3 1 Cash
## 3 2 Cash
## 3 3 Cash
## 3 4 Cash
## 3 5 Cash
## 4 1 Cash
## 4 2 Cash
## 4 3 Cash
## 4 4 Cash
## 4 5 Cash
## 5 1 Cash
## 5 2 Cash
## 5 3 Cash
## 5 4 Cash
## 5 5 Cash
## 6 1 Cash
## 6 2 Cash
## 6 3 Cash
## 6 4 Cash
## 6 5 Cash
## 7 1 Cash
## 7 2 Cash
## 7 3 Cash
## 7 4 Cash
## 7 5 Cash
## 8 1 Cash
## 8 2 Cash
## 8 3 Cash
## 8 4 Cash
## 8 5 Cash
## 9 1 Cash
## 9 2 Cash
## 9 3 Cash
## 9 4 Cash
## 9 5 Cash
## 10 1 Cash
## 10 2 Cash
## 10 3 Cash
## 10 4 Cash
## 10 5 Cash
## 11 1 Cash
## 11 2 Cash
## 11 3 Cash
## 11 4 Cash
## 11 5 Cash
## 12 1 Cash
## 12 2 Cash
## 12 3 Cash
## 12 4 Cash
## 12 5 Cash
## 13 1 Cash
## 13 2 Cash
## 13 3 Cash
## 13 4 Cash
## 13 5 Cash
## 14 1 Cash
## 14 2 Cash
## 14 3 Cash
## 14 4 Cash
## 14 5 Cash
## 15 1 Cash
## 15 2 Cash
## 15 3 Cash
## 15 4 Cash
## 15 5 Cash
## 16 1 Cash
## 16 2 Cash
## 16 3 Cash
## 16 4 Cash
## 16 5 Cash
## 17 1 Cash
## 17 2 Cash
## 17 3 Cash
## 17 4 Cash
## 17 5 Cash
## 18 1 Cash
## 18 2 Cash
## 18 3 Cash
## 18 4 Cash
## 18 5 Cash
## 19 1 Cash
## 19 2 Cash
## 19 3 Cash
## 19 4 Cash
## 19 5 Cash
## 20 1 Cash
## 20 2 Cash
## 20 3 Cash
## 20 4 Cash
## 20 5 Cash
## 21 1 Cash
## 21 2 Cash
## 21 3 Cash
## 21 4 Cash
## 21 5 Cash
## 22 1 Cash
## 22 2 Cash
## 22 3 Cash
## 22 4 Cash
## 22 5 Cash
## 23 1 Cash
## 23 2 Cash
## 23 3 Cash
## 23 4 Cash
## 23 5 Cash
## 24 1 Cash
## 24 2 Cash
## 24 3 Cash
## 24 4 Cash
## 24 5 Cash
## 25 1 Cash
## 25 2 Cash
## 25 3 Cash
## 25 4 Cash
## 25 5 Cash
## 26 1 Cash
## 26 2 Cash
## 26 3 Cash
## 26 4 Cash
## 26 5 Cash
## 27 1 Cash
## 27 2 Cash
## 27 3 Cash
## 27 4 Cash
## 27 5 Cash
## 28 1 Cash
## 28 2 Cash
## 28 3 Cash
## 28 4 Cash
## 28 5 Cash
## 29 1 Cash
## 29 2 Cash
## 29 3 Cash
## 29 4 Cash
## 29 5 Cash
## 30 1 Cash
## 30 2 Cash
## 30 3 Cash
## 30 4 Cash
## 30 5 Cash
## 31 1 Cash
## 31 2 Cash
## 31 3 Cash
## 31 4 Cash
## 31 5 Cash
## 32 1 Cash
## 32 2 Cash
## 32 3 Cash
## 32 4 Cash
## 32 5 Cash
## 33 1 Cash
## 33 2 Cash
## 33 3 Cash
## 33 4 Cash
## 33 5 Cash
## 34 1 Cash
## 34 2 Cash
## 34 3 Cash
## 34 4 Cash
## 34 5 Cash
## 35 1 Cash
## 35 2 Cash
## 35 3 Cash
## 35 4 Cash
## 35 5 Cash
## 36 1 Cash
## 36 2 Cash
## 36 3 Cash
## 36 4 Cash
## 36 5 Cash
## 37 1 Cash
## 37 2 Cash
## 37 3 Cash
## 37 4 Cash
## 37 5 Cash
## 38 1 Cash
## 38 2 Cash
## 38 3 Cash
## 38 4 Cash
## 38 5 Cash
## 39 1 Cash
## 39 2 Cash
## 39 3 Cash
## 39 4 Cash
## 39 5 Cash
## 40 1 Cash
## 40 2 Cash
## 40 3 Cash
## 40 4 Cash
## 40 5 Cash
## 41 1 Cash
## 41 2 Cash
## 41 3 Cash
## 41 4 Cash
## 41 5 Cash
## 42 1 Cash
## 42 2 Cash
## 42 3 Cash
## 42 4 Cash
## 42 5 Cash
## 43 1 Cash
## 43 2 Cash
## 43 3 Cash
## 43 4 Cash
## 43 5 Cash
## 44 1 Cash
## 44 2 Cash
## 44 3 Cash
## 44 4 Cash
## 44 5 Cash
## 45 1 Cash
## 45 2 Cash
## 45 3 Cash
## 45 4 Cash
## 45 5 Cash
## 46 1 Cash
## 46 2 Cash
## 46 3 Cash
## 46 4 Cash
## 46 5 Cash
## 47 1 Cash
## 47 2 Cash
## 47 3 Cash
## 47 4 Cash
## 47 5 Cash
## 48 1 Cash
## 48 2 Cash
## 48 3 Cash
## 48 4 Cash
## 48 5 Cash
## 49 1 Cash
## 49 2 Cash
## 49 3 Cash
## 49 4 Cash
## 49 5 Cash
## 50 1 Cash
## 50 2 Cash
## 50 3 Cash
## 50 4 Cash
## 50 5 Cash
## Warning: Number of logged events: 1
#Check imputation
atm<-complete(ATM_imputed,1)
#Create a time series tibble
atm<-atm|>
rename(Date=DATE)|>
as_tsibble(index = Date, key= ATM)
#Check for missing data
summary(atm)
## Date ATM Cash
## Min. :2009-05-01 Length:1460 Min. : 0.0
## 1st Qu.:2009-07-31 Class :character 1st Qu.: 0.0
## Median :2009-10-30 Mode :character Median : 73.0
## Mean :2009-10-30 Mean : 155.2
## 3rd Qu.:2010-01-29 3rd Qu.: 114.0
## Max. :2010-04-30 Max. :10919.8
#Plot
atm|>
autoplot(Cash)+
facet_wrap(~ATM, scales="free", nrow=4)+
labs(title = "ATM time series")
ATM 1 and ATM 2 seem to have seasonality no trend, I would use ARIMA +seasonal. I did a model fit for ATM 1 and ATM 2 using ARIMA, Multiplicative, Damped Multiplicative, and, Seasonal naive, based on the conditions both ATM meet. After testing the model ARIMA had the best fit as it had the lowest RSME value (less error) and the same applied with ATM 2. For ATM 1 auto ARIMA used ARIMA(0,0,1)(0,1,1), ATM2 the auto ARIMA used ARIMA(2,0,2)(0,1,1). ATM 3 seems to only have a cash withdrawal after April 2010, I would say the best model to use for predicting May 2010 values would be to use a exponential smoothing. ATM 4 seems to a high peak in Feb 2010 which seems to be a clear outlier as the other withdraws way lower for the other dates, which can be dealt with by removing the value and imputing a new value using MICE and obatining a mean value that would be more reasonable for that date with the maxed out outlier.
#ATM1
ATM1<-atm|> #Test which model works best
filter(ATM=="ATM1")|>
model("ARIMA"= ARIMA(Cash),
"Multiplicative" = ETS(Cash~ error("M")+ trend("A")+season("M")),
"Damped Multiplicative"= ETS(Cash~error("M")+ trend("Ad")+season("M")),
"Seasonal naive"=SNAIVE(Cash))
accuracy(ATM1)|>
select(".model", "RMSE")
## # A tibble: 4 × 2
## .model RMSE
## <chr> <dbl>
## 1 ARIMA 25.1
## 2 Multiplicative 26.5
## 3 Damped Multiplicative 26.3
## 4 Seasonal naive 30.2
#which ARIMA was selected from auto ARIMA
ATM1|>
select(.model="ARIMA")|>
report()
## Series: Cash
## Model: ARIMA(0,0,1)(0,1,1)[7]
##
## Coefficients:
## ma1 sma1
## 0.1092 -0.6819
## s.e. 0.0557 0.0437
##
## sigma^2 estimated as 647.3: log likelihood=-1667.8
## AIC=3341.61 AICc=3341.68 BIC=3353.25
ATM1_forecast<- ATM1|>
forecast(h="31 days")|>
filter(.model=="ARIMA")
ATM1_forecast|>
autoplot(atm)+
labs(y="Cash", title="ATM 1 Cash Withdrawal Forecast for May 2010", x="Date")
#ATM2
ATM2<-atm|> #Test which model works best
filter(ATM=="ATM2")|>
model("ARIMA"= ARIMA(Cash),
"Multiplicative" = ETS(Cash~ error("M")+ trend("A")+season("M")),
"Damped Multiplicative"= ETS(Cash~error("M")+ trend("Ad")+season("M")),
"Seasonal naive"=SNAIVE(Cash))
accuracy(ATM2)|>
select(".model", "RMSE")
## # A tibble: 4 × 2
## .model RMSE
## <chr> <dbl>
## 1 ARIMA 24.5
## 2 Multiplicative 34.8
## 3 Damped Multiplicative 34.3
## 4 Seasonal naive 30.6
#which ARIMA was selected from auto ARIMA
ATM2|>
select(.model="ARIMA")|>
report()
## Series: Cash
## Model: ARIMA(2,0,2)(0,1,1)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1
## -0.4319 -0.9182 0.4715 0.8154 -0.7567
## s.e. 0.0548 0.0395 0.0859 0.0544 0.0384
##
## sigma^2 estimated as 620: log likelihood=-1658.81
## AIC=3329.62 AICc=3329.86 BIC=3352.91
ATM2_forecast<- ATM2|>
forecast(h="31 days")|>
filter(.model=="ARIMA")
ATM2_forecast|>
autoplot(atm)+
labs(y="Cash", title="ATM 1 Cash Withdrawal Forecast for May 2010", x="Date")
#ATM3
atm3<-atm|>
filter(ATM=="ATM3")
#ETS
ATM3_Forecast<-atm3|>
model(ETS(Cash))
ATM3_Forecast<-ATM3_Forecast|>
forecast(h="31 days")
ATM3_Forecast|>
autoplot(atm)+
labs(y="Cash", title="ATM 3 Cash Withdrawal Forecast for May 2010", x="Date")
#ATM4
#Handling outlier using MICE
atm <- atm %>%
mutate(
Cash = ifelse(ATM == "ATM4" & Date == "2010-02-09", NA, Cash) # Remove outlier
)
ATM4_imputed<-atm|>
filter(ATM=="ATM4")|>
mice(method = "pmm", #pmm=predictive mean matching
m=5,
seed=10)
##
## iter imp variable
## 1 1 Cash
## 1 2 Cash
## 1 3 Cash
## 1 4 Cash
## 1 5 Cash
## 2 1 Cash
## 2 2 Cash
## 2 3 Cash
## 2 4 Cash
## 2 5 Cash
## 3 1 Cash
## 3 2 Cash
## 3 3 Cash
## 3 4 Cash
## 3 5 Cash
## 4 1 Cash
## 4 2 Cash
## 4 3 Cash
## 4 4 Cash
## 4 5 Cash
## 5 1 Cash
## 5 2 Cash
## 5 3 Cash
## 5 4 Cash
## 5 5 Cash
## Warning: Number of logged events: 1
ATM4<-complete(ATM4_imputed,1)
ATM4_forecast<-ATM4|>
as_tsibble(index = Date, key= ATM)|>
model(ARIMA(Cash))
ATM4_forecast<-ATM4_forecast|>
forecast(h="31 days")
ATM4_forecast|>
autoplot(atm)+
labs(title = "ATM 4 Cash Withdrawals Forecast for May 2010", y="Cash", x="Date")
For each ATM different models were applied to forecast the withdrawals for each ATM in May 2010, due to the different patterns each ATM exhibited. The forecast predicts that ATM 3 will have a constant withdrawal for the month of May with no peaks, ATM 1 and 2 seem to a seasonal pattern but ATM 2 seems to have a higher cash withdrawal, overall ATM 4 was predicted to have the highest cash withdrawal that seems to decrease in variance and have a constant withdrawal as the data gets closer to the end of May.
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
Imported the data and made corrects to column names. The data is monthly, therefore I changed the “YYYY-MMM” column in monthly format and renmaed the column to “Month”. Checked for missing data, KWH was missing for September 2008. I decided to use MICE to imute for the missing value to avoid losing the integrity of the data and to get a better prediction for the monthly forecast of 2014.
Residential<-read_excel("~/Downloads/ResidentialCustomerForecastLoad-624.xlsx")
head(Residential)
## # A tibble: 6 × 3
## CaseSequence `YYYY-MMM` KWH
## <dbl> <chr> <dbl>
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
#corrections to data
Residential_data<-Residential|>
rename(Month="YYYY-MMM")|>
mutate(Month=yearmonth(Month),
KWH= as.numeric(KWH))|>
as_tsibble(index=Month, key=KWH)
#Checked for missing data for KWH
summary(Residential_data$KWH)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 770523 5429912 6283324 6502475 7620524 10655730 1
NaList2<-Residential_data[!complete.cases(Residential_data),]
print(NaList2)#Only missing data for sept 2008
## # A tsibble: 1 x 3 [1M]
## # Key: KWH [1]
## CaseSequence Month KWH
## <dbl> <mth> <dbl>
## 1 861 2008 Sep NA
head(Residential)
## # A tibble: 6 × 3
## CaseSequence `YYYY-MMM` KWH
## <dbl> <chr> <dbl>
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
RD_imputed<-Residential_data|>
mice(method = "pmm", #pmm=predictive mean matching
m=5,
seed=10)
##
## iter imp variable
## 1 1 KWH
## 1 2 KWH
## 1 3 KWH
## 1 4 KWH
## 1 5 KWH
## 2 1 KWH
## 2 2 KWH
## 2 3 KWH
## 2 4 KWH
## 2 5 KWH
## 3 1 KWH
## 3 2 KWH
## 3 3 KWH
## 3 4 KWH
## 3 5 KWH
## 4 1 KWH
## 4 2 KWH
## 4 3 KWH
## 4 4 KWH
## 4 5 KWH
## 5 1 KWH
## 5 2 KWH
## 5 3 KWH
## 5 4 KWH
## 5 5 KWH
## Warning: Number of logged events: 1
Residential_data<-complete(RD_imputed,1)
Residential_data<-Residential_data|>
as_tsibble(index = Month)
The plot of the residential time series shows seasonality with a very slight upwards trend, there is also a decreasing peak a little after January 2010, which may just be an outlier. To deal with the outlier I will turn the value into a missing value and them impute a value using MICE. I believe getting the mean instead of just removing the value would allow me to keep the time series flowing and without hurting the prediction as the mean will be used to the value. Using ACF and PACF.
Residential_data|>
autoplot(KWH)+
labs(title = "Residential time series")
Residential_data <- Residential_data %>%
mutate(
KWH = replace(KWH, KWH == "770523", NA) # Remove outlier
)
RD2_imputed<-Residential_data|>
mice(method = "pmm", #pmm=predictive mean matching
m=5,
seed=10)
##
## iter imp variable
## 1 1 KWH
## 1 2 KWH
## 1 3 KWH
## 1 4 KWH
## 1 5 KWH
## 2 1 KWH
## 2 2 KWH
## 2 3 KWH
## 2 4 KWH
## 2 5 KWH
## 3 1 KWH
## 3 2 KWH
## 3 3 KWH
## 3 4 KWH
## 3 5 KWH
## 4 1 KWH
## 4 2 KWH
## 4 3 KWH
## 4 4 KWH
## 4 5 KWH
## 5 1 KWH
## 5 2 KWH
## 5 3 KWH
## 5 4 KWH
## 5 5 KWH
## Warning: Number of logged events: 1
Residential_data<-complete(RD2_imputed,1)
Residential_data<-Residential_data|>
as_tsibble(index = Month)
#STL plot after imputation
Residential_data|>
model(STL(KWH~ season(window="periodic"), robust= TRUE))|>
components()|>
autoplot()|>
labs(title = "Residential STL Decomposition plot")
## [[1]]
##
## $title
## [1] "Residential STL Decomposition plot"
##
## attr(,"class")
## [1] "labels"
#Lambda for box-cox
lambda <- Residential_data|>
features(KWH, features = guerrero) %>%
pull(lambda_guerrero)
I fit the data into three different models ARIMA, ETS, and Seasonal Naive to see which would best for forecasting a prediction for the year of 2014. ARIMA was the best model as it had the lowest RSME. The auto ARIMA used ARIMA ARIMA(1,0,0)(2,1,0)[12] with a drift and a box-cox transformation of -0.212065. I used box-cox to take care of the residuals. The residual plot shows that the residuals were just noise.
Residential_forecast<-Residential_data|>
as_tsibble(index = Month)|>
model("ARIMA"=ARIMA(box_cox(KWH,lambda)),
"ETS"=ETS(box_cox(KWH, lambda)),
"Seasonal Naive"= SNAIVE(box_cox(KWH, lambda)))
accuracy(Residential_forecast)|>
select(".model", "RMSE")
## # A tibble: 3 × 2
## .model RMSE
## <chr> <dbl>
## 1 ARIMA 663047.
## 2 ETS 644420.
## 3 Seasonal Naive 872607.
#which ARIMA was selected from auto ARIMA
Residential_forecast|>
select(.model="ARIMA")|>
report()
## Series: KWH
## Model: ARIMA(1,0,0)(2,1,0)[12] w/ drift
## Transformation: box_cox(KWH, lambda)
##
## Coefficients:
## ar1 sar1 sar2 constant
## 0.2606 -0.7411 -0.3812 9e-04
## s.e. 0.0747 0.0730 0.0743 3e-04
##
## sigma^2 estimated as 1.329e-05: log likelihood=763.32
## AIC=-1516.65 AICc=-1516.3 BIC=-1500.68
#ARIMA had the lowest RSME
Residential_forecast<-Residential_forecast|>
forecast(h="12 months")|>
filter(.model=="ARIMA")
Residential_data |>
model(ARIMA(KWH))|>
gg_tsresiduals()
Residential_forecast|>
autoplot(Residential_data)+
labs(title = "Power Usage Forecast for 2014", y="Power Usage (KWH)", x="Month")
The ARIMA model was the best model to use for the residential forecast of 2014. For the outlier, the a new value was imputed and for the residuals a box-cox transformation was applied. Looking at the prediction of 2014 the power usage seems to be at a constant rate with no real big trend.
#Final Forecast correction
Residential_forecast <-Residential_forecast|>
as_tsibble(index = Month, key= KWH)|>
select( Month, .mean) %>%
rename(KWH = .mean)|>
mutate(CaseSequence=925:936)|> #last case sequence was at 924
relocate(CaseSequence)
Residential_forecast|>
autoplot(KWH)+
labs(title = "Power Usage 2014 Forecast")
write_xlsx(list("ATM_forecasts_May_2010"= ATMC,
"Power_Forecast_2014"= Residential_forecast), "ATM_Residential_Forecast.xlsx") #created excel file with both part A and B forecasts