library("readxl")
library("fpp3")
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library("ggplot2")
library("fable")
library("lubridate")
library("seasonal") 
## 
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
## 
##     view
library("forecast")
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Part A

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file. # Load Data

atm_data <- read_excel("~/Desktop/SPS Spring 2025/Data 624/Project 1/ATM624Data.xlsx")
View(atm_data)

Pre Process

The data set provides the history of cash withdraws from four different ATM machines. The cash withdrawn is the main variable, with dates used at the time stamp as to when the cash was withdrawn.

head(atm_data)
## # A tibble: 6 × 3
##    DATE ATM    Cash
##   <dbl> <chr> <dbl>
## 1 39934 ATM1     96
## 2 39934 ATM2    107
## 3 39935 ATM1     82
## 4 39935 ATM2     89
## 5 39936 ATM1     85
## 6 39936 ATM2     90
glimpse(atm_data)
## Rows: 1,474
## Columns: 3
## $ DATE <dbl> 39934, 39934, 39935, 39935, 39936, 39936, 39937, 39937, 39938, 39…
## $ ATM  <chr> "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "ATM1", "ATM2", "…
## $ Cash <dbl> 96, 107, 82, 89, 85, 90, 90, 55, 99, 79, 88, 19, 8, 2, 104, 103, …

Looking at the glimpse data, the “DATE” column is double datatype meaning it float which can stored as decimal. “ATM” is stored as character variable as text in the data. The “CASH” variable is also stored as double datatype. The “DATE” column must be converted into a date format for our analysis. this can be through a the “lubridate” library in R.

# the lubridate library is used convert variables into date object
atm_data <- atm_data %>%
  mutate(DATE = as.Date(DATE, origin = "1899-12-30"))
# The starting date "1899-12-30" is used starting date because of the year 1900 is leap year and it may cause a bug.

We will now check for missing variables in data set.

# We can find the sum of the missing values. This will check any ATM with no cash in the "CASH" column.
sum(is.na(atm_data$Cash))
## [1] 19

There are 19 rows in the “CASH” column where there are missing valies. A new data set will be made to filter out any missing values.

# This removes any rows where there are missing values in teh "CASH" column.
atm_data2 <- atm_data %>%
  filter(!is.na(Cash))
# We can check if the new dataset has filtered out the NA's values
sum(is.na(atm_data2$Cash)) 
## [1] 0

Exploratory Data Analysis

In with the ATM data cleaned, through EDA we can visualize and understand patterns obsered throughout the data. This will help generate any analyis we can use for our forecasting and check for issues we may have not observed in our cleaned data.

We can check the summary stats of data, where the maximum, minimum, median and quantiles of the “CASH” column we be provided.

summary(atm_data2$Cash)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.5    73.0   155.6   114.0 10919.8

Our minimum vlaue is the 0, our first quantile or 25 percent of the cash withdraws are either 0.50 USD or less, the median cash withdraw in 73 USD, the mean is 155.60 USD, the third quantile is 144.0 and lastly our maximum vlaue is 10910.8 USD. There are outliers and skewness that is present in our data. This skewness will be displayed as right skewed or positively skewed distribution since the mean is greater than the median. We may remove the outliers with cash withdrawn more than 1000 USD.

To better filter our data, we will calculate the Interqualtile Range or IQR , where we can find the upper and lower bounds of the data. The upper and lower bounds will help dicate bounds where outliers can be taken out the dataset.

Q1 <- quantile(atm_data2$Cash, 0.25)
Q3 <- quantile(atm_data2$Cash, 0.75)
IQR <- Q3 - Q1

The first quatile is 0.50 USD and the third quantile is 114 USD.

# this equation is used to calculate the lower bound
lower_bound <- Q1 - 1.5 * IQR
# this equation is used to find the upper bound
upper_bound <- Q3 + 1.5 * IQR
# Here we seleced values in the "CASH" colum where the cash is  greater than or equal to the lower and where cash is less than or equal to upper bound. This is IQR  set where we remove outliers
atm_data3 <- atm_data2 %>%
  filter(Cash >= lower_bound & Cash <= upper_bound)
summary(atm_data3$Cash)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00   50.51   55.07   94.00  283.34

Our minimum value is 0, our first quartile or 25 percent of the cash withdrawals are either 0.00 or less, the median cash withdrawal is 50.51, the mean is 55.07, the third quartile is 94.00, and lastly our maximum value is 283.34. There are outliers and skewness present in our data. This skewness will be displayed as a right-skewed or positively skewed distribution since the mean is greater than the median. This data is slightly right skewed as the mean is greater than median.

# we can count the amount time 0 appears cash 
sum(atm_data3$Cash == 0)
## [1] 364

There are 364 rows where 0 cash is withdrawn

We will visualize the frequency of the the cash and range of cash using boxplot and histogram

ggplot(atm_data3, aes(y = Cash)) +
  geom_boxplot(fill = "orange", color = "blue", alpha = 1) +
  labs(title = "The Boxplot of ATM Withdrawal", y = "Cash Withdrawn ($)") + theme_minimal() 

ggplot(atm_data3, aes(x = Cash)) +
  geom_histogram(binwidth = 10, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Frenquency of ATM Withdrawal Amounts", x = "Cash Withdrawn ($)", y = "Frequency of Cash") + theme_minimal()

Forecasting Models

We will be using time series to observe how cash withdrawn overtime changes. Forecasting models will further analyze patterns based on seasonality, and other variations that may affect the changes in cash

We convert our data into a tsibble in order to perform the time series. A tsibble is a time series data structure in, that allows for time series. Through time series we will observe weekly cash changes recorded in these ATMS over time. Time series is necessary in observing trend , radomness and any seasonality in the data.

atm_data4 <- atm_data3 %>% 
  mutate(Weekly = yearweek(DATE)) %>%  #  new column where date is stored into weekly
  group_by(ATM, Weekly) %>%  # groups the ATM and weekly column
  summarize(Cash = sum(Cash, na.rm = TRUE), .groups = "drop") %>%  # create summary of cash for the atm in that specific week, na=.rm = true make sure there is no NA (missing) values, .group  = drop removes  grouping after the atm and week are summarized.
  as_tsibble(index = Weekly, key = ATM) # makes tssible or time series structure of our data.
atm_data4 %>% 
  autoplot() + labs(title = "Weekly ATM Cash Withdrawals Location")
## Plot variable not specified, automatically selected `.vars = Cash`

This time series grpah shwo sthe cash withdrawn from the four different ATM each week form 2009 to 2010. We can observe overtime that ATM 1 and ATM @ have have higher withdrawals of cash compared to ATM 3 and ATM 4. ATM 4 shows the most fluctuation with aggressive withdraws and declines of with drawls throughout the weeks. ATM 3 had no withdrawal until the hugh spike was observed at the end of 2010,

atm_data4 |>
  gg_season() # create time series graph of sr
## Plot variable not specified, automatically selected `y = Cash`

ATM1 to ATM4 are broken down into four seperate columns where, we can observe the changes in withdrawal based on the year 2009 and 2010. ATM 1 has the stable and high cash withdrawals through the year 2009 and 2010, and there are some variations that occur in the withdrawal. ATM2 also exhibit similar withdrawals as ATM1, however, it has higher fluctuations. ATM3 shows almost no with drawls and there there is spike in the last few weeks of 2009 , however the lack of withdrawals continues in 2010. ATM 4 has the most fluctuations, with some significant increase and decrease of cash withdrawn throughout the two year.

Now we will decompose our data to analyze trend, seasonality and residuals through the data. Time series decomposition will break down the time series into patterns and trends and some models we can use are the additive model. We can transform our data first through log or Box cox transformation to further observe our data, before we decompose. Transformation the data will stabilize the variance and make data more stationary which does any skewness in our data.

Through the box transformation we have find an accurate transformation that will decrease the variance in our data

atm_data4 %>% 
  features(Cash, features = guerrero)
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value
## # A tibble: 4 × 2
##   ATM   lambda_guerrero
##   <chr>           <dbl>
## 1 ATM1             2.00
## 2 ATM2             2.00
## 3 ATM3             2.00
## 4 ATM4             2.00

Using the Guerrero method the parametr of lambda is 2, and this equivalent to performing square transformation on on data

atm_data4 %>% 
  autoplot(box_cox(Cash,2))

After transformation, we can now perform decomposition, we will perform STL decomposition. STL stand for “Seasonal and Trend decomposition using Loess” STL will break down our data into three components trend, seasonality, and residual/noise. In seasonal component we observe any seasonal pattern in the data, example are the more cash withdrawn in the december during Christmas or june during the summer when people are on vacation. Trend components will show any increase or decreases in the overall trend of the cash withdrawan acrsso the ATMS. Lastly, the residual components will remove the trend and seasonality pattern fromn the data, and check for random nose that may have not been captured in the seasonality or trend component.

atm_data4 %>%
  model(
    STL(Cash ~ trend(window = 7) +
                   season(window = "periodic"),
    robust = TRUE)) %>%
  components() %>%
  autoplot() + theme(legend.position = "none")  

After decomposing the graph, we can observe that ATM1 and ATM2 have stable and high trends throughout the seasonal and trend component. ATM4 does exhibt fluctuation in cash withdrawals in the trend and remainder component. ATM 3 shows lack of cash withdrawn and a sudden spike occurs at the end.

We now make forecasting model using the ETS model also called the Exponential Smoothing.

We will separate our ATMs and forcast the cash withdrawm from all four seperate ATMS in the upcoming four months.

atm1_data <- atm_data4 %>% filter(ATM == "ATM1")
atm2_data <- atm_data4 %>% filter(ATM == "ATM2")
atm3_data <- atm_data4 %>% filter(ATM == "ATM3")
atm4_data <- atm_data4 %>% filter(ATM == "ATM4")
atmdata_fit <- atm1_data |>
  model(SES = ETS(Cash ~ error("A") + trend("N") +  season("N")))
model_report <- atmdata_fit |>
  select(SES) |> 
  report()
## Series: Cash 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.4837766 
## 
##   Initial states:
##      l[0]
##  415.1581
## 
##   sigma^2:  6492.662
## 
##      AIC     AICc      BIC 
## 679.6434 680.1332 685.5543

We are reporting and extrcting out the the simle expoential smoothing

atm_forecast = atmdata_fit %>% 
  forecast(  h = 4) # four months ahead
atm1_data %>% 
  autoplot() + labs(title = "ATM 1 Weekly Cash Withdrawn , Forecast in 4 Months ")
## Plot variable not specified, automatically selected `.vars = Cash`

atm_forecast %>%
  autoplot(atm1_data) + labs(title = "ATM 1 Weekly Cash Withdrawn , Forecast in 4 Months ")

ATM 2

atmdata_fit2 <- atm2_data |>
  model(SES = ETS(Cash ~ error("A") + trend("N") +  season("N")))
model_report2 <- atmdata_fit2 |>
  select(SES) |>
  report()
## Series: Cash 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.1519159 
## 
##   Initial states:
##      l[0]
##  468.6222
## 
##   sigma^2:  4736.408
## 
##      AIC     AICc      BIC 
## 662.9276 663.4174 668.8384
atm_forecast2 = atmdata_fit2 %>% 
  forecast(  h = 4) # four months ahead
atm2_data %>% 
  autoplot() +  labs(title = "ATM 2 Weekly Cash Withdrawn , Forecast in 4 Months ")
## Plot variable not specified, automatically selected `.vars = Cash`

atm_forecast2 %>%
  autoplot(atm2_data) +  labs(title = "ATM 2 Weekly Cash Withdrawn , Forecast in 4 Months ")

ATM 3

atmdata_fit3 <- atm3_data |>
  model(SES = ETS(Cash ~ error("A") + trend("N") +  season("N")))
model_report3 <- atmdata_fit3 |>
  select(SES) |>
  report()
## Series: Cash 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.20006 
## 
##   Initial states:
##  l[0]
##     0
## 
##   sigma^2:  1356.255
## 
##      AIC     AICc      BIC 
## 596.6483 597.1381 602.5592
atm_forecast3 = atmdata_fit3 %>% 
  forecast(  h = 4) # four months ahead
atm3_data %>% 
  autoplot() +  labs(title = "ATM 3 Weekly Cash Withdrawn  ")
## Plot variable not specified, automatically selected `.vars = Cash`

atm_forecast3 %>%
  autoplot(atm3_data) +  labs(title = "ATM 3 Weekly Cash Withdrawn , Forecast in 4 Months ")

ATM 4

atmdata_fit4 <- atm4_data |>
  model(SES = ETS(Cash ~ error("A") + trend("N") +  season("N")))
model_report4 <- atmdata_fit4 |>
  select(SES) |>
  report()
## Series: Cash 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.0001000855 
## 
##   Initial states:
##      l[0]
##  278.9118
## 
##   sigma^2:  27721.94
## 
##      AIC     AICc      BIC 
## 741.3841 741.8841 747.2379
atm_forecast4 = atmdata_fit4 %>% 
  forecast(  h = 4) # four months ahead
atm4_data %>% 
  autoplot() +  labs(title = "ATM 4 Weekly Cash Withdrawn ")
## Plot variable not specified, automatically selected `.vars = Cash`

atm_forecast4 %>%
  autoplot(atm4_data) + labs(title = "ATM 4 Weekly Cash Withdrawn , Forecast in 4 Months ")

In our forecast data, the shaded blue region indicates the confidence interval at with the future cash will be withdrawn. Our first forecast data indicates a projection of a decrease in cash withdrawn from ATM 1 in the next four months. Our forecast data in ATM 2 also indicates a decrease in cash withdrawn over the next few months. The forecast suggests the spike reflected in ATM 3 will likely decrease in the next few months.The forecast in ATM indicates a likely increase the data.

Part B

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

Load the data

power_data <- read_excel("~/Desktop/SPS Spring 2025/Data 624/Project 1/ResidentialCustomerForecastLoad-624.xlsx")
View(power_data)

Pre Process Data

The year column is stored as the characther object and must stored as at dbl object for our analysis.

glimpse(power_data)
## Rows: 192
## Columns: 3
## $ CaseSequence <dbl> 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 74…
## $ `YYYY-MMM`   <chr> "1998-Jan", "1998-Feb", "1998-Mar", "1998-Apr", "1998-May…
## $ KWH          <dbl> 6862583, 5838198, 5420658, 5010364, 4665377, 6467147, 891…
# Renaming columns
colnames(power_data) <- c("CaseSequence", "YYYY_MMM", "KWH")
# our date column as be stored as a date variable in to order to perform time series.
power_data <- power_data %>%
  mutate(Date = as.Date(paste0(YYYY_MMM, "-01"), format = "%Y-%b-%d")) %>%
  arrange(Date)
# view the first few rows in the data
head(power_data)
## # A tibble: 6 × 4
##   CaseSequence YYYY_MMM     KWH Date      
##          <dbl> <chr>      <dbl> <date>    
## 1          733 1998-Jan 6862583 1998-01-01
## 2          734 1998-Feb 5838198 1998-02-01
## 3          735 1998-Mar 5420658 1998-03-01
## 4          736 1998-Apr 5010364 1998-04-01
## 5          737 1998-May 4665377 1998-05-01
## 6          738 1998-Jun 6467147 1998-06-01
sum(is.na(power_data$KWH))
## [1] 1
sum(is.na(power_data$Date))
## [1] 0

There is one missing data in the KWH columns

#Removing the Case Sequence and YYYY_MMM 
power_data = power_data %>% 
  select(-CaseSequence, -YYYY_MMM)
power_data = power_data %>% 
  drop_na()

Forecasting Model

We create a timeseries model to observe our data

power_data_tsibble <- power_data %>%
  mutate(Quarter = yearquarter(Date)) %>%  
  group_by(Quarter) %>%                    
  summarize(KWH = sum(KWH, na.rm = TRUE), .groups = "drop") %>%  
  as_tsibble(index = Quarter)
power_data_tsibble %>% 
  autoplot() + labs(title = "Quaterley  KWH Consumption Over Time")
## Plot variable not specified, automatically selected `.vars = KWH`

Our graph display the quarterly KHW consumption, and we can observe seasonal pattern with high spike and throughout the quaters. There is high decrease in teh first quarter of 2010, however there a cycle patterns seen throughout the data. An ETS model would be great for our forecasting. The ETS model targets seasonal data and help forecast it

We will be using the ARIMA Model in this data. In order to perform the data must by stationary.

power_fit =  power_data_tsibble%>% 
  model(SES = ETS(KWH ~ error("A") + trend("N") + season("N")))
power_report = power_fit %>% 
  report()
## Series: KWH 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.07560535 
## 
##   Initial states:
##      l[0]
##  18343038
## 
##   sigma^2:  1.211044e+13
## 
##      AIC     AICc      BIC 
## 2198.142 2198.542 2204.619
power_forecast = power_fit %>% 
  forecast(  h = 12) # 12 months ahead
power_forecast %>% 
  autoplot(power_data_tsibble)

This graph having strong seasonal patterns suggests strong seasonal patterns in the upcoming four quarters. Through our forecast confidence intevrals we can observe that will be a slight increase in the consumption of KWH in the early quarter then a continuation of uncertanitiy in the seasonal fluctuations.