DATA 624 Project 1

library(readxl)
library(fpp3)

## -- Attaching packages ------------------------------------------------------------------------------------------------------- fpp3 0.4.0 --

## v tibble      3.1.1     v tsibble     1.1.1
## v dplyr       1.0.6     v tsibbledata 0.4.0
## v tidyr       1.1.3     v feasts      0.2.2
## v lubridate   1.7.4     v fable       0.3.0
## v ggplot2     3.3.5

## -- Conflicts ------------------------------------------------------------------------------------------------------------ fpp3_conflicts --
## x lubridate::date()       masks base::date()
## x dplyr::filter()         masks stats::filter()
## x tsibble::intersect()    masks base::intersect()
## x tsibble::interval()     masks lubridate::interval()
## x dplyr::lag()            masks stats::lag()
## x tsibble::new_interval() masks lubridate::new_interval()
## x tsibble::setdiff()      masks base::setdiff()
## x tsibble::union()        masks base::union()

library(DataExplorer)
library(tidyverse)

## -- Attaching packages -------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --

## v readr   1.3.1     v stringr 1.4.0
## v purrr   0.3.2     v forcats 0.4.0

## -- Conflicts ----------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x tsibble::intersect()     masks lubridate::intersect(), base::intersect()
## x tsibble::interval()      masks lubridate::interval()
## x dplyr::lag()             masks stats::lag()
## x tsibble::new_interval()  masks lubridate::new_interval()
## x tsibble::setdiff()       masks lubridate::setdiff(), base::setdiff()
## x tsibble::union()         masks lubridate::union(), base::union()

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

Objective

Part A – ATM Forecast, ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual .rmd file. Also please submit the forecast which you will put in an Excel readable file.

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

PART A

Data Exploration

We will first load the data, which I have saved in a local folder, and explore it.

data <- read_excel("ATM624Data.xlsx", col_names = T, col_types = c('date', 'guess', 'guess'))

The data contain the variables “DATE, ATM and Cash”.

head(data)

## # A tibble: 6 x 3
##   DATE                ATM    Cash
##   <dttm>              <chr> <dbl>
## 1 2009-05-01 00:00:00 ATM1     96
## 2 2009-05-01 00:00:00 ATM2    107
## 3 2009-05-02 00:00:00 ATM1     82
## 4 2009-05-02 00:00:00 ATM2     89
## 5 2009-05-03 00:00:00 ATM1     85
## 6 2009-05-03 00:00:00 ATM2     90

As we can observe in the table above, the variable “DATE” is of data type “POSIXct”. We will have to convert it to date in order to work with the data.

summary(data)

##       DATE                         ATM                 Cash        
##  Min.   :2009-05-01 00:00:00   Length:1474        Min.   :    0.0  
##  1st Qu.:2009-08-01 00:00:00   Class :character   1st Qu.:    0.5  
##  Median :2009-11-01 00:00:00   Mode  :character   Median :   73.0  
##  Mean   :2009-10-31 19:11:48                      Mean   :  155.6  
##  3rd Qu.:2010-02-01 00:00:00                      3rd Qu.:  114.0  
##  Max.   :2010-05-14 00:00:00                      Max.   :10919.8  
##                                                   NA's   :19

We can also see that the max value under the “Cash” variable is 10919.8. Since the number is in hundreds of dollars, when multiplying this by 100 we get 1,091,980, and it is quite unlikely that any ATM machine would be able to dispense this kind of cash. This is most likely an error and we will address it accordingly.

plot_missing(data)

There are a few missing values for “ATM” but I believe these were left blank on purpose in order to enter the predictions later on for the month of May. Additionally, there are also a few missing values from “Cash” that we may need to address.

Data Preparation

We will create a separate data set in order to maintain the original data and make all the necessary transformations there. First we’ll transform the “DATE” variables into its corresponding data type date. We will also make “Cash” an integer as we know that ATMs do not dispense cents.

atm <- data %>%
  mutate(DATE = as_date(DATE), Cash = as.integer(Cash))
str(atm)

## tibble [1,474 x 3] (S3: tbl_df/tbl/data.frame)
##  $ DATE: Date[1:1474], format: "2009-05-01" "2009-05-01" ...
##  $ ATM : chr [1:1474] "ATM1" "ATM2" "ATM1" "ATM2" ...
##  $ Cash: int [1:1474] 96 107 82 89 85 90 90 55 99 79 ...

Since we are looking at 4 separate ATMs and the data does not provide their location, they may have varying amounts of cash withdrawn at varying different days. We will work with each ATM separately and each will have its own forecast for the month of May in 2010.

atm1 <- atm %>%
  filter(ATM == "ATM1") %>%
  as_tsibble(index = DATE)


autoplot(atm1) +
  labs(title="ATM1", subtitle="Cash withdrawals per day", y="Hundreds USD")

## Plot variable not specified, automatically selected `.vars = Cash`

atm2 <- atm %>%
  filter(ATM == "ATM2") %>%
  as_tsibble(index = DATE)


autoplot(atm2) +
  labs(title="ATM2", subtitle="Cash withdrawals per day", y="Hundreds USD")

## Plot variable not specified, automatically selected `.vars = Cash`

atm3 <- atm %>%
  filter(ATM == "ATM3") %>%
  as_tsibble(index = DATE)


autoplot(atm3) +
  labs(title="ATM3", subtitle="Cash withdrawals per day", y="Hundreds USD")

## Plot variable not specified, automatically selected `.vars = Cash`

atm4 <- atm %>%
  filter(ATM == "ATM4") %>%
  as_tsibble(index = DATE)


autoplot(atm4) +
  labs(title="ATM4", subtitle="Cash withdrawals per day", y="Hundreds USD")

## Plot variable not specified, automatically selected `.vars = Cash`

Based on our plots above, we can observe that “ATM1” and “ATM2” are time series that have constant variability, with no apparent trend and potential seasonality, but we will explore these more in detail with decompositions. However, “ATM3”seems to only have withdrawals for the last few days in the data. Lastly, “ATM4” has a clear outlier, which is the one we identified previously with the summary() function.

which(atm4$Cash == 10919)

## [1] 285

The outlier is found on row 285 of the 4th ATM time series. We will use the average to impute this number as it is obvious that this is an error.

The time series looks more like “ATM1” and “ATM2” now:

atm4[285,3] <- round(mean(atm4$Cash),0)

autoplot(atm4) +
  labs(title="ATM4", subtitle="Cash withdrawals per day", y="Hundreds USD")

## Plot variable not specified, automatically selected `.vars = Cash`

We also seem to have a few missing values on the “Cash” column for “ATM1” and “ATM2”:

sum(is.na(atm1$Cash))

## [1] 3

sum(is.na(atm2$Cash))

## [1] 2

sum(is.na(atm3$Cash))

## [1] 0

sum(is.na(atm4$Cash))

## [1] 0

The missing values are found on the rows below for each respective ATM

which(is.na(atm1$Cash))

## [1] 44 47 53

which(is.na(atm2$Cash))

## [1] 49 55

hist(atm1$Cash)

hist(atm2$Cash)

Since I don’t see an evident skewness on the distribution of “Cash” for either ATM above, I will use the averages to impute these values:

atm1[44,3] <- round(mean(atm1$Cash, na.rm=TRUE),0)
atm1[47,3] <- round(mean(atm1$Cash, na.rm=TRUE),0)
atm1[53,3] <- round(mean(atm1$Cash, na.rm=TRUE),0)
atm2[49,3] <- round(mean(atm2$Cash, na.rm=TRUE),0)
atm2[55,3] <- round(mean(atm2$Cash, na.rm=TRUE),0)

Additionally, we can look for a Box-Cox transformation for each series to help make them a little simpler:

lambda1 <- atm1%>%
  features(Cash,features = guerrero)%>%
  pull(lambda_guerrero)

plot_trans1 <- atm1 %>%
  autoplot(box_cox(Cash, lambda1)) +
  labs(title="ATM1 TRANSFORMED", subtitle="Cash withdrawals per day", y="USD")

lambda2 <- atm2%>%
  features(Cash,features = guerrero)%>%
  pull(lambda_guerrero)

plot_trans2 <- atm2 %>%
  autoplot(box_cox(Cash, lambda2)) +
  labs(title="ATM2 TRANSFORMED", subtitle="Cash withdrawals per day", y="USD")

lambda3 <- atm3%>%
  features(Cash,features = guerrero)%>%
  pull(lambda_guerrero)

## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value

## Warning in optimise(lambda_coef_var, c(lower, upper), x = x, .period =
## max(.period, : NA/Inf replaced by maximum positive value

plot_trans3 <- atm3 %>%
  autoplot(box_cox(Cash, lambda3)) +
  labs(title="ATM3 TRANSFORMED", subtitle="Cash withdrawals per day", y="USD")

lambda4 <- atm4%>%
  features(Cash,features = guerrero)%>%
  pull(lambda_guerrero)

plot_trans4 <- atm4 %>%
  autoplot(box_cox(Cash, lambda4)) +
  labs(title="ATM4 TRANSFORMED", subtitle="Cash withdrawals per day", y="USD")

grid.arrange(plot_trans1, plot_trans2, plot_trans3, plot_trans4, nrow = 2)

The transformations helped scale down the time series of all ATMs.

Build Model

Let’s now look at the decomposition of each series to see if we have strong seasonality and perhaps differencing is required in the model. Since the magnitud of the seasonal components do not seem to change with time, we can say the series are additive.

atm1%>%
  model(classical_decomposition(box_cox(Cash, lambda1), type="additive")) %>%
  components () %>%
  autoplot() + 
  labs(title="Classical additive decomposition of ATM1")

## Warning: Removed 3 row(s) containing missing values (geom_path).

atm2%>%
  model(classical_decomposition(box_cox(Cash, lambda2), type="additive")) %>%
  components () %>%
  autoplot() + 
  labs(title="Classical additive decomposition of ATM2")

## Warning: Removed 3 row(s) containing missing values (geom_path).

atm3%>%
  model(classical_decomposition(box_cox(Cash, lambda3), type="additive")) %>%
  components () %>%
  autoplot() + 
  labs(title="Classical additive decomposition of ATM3")

## Warning: Removed 3 row(s) containing missing values (geom_path).

atm4%>%
  model(classical_decomposition(box_cox(Cash, lambda4), type="additive")) %>%
  components () %>%
  autoplot() + 
  labs(title="Classical additive decomposition of ATM4")

## Warning: Removed 3 row(s) containing missing values (geom_path).

As evident on the plots above, we see a strong seasonal component for all ATMs.

plot_acf1 <- atm1 %>%
  ACF(box_cox(Cash, lambda1)) %>%
  autoplot() + labs(title="Autocorrelation of Cash ATM1")

plot_acf2 <- atm2 %>%
  ACF(box_cox(Cash, lambda2)) %>%
  autoplot() + labs(title="Autocorrelation of Cash ATM2")

plot_acf3 <- atm3 %>%
  ACF(box_cox(Cash, lambda3)) %>%
  autoplot() + labs(title="Autocorrelation of Cash ATM3")

plot_acf4 <- atm4 %>%
  ACF(box_cox(Cash, lambda4)) %>%
  autoplot() + labs(title="Autocorrelation of Cash ATM4")

grid.arrange(plot_acf1, plot_acf2, plot_acf3, plot_acf4, nrow = 2)

As observed in the ACF plots above, we may need to apply unitroot_nsdiffs() to the daily cash withdrawals for each ATM in order to determine if we need any seasonal differencing by week.

atm1 %>%
  features(box_cox(Cash, lambda1), unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

atm2 %>%
  features(box_cox(Cash, lambda2), unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

atm3 %>%
  features(box_cox(Cash, lambda3), unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       0

atm4 %>%
  features(box_cox(Cash, lambda4), unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       0

As determined by the function above, we need to apply seasonal differencing to “ATM1” and “ATM2”. Let’s explore further to see if we need any additional differencing:

atm1 %>%
  features(difference(box_cox(Cash, lambda1), 7), unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

atm2 %>%
  features(difference(box_cox(Cash, lambda2), 7), unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

atm3 %>%
  features(box_cox(Cash, lambda3), unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

atm4 %>%
  features(box_cox(Cash, lambda4), unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

No additional differencing seems to be needed. Let’s take a look at the ACF plots aftering differencing “ATM1” and “ATM2”.

atm1 %>%
  ACF(difference(box_cox(Cash, lambda1), 7)) %>%
  autoplot() + labs(title="Autocorrelation of Cash ATM1")

atm2 %>%
  ACF(difference(box_cox(Cash, lambda2), 7)) %>%
  autoplot() + labs(title="Autocorrelation of Cash ATM2")

Differencing seems to have made the data look closer to white noise.

Considering that these data need differencing, we will use the ARIMA() model, which applies differencing within the algorithm, making it simpler to build.

For “ATM3” we will use the naive model, which takes the last observation to forecast. Given there are only three values, this is a sound approach.

atm1_fit <- atm1 %>%
  model(ARIMA(box_cox(Cash, lambda1)))

report(atm1_fit)

## Series: Cash 
## Model: ARIMA(0,0,2)(0,1,1)[7] 
## Transformation: box_cox(Cash, lambda1) 
## 
## Coefficients:
##          ma1      ma2     sma1
##       0.1090  -0.1121  -0.6410
## s.e.  0.0524   0.0525   0.0433
## 
## sigma^2 estimated as 3.052:  log likelihood=-708.06
## AIC=1424.13   AICc=1424.24   BIC=1439.65

atm1_fit %>%
  gg_tsresiduals()

atm2_fit <- atm2 %>%
  model(ARIMA(box_cox(Cash, lambda2)))

report(atm2_fit)

## Series: Cash 
## Model: ARIMA(2,0,2)(0,1,1)[7] 
## Transformation: box_cox(Cash, lambda2) 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     sma1
##       -0.4234  -0.9054  0.4669  0.7984  -0.7203
## s.e.   0.0594   0.0448  0.0895  0.0592   0.0414
## 
## sigma^2 estimated as 48.4:  log likelihood=-1201.99
## AIC=2415.98   AICc=2416.22   BIC=2439.26

atm2_fit %>%
  gg_tsresiduals()

atm3_fit <- atm3 %>%
  model(NAIVE(box_cox(Cash, lambda3)))

report(atm3_fit)

## Series: Cash 
## Model: NAIVE 
## Transformation: box_cox(Cash, lambda3) 
## 
## sigma^2: 1381.2177

atm3_fit %>%
  gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

atm4_fit <- atm4 %>%
  model(ARIMA(box_cox(Cash, lambda4)))

report(atm4_fit)

## Series: Cash 
## Model: ARIMA(0,0,1)(2,0,0)[7] w/ mean 
## Transformation: box_cox(Cash, lambda4) 
## 
## Coefficients:
##          ma1    sar1    sar2  constant
##       0.0753  0.2146  0.2044   13.7047
## s.e.  0.0529  0.0516  0.0525    0.5685
## 
## sigma^2 estimated as 107.4:  log likelihood=-1369.93
## AIC=2749.86   AICc=2750.03   BIC=2769.36

atm4_fit %>%
  gg_tsresiduals()

All residuals, except for “ATM3”, look like they have constant variability, seem to be white noise and have approximately close to normal distributions.

Forecasts

We will take a look at the forecasts for each ATM:

plot_fc1 <- atm1_fit %>% forecast(h=31) %>%
  autoplot(atm1)

plot_fc2 <- atm2_fit %>% forecast(h=31) %>%
  autoplot(atm2)

plot_fc3 <- atm3_fit %>% forecast(h=31) %>%
  autoplot(atm3)

plot_fc4 <- atm4_fit %>% forecast(h=31) %>%
  autoplot(atm4)

grid.arrange(plot_fc1, plot_fc2, plot_fc3, plot_fc4, nrow = 2)

All forecasts seem reasonable from a visual perspective.

Finally, we save the forecast of each ATM in its own .csv document:

forecast_atm1 <- atm1_fit %>% forecast(h=31)
forecast_atm2 <- atm2_fit %>% forecast(h=31)
forecast_atm3 <- atm3_fit %>% forecast(h=31)
forecast_atm4 <- atm4_fit %>% forecast(h=31)

write.csv(forecast_atm1,"forecast_atm1.csv")
write.csv(forecast_atm2,"forecast_atm2.csv")
write.csv(forecast_atm3,"forecast_atm3.csv")
write.csv(forecast_atm4,"forecast_atm4.csv")

PART B

Data Exploration

We will first load the data, which I have saved in a local folder, and explore it.

data2 <- read_excel("ResidentialCustomerForecastLoad-624.xlsx", col_names = T)

The data contain the variables “CaseSequence, YYYY-MMM and KWH”.

head(data2)

## # A tibble: 6 x 3
##   CaseSequence `YYYY-MMM`     KWH
##          <dbl> <chr>        <dbl>
## 1          733 1998-Jan   6862583
## 2          734 1998-Feb   5838198
## 3          735 1998-Mar   5420658
## 4          736 1998-Apr   5010364
## 5          737 1998-May   4665377
## 6          738 1998-Jun   6467147

As we can observe in the table above, the variable “YYYY-MMM” is of data type “chr”. We will have to convert it to “date” in order to work with the data.

summary(data2)

##   CaseSequence     YYYY-MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1

We can also see that the max value under the “KWH” variable is 10655730 and min value is 770523. We’ll have to explore the data some more to determine weather either of these values are outliers.

plot_missing(data2)

There are a few missing values for “KWH” that we may need to address.

Data Preparation

We will create a separate data set in order to maintain the original data and make all the necessary transformations there. First we’ll transform our data into a time series, while making the variable “YYYY-MMM” the index.

power <- data2 %>%
  mutate (Month = yearmonth(`YYYY-MMM`)) %>%
  select(-`YYYY-MMM`, -CaseSequence) %>%
  as_tsibble(index = Month)

str(power)

## tbl_ts [192 x 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
##  $ KWH  : num [1:192] 6862583 5838198 5420658 5010364 4665377 ...
##  $ Month: mth [1:192] 1998 Jan, 1998 Feb, 1998 Mar, 1998 Apr, 1998 May, 1998 Jun...
##  - attr(*, "key")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
##   ..$ .rows: list<int> [1:1] 
##   .. ..$ : int [1:192] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..@ ptype: int(0) 
##  - attr(*, "index")= chr "Month"
##   ..- attr(*, "ordered")= logi TRUE
##  - attr(*, "index2")= chr "Month"
##  - attr(*, "interval")= interval [1:1] 1M
##   ..@ .regular: logi TRUE

autoplot(power) +
  labs(title="Power Consumption", subtitle="Residential", y="KWH")

## Plot variable not specified, automatically selected `.vars = KWH`

In the graph above, we can clearly see there is an outlier and could potentially be an error.

which(power$KWH == 770523)

## [1] 151

This value belong to observation 151.

Let’s also look for the missing value or values previously noted:

which(is.na(power$KWH))

## [1] 129

There is one value missing from observation 129.

hist(power$KWH)

Since the distribution of “KWH” is close to normal, I will use the average to substitute these values.

power[129,1] <- round(mean(power$KWH, na.rm=TRUE),0)
power[151,1] <- round(mean(power$KWH, na.rm=TRUE),0)

Additionally, we can look for a Box-Cox transformation for the series to help simplify it:

lambdap <- power%>%
  features(KWH,features = guerrero)%>%
  pull(lambda_guerrero)
lambdap

## [1] -0.2108159

The guerrero() feature suggests a transformation with lambda -0.21.

power %>%
  autoplot(box_cox(KWH, lambdap)) +
  labs(title="Power Consumption Transformed", subtitle="Residential", y="KWH")

The transformation helped scale down the series but did not do much to normalize it.

Build Model

Let’s now look at the decomposition of the series to see if we have strong seasonality and perhaps differencing is required in the model. Since the magnitud of the seasonal components do not seem to change with time, we can say the series are additive.

power%>%
  model(classical_decomposition(box_cox(KWH, lambdap), type="additive")) %>%
  components () %>%
  autoplot() + 
  labs(title="Classical additive decomposition of Power Consumption")

## Warning: Removed 6 row(s) containing missing values (geom_path).

As evident on the plots above, we see a strong seasonal component and a slight trend for these data.

power %>%
  ACF(box_cox(KWH, lambdap)) %>%
  autoplot() + labs(title="Autocorrelation of Power Consumption")

We can observe on the ACF plot that we have a lot of autocorrelation. We may need to apply unitroot_nsdiffs() to the power consumption in order to determine if we need any seasonal differencing.

power %>%
  features(box_cox(KWH, lambdap), unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

As determined by the function above, we need to apply one order of seasonal differencing to the power consumption data. Let’s explore further to see if we need any additional differencing:

power %>%
  features(difference(box_cox(KWH, lambdap), 12), unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      0

No additional differencing seems to be needed. Let’s take a look at the ACF plot aftering differencing the data.

power %>%
  ACF(difference(box_cox(KWH, lambdap), 12)) %>%
  autoplot() + labs(title="Autocorrelation of Power Consumption Differenced")

The data looks closer to white noise after differencing.

Considering that these data need seasonal differencing, again we will use the ARIMA() model, which applies differencing within the algorithm, making it simpler to build and actually a pretty straight forward choice.

power_fit <- power %>%
  model(ARIMA(box_cox(KWH, lambdap)))

report(power_fit)

## Series: KWH 
## Model: ARIMA(1,0,0)(2,1,0)[12] w/ drift 
## Transformation: box_cox(KWH, lambdap) 
## 
## Coefficients:
##          ar1     sar1     sar2  constant
##       0.2552  -0.7405  -0.3798     9e-04
## s.e.  0.0749   0.0736   0.0752     3e-04
## 
## sigma^2 estimated as 1.309e-05:  log likelihood=764.92
## AIC=-1519.85   AICc=-1519.5   BIC=-1503.88

power_fit %>%
  gg_tsresiduals()

The ARIMA() function picks an ARIMA(1,0,0)(2,1,0)[12] with drift. The residuals’ distribution looks close to normal, they have constant variability and we can say they are white noise.

Forecasts

We will take a look at the forecasts for power consumption for the year 2014:

power_fit %>% forecast(h=12) %>%
  autoplot(power)

The forecast seems reasonable from a visual perspective.

Finally, we save the forecast in a .csv document:

forecast_power <- power_fit %>% forecast(h=12)

write.csv(forecast_power,"forecast_power.csv")

DATA 624 Project 1

Mario Pena

April 3, 2022

Objective

PART A

Data Exploration

Data Preparation

Build Model

Forecasts

PART B

Data Exploration

Data Preparation

Build Model

Forecasts