Project 1 - This Project has 3 Parts - 2 required and 1 bonus

#Load Required Libraries
suppressMessages(suppressWarnings(library(fpp2)))
suppressMessages(suppressWarnings(library(ggplot2)))
suppressMessages(suppressWarnings(library(dplyr)))
suppressMessages(suppressWarnings(library(tidyverse)))
suppressMessages(suppressWarnings(library(scales)))
suppressMessages(suppressWarnings(library(forecast)))
suppressMessages(suppressWarnings(library(lubridate)))
suppressMessages(suppressWarnings(library(readxl)))
suppressMessages(suppressWarnings(library(tidyr)))
suppressMessages(suppressWarnings(library(plotly)))

Part A – ATM Forecast :

Part B – Forecasting Power :

Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

Loading Data and verifying

power_df <- read_excel("ResidentialCustomerForecastLoad-624.xlsx")
power_df <- ts(power_df[, "KWH"], start = c(1998, 1), frequency = 12)
autoplot(power_df)

power_lambda <- BoxCox.lambda(power_df)
power_trans <- BoxCox(power_df, power_lambda)
ggtsdisplay(diff(power_trans, 12))

Ue ARIMA

Now use ARIMA arima auto function to select the most appropriate model

fc_arima_power <- auto.arima(power_trans)
summary(fc_arima_power)
## Series: power_trans 
## ARIMA(2,1,1)(0,0,2)[12] 
## 
## Coefficients:
##          ar1      ar2      ma1    sma1    sma2
##       0.2593  -0.1585  -0.9752  0.1980  0.2061
## s.e.  0.0743   0.0769   0.0171  0.0816  0.0674
## 
## sigma^2 estimated as 1.7:  log likelihood=-319.45
## AIC=650.9   AICc=651.36   BIC=670.41
## 
## Training set error measures:
##                     ME     RMSE      MAE      MPE     MAPE     MASE        ACF1
## Training set 0.1024691 1.283068 0.796867 0.134273 1.878771 1.160183 -0.03283777

With Ljung box we will test the residuals

Box.test(resid(fc_arima_power), type = "L", fitdf = 3, lag = 12)
## 
##  Box-Ljung test
## 
## data:  resid(fc_arima_power)
## X-squared = 20.847, df = 9, p-value = 0.01335

The p-value 0.01335 is > 0.05 and we used lag 12 for high patterns. Based on the results fir suggests that residuals may be white noise.

power_fit <- Arima(power_df, order = c(2, 1, 1), seasonal = c(0, 0, 2), lambda = power_lambda)
ggtsdisplay(resid(power_fit), plot.type = "histogram")

power_forecast <- forecast(power_fit, 12, level = 95)
autoplot(power_forecast)

Output

Write the forecasted data to a seperate file

data_frame(`YYYY-MMM` = paste0(2014, "-", month.abb), KWH = power_forecast$mean) %>% 
  write_csv("DATA624_POWER_FORECASTING.csv")

Part C – Waterflow_Pipe :

Part C consists of two data sets. These are simple 2 columns sets, however they have different time stamps. Your optional assignment is to time-base sequence the data and aggregate based on hour (example of what this looks like, follows). Note for multiple recordings within an hour, take the mean. Then to determine if the data is stationary and can it be forecast. If so, provide a week forward forecast and present results via Rpubs and .rmd and the forecast in an Excel readable file.

Loading data and verifying

water_1 = read_excel("Waterflow_Pipe1.xlsx",col_types =c("date", "numeric"))
water_2 = read_excel("Waterflow_Pipe2.xlsx",col_types =c("date", "numeric"))
colnames(water_1)= c("Date_time","WaterFlow")
colnames(water_2)= c("Date_Time","WaterFlow")

As specified in the question, lets try to seperate date and hour reading and combine them to make a single dataset. Additionally group by data and aggregate.

Water_df= water_1 %>% mutate(Date_Time = lubridate::round_date(Date_time,"hour") ) %>% select(Date_Time,WaterFlow) %>% bind_rows(water_2) %>% group_by(Date_Time) %>% summarize(WaterFlowF = mean(WaterFlow, na.rm = T))
## `summarise()` ungrouping output (override with `.groups` argument)

Specify the frequency as 24

Water_ts = ts(Water_df$WaterFlowF,frequency = 24)

Use ETS

ggtsdisplay(Water_ts)

Water_Model1 = ets(Water_ts,lambda=BoxCox.lambda(Water_ts))

Check summary of ets method

summary(Water_Model1)
## ETS(A,N,N) 
## 
## Call:
##  ets(y = Water_ts, lambda = BoxCox.lambda(Water_ts)) 
## 
##   Box-Cox transformation: lambda= -0.8244 
## 
##   Smoothing parameters:
##     alpha = 0.0119 
## 
##   Initial states:
##     l = 1.1247 
## 
##   sigma:  0.047
## 
##      AIC     AICc      BIC 
## 798.1980 798.2221 812.9243 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE     MASE       ACF1
## Training set 7.738182 16.76823 12.74726 -1.625442 42.41892 0.849401 0.05613023

Check for residuals

checkresiduals(Water_Model1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 41.298, df = 46, p-value = 0.6692
## 
## Model df: 2.   Total lags used: 48
autoplot(forecast(Water_Model1, 168, level = 95))

Water_Model1_Forecast = forecast(Water_Model1, 168, level=95)

Use ARIMA

Now use ARIMA arima auto function to select the most appropriate model.

Water_Model2 =auto.arima(Water_ts,lambda=BoxCox.lambda(Water_ts))

Check summary of ARIMA

summary(Water_Model2)
## Series: Water_ts 
## ARIMA(3,1,3) 
## Box Cox transformation: lambda= -0.8243695 
## 
## Coefficients:
##           ar1      ar2      ar3      ma1     ma2      ma3
##       -0.4049  -0.9688  -0.0339  -0.5872  0.5673  -0.9475
## s.e.   0.0439   0.0255   0.0329   0.0309  0.0345   0.0236
## 
## sigma^2 estimated as 0.002199:  log likelihood=1642.1
## AIC=-3270.2   AICc=-3270.09   BIC=-3235.85
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 7.529514 16.72952 12.64518 -1.560004 41.71755 0.8425988 0.05232296

Check for residuals

checkresiduals(Water_Model2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(3,1,3)
## Q* = 31.449, df = 42, p-value = 0.883
## 
## Model df: 6.   Total lags used: 48
Water_Model2_Forecast = forecast(Water_Model2, 168, level=95)

autoplot(forecast(Water_Model2, 168, level = 95))

Wtire forecasted data to file

Water_csv= data_frame(Date_Time = max(Water_df$Date_Time) + lubridate::hours(1:168),
           WaterFlowF = as.numeric( Water_Model2_Forecast$mean) )

write.csv(Water_csv,"DATA624_WATERPIPE_FORECAST.csv")