Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

Power<- read.csv("/Users/christinakasman/Desktop/ResidentialCustomerForecastLoad-624.csv")
head(Power)
##   CaseSequence YYYY.MMM     KWH  X X.1
## 1          733 1998-Jan 6862583 NA  NA
## 2          734 1998-Feb 5838198 NA  NA
## 3          735 1998-Mar 5420658 NA  NA
## 4          736 1998-Apr 5010364 NA  NA
## 5          737 1998-May 4665377 NA  NA
## 6          738 1998-Jun 6467147 NA  NA
summary(Power)
##   CaseSequence     YYYY.MMM              KWH              X          
##  Min.   :733.0   Length:192         Min.   :  770523   Mode:logical  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912   NA's:192      
##  Median :828.5   Mode  :character   Median : 6283324                 
##  Mean   :828.5                      Mean   : 6502475                 
##  3rd Qu.:876.2                      3rd Qu.: 7620524                 
##  Max.   :924.0                      Max.   :10655730                 
##                                     NA's   :1                        
##    X.1         
##  Mode:logical  
##  NA's:192      
##                
##                
##                
##                
## 
# This is monthly data so we will explore seasonality
Power_TS<-ts(Power[,"KWH"], frequency = 12, start=c(1998,1))
autoplot(Power_TS)+ xlab("Time") + ylab("KWH") + ggtitle("Monthly Power Usage (Watts)")

The data appears to be seasonal with an upward drift. There is one outlier apparent in the data after 2010

ggseasonplot(Power_TS)

In general, there appears to be seasonality with increases in December/January and then again in the summer months (June-August). My guess is that this is due to temperature fluctuatuions. The outlier is July 2010. In order to forecast approproately, we will impute the outlier.

# Replace outlier with median value
Power_TS[which.min(Power_TS)]<-median(Power_TS, na.rm = TRUE)
ggtsdisplay(Power_TS)
## Warning: Removed 1 rows containing missing values (geom_point).

We will now perform a box cox transformation

lambda <- BoxCox.lambda(Power_TS)
print(lambda)
## [1] -0.2023101
Power_TS_BC <- BoxCox(Power_TS, lambda)
autoplot(BoxCox(Power_TS, lambda)) +
  ggtitle("BoxCox Transformation")

Power_TS_BC%>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 1.5857 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The KPSS test shows that the test statistic is greater than the 1pct and it is not stationary. We will use a seasonal ARIMA to model and forecast.

fit_power<- auto.arima(Power_TS_BC, seasonal = TRUE)
summary(fit_power)
## Series: Power_TS_BC 
## ARIMA(0,0,1)(2,1,0)[12] with drift 
## 
## Coefficients:
##          ma1     sar1     sar2  drift
##       0.2377  -0.7467  -0.3759  1e-04
## s.e.  0.0802   0.0738   0.0749  1e-04
## 
## sigma^2 estimated as 1.683e-05:  log likelihood=736.27
## AIC=-1462.54   AICc=-1462.19   BIC=-1446.57
## 
## Training set error measures:
##                        ME        RMSE         MAE         MPE       MAPE
## Training set 0.0002399358 0.003927156 0.003111997 0.005021188 0.06571964
##                   MASE       ACF1
## Training set 0.7794575 0.08523054

Forexast with ARIMA model for 12 months

autoplot((forecast(fit_power, h=12)))