Project 1 Description

This project consisx of 3 parx - two required and one bonus and is worth 15% of your grade. The project is due at 11:59 PM on Sunday Apr 11. I will accept late submissions with a penalty until the meetup after that when we review some projecx.

Part A – ATM Forecast, ATM624Data.xlsx

In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.

Reading in the DATA

ATMDATA <- read.csv(file = 'https://raw.githubusercontent.com/johnm1990/DATA624/main/ATM624Data(2)%20-%20ATM%20Data.csv')
head(ATMDATA)

##    DATE  ATM Cash
## 1 39934 ATM1   96
## 2 39934 ATM2  107
## 3 39935 ATM1   82
## 4 39935 ATM2   89
## 5 39936 ATM1   85
## 6 39936 ATM2   90

RCFL <- read.csv(file = 'https://raw.githubusercontent.com/johnm1990/DATA624/main/ResidentialCustomerForecastLoad-624(2).xlsx%20-%20ResidentialCustomerForecastLoad.csv')
head(RCFL)

##   CaseSequence YYYY.MMM     KWH
## 1          733 1998-Jan 6862583
## 2          734 1998-Feb 5838198
## 3          735 1998-Mar 5420658
## 4          736 1998-Apr 5010364
## 5          737 1998-May 4665377
## 6          738 1998-Jun 6467147

ANALYZE AND INSPECT THE DATA

First we start of by inspecting the data and checking for any missing/incomplete values. When values should have been reported but were not available, we end up with missing values. In real-life data, missing values occur almost automatically. We see nonresponse in surveys, technical issues during data collection or joining data from different sources. data for which we have only complete cases are rather scarce.

ATMDATA[!complete.cases(ATMDATA),]

##      DATE  ATM Cash
## 87  39977 ATM1   NA
## 93  39980 ATM1   NA
## 98  39982 ATM2   NA
## 105 39986 ATM1   NA
## 110 39988 ATM2   NA
## 731 40299        NA
## 732 40300        NA
## 733 40301        NA
## 734 40302        NA
## 735 40303        NA
## 736 40304        NA
## 737 40305        NA
## 738 40306        NA
## 739 40307        NA
## 740 40308        NA
## 741 40309        NA
## 742 40310        NA
## 743 40311        NA
## 744 40312        NA

head(ATMDATA)

##    DATE  ATM Cash
## 1 39934 ATM1   96
## 2 39934 ATM2  107
## 3 39935 ATM1   82
## 4 39935 ATM2   89
## 5 39936 ATM1   85
## 6 39936 ATM2   90

#fix the date - as you saw previously we need to make some adjustment to fix data

ATMDATA$DATE<-as.Date(ATMDATA$DATE, origin = "1899-12-30")

Note from above that of 19 entries(rows) we see that 14 are missing. For simplicity sake these are supposed to be removed.

While still exploring the data we make a matrix of plox with a given data set

ATM_DF <- ATMDATA %>%
      drop_na() %>%
      spread(ATM, Cash) %>% 
      mutate(DATE = as.Date(DATE, origin='1899-12-30'))
head(ATM_DF)

##         DATE ATM1 ATM2 ATM3 ATM4
## 1 2009-05-01   96  107    0  777
## 2 2009-05-02   82   89    0  524
## 3 2009-05-03   85   90    0  793
## 4 2009-05-04   90   55    0  908
## 5 2009-05-05   99   79    0   53
## 6 2009-05-06   88   19    0   52

ggpairs(ATMDATA)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

FROM BELOW NOTE THAT ATM 1 AND ATM2 APPEAR SIMILAR IN CHARACTERSTICS. BOTH SEASONALITY AND RANGE OF CASH WITHDRAWL. NOTE THAT ATM3 DID NOT HAVE MUCH ACTIVITY IT APPEARS UNTIL WAY LATER DATES NOTE THAT ATM4 HAS A BIT SIMILAR IN CHARACTERISTICS WITH ATM1 ATM2. THERE WAS NOTEABLE WITHDRAWAL IN CASH >9000. THIS IN THEORY IS OUTLIER

ATM_DF <- ATM_DF[1:(dim(ATM_DF)[1] - 14),]
atm_x <- ts(ATM_DF %>% select(ATM1:ATM4), frequency=7, end = nrow(ATM_DF) - 14)
autoplot(atm_x, facet = TRUE)

AFTER SEEING ABOVE STATISTICS I WILL GO AHEAD AND ONE BY ONE (INDIVIDUAL) APPROACH

ATM1_x <- ts(ATM_DF[, "ATM1"], frequency = 7)
ATM2_x <- ts(ATM_DF[, "ATM2"] , frequency = 7)
ATM3_x <- ts(ATM_DF[, "ATM3"], frequency = 7)
ATM4_x <- ts(ATM_DF[, "ATM4"], frequency = 7)


ggtsdisplay(ATM1_x , poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")

ggtsdisplay(ATM2_x, poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")

ggtsdisplay(ATM3_x, poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")

ggtsdisplay(ATM4_x, poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")

A box plot is a highly visually effective way of viewing a clear summary of one or more sets of data. It is particularly useful for quickly summarizing and comparing different sets of results from different experiments. At a glance, a box plot allows a graphical display of the distribution of results and provides indications of symmetry within the data.

par(mfrow=c(4,1))
for (i in 2:5) {
  print(summary(ATM_DF[i]))
  boxplot(ATM_DF[i], horizontal = TRUE)
}

##       ATM1       
##  Min.   :  1.00  
##  1st Qu.: 73.00  
##  Median : 91.00  
##  Mean   : 84.26  
##  3rd Qu.:108.00  
##  Max.   :180.00  
##  NA's   :3

##       ATM2       
##  Min.   :  0.00  
##  1st Qu.: 26.00  
##  Median : 67.00  
##  Mean   : 62.65  
##  3rd Qu.: 93.00  
##  Max.   :147.00  
##  NA's   :2

##       ATM3  
##  Min.   :0  
##  1st Qu.:0  
##  Median :0  
##  Mean   :0  
##  3rd Qu.:0  
##  Max.   :0

##       ATM4        
##  Min.   :    2.0  
##  1st Qu.:  124.5  
##  Median :  412.0  
##  Mean   :  480.8  
##  3rd Qu.:  710.5  
##  Max.   :10920.0

TRANSFORMATION

ATM 1

below note to include diff and transformation(boxcox)

ATM1_LMB <- BoxCox.lambda(ATM1_x)
ATM1_BXCX <- ATM1_x %>% BoxCox(ATM1_LMB)
ATM1_BXCX_DIFF <- ATM1_BXCX %>% diff(lag=7)
ggtsdisplay(ATM1_BXCX_DIFF, points = FALSE, main = "including differencing and transf. ", xlab = "DAY", ylab = "Amounts of Cash")

ATM 2

ggtsdisplay(ATM2_x, points = FALSE, main = "ATM #2 WITHDRAW", xlab = "Day", ylab = "Amounts of Cash")

ATM2_LMB <- BoxCox.lambda(ATM2_x)
ATM2_BXCX <- ATM2_x %>% BoxCox(ATM2_LMB)
ATM2_BXCX_DIFF <- ATM2_BXCX %>% diff(lag=7)
ggtsdisplay(ATM2_BXCX_DIFF, points = FALSE, main = "with transformation and differencing", xlab = "Day", ylab = "Cash Amounts")

ATM 3

ggtsdisplay(ATM3_x, points = FALSE, main = "ATM #3 WITHDRAWLS", xlab = "Day", ylab = "Amounts OF CASH")

ATM 4

Given that all cash withdrawals from ATM4 was large, we will transform below

ggtsdisplay(ATM4_x, points = FALSE, main = "ATM#4 Withdrawls", xlab = "Day", ylab = "Amounts of Cash")

MODEL THE DATA ATM 1 and ATM2

ATM1_LMB <- BoxCox.lambda(ATM1_x)
ATM2_LMB <- BoxCox.lambda(ATM2_x)
ATM1_ARM <- auto.arima(ATM1_x)
ATM2_ARM <- auto.arima(ATM2_x)


# FOR ATM3 apparently no trend or seasonality. Lets use the mean and naive model.
ATM3_MN <- meanf(ATM3_x, h = 14)
ATM3_NV <- naive(ATM3_x, h = 14)



# FOR ATM4 i will be using for first time auto arima

ATM4_LMB <- BoxCox.lambda(ATM4_x)
ATM4_ARM <- auto.arima(ATM4_x)

LETS CHECK RESIDUALS

checkresiduals(ATM1_ARM)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(2,1,0)[7]
## Q* = 18.177, df = 11, p-value = 0.07756
## 
## Model df: 3.   Total lags used: 14

checkresiduals(ATM2_ARM)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0)(2,1,0)[7]
## Q* = 28.161, df = 12, p-value = 0.005239
## 
## Model df: 2.   Total lags used: 14

checkresiduals(ATM3_MN)

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = NaN, df = 13, p-value = NA
## 
## Model df: 1.   Total lags used: 14

checkresiduals(ATM4_ARM)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 4.5349, df = 13, p-value = 0.9841
## 
## Model df: 1.   Total lags used: 14

LETS CHECK SUMMARIES

summary(ATM1_ARM)

## Series: ATM1_x 
## ARIMA(1,0,0)(2,1,0)[7] 
## 
## Coefficients:
##          ar1     sar1     sar2
##       0.1539  -0.4693  -0.2054
## s.e.  0.0546   0.0527   0.0523
## 
## sigma^2 = 626.2:  log likelihood = -1581.73
## AIC=3171.46   AICc=3171.57   BIC=3186.82
## 
## Training set error measures:
##                      ME     RMSE      MAE       MPE     MAPE     MASE
## Training set -0.0563155 24.66123 15.62939 -102.0602 118.9276 0.865596
##                     ACF1
## Training set 0.009681303

summary(ATM2_ARM)

## Series: ATM2_x 
## ARIMA(0,0,0)(2,1,0)[7] 
## 
## Coefficients:
##          sar1     sar2
##       -0.5702  -0.1700
## s.e.   0.0531   0.0533
## 
## sigma^2 = 669.4:  log likelihood = -1598.34
## AIC=3202.68   AICc=3202.75   BIC=3214.2
## 
## Training set error measures:
##                      ME     RMSE      MAE  MPE MAPE      MASE       ACF1
## Training set -0.1915631 25.53727 17.48062 -Inf  Inf 0.8411279 0.01916964

autoplot(ATM3_x) +
  autolayer(ATM3_NV, series = "Naive", PI = FALSE) +
  autolayer(ATM3_MN, series = "Average", PI = FALSE)

summary(ATM4_ARM)

## Series: ATM4_x 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##           mean
##       480.7635
## s.e.   35.2553
## 
## sigma^2 = 437517:  log likelihood = -2777.09
## AIC=5558.19   AICc=5558.22   BIC=5565.91
## 
## Training set error measures:
##                                   ME     RMSE      MAE       MPE     MAPE
## Training set -0.00000000000008229916 660.5081 328.3757 -578.5926 608.9941
##                   MASE         ACF1
## Training set 0.8101007 -0.009857496

LETS FIT THE MODEL ATM1 FORECASTING

ATM1_FT<-Arima(ATM1_x, order = c(1, 0, 0), seasonal = c(2, 1, 0), lambda = ATM1_LMB)
ATM1_FCST<-forecast(ATM1_FT, 31)  

autoplot(ATM1_x) +
  autolayer(ATM1_FCST, series = "ATM 1 ARIMA FORECAST", PI = FALSE)

ATM2 FORECASTING

ATM2_FT<-Arima(ATM2_x, order = c(1, 0, 0), seasonal = c(2, 1, 0), lambda = ATM2_LMB)
ATM2_FCST<-forecast(ATM2_FT, 31) 

autoplot(ATM2_x) +
  autolayer(ATM2_FCST, series = "ATM 2 FORECAST ARIMA ", PI = FALSE)

ATM3 FORECASTING

ATM3_NV <- naive(ATM3_x, h = 31)
autoplot(ATM3_x) +
  autolayer(ATM3_NV, series = "ATM3 NAIVE FRCSTING ", PI = FALSE)

ATM4 FORECASTING

ATM4_FT<-Arima(ATM4_x, order = c(0, 0, 0), lambda = ATM4_LMB)
ATM4_FCST<-forecast(ATM4_FT, 31)  

autoplot(ATM4_x) +
  autolayer(ATM4_FCST, series = "ATM4 AUT ARM FORECAST ", PI = FALSE)

CONCLUSION we will export using below the forecast rslts for all ATMS

names(ATM_DF)[-1]

## [1] "ATM1" "ATM2" "ATM3" "ATM4"

max(ATM_DF$DATE)

## [1] "2010-04-16"

RSLTS <- data_frame(DATE = rep(max(ATM_DF$DATE) + 1:31, 4), ATM = rep(names(ATM_DF)[-1], each = 31), Cash = c(ATM1_FCST$mean, ATM2_FCST$mean,ATM3_NV$mean, ATM4_FCST$mean)) 

head(RSLTS)

## # A tibble: 6 x 3
##   DATE       ATM     Cash
##   <date>     <chr>  <dbl>
## 1 2010-04-17 ATM1   91.9 
## 2 2010-04-18 ATM1  102.  
## 3 2010-04-19 ATM1   65.6 
## 4 2010-04-20 ATM1    5.43
## 5 2010-04-21 ATM1  109.  
## 6 2010-04-22 ATM1   82.7

write.csv(RSLTS,"D:/CUNY SPS/Spring 2022/DATA 624/rslts.csv", row.names = FALSE)

Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx

Part B consisx of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.

head(RCFL)

##   CaseSequence YYYY.MMM     KWH
## 1          733 1998-Jan 6862583
## 2          734 1998-Feb 5838198
## 3          735 1998-Mar 5420658
## 4          736 1998-Apr 5010364
## 5          737 1998-May 4665377
## 6          738 1998-Jun 6467147

summary(RCFL)

##   CaseSequence     YYYY.MMM              KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5429912  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6502475  
##  3rd Qu.:876.2                      3rd Qu.: 7620524  
##  Max.   :924.0                      Max.   :10655730  
##                                     NA's   :1

LOCATING THE MISSING VALUES

which(is.na(RCFL), arr.ind=TRUE)

##      row col
## [1,] 129   3

# aka 861 - 2008-Sep

slice(RCFL,c(127:132))

##   CaseSequence YYYY.MMM     KWH
## 1          859 2008-Jul 7643987
## 2          860 2008-Aug 8037137
## 3          861 2008-Sep      NA
## 4          862 2008-Oct 5101803
## 5          863 2008-Nov 4555602
## 6          864 2008-Dec 6442746

#HERE YOU CAN WITNESS THE MISSING (NA) DATA

ONCE AGAIN AS YOU CAN SEE IT APPEARS MISSING DATA FOR 2008 SEPT

EXPLORING FURTHER THE DATA

RCFL <- RCFL %>% rename(Date = 'YYYY.MMM')
RCFL_data <- RCFL %>% select(-CaseSequence) 
RCFL_data <- RCFL_data %>%  mutate(Date = as.Date(paste0('01-', Date), '%d-%Y-%b'))
 
ggplot(RCFL_data, aes(Date, KWH)) +geom_line() + ggtitle('POWER USAGE OF RESIDENTIAL')

AS YOU CAN SEE FROM ABOVE CHART AND BELOW STAT PULL, VERY LOW DECLINE OF USAGE NEAR 2010

min(RCFL_data$KWH,na.rm = TRUE)

## [1] 770523

RCFL_2 <-ts(RCFL[, "KWH"], start = c(1998, 1), frequency = 12)
ggseasonplot(RCFL_2)+ggtitle('USAGE BY YEAR FOR RESIDENTIAL POWER')

BEING THAT THE DATE APPEARS SEASONAL, I THNINK WE COULD USE MEAN VALUE OF THE MONTHS JUNE / NOV IN ORDER TO HANDLE MISSING

RCFL_data<- RCFL_data[-c(129,151),]


#Get average by month
RCFL_data$Month <- months(RCFL_data$Date)
aggregate(KWH ~ Month, RCFL_data, mean)

##        Month     KWH
## 1      April 5299734
## 2     August 8298211
## 3   December 6283175
## 4   February 6946556
## 5    January 8007422
## 6       July 7852521
## 7       June 6536092
## 8      March 5971450
## 9        May 5039034
## 10  November 4953619
## 11   October 5657164
## 12 September 7702333

RCFL$KWH[is.na(RCFL$KWH)] = median(RCFL$KWH, na.rm=TRUE)

summary(RCFL)

##   CaseSequence       Date                KWH          
##  Min.   :733.0   Length:192         Min.   :  770523  
##  1st Qu.:780.8   Class :character   1st Qu.: 5434539  
##  Median :828.5   Mode  :character   Median : 6283324  
##  Mean   :828.5                      Mean   : 6501333  
##  3rd Qu.:876.2                      3rd Qu.: 7608792  
##  Max.   :924.0                      Max.   :10655730

RCFL_ts <- ts(RCFL$KWH, start=c(1998,1), frequency = 12)
RCFL_ts

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1998  6862583  5838198  5420658  5010364  4665377  6467147  8914755  8607428
## 1999  7183759  5759262  4847656  5306592  4426794  5500901  7444416  7564391
## 2000  7068296  5876083  4807961  4873080  5050891  7092865  6862662  7517830
## 2001  7538529  6602448  5779180  4835210  4787904  6283324  7855129  8450717
## 2002  7099063  6413429  5839514  5371604  5439166  5850383  7039702  8058748
## 2003  7256079  6190517  6120626  4885643  5296096  6051571  6900676  8476499
## 2004  7584596  6560742  6526586  4831688  4878262  6421614  7307931  7309774
## 2005  8225477  6564338  5581725  5563071  4453983  5900212  8337998  7786659
## 2006  7793358  5914945  5819734  5255988  4740588  7052275  7945564  8241110
## 2007  8031295  7928337  6443170  4841979  4862847  5022647  6426220  7447146
## 2008  7964293  7597060  6085644  5352359  4608528  6548439  7643987  8037137
## 2009  8072330  6976800  5691452  5531616  5264439  5804433  7713260  8350517
## 2010  9397357  8390677  7347915  5776131  4919289  6696292   770523  7922701
## 2011  8394747  8898062  6356903  5685227  5506308  8037779 10093343 10308076
## 2012  8991267  7952204  6356961  5569828  5783598  7926956  8886851  9612423
## 2013 10655730  7681798  6517514  6105359  5940475  7920627  8415321  9080226
##           Sep      Oct      Nov      Dec
## 1998  6989888  6345620  4640410  4693479
## 1999  7899368  5358314  4436269  4419229
## 2000  8912169  5844352  5041769  6220334
## 2001  7112069  5242535  4461979  5240995
## 2002  8245227  5865014  4908979  5779958
## 2003  7791791  5344613  4913707  5756193
## 2004  6690366  5444948  4824940  5791208
## 2005  7057213  6694523  4313019  6181548
## 2006  7296355  5104799  4458429  6226214
## 2007  7666970  5785964  4907057  6047292
## 2008  6283324  5101803  4555602  6442746
## 2009  7583146  5566075  5339890  7089880
## 2010  7819472  5875917  4800733  6152583
## 2011  8943599  5603920  6154138  8273142
## 2012  7559148  5576852  5731899  6609694
## 2013  7968220  5759367  5769083  9606304

# Before Transformation
ggtsdisplay(RCFL_ts, main="Monthly Power Consumption before transform")

BOXCOX TRANSFORM

RCFLS_BXCX <- RCFL_ts %>% BoxCox(lambda= 'auto')
ggtsdisplay(RCFLS_BXCX, main='MONTHLY POWER CONSUMER BXCX')

INVESTIGATE THE DATA CLOSELY

ggseasonplot(RCFLS_BXCX)

summary(RCFLS_BXCX)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.80   38.69   39.42   39.47   40.41   42.20

HELPFUL STATS TO EXPLORE SEASONALITY OR PATTERNS

ggsubseriesplot(RCFLS_BXCX)

ggAcf(RCFLS_BXCX)

LETS UTILIZE A BOX TEST TO TAKE A CLOSER LOOK

Box.test(RCFLS_BXCX, type = c("Ljung-Box"))

## 
##  Box-Ljung test
## 
## data:  RCFLS_BXCX
## X-squared = 16.556, df = 1, p-value = 0.00004723

summary(RCFLS_BXCX)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.80   38.69   39.42   39.47   40.41   42.20

boxplot(RCFLS_BXCX~cycle(RCFLS_BXCX))

DIFFERENCING

print(paste0("Suggested # of diff: ", ndiffs(RCFLS_BXCX)))

## [1] "Suggested # of diff: 1"

print(paste0("DIFF REQUIRED (SEASIONAL): ", ndiffs(diff(RCFLS_BXCX, lag=12))))

## [1] "DIFF REQUIRED (SEASIONAL): 0"

RCFL_PWR_DIFF <- RCFLS_BXCX %>% diff(lag = 12)
ggtsdisplay(RCFL_PWR_DIFF, main= "Monthly power consumption BXCX AND DIFF")

LETS SEE A GRAPHIC FOR RES POWER USAGE BY YEAR

ggseasonplot(RCFL_PWR_DIFF,polar = TRUE)+ggtitle('Residential Power Usage by Year')

plot(RCFL_PWR_DIFF)

#LET SEE A MOVING AVG

autoplot(RCFL_PWR_DIFF, series="Data")+
  autolayer(ma(RCFL_PWR_DIFF, 12), series = "12 MTH Moving Avg")+ ggtitle("2014 MVING AVG")

FORECAST MODELING

1 STL - ANN NO DP

#stlf - etsmodel
RCFLS_STL <- stlf(RCFL_PWR_DIFF, damped=FALSE, s.window = "periodic", robust=TRUE, h = 12)

# forecast plot
autoplot(RCFLS_STL) + autolayer(fitted(RCFLS_STL))

#2 STL - DP AADN

#stlf - etsmodel estimation --- M, Ad, N is chosen.
RCFL_STL_DP <- stlf(RCFL_PWR_DIFF, damped=TRUE, s.window = "periodic", robust=TRUE, h = 12)

# forecast plot
autoplot(RCFL_STL_DP) + autolayer(fitted(RCFL_STL_DP))

#3 - ARIMA

# auto.arima
arima_model <- auto.arima(RCFL_PWR_DIFF)

# forecast values
arima_model <- forecast(arima_model, h=20)

# forecast plot
autoplot(arima_model) + autolayer(fitted(arima_model))

#4 - ETS MNM

RCFL_ETS<- ets(RCFL_PWR_DIFF)

# forecast plot
autoplot(forecast(RCFL_ETS
                , h=12)) + autolayer(fitted(RCFL_ETS
                                                    ))

5 EXP SMOOTH

RCFL_FCST_PWR_S <- ses(RCFL_PWR_DIFF, h=12)
autoplot(RCFL_FCST_PWR_S)+
  autolayer(fitted(RCFL_FCST_PWR_S), series="Fitted")

#RCFL_FCST_PWR_S

COMPARISON OF THE MODELS

accuracy(RCFLS_STL)

##                        ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0005455413 1.348221 0.5931651 -1360.432 1605.179 0.5404898
##                   ACF1
## Training set 0.1194029

checkresiduals(RCFLS_STL)

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,N,N)
## Q* = 65.794, df = 22, p-value = 0.000002984
## 
## Model df: 2.   Total lags used: 24

summary(RCFLS_STL)

## 
## Forecast method: STL +  ETS(A,N,N)
## 
## Model Information:
## ETS(A,N,N) 
## 
## Call:
##  ets(y = na.interp(x), model = etsmodel, damped = FALSE, allow.multiplicative.trend = allow.multiplicative.trend) 
## 
##   Smoothing parameters:
##     alpha = 0.0001 
## 
##   Initial states:
##     l = 0.0713 
## 
##   sigma:  1.3558
## 
##      AIC     AICc      BIC 
## 1048.295 1048.432 1057.874 
## 
## Error measures:
##                        ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0005455413 1.348221 0.5931651 -1360.432 1605.179 0.5404898
##                   ACF1
## Training set 0.1194029
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2014     0.17305758 -1.564437 1.910552 -2.484211 2.830326
## Feb 2014     0.06130592 -1.676189 1.798801 -2.595963 2.718575
## Mar 2014     0.12236178 -1.615133 1.859857 -2.534907 2.779631
## Apr 2014     0.08210098 -1.655394 1.819596 -2.575168 2.739370
## May 2014     0.12746350 -1.610031 1.864958 -2.529805 2.784732
## Jun 2014     0.14679543 -1.590699 1.884290 -2.510474 2.804064
## Jul 2014    -0.19208644 -1.929581 1.545408 -2.849355 2.465183
## Aug 2014    -0.02245609 -1.759951 1.715039 -2.679725 2.634813
## Sep 2014     0.16034813 -1.577147 1.897843 -2.496921 2.817617
## Oct 2014     0.05745299 -1.680042 1.794948 -2.599816 2.714722
## Nov 2014     0.08070004 -1.656795 1.818195 -2.576569 2.737969
## Dec 2014     0.05898262 -1.678512 1.796477 -2.598286 2.716252

accuracy(RCFL_STL_DP)

##                        ME     RMSE       MAE       MPE     MAPE      MASE
## Training set -0.007629573 1.349456 0.5947813 -1328.026 1577.668 0.5419625
##                  ACF1
## Training set 0.120852

checkresiduals(RCFL_STL_DP)

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,Ad,N)
## Q* = 65.735, df = 19, p-value = 0.0000004636
## 
## Model df: 5.   Total lags used: 24

summary(RCFL_STL_DP)

## 
## Forecast method: STL +  ETS(A,Ad,N)
## 
## Model Information:
## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = na.interp(x), model = etsmodel, damped = TRUE, allow.multiplicative.trend = allow.multiplicative.trend) 
## 
##   Smoothing parameters:
##     alpha = 0.0001 
##     beta  = 0.0001 
##     phi   = 0.9622 
## 
##   Initial states:
##     l = 0.1056 
##     b = -0.0006 
## 
##   sigma:  1.3686
## 
##      AIC     AICc      BIC 
## 1054.625 1055.110 1073.783 
## 
## Error measures:
##                        ME     RMSE       MAE       MPE     MAPE      MASE
## Training set -0.007629573 1.349456 0.5947813 -1328.026 1577.668 0.5419625
##                  ACF1
## Training set 0.120852
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2014     0.17900301 -1.574926 1.932932 -2.503400 2.861406
## Feb 2014     0.06761419 -1.686315 1.821544 -2.614789 2.750018
## Mar 2014     0.12901915 -1.624910 1.882949 -2.553385 2.811423
## Apr 2014     0.08909425 -1.664835 1.843024 -2.593310 2.771498
## May 2014     0.13477996 -1.619150 1.888710 -2.547624 2.817184
## Jun 2014     0.15442285 -1.599507 1.908353 -2.527982 2.836828
## Jul 2014    -0.18415983 -1.938091 1.569771 -2.866565 2.498246
## Aug 2014    -0.01424159 -1.768173 1.739690 -2.696648 2.668165
## Sep 2014     0.16883961 -1.585092 1.922772 -2.513568 2.851247
## Oct 2014     0.06621097 -1.687722 1.820144 -2.616198 2.748620
## Nov 2014     0.08971444 -1.664219 1.843648 -2.592696 2.772125
## Dec 2014     0.06824374 -1.685691 1.822179 -2.614168 2.750656

accuracy(arima_model)

##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set -0.02554754 0.9666583 0.4541852 -1920.696 2160.894 0.4138519
##                     ACF1
## Training set 0.006469743

checkresiduals(arima_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,0,2)[12] with non-zero mean
## Q* = 6.8185, df = 20, p-value = 0.9972
## 
## Model df: 4.   Total lags used: 24

summary(arima_model)

## 
## Forecast method: ARIMA(0,0,1)(0,0,2)[12] with non-zero mean
## 
## Model Information:
## Series: RCFL_PWR_DIFF 
## ARIMA(0,0,1)(0,0,2)[12] with non-zero mean 
## 
## Coefficients:
##          ma1     sma1    sma2    mean
##       0.1195  -0.9264  0.0890  0.0616
## s.e.  0.0712   0.0810  0.0811  0.0198
## 
## sigma^2 = 0.9557:  log likelihood = -257.11
## AIC=524.23   AICc=524.57   BIC=540.19
## 
## Error measures:
##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set -0.02554754 0.9666583 0.4541852 -1920.696 2160.894 0.4138519
##                     ACF1
## Training set 0.006469743
## 
## Forecasts:
##          Point Forecast     Lo 80      Hi 80     Lo 95     Hi 95
## Jan 2014    -0.71170867 -1.964897  0.5414796 -2.628295 1.2048774
## Feb 2014     0.20176196 -1.060338  1.4638615 -1.728453 2.1319766
## Mar 2014     0.10559912 -1.156500  1.3676986 -1.824616 2.0358137
## Apr 2014    -0.21986269 -1.481962  1.0422368 -2.150077 1.7103519
## May 2014    -0.33880539 -1.600905  0.9232941 -2.269020 1.5914092
## Jun 2014    -0.43704168 -1.699141  0.8250578 -2.367256 1.4931729
## Jul 2014    -1.07257955 -2.334679  0.1895200 -3.002794 0.8576351
## Aug 2014     0.06430436 -1.197795  1.3264039 -1.865910 1.9945190
## Sep 2014     0.14938779 -1.112712  1.4114873 -1.780827 2.0796024
## Oct 2014     0.22911025 -1.032989  1.4912097 -1.701104 2.1593249
## Nov 2014    -0.19797910 -1.460079  1.0641204 -2.128194 1.7322355
## Dec 2014    -1.54532103 -2.807419 -0.2832232 -3.475533 0.3848910
## Jan 2015    -0.02932622 -1.743669  1.6850167 -2.651187 2.5925349
## Feb 2015     0.05006537 -1.669873  1.7700036 -2.580353 2.6804839
## Mar 2015     0.05625769 -1.663681  1.7761959 -2.574161 2.6866762
## Apr 2015     0.08773329 -1.632205  1.8076715 -2.542685 2.7181518
## May 2015     0.10404245 -1.615896  1.8239807 -2.526376 2.7344610
## Jun 2015     0.11673378 -1.603204  1.8366720 -2.513685 2.7471523
## Jul 2015     0.18863975 -1.531298  1.9085780 -2.441779 2.8190583
## Aug 2015     0.06605819 -1.653880  1.7859964 -2.564360 2.6964767

accuracy(RCFL_ETS)

##                          ME     RMSE       MAE       MPE     MAPE     MASE
## Training set -0.00001229798 1.346981 0.6002461 -768.1263 994.3076 0.546942
##                   ACF1
## Training set 0.1202904

checkresiduals(RCFL_ETS)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 66.674, df = 22, p-value = 0.000002184
## 
## Model df: 2.   Total lags used: 24

summary(RCFL_ETS)

## ETS(A,N,N) 
## 
## Call:
##  ets(y = RCFL_PWR_DIFF) 
## 
##   Smoothing parameters:
##     alpha = 0.0001 
## 
##   Initial states:
##     l = 0.0719 
## 
##   sigma:  1.3545
## 
##      AIC     AICc      BIC 
## 1047.964 1048.100 1057.543 
## 
## Training set error measures:
##                          ME     RMSE       MAE       MPE     MAPE     MASE
## Training set -0.00001229798 1.346981 0.6002461 -768.1263 994.3076 0.546942
##                   ACF1
## Training set 0.1202904

accuracy(RCFL_FCST_PWR_S)

##                        ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0002402688 1.346981 0.6002376 -765.1024 991.1566 0.5469343
##                   ACF1
## Training set 0.1202903

checkresiduals(RCFL_FCST_PWR_S)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 66.674, df = 22, p-value = 0.000002184
## 
## Model df: 2.   Total lags used: 24

summary(RCFL_FCST_PWR_S)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = RCFL_PWR_DIFF, h = 12) 
## 
##   Smoothing parameters:
##     alpha = 0.0001 
## 
##   Initial states:
##     l = 0.0716 
## 
##   sigma:  1.3545
## 
##      AIC     AICc      BIC 
## 1047.964 1048.100 1057.543 
## 
## Error measures:
##                        ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0002402688 1.346981 0.6002376 -765.1024 991.1566 0.5469343
##                   ACF1
## Training set 0.1202903
## 
## Forecasts:
##          Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Feb 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Mar 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Apr 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## May 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Jun 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Jul 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Aug 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Sep 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Oct 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726455
## Nov 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726455
## Dec 2014     0.07162989 -1.664267 1.807526 -2.583195 2.726455

If you look at ARIMA it based AIC it appears with best result. BIC dropped to 540. AIC it dropped to 524. RMSE has also dropped from 1.347 to 0.966. i think i’ll take ARIMA model on this one. I’ll go ahead and predict the values in csv as I am comfortable with the results of ARIMA.

rslts_2 <- forecast(arima_model, h=12)
rslts_fin <- data.frame(rslts_2)

write.csv(rslts_fin,"D:/CUNY SPS/Spring 2022/DATA 624/rslts_rcfl.csv", row.names = FALSE)

DATA624_Project1

John Mazon

4/3/2022