This project consisx of 3 parx - two required and one bonus and is worth 15% of your grade. The project is due at 11:59 PM on Sunday Apr 11. I will accept late submissions with a penalty until the meetup after that when we review some projecx.
In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
Reading in the DATA
ATMDATA <- read.csv(file = 'https://raw.githubusercontent.com/johnm1990/DATA624/main/ATM624Data(2)%20-%20ATM%20Data.csv')
head(ATMDATA)
## DATE ATM Cash
## 1 39934 ATM1 96
## 2 39934 ATM2 107
## 3 39935 ATM1 82
## 4 39935 ATM2 89
## 5 39936 ATM1 85
## 6 39936 ATM2 90
RCFL <- read.csv(file = 'https://raw.githubusercontent.com/johnm1990/DATA624/main/ResidentialCustomerForecastLoad-624(2).xlsx%20-%20ResidentialCustomerForecastLoad.csv')
head(RCFL)
## CaseSequence YYYY.MMM KWH
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
ANALYZE AND INSPECT THE DATA
First we start of by inspecting the data and checking for any missing/incomplete values. When values should have been reported but were not available, we end up with missing values. In real-life data, missing values occur almost automatically. We see nonresponse in surveys, technical issues during data collection or joining data from different sources. data for which we have only complete cases are rather scarce.
ATMDATA[!complete.cases(ATMDATA),]
## DATE ATM Cash
## 87 39977 ATM1 NA
## 93 39980 ATM1 NA
## 98 39982 ATM2 NA
## 105 39986 ATM1 NA
## 110 39988 ATM2 NA
## 731 40299 NA
## 732 40300 NA
## 733 40301 NA
## 734 40302 NA
## 735 40303 NA
## 736 40304 NA
## 737 40305 NA
## 738 40306 NA
## 739 40307 NA
## 740 40308 NA
## 741 40309 NA
## 742 40310 NA
## 743 40311 NA
## 744 40312 NA
head(ATMDATA)
## DATE ATM Cash
## 1 39934 ATM1 96
## 2 39934 ATM2 107
## 3 39935 ATM1 82
## 4 39935 ATM2 89
## 5 39936 ATM1 85
## 6 39936 ATM2 90
#fix the date - as you saw previously we need to make some adjustment to fix data
ATMDATA$DATE<-as.Date(ATMDATA$DATE, origin = "1899-12-30")
Note from above that of 19 entries(rows) we see that 14 are missing. For simplicity sake these are supposed to be removed.
While still exploring the data we make a matrix of plox with a given data set
ATM_DF <- ATMDATA %>%
drop_na() %>%
spread(ATM, Cash) %>%
mutate(DATE = as.Date(DATE, origin='1899-12-30'))
head(ATM_DF)
## DATE ATM1 ATM2 ATM3 ATM4
## 1 2009-05-01 96 107 0 777
## 2 2009-05-02 82 89 0 524
## 3 2009-05-03 85 90 0 793
## 4 2009-05-04 90 55 0 908
## 5 2009-05-05 99 79 0 53
## 6 2009-05-06 88 19 0 52
ggpairs(ATMDATA)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
FROM BELOW NOTE THAT ATM 1 AND ATM2 APPEAR SIMILAR IN CHARACTERSTICS. BOTH SEASONALITY AND RANGE OF CASH WITHDRAWL. NOTE THAT ATM3 DID NOT HAVE MUCH ACTIVITY IT APPEARS UNTIL WAY LATER DATES NOTE THAT ATM4 HAS A BIT SIMILAR IN CHARACTERISTICS WITH ATM1 ATM2. THERE WAS NOTEABLE WITHDRAWAL IN CASH >9000. THIS IN THEORY IS OUTLIER
ATM_DF <- ATM_DF[1:(dim(ATM_DF)[1] - 14),]
atm_x <- ts(ATM_DF %>% select(ATM1:ATM4), frequency=7, end = nrow(ATM_DF) - 14)
autoplot(atm_x, facet = TRUE)
AFTER SEEING ABOVE STATISTICS I WILL GO AHEAD AND ONE BY ONE (INDIVIDUAL) APPROACH
ATM1_x <- ts(ATM_DF[, "ATM1"], frequency = 7)
ATM2_x <- ts(ATM_DF[, "ATM2"] , frequency = 7)
ATM3_x <- ts(ATM_DF[, "ATM3"], frequency = 7)
ATM4_x <- ts(ATM_DF[, "ATM4"], frequency = 7)
ggtsdisplay(ATM1_x , poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")
ggtsdisplay(ATM2_x, poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")
ggtsdisplay(ATM3_x, poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")
ggtsdisplay(ATM4_x, poinx = FALSE, main = "ATM WITHDRAWLS", xlab = "Day", ylab = "Amount of Cash")
A box plot is a highly visually effective way of viewing a clear summary of one or more sets of data. It is particularly useful for quickly summarizing and comparing different sets of results from different experiments. At a glance, a box plot allows a graphical display of the distribution of results and provides indications of symmetry within the data.
par(mfrow=c(4,1))
for (i in 2:5) {
print(summary(ATM_DF[i]))
boxplot(ATM_DF[i], horizontal = TRUE)
}
## ATM1
## Min. : 1.00
## 1st Qu.: 73.00
## Median : 91.00
## Mean : 84.26
## 3rd Qu.:108.00
## Max. :180.00
## NA's :3
## ATM2
## Min. : 0.00
## 1st Qu.: 26.00
## Median : 67.00
## Mean : 62.65
## 3rd Qu.: 93.00
## Max. :147.00
## NA's :2
## ATM3
## Min. :0
## 1st Qu.:0
## Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
## ATM4
## Min. : 2.0
## 1st Qu.: 124.5
## Median : 412.0
## Mean : 480.8
## 3rd Qu.: 710.5
## Max. :10920.0
ATM 1
below note to include diff and transformation(boxcox)
ATM1_LMB <- BoxCox.lambda(ATM1_x)
ATM1_BXCX <- ATM1_x %>% BoxCox(ATM1_LMB)
ATM1_BXCX_DIFF <- ATM1_BXCX %>% diff(lag=7)
ggtsdisplay(ATM1_BXCX_DIFF, points = FALSE, main = "including differencing and transf. ", xlab = "DAY", ylab = "Amounts of Cash")
ATM 2
ggtsdisplay(ATM2_x, points = FALSE, main = "ATM #2 WITHDRAW", xlab = "Day", ylab = "Amounts of Cash")
ATM2_LMB <- BoxCox.lambda(ATM2_x)
ATM2_BXCX <- ATM2_x %>% BoxCox(ATM2_LMB)
ATM2_BXCX_DIFF <- ATM2_BXCX %>% diff(lag=7)
ggtsdisplay(ATM2_BXCX_DIFF, points = FALSE, main = "with transformation and differencing", xlab = "Day", ylab = "Cash Amounts")
ATM 3
ggtsdisplay(ATM3_x, points = FALSE, main = "ATM #3 WITHDRAWLS", xlab = "Day", ylab = "Amounts OF CASH")
ATM 4
Given that all cash withdrawals from ATM4 was large, we will transform below
ggtsdisplay(ATM4_x, points = FALSE, main = "ATM#4 Withdrawls", xlab = "Day", ylab = "Amounts of Cash")
MODEL THE DATA ATM 1 and ATM2
ATM1_LMB <- BoxCox.lambda(ATM1_x)
ATM2_LMB <- BoxCox.lambda(ATM2_x)
ATM1_ARM <- auto.arima(ATM1_x)
ATM2_ARM <- auto.arima(ATM2_x)
# FOR ATM3 apparently no trend or seasonality. Lets use the mean and naive model.
ATM3_MN <- meanf(ATM3_x, h = 14)
ATM3_NV <- naive(ATM3_x, h = 14)
# FOR ATM4 i will be using for first time auto arima
ATM4_LMB <- BoxCox.lambda(ATM4_x)
ATM4_ARM <- auto.arima(ATM4_x)
LETS CHECK RESIDUALS
checkresiduals(ATM1_ARM)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(2,1,0)[7]
## Q* = 18.177, df = 11, p-value = 0.07756
##
## Model df: 3. Total lags used: 14
checkresiduals(ATM2_ARM)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0)(2,1,0)[7]
## Q* = 28.161, df = 12, p-value = 0.005239
##
## Model df: 2. Total lags used: 14
checkresiduals(ATM3_MN)
##
## Ljung-Box test
##
## data: Residuals from Mean
## Q* = NaN, df = 13, p-value = NA
##
## Model df: 1. Total lags used: 14
checkresiduals(ATM4_ARM)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 4.5349, df = 13, p-value = 0.9841
##
## Model df: 1. Total lags used: 14
LETS CHECK SUMMARIES
summary(ATM1_ARM)
## Series: ATM1_x
## ARIMA(1,0,0)(2,1,0)[7]
##
## Coefficients:
## ar1 sar1 sar2
## 0.1539 -0.4693 -0.2054
## s.e. 0.0546 0.0527 0.0523
##
## sigma^2 = 626.2: log likelihood = -1581.73
## AIC=3171.46 AICc=3171.57 BIC=3186.82
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0563155 24.66123 15.62939 -102.0602 118.9276 0.865596
## ACF1
## Training set 0.009681303
summary(ATM2_ARM)
## Series: ATM2_x
## ARIMA(0,0,0)(2,1,0)[7]
##
## Coefficients:
## sar1 sar2
## -0.5702 -0.1700
## s.e. 0.0531 0.0533
##
## sigma^2 = 669.4: log likelihood = -1598.34
## AIC=3202.68 AICc=3202.75 BIC=3214.2
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.1915631 25.53727 17.48062 -Inf Inf 0.8411279 0.01916964
autoplot(ATM3_x) +
autolayer(ATM3_NV, series = "Naive", PI = FALSE) +
autolayer(ATM3_MN, series = "Average", PI = FALSE)
summary(ATM4_ARM)
## Series: ATM4_x
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## 480.7635
## s.e. 35.2553
##
## sigma^2 = 437517: log likelihood = -2777.09
## AIC=5558.19 AICc=5558.22 BIC=5565.91
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.00000000000008229916 660.5081 328.3757 -578.5926 608.9941
## MASE ACF1
## Training set 0.8101007 -0.009857496
LETS FIT THE MODEL ATM1 FORECASTING
ATM1_FT<-Arima(ATM1_x, order = c(1, 0, 0), seasonal = c(2, 1, 0), lambda = ATM1_LMB)
ATM1_FCST<-forecast(ATM1_FT, 31)
autoplot(ATM1_x) +
autolayer(ATM1_FCST, series = "ATM 1 ARIMA FORECAST", PI = FALSE)
ATM2 FORECASTING
ATM2_FT<-Arima(ATM2_x, order = c(1, 0, 0), seasonal = c(2, 1, 0), lambda = ATM2_LMB)
ATM2_FCST<-forecast(ATM2_FT, 31)
autoplot(ATM2_x) +
autolayer(ATM2_FCST, series = "ATM 2 FORECAST ARIMA ", PI = FALSE)
ATM3 FORECASTING
ATM3_NV <- naive(ATM3_x, h = 31)
autoplot(ATM3_x) +
autolayer(ATM3_NV, series = "ATM3 NAIVE FRCSTING ", PI = FALSE)
ATM4 FORECASTING
ATM4_FT<-Arima(ATM4_x, order = c(0, 0, 0), lambda = ATM4_LMB)
ATM4_FCST<-forecast(ATM4_FT, 31)
autoplot(ATM4_x) +
autolayer(ATM4_FCST, series = "ATM4 AUT ARM FORECAST ", PI = FALSE)
CONCLUSION we will export using below the forecast rslts for all ATMS
names(ATM_DF)[-1]
## [1] "ATM1" "ATM2" "ATM3" "ATM4"
max(ATM_DF$DATE)
## [1] "2010-04-16"
RSLTS <- data_frame(DATE = rep(max(ATM_DF$DATE) + 1:31, 4), ATM = rep(names(ATM_DF)[-1], each = 31), Cash = c(ATM1_FCST$mean, ATM2_FCST$mean,ATM3_NV$mean, ATM4_FCST$mean))
head(RSLTS)
## # A tibble: 6 x 3
## DATE ATM Cash
## <date> <chr> <dbl>
## 1 2010-04-17 ATM1 91.9
## 2 2010-04-18 ATM1 102.
## 3 2010-04-19 ATM1 65.6
## 4 2010-04-20 ATM1 5.43
## 5 2010-04-21 ATM1 109.
## 6 2010-04-22 ATM1 82.7
write.csv(RSLTS,"D:/CUNY SPS/Spring 2022/DATA 624/rslts.csv", row.names = FALSE)
Part B consisx of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
head(RCFL)
## CaseSequence YYYY.MMM KWH
## 1 733 1998-Jan 6862583
## 2 734 1998-Feb 5838198
## 3 735 1998-Mar 5420658
## 4 736 1998-Apr 5010364
## 5 737 1998-May 4665377
## 6 738 1998-Jun 6467147
summary(RCFL)
## CaseSequence YYYY.MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
LOCATING THE MISSING VALUES
which(is.na(RCFL), arr.ind=TRUE)
## row col
## [1,] 129 3
# aka 861 - 2008-Sep
slice(RCFL,c(127:132))
## CaseSequence YYYY.MMM KWH
## 1 859 2008-Jul 7643987
## 2 860 2008-Aug 8037137
## 3 861 2008-Sep NA
## 4 862 2008-Oct 5101803
## 5 863 2008-Nov 4555602
## 6 864 2008-Dec 6442746
#HERE YOU CAN WITNESS THE MISSING (NA) DATA
ONCE AGAIN AS YOU CAN SEE IT APPEARS MISSING DATA FOR 2008 SEPT
EXPLORING FURTHER THE DATA
RCFL <- RCFL %>% rename(Date = 'YYYY.MMM')
RCFL_data <- RCFL %>% select(-CaseSequence)
RCFL_data <- RCFL_data %>% mutate(Date = as.Date(paste0('01-', Date), '%d-%Y-%b'))
ggplot(RCFL_data, aes(Date, KWH)) +geom_line() + ggtitle('POWER USAGE OF RESIDENTIAL')
AS YOU CAN SEE FROM ABOVE CHART AND BELOW STAT PULL, VERY LOW DECLINE OF USAGE NEAR 2010
min(RCFL_data$KWH,na.rm = TRUE)
## [1] 770523
RCFL_2 <-ts(RCFL[, "KWH"], start = c(1998, 1), frequency = 12)
ggseasonplot(RCFL_2)+ggtitle('USAGE BY YEAR FOR RESIDENTIAL POWER')
BEING THAT THE DATE APPEARS SEASONAL, I THNINK WE COULD USE MEAN VALUE OF THE MONTHS JUNE / NOV IN ORDER TO HANDLE MISSING
RCFL_data<- RCFL_data[-c(129,151),]
#Get average by month
RCFL_data$Month <- months(RCFL_data$Date)
aggregate(KWH ~ Month, RCFL_data, mean)
## Month KWH
## 1 April 5299734
## 2 August 8298211
## 3 December 6283175
## 4 February 6946556
## 5 January 8007422
## 6 July 7852521
## 7 June 6536092
## 8 March 5971450
## 9 May 5039034
## 10 November 4953619
## 11 October 5657164
## 12 September 7702333
RCFL$KWH[is.na(RCFL$KWH)] = median(RCFL$KWH, na.rm=TRUE)
summary(RCFL)
## CaseSequence Date KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5434539
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6501333
## 3rd Qu.:876.2 3rd Qu.: 7608792
## Max. :924.0 Max. :10655730
RCFL_ts <- ts(RCFL$KWH, start=c(1998,1), frequency = 12)
RCFL_ts
## Jan Feb Mar Apr May Jun Jul Aug
## 1998 6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428
## 1999 7183759 5759262 4847656 5306592 4426794 5500901 7444416 7564391
## 2000 7068296 5876083 4807961 4873080 5050891 7092865 6862662 7517830
## 2001 7538529 6602448 5779180 4835210 4787904 6283324 7855129 8450717
## 2002 7099063 6413429 5839514 5371604 5439166 5850383 7039702 8058748
## 2003 7256079 6190517 6120626 4885643 5296096 6051571 6900676 8476499
## 2004 7584596 6560742 6526586 4831688 4878262 6421614 7307931 7309774
## 2005 8225477 6564338 5581725 5563071 4453983 5900212 8337998 7786659
## 2006 7793358 5914945 5819734 5255988 4740588 7052275 7945564 8241110
## 2007 8031295 7928337 6443170 4841979 4862847 5022647 6426220 7447146
## 2008 7964293 7597060 6085644 5352359 4608528 6548439 7643987 8037137
## 2009 8072330 6976800 5691452 5531616 5264439 5804433 7713260 8350517
## 2010 9397357 8390677 7347915 5776131 4919289 6696292 770523 7922701
## 2011 8394747 8898062 6356903 5685227 5506308 8037779 10093343 10308076
## 2012 8991267 7952204 6356961 5569828 5783598 7926956 8886851 9612423
## 2013 10655730 7681798 6517514 6105359 5940475 7920627 8415321 9080226
## Sep Oct Nov Dec
## 1998 6989888 6345620 4640410 4693479
## 1999 7899368 5358314 4436269 4419229
## 2000 8912169 5844352 5041769 6220334
## 2001 7112069 5242535 4461979 5240995
## 2002 8245227 5865014 4908979 5779958
## 2003 7791791 5344613 4913707 5756193
## 2004 6690366 5444948 4824940 5791208
## 2005 7057213 6694523 4313019 6181548
## 2006 7296355 5104799 4458429 6226214
## 2007 7666970 5785964 4907057 6047292
## 2008 6283324 5101803 4555602 6442746
## 2009 7583146 5566075 5339890 7089880
## 2010 7819472 5875917 4800733 6152583
## 2011 8943599 5603920 6154138 8273142
## 2012 7559148 5576852 5731899 6609694
## 2013 7968220 5759367 5769083 9606304
# Before Transformation
ggtsdisplay(RCFL_ts, main="Monthly Power Consumption before transform")
BOXCOX TRANSFORM
RCFLS_BXCX <- RCFL_ts %>% BoxCox(lambda= 'auto')
ggtsdisplay(RCFLS_BXCX, main='MONTHLY POWER CONSUMER BXCX')
INVESTIGATE THE DATA CLOSELY
ggseasonplot(RCFLS_BXCX)
summary(RCFLS_BXCX)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.80 38.69 39.42 39.47 40.41 42.20
HELPFUL STATS TO EXPLORE SEASONALITY OR PATTERNS
ggsubseriesplot(RCFLS_BXCX)
ggAcf(RCFLS_BXCX)
LETS UTILIZE A BOX TEST TO TAKE A CLOSER LOOK
Box.test(RCFLS_BXCX, type = c("Ljung-Box"))
##
## Box-Ljung test
##
## data: RCFLS_BXCX
## X-squared = 16.556, df = 1, p-value = 0.00004723
summary(RCFLS_BXCX)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.80 38.69 39.42 39.47 40.41 42.20
boxplot(RCFLS_BXCX~cycle(RCFLS_BXCX))
DIFFERENCING
print(paste0("Suggested # of diff: ", ndiffs(RCFLS_BXCX)))
## [1] "Suggested # of diff: 1"
print(paste0("DIFF REQUIRED (SEASIONAL): ", ndiffs(diff(RCFLS_BXCX, lag=12))))
## [1] "DIFF REQUIRED (SEASIONAL): 0"
RCFL_PWR_DIFF <- RCFLS_BXCX %>% diff(lag = 12)
ggtsdisplay(RCFL_PWR_DIFF, main= "Monthly power consumption BXCX AND DIFF")
LETS SEE A GRAPHIC FOR RES POWER USAGE BY YEAR
ggseasonplot(RCFL_PWR_DIFF,polar = TRUE)+ggtitle('Residential Power Usage by Year')
plot(RCFL_PWR_DIFF)
#LET SEE A MOVING AVG
autoplot(RCFL_PWR_DIFF, series="Data")+
autolayer(ma(RCFL_PWR_DIFF, 12), series = "12 MTH Moving Avg")+ ggtitle("2014 MVING AVG")
#stlf - etsmodel
RCFLS_STL <- stlf(RCFL_PWR_DIFF, damped=FALSE, s.window = "periodic", robust=TRUE, h = 12)
# forecast plot
autoplot(RCFLS_STL) + autolayer(fitted(RCFLS_STL))
#2 STL - DP AADN
#stlf - etsmodel estimation --- M, Ad, N is chosen.
RCFL_STL_DP <- stlf(RCFL_PWR_DIFF, damped=TRUE, s.window = "periodic", robust=TRUE, h = 12)
# forecast plot
autoplot(RCFL_STL_DP) + autolayer(fitted(RCFL_STL_DP))
#3 - ARIMA
# auto.arima
arima_model <- auto.arima(RCFL_PWR_DIFF)
# forecast values
arima_model <- forecast(arima_model, h=20)
# forecast plot
autoplot(arima_model) + autolayer(fitted(arima_model))
#4 - ETS MNM
RCFL_ETS<- ets(RCFL_PWR_DIFF)
# forecast plot
autoplot(forecast(RCFL_ETS
, h=12)) + autolayer(fitted(RCFL_ETS
))
RCFL_FCST_PWR_S <- ses(RCFL_PWR_DIFF, h=12)
autoplot(RCFL_FCST_PWR_S)+
autolayer(fitted(RCFL_FCST_PWR_S), series="Fitted")
#RCFL_FCST_PWR_S
accuracy(RCFLS_STL)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0005455413 1.348221 0.5931651 -1360.432 1605.179 0.5404898
## ACF1
## Training set 0.1194029
checkresiduals(RCFLS_STL)
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(A,N,N)
## Q* = 65.794, df = 22, p-value = 0.000002984
##
## Model df: 2. Total lags used: 24
summary(RCFLS_STL)
##
## Forecast method: STL + ETS(A,N,N)
##
## Model Information:
## ETS(A,N,N)
##
## Call:
## ets(y = na.interp(x), model = etsmodel, damped = FALSE, allow.multiplicative.trend = allow.multiplicative.trend)
##
## Smoothing parameters:
## alpha = 0.0001
##
## Initial states:
## l = 0.0713
##
## sigma: 1.3558
##
## AIC AICc BIC
## 1048.295 1048.432 1057.874
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0005455413 1.348221 0.5931651 -1360.432 1605.179 0.5404898
## ACF1
## Training set 0.1194029
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 0.17305758 -1.564437 1.910552 -2.484211 2.830326
## Feb 2014 0.06130592 -1.676189 1.798801 -2.595963 2.718575
## Mar 2014 0.12236178 -1.615133 1.859857 -2.534907 2.779631
## Apr 2014 0.08210098 -1.655394 1.819596 -2.575168 2.739370
## May 2014 0.12746350 -1.610031 1.864958 -2.529805 2.784732
## Jun 2014 0.14679543 -1.590699 1.884290 -2.510474 2.804064
## Jul 2014 -0.19208644 -1.929581 1.545408 -2.849355 2.465183
## Aug 2014 -0.02245609 -1.759951 1.715039 -2.679725 2.634813
## Sep 2014 0.16034813 -1.577147 1.897843 -2.496921 2.817617
## Oct 2014 0.05745299 -1.680042 1.794948 -2.599816 2.714722
## Nov 2014 0.08070004 -1.656795 1.818195 -2.576569 2.737969
## Dec 2014 0.05898262 -1.678512 1.796477 -2.598286 2.716252
accuracy(RCFL_STL_DP)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.007629573 1.349456 0.5947813 -1328.026 1577.668 0.5419625
## ACF1
## Training set 0.120852
checkresiduals(RCFL_STL_DP)
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(A,Ad,N)
## Q* = 65.735, df = 19, p-value = 0.0000004636
##
## Model df: 5. Total lags used: 24
summary(RCFL_STL_DP)
##
## Forecast method: STL + ETS(A,Ad,N)
##
## Model Information:
## ETS(A,Ad,N)
##
## Call:
## ets(y = na.interp(x), model = etsmodel, damped = TRUE, allow.multiplicative.trend = allow.multiplicative.trend)
##
## Smoothing parameters:
## alpha = 0.0001
## beta = 0.0001
## phi = 0.9622
##
## Initial states:
## l = 0.1056
## b = -0.0006
##
## sigma: 1.3686
##
## AIC AICc BIC
## 1054.625 1055.110 1073.783
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.007629573 1.349456 0.5947813 -1328.026 1577.668 0.5419625
## ACF1
## Training set 0.120852
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 0.17900301 -1.574926 1.932932 -2.503400 2.861406
## Feb 2014 0.06761419 -1.686315 1.821544 -2.614789 2.750018
## Mar 2014 0.12901915 -1.624910 1.882949 -2.553385 2.811423
## Apr 2014 0.08909425 -1.664835 1.843024 -2.593310 2.771498
## May 2014 0.13477996 -1.619150 1.888710 -2.547624 2.817184
## Jun 2014 0.15442285 -1.599507 1.908353 -2.527982 2.836828
## Jul 2014 -0.18415983 -1.938091 1.569771 -2.866565 2.498246
## Aug 2014 -0.01424159 -1.768173 1.739690 -2.696648 2.668165
## Sep 2014 0.16883961 -1.585092 1.922772 -2.513568 2.851247
## Oct 2014 0.06621097 -1.687722 1.820144 -2.616198 2.748620
## Nov 2014 0.08971444 -1.664219 1.843648 -2.592696 2.772125
## Dec 2014 0.06824374 -1.685691 1.822179 -2.614168 2.750656
accuracy(arima_model)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02554754 0.9666583 0.4541852 -1920.696 2160.894 0.4138519
## ACF1
## Training set 0.006469743
checkresiduals(arima_model)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,0,2)[12] with non-zero mean
## Q* = 6.8185, df = 20, p-value = 0.9972
##
## Model df: 4. Total lags used: 24
summary(arima_model)
##
## Forecast method: ARIMA(0,0,1)(0,0,2)[12] with non-zero mean
##
## Model Information:
## Series: RCFL_PWR_DIFF
## ARIMA(0,0,1)(0,0,2)[12] with non-zero mean
##
## Coefficients:
## ma1 sma1 sma2 mean
## 0.1195 -0.9264 0.0890 0.0616
## s.e. 0.0712 0.0810 0.0811 0.0198
##
## sigma^2 = 0.9557: log likelihood = -257.11
## AIC=524.23 AICc=524.57 BIC=540.19
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02554754 0.9666583 0.4541852 -1920.696 2160.894 0.4138519
## ACF1
## Training set 0.006469743
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 -0.71170867 -1.964897 0.5414796 -2.628295 1.2048774
## Feb 2014 0.20176196 -1.060338 1.4638615 -1.728453 2.1319766
## Mar 2014 0.10559912 -1.156500 1.3676986 -1.824616 2.0358137
## Apr 2014 -0.21986269 -1.481962 1.0422368 -2.150077 1.7103519
## May 2014 -0.33880539 -1.600905 0.9232941 -2.269020 1.5914092
## Jun 2014 -0.43704168 -1.699141 0.8250578 -2.367256 1.4931729
## Jul 2014 -1.07257955 -2.334679 0.1895200 -3.002794 0.8576351
## Aug 2014 0.06430436 -1.197795 1.3264039 -1.865910 1.9945190
## Sep 2014 0.14938779 -1.112712 1.4114873 -1.780827 2.0796024
## Oct 2014 0.22911025 -1.032989 1.4912097 -1.701104 2.1593249
## Nov 2014 -0.19797910 -1.460079 1.0641204 -2.128194 1.7322355
## Dec 2014 -1.54532103 -2.807419 -0.2832232 -3.475533 0.3848910
## Jan 2015 -0.02932622 -1.743669 1.6850167 -2.651187 2.5925349
## Feb 2015 0.05006537 -1.669873 1.7700036 -2.580353 2.6804839
## Mar 2015 0.05625769 -1.663681 1.7761959 -2.574161 2.6866762
## Apr 2015 0.08773329 -1.632205 1.8076715 -2.542685 2.7181518
## May 2015 0.10404245 -1.615896 1.8239807 -2.526376 2.7344610
## Jun 2015 0.11673378 -1.603204 1.8366720 -2.513685 2.7471523
## Jul 2015 0.18863975 -1.531298 1.9085780 -2.441779 2.8190583
## Aug 2015 0.06605819 -1.653880 1.7859964 -2.564360 2.6964767
accuracy(RCFL_ETS)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.00001229798 1.346981 0.6002461 -768.1263 994.3076 0.546942
## ACF1
## Training set 0.1202904
checkresiduals(RCFL_ETS)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 66.674, df = 22, p-value = 0.000002184
##
## Model df: 2. Total lags used: 24
summary(RCFL_ETS)
## ETS(A,N,N)
##
## Call:
## ets(y = RCFL_PWR_DIFF)
##
## Smoothing parameters:
## alpha = 0.0001
##
## Initial states:
## l = 0.0719
##
## sigma: 1.3545
##
## AIC AICc BIC
## 1047.964 1048.100 1057.543
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.00001229798 1.346981 0.6002461 -768.1263 994.3076 0.546942
## ACF1
## Training set 0.1202904
accuracy(RCFL_FCST_PWR_S)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0002402688 1.346981 0.6002376 -765.1024 991.1566 0.5469343
## ACF1
## Training set 0.1202903
checkresiduals(RCFL_FCST_PWR_S)
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 66.674, df = 22, p-value = 0.000002184
##
## Model df: 2. Total lags used: 24
summary(RCFL_FCST_PWR_S)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = RCFL_PWR_DIFF, h = 12)
##
## Smoothing parameters:
## alpha = 0.0001
##
## Initial states:
## l = 0.0716
##
## sigma: 1.3545
##
## AIC AICc BIC
## 1047.964 1048.100 1057.543
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0002402688 1.346981 0.6002376 -765.1024 991.1566 0.5469343
## ACF1
## Training set 0.1202903
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Feb 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Mar 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Apr 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## May 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Jun 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Jul 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Aug 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Sep 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Oct 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726455
## Nov 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726455
## Dec 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726455
If you look at ARIMA it based AIC it appears with best result. BIC dropped to 540. AIC it dropped to 524. RMSE has also dropped from 1.347 to 0.966. i think i’ll take ARIMA model on this one. I’ll go ahead and predict the values in csv as I am comfortable with the results of ARIMA.
rslts_2 <- forecast(arima_model, h=12)
rslts_fin <- data.frame(rslts_2)
write.csv(rslts_fin,"D:/CUNY SPS/Spring 2022/DATA 624/rslts_rcfl.csv", row.names = FALSE)