In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain and demonstrate your process, techniques used and not used, and your actual forecast. I am giving you data via an excel file, please provide your written report on your findings, visuals, discussion and your R code via an RPubs link along with the actual.rmd file Also please submit the forecast which you will put in an Excel readable file.
Part B – Forecasting Power, ResidentialCustomerForecastLoad-624.xlsx
Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward. Add this to your existing files above.
start reading ATM dataset and split it into four datasets based on ATM #
atm_total <- readxl::read_excel('ATM624Data.xlsx')
# Check for missing values
colSums(is.na(atm_total))
## DATE ATM Cash
## 0 14 19
atm_total$DATE <- as.Date(atm_total$DATE, origin= "1899-12-30") # Converting into date
atm_total <- atm_total[!is.na(atm_total$ATM),]
atm_total <- atm_total[!is.na(atm_total$Cash),]
# missing values after removing
colSums(is.na(atm_total))
## DATE ATM Cash
## 0 0 0
# Splitting the data
atm1 <- atm_total %>% filter(ATM == 'ATM1')
atm2 <- atm_total %>% filter(ATM == 'ATM2')
atm3 <- atm_total %>% filter(ATM == 'ATM3')
atm4 <- atm_total %>% filter(ATM == 'ATM4')
atm1_ts <- ts(atm1$Cash, frequency = 7)
ggseasonplot(atm1_ts)
ggtsdisplay(atm1_ts, main="ATM 1 Transactions - Weekly")
series are converted into weekly data to see better picture of atm
transactions, there is some seasonality in the data with dip on Mondays
and Wednesdays mostly. The data is stationary as per the ACF model and
autoplot shows white noise so there is no need for transformation.
simple exponential smoothing is used mostly where there is no seasonality but in our dataset seasonality is not very strong but apply ses and see what the outcome is.
forecast_atm1_ses <- ses(atm1_ts)
autoplot(forecast_atm1_ses)+
autolayer(fitted(forecast_atm1_ses), series='Fitted')
summary(forecast_atm1_ses)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = atm1_ts)
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 83.7797
##
## sigma: 36.7094
##
## AIC AICc BIC
## 4745.365 4745.432 4757.040
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.07802025 36.60783 27.31876 -175.78 199.2779 1.451603 0.06365945
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.71429 83.78256 36.7376 130.8275 11.83350 155.7316
## 52.85714 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.00000 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.14286 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.28571 83.78256 36.7376 130.8275 11.83350 155.7316
## 53.42857 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.57143 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.71429 83.78256 36.7376 130.8275 11.83349 155.7316
## 53.85714 83.78256 36.7376 130.8275 11.83349 155.7316
## 54.00000 83.78256 36.7376 130.8275 11.83349 155.7316
forecast_atm1_ets <- ets(atm1_ts)
autoplot(forecast_atm1_ets)+
autolayer(forecast(forecast_atm1_ets), series="Forecasted")
summary(forecast_atm1_ets)
## ETS(A,N,A)
##
## Call:
## ets(y = atm1_ts)
##
## Smoothing parameters:
## alpha = 1e-04
## gamma = 0.415
##
## Initial states:
## l = 87.7888
## s = -54.0598 16.2223 21.048 -26.068 8.3991 35.2058
## -0.7474
##
## sigma: 26.384
##
## AIC AICc BIC
## 4513.138 4513.765 4552.055
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.5473075 26.05397 16.9637 -110.6377 126.3744 0.9013793 0.1374786
The model accuracy is slightly better if compare the values of RMSE, AIC and BIC as compared with simple exponential smoothing.
forecast_atm1_arima <- auto.arima(atm1_ts)
autoplot(forecast(forecast_atm1_arima))
checkresiduals(forecast_atm1_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(0,1,1)[7]
## Q* = 11.267, df = 12, p-value = 0.5062
##
## Model df: 2. Total lags used: 14
summary(forecast_atm1_arima)
## Series: atm1_ts
## ARIMA(0,0,1)(0,1,1)[7]
##
## Coefficients:
## ma1 sma1
## 0.1674 -0.5813
## s.e. 0.0565 0.0472
##
## sigma^2 = 666.6: log likelihood = -1658.32
## AIC=3322.63 AICc=3322.7 BIC=3334.25
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.09700089 25.49555 16.17552 -106.2792 122.4165 0.8594986
## ACF1
## Training set -0.01116544
ARIMA model gave the best result out of the rest with AIC and BIC dropped to around 3334 which around 4300 before. Even RMSE dropped with auto.arima(). Also, the data does not need transformation and is stationary as per the ACF plot.
atm2_ts <- ts(atm2$Cash, frequency = 7)
ggseasonplot(atm2_ts)
ggtsdisplay(atm2_ts, main="ATM 2 Transactions - Weekly")
ATM2 data is also stationary and there is no transformation required as
it is already white noise. ACF plot shows that the data is
stationary.
forecast_atm2_ses <- ses(atm2_ts)
autoplot(forecast_atm2_ses)+
autolayer(forecast(forecast_atm2_ses), series='Forecasted')
summary(forecast_atm2_ses)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = atm2_ts)
##
## Smoothing parameters:
## alpha = 0.016
##
## Initial states:
## l = 72.14
##
## sigma: 38.681
##
## AIC AICc BIC
## 4797.445 4797.512 4809.128
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -2.563596 38.57427 33.09136 -Inf Inf 1.539938 -0.02208177
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 52.85714 57.26713 7.695463 106.8388 -18.54619 133.0804
## 53.00000 57.26713 7.689132 106.8451 -18.55587 133.0901
## 53.14286 57.26713 7.682802 106.8514 -18.56555 133.0998
## 53.28571 57.26713 7.676473 106.8578 -18.57523 133.1095
## 53.42857 57.26713 7.670145 106.8641 -18.58491 133.1192
## 53.57143 57.26713 7.663817 106.8704 -18.59459 133.1288
## 53.71429 57.26713 7.657491 106.8768 -18.60426 133.1385
## 53.85714 57.26713 7.651165 106.8831 -18.61394 133.1482
## 54.00000 57.26713 7.644840 106.8894 -18.62361 133.1579
## 54.14286 57.26713 7.638515 106.8957 -18.63328 133.1675
The value of AIC and BIC are 4797 and 4809 respectively which I think is consistent with ATM1 data series.
forecast_atm2_ets <- ets(atm2_ts)
autoplot(forecast(forecast_atm2_ets))
checkresiduals(forecast_atm2_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 38.485, df = 14, p-value = 0.0004379
##
## Model df: 0. Total lags used: 14
summary(forecast_atm2_ets)
## ETS(A,N,A)
##
## Call:
## ets(y = atm2_ts)
##
## Smoothing parameters:
## alpha = 1e-04
## gamma = 0.4647
##
## Initial states:
## l = 76.1082
## s = 11.3136 -47.8461 -13.3718 -1.8486 18.6895 4.6056
## 28.4579
##
## sigma: 27.8902
##
## AIC AICc BIC
## 4566.882 4567.507 4605.826
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.6349959 27.54227 19.09383 -Inf Inf 0.8885496 -0.02291393
The data is already stationary and values of AIC and BIC have dropped to 4566 and 4605, which shows slight improvement.
forecast_atm2_arima <- auto.arima(atm2_ts)
autoplot(forecast(forecast_atm2_arima))
checkresiduals(forecast_atm2_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,2)(0,1,2)[7]
## Q* = 12.12, df = 8, p-value = 0.1459
##
## Model df: 6. Total lags used: 14
summary(forecast_atm2_arima)
## Series: atm2_ts
## ARIMA(2,0,2)(0,1,2)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## -0.4148 -0.8922 0.4008 0.7534 -0.7220 0.0801
## s.e. 0.0674 0.0510 0.1027 0.0688 0.0567 0.0550
##
## sigma^2 = 708.1: log likelihood = -1671.98
## AIC=3357.96 AICc=3358.28 BIC=3385.08
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.4583759 26.12942 18.09675 -Inf Inf 0.8421495 0.009277035
Data is already normal,ACF’s value dropped and Data is stationary, so there is no need of transformation. According to the summary results, AIC, BIC and RMSE have dropped significantly as compared with ETS and SES models so arima model performs better than the others.
atm3_ts <- ts(atm3$Cash, frequency = 7)
ggseasonplot(atm3_ts)
ggtsdisplay(atm3_ts, main="ATM 3 Transactions - Weekly")
A lot of the values in ATM3 are 0, not enough Data so we don’t need to
apply Simple Exponential Smoothing, ETS and ARIMA.
plot_ly(y = atm3$Cash, type='box', name="Cash Transactions - ATM3")
summary(atm3$Cash)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.7206 0.0000 96.0000
atm4_ts <- ts(atm4$Cash, frequency = 7)
ggseasonplot(atm4_ts)
ggtsdisplay(atm4_ts, main="ATM 4 Transactions - Weekly")
#### Simple Exponential Smoothing
forecast_atm4_ses <- ses(atm4_ts, lambda='auto')
autoplot(forecast_atm4_ses)+
autolayer(forecast(forecast_atm4_ses), series='Forecasted')
checkresiduals(forecast_atm4_ses)
##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 68.994, df = 14, p-value = 2.935e-09
##
## Model df: 0. Total lags used: 14
summary(forecast_atm4_ses)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = atm4_ts, lambda = "auto")
##
## Box-Cox transformation: lambda= -0.0737
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 4.4957
##
## sigma: 0.9807
##
## AIC AICc BIC
## 2143.231 2143.298 2154.931
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 238.6335 692.468 345.1077 -255.9182 328.5578 0.8587028
## ACF1
## Training set -0.009363934
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 53.14286 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.28571 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.42857 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.57143 235.4013 40.47784 1780.964 17.35251 5957.286
## 53.71429 235.4013 40.47783 1780.964 17.35251 5957.287
## 53.85714 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.00000 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.14286 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.28571 235.4013 40.47783 1780.965 17.35251 5957.287
## 54.42857 235.4013 40.47783 1780.965 17.35251 5957.287
ATM4 data was slighly abnormal so we used boxcox transformation to normalize the data. Data is stationary as we can see in ACF plot above and autoplot which shows data does not have autocorrelation. Before transformation AIC and BIC values were around 6500 which is dropped significantly down to around 2143 after boxcox transformation. RMSE has also improved.
forecast_atm4_ets <- ets(atm4_ts, lambda = 'auto')
autoplot(forecast(forecast_atm4_ets))
checkresiduals(forecast_atm4_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 18.809, df = 14, p-value = 0.1724
##
## Model df: 0. Total lags used: 14
summary(forecast_atm4_ets)
## ETS(A,N,A)
##
## Call:
## ets(y = atm4_ts, lambda = "auto")
##
## Box-Cox transformation: lambda= -0.0737
##
## Smoothing parameters:
## alpha = 0.0138
## gamma = 0.1424
##
## Initial states:
## l = 4.4335
## s = -1.3504 -0.0467 0.0246 0.3089 0.3808 0.2859
## 0.397
##
## sigma: 0.8982
##
## AIC AICc BIC
## 2085.957 2086.578 2124.956
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 177.0985 664.9255 308.4432 -271.5266 333.9661 0.7674735
## ACF1
## Training set -0.004544246
Model’s accuracy got slightly improved with AIC and BIC’s values came down to around 2085 and RMSE from 694 to 664.
forecast_atm4_arima <- auto.arima(atm4_ts, lambda = 'auto')
autoplot(forecast(forecast_atm4_arima))
checkresiduals(forecast_atm4_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Q* = 17.625, df = 12, p-value = 0.1275
##
## Model df: 2. Total lags used: 14
summary(forecast_atm4_arima)
## Series: atm4_ts
## ARIMA(0,0,0)(2,0,0)[7] with non-zero mean
## Box Cox transformation: lambda= -0.0737252
##
## Coefficients:
## sar1 sar2 mean
## 0.2487 0.1947 4.4864
## s.e. 0.0521 0.0525 0.0841
##
## sigma^2 = 0.8418: log likelihood = -485.59
## AIC=979.18 AICc=979.29 BIC=994.78
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 209.7821 678.9957 317.1943 -239.7645 304.0411 0.7892481
## ACF1
## Training set -0.005100991
This model seems to provide better results in terms of accuracy as AIC and BIC have dropped from around 2085 down to around 979 but RMSE did not improve. still go with ARIMA model as AIC and BIC are stronger criterion for model selection.
# ATM 1
data1 <- forecast(forecast_atm1_arima, h=5)
testdata1 <- data.frame(data1)
testdata1
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 52.71429 86.805607 53.71784 119.89337 36.20224 137.40898
## 52.85714 100.640560 67.09247 134.18865 49.33319 151.94793
## 53.00000 74.714560 41.16647 108.26265 23.40719 126.02193
## 53.14286 4.762635 -28.78545 38.31072 -46.54474 56.07001
## 53.28571 100.063192 66.51510 133.61128 48.75582 151.37057
writexl::write_xlsx(testdata1, "testdata1.xlsx")
# ATM 2
data2 <- forecast(forecast_atm2_arima, h=5)
testdata2 <- data.frame(data2)
testdata2
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 52.85714 69.432448 35.33003 103.53487 17.27730 121.58759
## 53.00000 69.881366 35.77560 103.98713 17.72111 122.04163
## 53.14286 12.308924 -22.09671 46.71456 -40.30996 64.92780
## 53.28571 2.758906 -31.72395 37.24176 -49.97807 55.49588
## 53.42857 99.519624 64.89886 134.14038 46.57174 152.46751
writexl::write_xlsx(testdata2, "testdata2.xlsx")
# ATM 4
data4 <- forecast(forecast_atm4_arima, h=5)
testdata4 <- data.frame(data4)
testdata4
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## 53.14286 90.23023 19.20879 517.3860 9.049301 1441.116
## 53.28571 275.70713 51.80185 1856.9627 23.087713 5748.043
## 53.42857 328.95851 60.54706 2275.9529 26.741696 7171.648
## 53.57143 72.60334 15.82001 404.1714 7.530804 1104.380
## 53.71429 303.15712 56.33393 2071.4372 24.985772 6473.369
writexl::write_xlsx(testdata4, "testdata4.xlsx")
Data had some missing values which were removed .Splitted the data into four datasets as required ATM1, ATM2, ATM3 AND ATM4. Aggregated the data into ts object on weekly basis and test Simple exponential, ETS and ARIMA. Arima performed best in all ATM1, ATM2 and ATM4. ATM3 did not have enough data to do analysis. Criteria for best model was based on AIC, BIC and RMSE. Forecasted the values for 5 weeks with exact value along with lower and upper values for 80% and 95% confidence interval and exported them in excel. As required, forecasted values are for May and first month of June 2010.
##Part B - Power Part B consists of a simple dataset of residential power usage for January 1998 until December 2013. Your assignment is to model these data and a monthly forecast for 2014. The data is given in a single file. The variable ‘KWH’ is power consumption in Kilowatt hours, the rest is straight forward.
power <- readxl::read_excel("ResidentialCustomerForecastLoad-624.xlsx")
power %>% kable() %>% kable_styling(full_width = FALSE)
| CaseSequence | YYYY-MMM | KWH |
|---|---|---|
| 733 | 1998-Jan | 6862583 |
| 734 | 1998-Feb | 5838198 |
| 735 | 1998-Mar | 5420658 |
| 736 | 1998-Apr | 5010364 |
| 737 | 1998-May | 4665377 |
| 738 | 1998-Jun | 6467147 |
| 739 | 1998-Jul | 8914755 |
| 740 | 1998-Aug | 8607428 |
| 741 | 1998-Sep | 6989888 |
| 742 | 1998-Oct | 6345620 |
| 743 | 1998-Nov | 4640410 |
| 744 | 1998-Dec | 4693479 |
| 745 | 1999-Jan | 7183759 |
| 746 | 1999-Feb | 5759262 |
| 747 | 1999-Mar | 4847656 |
| 748 | 1999-Apr | 5306592 |
| 749 | 1999-May | 4426794 |
| 750 | 1999-Jun | 5500901 |
| 751 | 1999-Jul | 7444416 |
| 752 | 1999-Aug | 7564391 |
| 753 | 1999-Sep | 7899368 |
| 754 | 1999-Oct | 5358314 |
| 755 | 1999-Nov | 4436269 |
| 756 | 1999-Dec | 4419229 |
| 757 | 2000-Jan | 7068296 |
| 758 | 2000-Feb | 5876083 |
| 759 | 2000-Mar | 4807961 |
| 760 | 2000-Apr | 4873080 |
| 761 | 2000-May | 5050891 |
| 762 | 2000-Jun | 7092865 |
| 763 | 2000-Jul | 6862662 |
| 764 | 2000-Aug | 7517830 |
| 765 | 2000-Sep | 8912169 |
| 766 | 2000-Oct | 5844352 |
| 767 | 2000-Nov | 5041769 |
| 768 | 2000-Dec | 6220334 |
| 769 | 2001-Jan | 7538529 |
| 770 | 2001-Feb | 6602448 |
| 771 | 2001-Mar | 5779180 |
| 772 | 2001-Apr | 4835210 |
| 773 | 2001-May | 4787904 |
| 774 | 2001-Jun | 6283324 |
| 775 | 2001-Jul | 7855129 |
| 776 | 2001-Aug | 8450717 |
| 777 | 2001-Sep | 7112069 |
| 778 | 2001-Oct | 5242535 |
| 779 | 2001-Nov | 4461979 |
| 780 | 2001-Dec | 5240995 |
| 781 | 2002-Jan | 7099063 |
| 782 | 2002-Feb | 6413429 |
| 783 | 2002-Mar | 5839514 |
| 784 | 2002-Apr | 5371604 |
| 785 | 2002-May | 5439166 |
| 786 | 2002-Jun | 5850383 |
| 787 | 2002-Jul | 7039702 |
| 788 | 2002-Aug | 8058748 |
| 789 | 2002-Sep | 8245227 |
| 790 | 2002-Oct | 5865014 |
| 791 | 2002-Nov | 4908979 |
| 792 | 2002-Dec | 5779958 |
| 793 | 2003-Jan | 7256079 |
| 794 | 2003-Feb | 6190517 |
| 795 | 2003-Mar | 6120626 |
| 796 | 2003-Apr | 4885643 |
| 797 | 2003-May | 5296096 |
| 798 | 2003-Jun | 6051571 |
| 799 | 2003-Jul | 6900676 |
| 800 | 2003-Aug | 8476499 |
| 801 | 2003-Sep | 7791791 |
| 802 | 2003-Oct | 5344613 |
| 803 | 2003-Nov | 4913707 |
| 804 | 2003-Dec | 5756193 |
| 805 | 2004-Jan | 7584596 |
| 806 | 2004-Feb | 6560742 |
| 807 | 2004-Mar | 6526586 |
| 808 | 2004-Apr | 4831688 |
| 809 | 2004-May | 4878262 |
| 810 | 2004-Jun | 6421614 |
| 811 | 2004-Jul | 7307931 |
| 812 | 2004-Aug | 7309774 |
| 813 | 2004-Sep | 6690366 |
| 814 | 2004-Oct | 5444948 |
| 815 | 2004-Nov | 4824940 |
| 816 | 2004-Dec | 5791208 |
| 817 | 2005-Jan | 8225477 |
| 818 | 2005-Feb | 6564338 |
| 819 | 2005-Mar | 5581725 |
| 820 | 2005-Apr | 5563071 |
| 821 | 2005-May | 4453983 |
| 822 | 2005-Jun | 5900212 |
| 823 | 2005-Jul | 8337998 |
| 824 | 2005-Aug | 7786659 |
| 825 | 2005-Sep | 7057213 |
| 826 | 2005-Oct | 6694523 |
| 827 | 2005-Nov | 4313019 |
| 828 | 2005-Dec | 6181548 |
| 829 | 2006-Jan | 7793358 |
| 830 | 2006-Feb | 5914945 |
| 831 | 2006-Mar | 5819734 |
| 832 | 2006-Apr | 5255988 |
| 833 | 2006-May | 4740588 |
| 834 | 2006-Jun | 7052275 |
| 835 | 2006-Jul | 7945564 |
| 836 | 2006-Aug | 8241110 |
| 837 | 2006-Sep | 7296355 |
| 838 | 2006-Oct | 5104799 |
| 839 | 2006-Nov | 4458429 |
| 840 | 2006-Dec | 6226214 |
| 841 | 2007-Jan | 8031295 |
| 842 | 2007-Feb | 7928337 |
| 843 | 2007-Mar | 6443170 |
| 844 | 2007-Apr | 4841979 |
| 845 | 2007-May | 4862847 |
| 846 | 2007-Jun | 5022647 |
| 847 | 2007-Jul | 6426220 |
| 848 | 2007-Aug | 7447146 |
| 849 | 2007-Sep | 7666970 |
| 850 | 2007-Oct | 5785964 |
| 851 | 2007-Nov | 4907057 |
| 852 | 2007-Dec | 6047292 |
| 853 | 2008-Jan | 7964293 |
| 854 | 2008-Feb | 7597060 |
| 855 | 2008-Mar | 6085644 |
| 856 | 2008-Apr | 5352359 |
| 857 | 2008-May | 4608528 |
| 858 | 2008-Jun | 6548439 |
| 859 | 2008-Jul | 7643987 |
| 860 | 2008-Aug | 8037137 |
| 861 | 2008-Sep | NA |
| 862 | 2008-Oct | 5101803 |
| 863 | 2008-Nov | 4555602 |
| 864 | 2008-Dec | 6442746 |
| 865 | 2009-Jan | 8072330 |
| 866 | 2009-Feb | 6976800 |
| 867 | 2009-Mar | 5691452 |
| 868 | 2009-Apr | 5531616 |
| 869 | 2009-May | 5264439 |
| 870 | 2009-Jun | 5804433 |
| 871 | 2009-Jul | 7713260 |
| 872 | 2009-Aug | 8350517 |
| 873 | 2009-Sep | 7583146 |
| 874 | 2009-Oct | 5566075 |
| 875 | 2009-Nov | 5339890 |
| 876 | 2009-Dec | 7089880 |
| 877 | 2010-Jan | 9397357 |
| 878 | 2010-Feb | 8390677 |
| 879 | 2010-Mar | 7347915 |
| 880 | 2010-Apr | 5776131 |
| 881 | 2010-May | 4919289 |
| 882 | 2010-Jun | 6696292 |
| 883 | 2010-Jul | 770523 |
| 884 | 2010-Aug | 7922701 |
| 885 | 2010-Sep | 7819472 |
| 886 | 2010-Oct | 5875917 |
| 887 | 2010-Nov | 4800733 |
| 888 | 2010-Dec | 6152583 |
| 889 | 2011-Jan | 8394747 |
| 890 | 2011-Feb | 8898062 |
| 891 | 2011-Mar | 6356903 |
| 892 | 2011-Apr | 5685227 |
| 893 | 2011-May | 5506308 |
| 894 | 2011-Jun | 8037779 |
| 895 | 2011-Jul | 10093343 |
| 896 | 2011-Aug | 10308076 |
| 897 | 2011-Sep | 8943599 |
| 898 | 2011-Oct | 5603920 |
| 899 | 2011-Nov | 6154138 |
| 900 | 2011-Dec | 8273142 |
| 901 | 2012-Jan | 8991267 |
| 902 | 2012-Feb | 7952204 |
| 903 | 2012-Mar | 6356961 |
| 904 | 2012-Apr | 5569828 |
| 905 | 2012-May | 5783598 |
| 906 | 2012-Jun | 7926956 |
| 907 | 2012-Jul | 8886851 |
| 908 | 2012-Aug | 9612423 |
| 909 | 2012-Sep | 7559148 |
| 910 | 2012-Oct | 5576852 |
| 911 | 2012-Nov | 5731899 |
| 912 | 2012-Dec | 6609694 |
| 913 | 2013-Jan | 10655730 |
| 914 | 2013-Feb | 7681798 |
| 915 | 2013-Mar | 6517514 |
| 916 | 2013-Apr | 6105359 |
| 917 | 2013-May | 5940475 |
| 918 | 2013-Jun | 7920627 |
| 919 | 2013-Jul | 8415321 |
| 920 | 2013-Aug | 9080226 |
| 921 | 2013-Sep | 7968220 |
| 922 | 2013-Oct | 5759367 |
| 923 | 2013-Nov | 5769083 |
| 924 | 2013-Dec | 9606304 |
summary(power)
## CaseSequence YYYY-MMM KWH
## Min. :733.0 Length:192 Min. : 770523
## 1st Qu.:780.8 Class :character 1st Qu.: 5429912
## Median :828.5 Mode :character Median : 6283324
## Mean :828.5 Mean : 6502475
## 3rd Qu.:876.2 3rd Qu.: 7620524
## Max. :924.0 Max. :10655730
## NA's :1
colSums(is.na(power))
## CaseSequence YYYY-MMM KWH
## 0 0 1
power$KWH[is.na(power$KWH)] = median(power$KWH, na.rm=TRUE)
# Converting the dataframe into timeseries object
power_ts <- ts(power$KWH, start=c(1998,1), frequency = 12)
power_ts
## Jan Feb Mar Apr May Jun Jul Aug
## 1998 6862583 5838198 5420658 5010364 4665377 6467147 8914755 8607428
## 1999 7183759 5759262 4847656 5306592 4426794 5500901 7444416 7564391
## 2000 7068296 5876083 4807961 4873080 5050891 7092865 6862662 7517830
## 2001 7538529 6602448 5779180 4835210 4787904 6283324 7855129 8450717
## 2002 7099063 6413429 5839514 5371604 5439166 5850383 7039702 8058748
## 2003 7256079 6190517 6120626 4885643 5296096 6051571 6900676 8476499
## 2004 7584596 6560742 6526586 4831688 4878262 6421614 7307931 7309774
## 2005 8225477 6564338 5581725 5563071 4453983 5900212 8337998 7786659
## 2006 7793358 5914945 5819734 5255988 4740588 7052275 7945564 8241110
## 2007 8031295 7928337 6443170 4841979 4862847 5022647 6426220 7447146
## 2008 7964293 7597060 6085644 5352359 4608528 6548439 7643987 8037137
## 2009 8072330 6976800 5691452 5531616 5264439 5804433 7713260 8350517
## 2010 9397357 8390677 7347915 5776131 4919289 6696292 770523 7922701
## 2011 8394747 8898062 6356903 5685227 5506308 8037779 10093343 10308076
## 2012 8991267 7952204 6356961 5569828 5783598 7926956 8886851 9612423
## 2013 10655730 7681798 6517514 6105359 5940475 7920627 8415321 9080226
## Sep Oct Nov Dec
## 1998 6989888 6345620 4640410 4693479
## 1999 7899368 5358314 4436269 4419229
## 2000 8912169 5844352 5041769 6220334
## 2001 7112069 5242535 4461979 5240995
## 2002 8245227 5865014 4908979 5779958
## 2003 7791791 5344613 4913707 5756193
## 2004 6690366 5444948 4824940 5791208
## 2005 7057213 6694523 4313019 6181548
## 2006 7296355 5104799 4458429 6226214
## 2007 7666970 5785964 4907057 6047292
## 2008 6283324 5101803 4555602 6442746
## 2009 7583146 5566075 5339890 7089880
## 2010 7819472 5875917 4800733 6152583
## 2011 8943599 5603920 6154138 8273142
## 2012 7559148 5576852 5731899 6609694
## 2013 7968220 5759367 5769083 9606304
ggtsdisplay(power_ts, main="Before Transformation, Monthly Power Consumption")
# Box Cox Transformation
power_boxcox <- power_ts %>% BoxCox(lambda= 'auto')
ggtsdisplay(power_boxcox, main='BoxCox Transformation, Monthly Power Consumption')
# Seasonality
ggseasonplot(power_boxcox)
BoxCox Transformation did not make much improvement especially the data
is still non-stationary therefore, we have to make it stationary before
creating any model or predictions. There is also seasonality in the data
which shows dip in April and May and then goes up again in June through
beginning of September. make differencing in below steps to make the
data stationary.
print(paste0("Suggested number of differencing is ", ndiffs(power_boxcox)))
## [1] "Suggested number of differencing is 1"
print(paste0("Seasonal differencing: How many more differencing required ? ", ndiffs(diff(power_boxcox, lag=12))))
## [1] "Seasonal differencing: How many more differencing required ? 0"
Looks like no seasonal differencing is required but overall 1 differencing may make the data stationary.
power_diff <- power_boxcox %>% diff(lag = 12)
ggtsdisplay(power_diff, main= "BoxCox Transformation and seasonal differencing - Monthly power consumption")
The data has become stationary now we can see in ACF plot at seasonal
differencing. There was a seasonality in the data which required us
seasonal differencing in order to make the data stationary.
print(paste0("Suggested differencing : ", ndiffs(power_diff)))
## [1] "Suggested differencing : 0"
Sincee the data is stationary, we can move forward with different forecasting techniques to see which one performs well and then predict for 2014 for each month.
autoplot(power_diff, series="Data")+
autolayer(ma(power_diff, 12), series = "12 Months Moving Average")+ ggtitle("12 Months Moving Average for 2014")
## Warning: Removed 12 rows containing missing values (`geom_line()`).
forecast_power_ses <- ses(power_diff, h=12)
autoplot(forecast_power_ses)+
autolayer(fitted(forecast_power_ses), series="Fitted")
summary(forecast_power_ses)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = power_diff, h = 12)
##
## Smoothing parameters:
## alpha = 1e-04
##
## Initial states:
## l = 0.0716
##
## sigma: 1.3545
##
## AIC AICc BIC
## 1047.964 1048.100 1057.543
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0002402688 1.346981 0.6002376 -765.1024 991.1566 0.5469343
## ACF1
## Training set 0.1202903
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Feb 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Mar 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Apr 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## May 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Jun 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Jul 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Aug 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Sep 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726454
## Oct 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726455
## Nov 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726455
## Dec 2014 0.07162989 -1.664267 1.807526 -2.583195 2.726455
Moving average still gives some value but simple exponential smoothing does not seem like it’s working as it’s constant line of 0.0713 (Transformed). Here, AIC is 1048 and BIC is 1057. RMSE is 1.347. Let’s do ARIMA.
forecast_power_arima <- auto.arima(power_diff)
autoplot(forecast(forecast_power_arima,12), main="Prediction for 2014 with ARIMA")
summary(forecast_power_arima)
## Series: power_diff
## ARIMA(0,0,1)(0,0,2)[12] with non-zero mean
##
## Coefficients:
## ma1 sma1 sma2 mean
## 0.1195 -0.9264 0.0890 0.0616
## s.e. 0.0712 0.0810 0.0811 0.0198
##
## sigma^2 = 0.9557: log likelihood = -257.11
## AIC=524.23 AICc=524.57 BIC=540.19
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02554754 0.9666583 0.4541852 -1920.696 2160.894 0.4138519
## ACF1
## Training set 0.006469743
Seems like ARIMA provided better results as AIC dropped to 524 and BIC dropped to 540. RMSE has also dropped from 1.347 to 0.966. I’ll select ARIMA model here and predict the values in excel.
pdata <- forecast(forecast_power_arima, h=12)
ptestdata1 <- data.frame(pdata)
writexl::write_xlsx(ptestdata1, "powerdata.xlsx")