Dive Deeper CM2
1 Library Setup
library(tidyverse)
library(MLmetrics)
library(lubridate)
library(TSstudio)
library(forecast)
library(tseries)
theme_set(theme_minimal())Gunakanlah data nybirth.csv. Buatlah model time series yang tepat untuk data tersebut, analisislah pola time series yang ada pada data tersebut, apakah multiplicative atau additive? Lakukanlah forecasting dan lihatlah model evaluationnya.
2 Data Preparation
2.1 Read Data
birth <- read.csv("data_input/nybirth.csv")birth <- birth %>%
mutate(date = ymd(date)) %>%
arrange(date)glimpse(birth)## Rows: 168
## Columns: 2
## $ date <date> 1946-01-01, 1946-02-01, 1946-03-01, 1946-04-01, 1946-05-01, 19~
## $ births <dbl> 26.663, 23.598, 26.931, 24.740, 25.806, 24.364, 24.477, 23.901,~
#check range from birthdate
range(birth$date)## [1] "1946-01-01" "1959-12-01"
2.2 Buatlah object ts
birth_ts <- ts(data = birth$births,
start = c(1946,1),
frequency = 12)2.3 Visualisasikan object ts
birth_ts %>%
autoplot()2.4 Splitting train test, untuk data train 12 tahun, data test 2 tahun
# test
birth_test <- tail(birth_ts, 24)
# train
birth_train <- head(birth_ts, -24)2.5 Lakukan decompose untuk data train dan visualisasikan untuk menentukan additive atau multiplicative
birth_train %>%
decompose() %>%
autoplot()3 Modelling Holt’s Winter Explonential
- model dengan nilai smoothing nya auto
- model dengan nilai smoothing dengan mendefinisikan nilai alpha beta dan gamma secara manual
# #Smoothing Auto
birth_hw_auto <- HoltWinters(x = birth_train)
# #Smoothing Manual
birth_hw_manual <- HoltWinters(x = birth_train,
alpha = 0.8, #recent observation
beta = 0.3, #trend
gamma = 0.05) #seasonal3.1 Plot HW
3.1.1 HW Auto
plot(birth_hw_auto)3.1.2 HW Manual
plot(birth_hw_manual)3.2 Lakukanlah forecasting untuk 2 tahun kedepan (24 bulan)
#Smoothing Auto
birth_forecast_auto <- forecast(birth_hw_auto, h=24)
#Smoothing Manual
birth_forecast_manual <- forecast(birth_hw_manual, h=24)3.3 Visualisasikan data train, test dan hasil forecastnya
3.3.1 Train
birth_train %>%
autoplot(lwd = 1)+
autolayer(birth_hw_auto$fitted[,1], lwd = 1, series = "Auto (a=0.8, b=0.01, y=1)")+
autolayer(birth_hw_manual$fitted[,1], lwd = 1, series = "Manual(a=0.8, b=0.2, y=0.05)")3.3.2 Test
birth_test %>%
autoplot(lwd = 1)+
autolayer(birth_forecast_auto$mean, lwd = 1, series = "Auto (a=0.8, b=0.01, y=1)")+
autolayer(birth_forecast_manual$mean, lwd = 1, series = "Manual(a=0.8, b=0.2, y=0.05)")3.3.3 Confidence Interval
birth_ts %>%
autoplot (lwd = 1)+
autolayer(birth_forecast_auto, lwd = 1, series = "Auto (a=0.8, b=0.01, y=1)")+
autolayer(birth_forecast_manual, lwd = 1, series = "Manual (a=0.8, b=0.2, y=0.05)")3.4 Model Evaluation
#Smoothing Auto
MAPE(y_pred = birth_forecast_auto$mean, y_true = birth_test) *100## [1] 4.674416
#Smoothing Manual
MAPE(y_pred = birth_forecast_manual$mean, y_true = birth_test) *100## [1] 3.511854
Model birth_hw_manual memiliki persentasi error sebesar 3.5%, lebih rendah dibandingkan dengan model birth_hw_auto. Jika mengacu pada besaran error antar 2 model tersebut, model birth_hw_manual merupakan model terbaik untuk melakukan forecast pada datasets Birth.
4 ARIMA (Autoregressive Integrated Moving Average)
4.1 Differencing
Untuk menghasilkan data yang stasioner, maka dilakukan differencing menggunakan function (diff())
birth_train %>%
diff(lag = 12) %>%
diff(lag = 1) %>%
adf.test()##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -6.5978, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Berdasarkan hasil Differencing sebanyak 1 kali, dihasilkan p-value < 0.05 (alpha).
d = 1
Berikut perbandingan visual data sebelum dan sesudah dilakukan differencing sebanyak 1 kali.
4.1.1 Before Differencing
birth_train %>%
autoplot()4.1.2 After Differencing
birth_train %>%
diff() %>%
autoplot()4.2 Model Fitting
Membuat plot menggunakan tsdisplay() untuk menentukan nilai p (PACF) dan q (ACF).
birth_train %>%
diff %>%
tsdisplay()Berdasarkan visualisasi PACF dan ACF di atas, maka dapat ditentukan nilai p dan q sebagai berikut:
- PACF : 1, 4
- ACF : 1, 2 (lag 5 melewati garis putus-putus biru, namun tidak di masukan untuk menimalkan jumlah model yang di buat.)
Kombinasi ARIMA (p,d,q):
- (1,1,1)
- (1,1,2)
- (4,1,1)
- (4,1,2)
#create model
birth_arima111 <- arima(x = birth_train, order = c(1,1,1))
birth_arima112 <- arima(x = birth_train, order = c(1,1,2))
birth_arima411 <- arima(x = birth_train, order = c(4,1,1))
birth_arima412 <- arima(x = birth_train, order = c(4,1,2))Membuat model ARIMA menggunakan fungsi auto.arima()
birth_auto_arima <- auto.arima(birth_train, seasonal = F)
birth_auto_arima## Series: birth_train
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.5559
## s.e. 0.0706
##
## sigma^2 estimated as 1.544: log likelihood=-233.67
## AIC=471.33 AICc=471.42 BIC=477.26
Nilai p,d,q yang dihasilkan oleh model birth_auto_arima adalah (1,1,0).
4.3 Goodness of Fit
birth_arima111$aic## [1] 473.0518
birth_arima112$aic## [1] 474.777
birth_arima411$aic## [1] 460.5939
birth_arima412$aic## [1] 461.6367
birth_auto_arima$aic## [1] 471.3316
Berdasarkan nilai AIC dari masing-masing model, model birth_arima411 memiliki AIC terkecil sebesar 460.59.
4.4 Model Evaluation
accuracy(birth_arima111)## ME RMSE MAE MPE MAPE MASE
## Training set 0.01222229 1.232849 0.9623417 -0.1312407 3.934987 0.8157662
## ACF1
## Training set 0.006544136
accuracy(birth_arima112)## ME RMSE MAE MPE MAPE MASE
## Training set 0.01011292 1.231642 0.9650349 -0.133472 3.942579 0.8180492
## ACF1
## Training set 0.002596069
accuracy(birth_arima411)## ME RMSE MAE MPE MAPE MASE
## Training set 0.05953305 1.152643 0.9238914 0.02838998 3.783571 0.7831723
## ACF1
## Training set 0.009414609
accuracy(birth_arima412)## ME RMSE MAE MPE MAPE MASE
## Training set 0.05753951 1.148794 0.9033971 0.02281373 3.702157 0.7657996
## ACF1
## Training set -0.001865113
accuracy(birth_auto_arima)## ME RMSE MAE MPE MAPE MASE
## Training set 0.01111 1.234064 0.9630305 -0.1324311 3.938786 1.012046
## ACF1
## Training set -0.01788515
Berdasarkan hasil evaluasi 5 model dengan menggunakan fungsi accuracy(), MAPE terendah sebesar 3.7% dihasilkan oleh model birth_arima412.
5 SARIMA (Seasonal ARIMA)
5.1 Differencing
Pembuatan object ts dan splitting data sudah dilakukan pada saat membuat model Holt-Winter, sehingga kita bisa langsung masuk ke dalam tahap Differencing.
birth_train %>%
diff(lag = 12) %>%
diff(lag =1) %>%
adf.test##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -6.5978, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Berdasarkan hasil Differencing sebanyak 1 kali pada index utama dan seasonal, dihasilkan p-value < 0.05 (alpha).
- d = 1
- D = 1
5.1.1 Before Differencing
birth_train %>%
autoplot()5.1.2 After Differencing
birth_train %>%
diff(lag = 12) %>%
diff(lag =1) %>%
autoplot()5.2 Model Fitting
Membuat plot menggunakan tsdisplay() untuk menentukan nilai p (PACF) dan q (ACF).
birth_train %>%
diff(lag = 12) %>%
diff(lag =1) %>%
tsdisplay(lag.max = 36)index utama :
- p : 2,3
- d : 1
- q : 2,3
seasonal :
- P : 1,2
- D : 1
- Q : 1
Kombinasi model yang terbentuk adalah :
ARIMA (2,1,2)(1,1,1)[12] sarima1
ARIMA (3,1,2)(1,1,1)[12] sarima2
ARIMA (2,1,3)(1,1,1)[12] sarima3
ARIMA (3,1,3)(1,1,1)[12] sarima4
ARIMA (2,1,2)(2,1,1)[12] sarima5
ARIMA (3,1,2)(2,1,1)[12] sarima6
ARIMA (2,1,3)(2,1,1)[12] sarima7
ARIMA (3,1,3)(2,1,1)[12] sarima8
# create model
sarima1 <- arima(x = birth_train, order = c(2,1,2), seasonal = c(1,1,1))
sarima2 <- arima(x = birth_train, order = c(3,1,2), seasonal = c(1,1,1))
sarima3 <- arima(x = birth_train, order = c(2,1,3), seasonal = c(1,1,1))
sarima4 <- arima(x = birth_train, order = c(3,1,3), seasonal = c(1,1,1))
sarima5 <- arima(x = birth_train, order = c(2,1,2), seasonal = c(2,1,1))
sarima6 <- arima(x = birth_train, order = c(3,1,2), seasonal = c(2,1,1))
sarima7 <- arima(x = birth_train, order = c(2,1,3), seasonal = c(2,1,1))
sarima8 <- arima(x = birth_train, order = c(3,1,3), seasonal = c(2,1,1))Membuat model ARIMA menggunakan fungsi auto.arima()
# create model
sarima_auto <- auto.arima(birth_train)
sarima_auto## Series: birth_train
## ARIMA(0,1,2)(0,1,2)[12]
##
## Coefficients:
## ma1 ma2 sma1 sma2
## -0.1054 -0.2263 -1.1221 0.3499
## s.e. 0.0883 0.0822 0.1095 0.1127
##
## sigma^2 estimated as 0.4161: log likelihood=-135.08
## AIC=280.15 AICc=280.63 BIC=294.53
Nilai (p,d,q)(P,D,Q)[F] yang dihasilkan oleh model birth_auto_arima adalah (0,1,2)(0,1,2)[12].
5.3 Goodness of Fit
sarima1$aic## [1] 283.9502
sarima2$aic## [1] 285.3937
sarima3$aic## [1] 285.4812
sarima4$aic## [1] 287.3461
sarima5$aic## [1] 281.1986
sarima6$aic## [1] 277.5694
sarima7$aic## [1] 281.5742
sarima8$aic## [1] 277.8386
sarima_auto$aic## [1] 280.1547
Berdasarkan nilai AIC dari masing-masing model, model sarima6 memiliki AIC terkecil sebesar 277.56.
5.4 Model Evaluation
accuracy(sarima1)## ME RMSE MAE MPE MAPE MASE
## Training set 0.1445184 0.604505 0.4600347 0.5880691 1.860877 0.3899662
## ACF1
## Training set -0.060737
accuracy(sarima2)## ME RMSE MAE MPE MAPE MASE
## Training set 0.1460272 0.6038102 0.457588 0.5932925 1.850959 0.3878922
## ACF1
## Training set -0.05345237
accuracy(sarima3)## ME RMSE MAE MPE MAPE MASE
## Training set 0.1492736 0.6036553 0.4552473 0.6074197 1.842082 0.385908
## ACF1
## Training set -0.05678732
accuracy(sarima4)## ME RMSE MAE MPE MAPE MASE
## Training set 0.1475945 0.6037679 0.456615 0.5999198 1.847341 0.3870674
## ACF1
## Training set -0.05458337
accuracy(sarima5)## ME RMSE MAE MPE MAPE MASE
## Training set 0.09770979 0.5786945 0.4387213 0.4022654 1.781933 0.3718991
## ACF1
## Training set -0.0948942
accuracy(sarima6)## ME RMSE MAE MPE MAPE MASE
## Training set 0.09476621 0.5693816 0.4233738 0.3806519 1.722433 0.3588892
## ACF1
## Training set -0.02140837
accuracy(sarima7)## ME RMSE MAE MPE MAPE MASE
## Training set 0.1168173 0.5847007 0.437083 0.4778213 1.775556 0.3705104
## ACF1
## Training set -0.01359231
accuracy(sarima8)## ME RMSE MAE MPE MAPE MASE
## Training set 0.123292 0.5692868 0.4284582 0.5079665 1.738566 0.3631992
## ACF1
## Training set -0.006509432
accuracy(sarima_auto)## ME RMSE MAE MPE MAPE MASE
## Training set 0.1329977 0.6057793 0.450476 0.5482743 1.825522 0.4734038
## ACF1
## Training set -0.01184833
Berdasarkan hasil evaluasi 9 model dengan menggunakan fungsi accuracy(), MAPE terendah sebesar 1.72% dihasilkan oleh model sarima6.
6 Forecasting
Berdasarkan pembuatan model ARIMA dan SARIMA, didapatkan 2 model dengan MAPE terendah yaitu:
- ARIMA :
birth_arima412 - SARIMA :
sarima6
Kita akan melakukan forecasting untuk 24 bulan kedepan menggunakan 2 model tersebut dan membandingkan hasilnya.
6.1 Forecasting model
forecast_arima <- forecast(birth_arima412, h=24)
forecast_sarima <- forecast(sarima6, h=24)6.2 Visualization
6.2.1 Whole Data
birth_ts %>%
autoplot(lwd = 1)+
autolayer(forecast_arima$mean, lwd = 1, series = "ARIMA")+
autolayer(forecast_sarima$mean, lwd = 1, series = "SARIMA")6.2.2 Data Test
birth_test %>%
autoplot(lwd = 1)+
autolayer(forecast_arima$mean, lwd = 1, series = "ARIMA")+
autolayer(forecast_sarima$mean, lwd = 1, series = "SARIMA")Berdasarkan hasil visualisasi, model SARIMA terlihat memiliki pola yang mirip dengan data birth_test, namun tingkat error yang dihasilkan cenderung besar.
6.3 Model Evaluation
MAPE(y_pred=forecast_arima$mean, birth_test)*100## [1] 4.430088
MAPE(y_pred=forecast_sarima$mean, birth_test)*100## [1] 8.472663
Berdasarkan pengecekan error menggunakan MAPE, model ARIMA memiliki error yang lebih rendah dibandingkan dengan model SARIMA.