Tugas Individu 2 - Metode Peramalan Deret Waktu
Dataset deret waktu yang digunakan pada penanganan regresi dengan peubah lag kali ini merupakan data kinerja suatu perusahaan dalam periode 2015-2020 yang dihimpun dari situs kaggle.com. Dataset ini memiliki 4 (empat) peubah dengan rincian sebagai berikut,
- Revenues (peubah respon): nominal pendapatan perusahaan
- Sales_quantity (peubah penjelas): kuantitas penjualan perusahaan
- Average_cost (peubah penjelas): rata-rata biaya yang dikeluarkan untuk proses produksi
- The_average_annual_payroll_of_the_region (peubah penjelas): rata-rata jumlah pegawai per tahunnya
Pada penugasan kali ini, peubah penjelas yang digunakan adalah peubah Average_cost karena hubungannya dengan peubah respon Revenues memiliki kesesuaian dengan konsep lag yang diangkat.
Pra-Processing Data
Untuk melakukan proses analisis regresi dengan peubah lag, diperlukan beberapa library untuk menunjang proses analisis data sebagai berikut.
library(dLagM)
library(dynlm)
library(MLmetrics)
library(lmtest)
library(car)
library(readxl)Pemanggilan data yang berformat .xlsx, proses
import data menuju R dapat dilakukan menggunakan fungsi
read_excel() dengan rangkaian sintaks sebagai berikut.
setwd("D:/DATA AKMAL/IPB/Semester 5/Metode Peramalan Deret Waktu/Pertemuan 4")
data <- read.csv("Month_Value_1.csv")
data <- data[1:64,-1]
knitr::kable(head(data, align = "c", n = 6))| Revenue | Sales_quantity | Average_cost | The_average_annual_payroll_of_the_region |
|---|---|---|---|
| 16010072 | 12729 | 1257.764 | 30024676 |
| 15807587 | 11636 | 1358.507 | 30024676 |
| 22047146 | 15922 | 1384.697 | 30024676 |
| 18814583 | 15227 | 1235.607 | 30024676 |
| 14021480 | 8620 | 1626.622 | 30024676 |
| 16783929 | 13160 | 1275.375 | 30024676 |
Langkah selanjutnya dalam proses pre-processing data adalah
membuat kedua peubah yang menjadi peubah respon dan penjelas menjadi
data deret waktu dengan fungsi ts()dengan rangkaian sintaks
sebagai berikut.
revenue.ts <- ts(data$Revenue, frequency = 12, start = 2015)
avg_cost.ts <- ts(data$Average_cost, frequency = 12, start = 2015)Subsetting Data Training dan Testing
Kemudian, untuk keperluan peramalan data deret waktu, dataset perlu
dibagi menjadi data training dan testing dengan
proporsi optimum 80%-20%. Setelah dilakukan subsetting data,
tak lupa kedua data tersebut diubah menjadi data berformat deret waktu
dengan fungsi ts().
train <- data[1:50,]
test <- data[51:64,]
train.ts <- ts(train)
test.ts <- ts(test)
data.ts <- ts(data)Eksplorasi Data
plot(revenue.ts, main = "Pendapatan Perusahaan Periode 2015-2020",
xlab = "Tahun", ylab="Pendapatan")
points(revenue.ts)Berdasarkan plot di atas, dapat diamati bahwa pendapatan perusahaan selama periode 2015-2020 relatif meningkat dalam tren yang naik-turun setiap tahunnya.
plot(avg_cost.ts, main = "Rata-Rata Pengeluaran Perusahaan Periode 2015-2020",
xlab = "Tahun", ylab="Pengeluaran")
points(avg_cost.ts)Berdasarkan plot di atas, dapat diamati bahwa rata-rata pengeluaran perusahaan selama periode 2015-2020 cenderung stasioner pada nilai tengahnya yang diperkirakan berada pada interval 1750-2000. Hal ini menunjukkan bahwa selama lima tahun, perusahaan mampu bertahan dengan anggaran yang relatif sama.
plot(data$Average_cost, data$Revenue)cor(data$Average_cost, data$Revenue)## [1] 0.3582778
Berdasarkan plot tebaran di atas, dapat diamati bahwa pendapatan perusahaan dan biaya proses produksi memiliki hubungan linear yang cenderung positif di antara keduanya. Hal ini divalidasi dengan nilai korelasi antara keduanya, yaitu sebesar 0.358.
Analisis Regresi dengan Peubah Lag
Metode Koyck
model.koyck = koyckDlm(x=train$Average_cost, y=train$Revenue)
summary(model.koyck)##
## Call:
## "Y ~ (Intercept) + Y.1 + X.t"
##
## Residuals:
## Min 1Q Median 3Q Max
## -25744527 -6994260 818653 6587933 26217991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.662e+07 7.910e+07 -0.337 0.738
## Y.1 8.118e-03 7.967e-01 0.010 0.992
## X.t 3.330e+04 6.087e+04 0.547 0.587
##
## Residual standard error: 11280000 on 46 degrees of freedom
## Multiple R-Squared: -0.3554, Adjusted R-squared: -0.4144
## Wald test: 3.421 on 2 and 46 DF, p-value: 0.04122
##
## Diagnostic tests:
## NULL
##
## alpha beta phi
## Geometric coefficients: -26839858 33302.71 0.008117793
AIC(model.koyck)## [1] 1735.374
BIC(model.koyck)## [1] 1742.941
Langkah selanjutnya adalah membuat ramalan menggunakan fungsi
forecast() dengan jumlah ramalan sejumlah 14 amatan, sesuai
dengan jumlah amatan pada data testing.
(fore.koyck <- forecast(model = model.koyck, x=test$Average_cost, h=14))## $forecasts
## [1] 27749534 22579872 37414237 37343193 43565870 58964250 43355502 29011769
## [9] 28709031 25011163 42539013 30706190 24776350 39046781
##
## $call
## forecast.koyckDlm(model = model.koyck, x = test$Average_cost,
## h = 14)
##
## attr(,"class")
## [1] "forecast.koyckDlm" "dLagM"
#mape data testing
mape.koyck <- MAPE(fore.koyck$forecasts, test$Revenue)
#akurasi data training
mape_train <- dLagM::GoF(model.koyck)["MAPE"]
c("MAPE_testing" = mape.koyck, "MAPE_training" = mape_train)## $MAPE_testing
## [1] 0.3063571
##
## $MAPE_training.MAPE
## [1] 0.3435776
Berdasarkan perhitungan MAPE di atas, dapat diamati bahwa nilai galat yang dihasilkan pada data forecasting sudah lebih kecil daripada pada data training.
Metode Distributed Regression dengan Lag = 2
model.dlm = dLagM::dlm(x = train$Average_cost,y = train$Revenue , q = 2)
summary(model.dlm)##
## Call:
## lm(formula = model.formula, data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15317705 -5396734 -496885 5101191 18224535
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3632502 10403854 -0.349 0.7286
## x.t 5715 4727 1.209 0.2331
## x.1 2858 4588 0.623 0.5365
## x.2 11217 4615 2.431 0.0192 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8607000 on 44 degrees of freedom
## Multiple R-squared: 0.2136, Adjusted R-squared: 0.16
## F-statistic: 3.985 on 3 and 44 DF, p-value: 0.01351
##
## AIC and BIC values for the model:
## AIC BIC
## 1 1674.978 1684.334
(fore.dlm <- forecast(model = model.dlm, x=test$Average_cost, h=14))## $forecasts
## [1] 27450086 28370694 29741644 29268432 35330509 38433872 39147319 40526658
## [9] 33975075 28536422 31169459 29376808 33261688 31184657
##
## $call
## forecast.dlm(model = model.dlm, x = test$Average_cost, h = 14)
##
## attr(,"class")
## [1] "forecast.dlm" "dLagM"
mape.dlm <- MAPE(fore.dlm$forecasts, test$Revenue)
mape_train <- GoF(model.dlm)["MAPE"]
c("MAPE_testing" = mape.dlm, "MAPE_training" = mape_train)## $MAPE_testing
## [1] 0.263047
##
## $MAPE_training.MAPE
## [1] 0.2592919
Berdasarkan perhitungan MAPE di atas, dapat diamati bahwa nilai galat yang dihasilkan pada data forecasting justru lebih besar daripada pada data training.
Metode Distributed Regression dengan Lag Optimum
Salah satu nilai yang dapat digunakan untuk menilai kebaikan suatu model statistik adalah AIC dan BIC, dengan semakin kecil nilai keduanya, semakin baik model yang digunakan. Untuk mencari nilai AIC dan BIC terkecil, terlebih dahulu perlu dilakukan pemodelan distributed regression hingga nilai lag = 4 dengan rangkaian sintaks sebagai berikut.
finiteDLMauto(formula = Revenue ~ Average_cost,
data = data.frame(train), q.min = 1, q.max = 4 ,
model.type = "dlm", error.type = "AIC", trace = TRUE)## q - k MASE AIC BIC GMRAE MBRAE R.Adj.Sq Ljung-Box
## 4 4 0.79181 1609.888 1622.689 0.84911 0.99368 0.11129 0.03009846
## 3 3 0.80827 1642.680 1653.780 0.86089 0.56441 0.13878 0.01302913
## 2 2 0.80760 1674.978 1684.334 0.91418 3.74233 0.16003 0.01588799
## 1 1 0.86372 1713.753 1721.321 0.92906 -13.58053 0.09021 0.10440318
Berdasarkan tabel di atas, dapat diamati bahwa nilai lag optimum adalah 4, dibuktikan dengan besar nilai AIC dan BIC yang terkecil di antara nilai lag lainnya.
model.dlm2 = dLagM::dlm(x = train$Average_cost,y = train$Revenue , q = 4)
summary(model.dlm2)##
## Call:
## lm(formula = model.formula, data = design)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16182230 -5693274 -771282 4822319 17168018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1572628 12411880 -0.127 0.8998
## x.t 5726 5096 1.124 0.2678
## x.1 2638 5024 0.525 0.6024
## x.2 10549 4978 2.119 0.0403 *
## x.3 -4304 4973 -0.865 0.3920
## x.4 4022 4936 0.815 0.4200
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8864000 on 40 degrees of freedom
## Multiple R-squared: 0.21, Adjusted R-squared: 0.1113
## F-statistic: 2.127 on 5 and 40 DF, p-value: 0.08195
##
## AIC and BIC values for the model:
## AIC BIC
## 1 1609.888 1622.689
(fore.dlm2 <- forecast(model = model.dlm2, x=test$Average_cost, h=14))## $forecasts
## [1] 28740087 28167880 29226392 29754291 35913446 36446169 38749564 39093249
## [9] 31719259 30452757 33056854 29525058 33590362 29081280
##
## $call
## forecast.dlm(model = model.dlm2, x = test$Average_cost, h = 14)
##
## attr(,"class")
## [1] "forecast.dlm" "dLagM"
mape.dlm2 <- MAPE(fore.dlm2$forecasts, test$Revenue)
mape_train <- GoF(model.dlm2)["MAPE"]
c("MAPE_testing" = mape.dlm2, "MAPE_training" = mape_train)## $MAPE_testing
## [1] 0.2640773
##
## $MAPE_training.MAPE
## [1] 0.258167
Serupa dengan metode distributed regression dengan lag sebesar 2, nilai galat yang dihasilkan pada data forecasting model dengan lag optimum pun justru lebih besar daripada pada data training.
Metode Autoregressive
model.ardl = dLagM::ardlDlm(x = train$Average_cost, y = train$Revenue, p = 1 , q = 1)
summary(model.ardl)##
## Time series regression with "ts" data:
## Start = 2, End = 50
##
## Call:
## dynlm(formula = as.formula(model.text), data = data, start = 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15491395 -6251524 -159722 4459865 19633114
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.430e+06 9.374e+06 0.686 0.4963
## X.t 4.867e+03 4.947e+03 0.984 0.3305
## X.1 2.811e+03 4.646e+03 0.605 0.5481
## Y.1 3.429e-01 1.504e-01 2.280 0.0274 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8664000 on 45 degrees of freedom
## Multiple R-squared: 0.2184, Adjusted R-squared: 0.1663
## F-statistic: 4.191 on 3 and 45 DF, p-value: 0.01065
AIC(model.ardl)## [1] 1710.397
BIC(model.ardl)## [1] 1719.856
(fore.ardl <- forecast(model = model.ardl, x=test$Average_cost, h=14))## $forecasts
## [1] 31615556 28993211 29837724 31355189 32768816 36021873 36133601 32765934
## [9] 30383780 29011202 30794551 31138213 29392529 30394024
##
## $call
## forecast.ardlDlm(model = model.ardl, x = test$Average_cost, h = 14)
##
## attr(,"class")
## [1] "forecast.ardlDlm" "dLagM"
mape.ardl <- MAPE(fore.ardl$forecasts, test$Revenue)
mape_train <- GoF(model.ardl)["MAPE"]
c("MAPE_testing" = mape.ardl, "MAPE_training" = mape_train)## $MAPE_testing
## [1] 0.2831208
##
## $MAPE_training.MAPE
## [1] 0.2570004
Serupa dengan metode sebelumnya, nilai galat yang dihasilkan pada data forecasting model dengan metode autoregressive pun justru lebih besar daripada pada data training.
#mencari parameter p dan q yang optimum
ardlBoundOrders(data = data.frame(train) , formula = Revenue ~ Average_cost )## $p
## Average_cost
## 1 13
##
## $q
## [1] 15
##
## $Stat.table
## q = 1 q = 2 q = 3 q = 4 q = 5 q = 6 q = 7 q = 8
## p = 1 1672.188 1618.870 1576.193 1541.730 1500.592 1465.635 1433.375 1382.430
## p = 2 1637.923 1608.958 1569.842 1534.850 1496.187 1459.523 1428.320 1371.196
## p = 3 1577.111 1577.111 1564.097 1530.088 1494.853 1457.027 1426.332 1373.117
## p = 4 1559.267 1527.503 1527.503 1528.626 1493.355 1456.793 1426.346 1375.113
## p = 5 1532.125 1500.919 1489.610 1489.610 1491.305 1453.871 1423.556 1364.804
## p = 6 1490.735 1468.258 1459.081 1457.490 1457.490 1455.571 1425.344 1366.483
## p = 7 1460.436 1432.407 1426.002 1423.473 1425.441 1425.441 1427.306 1368.008
## p = 8 1429.595 1401.224 1394.115 1395.286 1396.909 1396.613 1396.613 1354.625
## p = 9 1369.784 1353.898 1342.464 1340.461 1315.851 1317.851 1315.280 1315.280
## p = 10 1347.748 1328.039 1321.624 1317.350 1319.166 1314.841 1316.834 1285.091
## p = 11 1316.550 1291.513 1287.965 1287.658 1289.190 1279.700 1270.518 1238.876
## p = 12 1220.545 1209.930 1211.645 1213.462 1215.441 1217.283 1218.012 1218.733
## p = 13 1249.960 1223.482 1224.121 1225.073 1225.767 1213.829 1215.673 1189.093
## p = 14 1211.687 1182.445 1183.805 1185.804 1185.491 1169.792 1167.067 1153.241
## p = 15 1153.324 1152.971 1154.969 1154.546 1156.500 1140.202 1141.984 1122.553
## q = 9 q = 10 q = 11 q = 12 q = 13 q = 14 q = 15
## p = 1 1347.495 1315.712 1270.523 1238.350 1206.759 1176.743 1144.211
## p = 2 1339.046 1306.544 1268.802 1236.955 1205.729 1176.026 1142.419
## p = 3 1340.991 1308.524 1269.660 1235.464 1205.402 1175.740 1142.232
## p = 4 1342.890 1309.735 1271.445 1237.430 1206.985 1177.565 1144.208
## p = 5 1334.275 1303.064 1267.934 1233.001 1203.339 1172.238 1139.120
## p = 6 1335.113 1304.444 1269.624 1233.938 1204.597 1173.820 1137.576
## p = 7 1336.486 1304.213 1269.762 1234.705 1205.347 1175.194 1139.449
## p = 8 1321.251 1291.148 1256.958 1223.100 1194.041 1161.611 1114.097
## p = 9 1314.781 1282.649 1249.640 1218.220 1189.145 1157.252 1111.597
## p = 10 1285.091 1281.280 1240.161 1207.680 1174.247 1143.409 1100.290
## p = 11 1236.724 1236.724 1236.257 1204.814 1171.491 1142.906 1102.156
## p = 12 1220.715 1204.923 1204.923 1206.660 1168.818 1140.768 1103.960
## p = 13 1190.962 1168.237 1165.778 1165.778 1157.029 1127.388 1080.000
## p = 14 1134.149 1134.193 1131.170 1127.817 1127.817 1125.172 1081.044
## p = 15 1110.276 1104.572 1093.614 1086.555 1080.227 1080.227 NA
##
## $min.Stat
## [1] 1080
cons_lm1 <- dynlm(Revenue ~ Average_cost+L(Average_cost),data = train.ts)
cons_lm2 <- dynlm(Revenue ~ Average_cost+L(Revenue),data = train.ts)
cons_lm3 <- dynlm(Revenue ~ Average_cost+L(Average_cost)+L(Revenue),data = train.ts)
cons_lm4 <- dynlm(Revenue ~ Average_cost+L(Average_cost)+L(Average_cost,2),data = train.ts)summary(cons_lm1)##
## Time series regression with "ts" data:
## Start = 2, End = 50
##
## Call:
## dynlm(formula = Revenue ~ Average_cost + L(Average_cost), data = train.ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17016579 -5390827 -1246903 4447212 20393398
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4171490 9737567 0.428 0.6704
## Average_cost 9235 4765 1.938 0.0588 .
## L(Average_cost) 5705 4669 1.222 0.2280
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9050000 on 46 degrees of freedom
## Multiple R-squared: 0.1281, Adjusted R-squared: 0.09021
## F-statistic: 3.38 on 2 and 46 DF, p-value: 0.04271
deviance(cons_lm1)## [1] 3.767674e+15
summary(cons_lm2)##
## Time series regression with "ts" data:
## Start = 2, End = 50
##
## Call:
## dynlm(formula = Revenue ~ Average_cost + L(Revenue), data = train.ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14935159 -6580939 -335213 4536432 19310165
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.844e+06 7.434e+06 1.324 0.1920
## Average_cost 5.182e+03 4.886e+03 1.061 0.2944
## L(Revenue) 3.677e-01 1.437e-01 2.560 0.0138 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8604000 on 46 degrees of freedom
## Multiple R-squared: 0.212, Adjusted R-squared: 0.1778
## F-statistic: 6.189 on 2 and 46 DF, p-value: 0.004165
deviance(cons_lm2)## [1] 3.405029e+15
summary(cons_lm3)##
## Time series regression with "ts" data:
## Start = 2, End = 50
##
## Call:
## dynlm(formula = Revenue ~ Average_cost + L(Average_cost) + L(Revenue),
## data = train.ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15491395 -6251524 -159722 4459865 19633114
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.430e+06 9.374e+06 0.686 0.4963
## Average_cost 4.867e+03 4.947e+03 0.984 0.3305
## L(Average_cost) 2.811e+03 4.646e+03 0.605 0.5481
## L(Revenue) 3.429e-01 1.504e-01 2.280 0.0274 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8664000 on 45 degrees of freedom
## Multiple R-squared: 0.2184, Adjusted R-squared: 0.1663
## F-statistic: 4.191 on 3 and 45 DF, p-value: 0.01065
deviance(cons_lm3)## [1] 3.377547e+15
summary(cons_lm4)##
## Time series regression with "ts" data:
## Start = 3, End = 50
##
## Call:
## dynlm(formula = Revenue ~ Average_cost + L(Average_cost) + L(Average_cost,
## 2), data = train.ts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15317705 -5396734 -496885 5101191 18224535
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3632502 10403854 -0.349 0.7286
## Average_cost 5715 4727 1.209 0.2331
## L(Average_cost) 2858 4588 0.623 0.5365
## L(Average_cost, 2) 11217 4615 2.431 0.0192 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8607000 on 44 degrees of freedom
## Multiple R-squared: 0.2136, Adjusted R-squared: 0.16
## F-statistic: 3.985 on 3 and 44 DF, p-value: 0.01351
deviance(cons_lm4)## [1] 3.259543e+15
Uji Diagnostik Model
Uji Autokorelasi
if(require("lmtest")) encomptest(cons_lm1, cons_lm2)## Encompassing test
##
## Model 1: Revenue ~ Average_cost + L(Average_cost)
## Model 2: Revenue ~ Average_cost + L(Revenue)
## Model E: Revenue ~ Average_cost + L(Average_cost) + L(Revenue)
## Res.Df Df F Pr(>F)
## M1 vs. ME 45 -1 5.1978 0.02741 *
## M2 vs. ME 45 -1 0.3662 0.54815
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Proses pengujian statistik untuk mendeteksi adanya autokorelasi
menggunakan statistik uji Durbin-Watson dengan memanfaatkan fungsi
dwtest().
dwtest(cons_lm1)##
## Durbin-Watson test
##
## data: cons_lm1
## DW = 1.515, p-value = 0.03963
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(cons_lm2)##
## Durbin-Watson test
##
## data: cons_lm2
## DW = 2.246, p-value = 0.791
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(cons_lm3)##
## Durbin-Watson test
##
## data: cons_lm3
## DW = 2.2392, p-value = 0.7827
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(cons_lm4)##
## Durbin-Watson test
##
## data: cons_lm4
## DW = 1.2929, p-value = 0.005699
## alternative hypothesis: true autocorrelation is greater than 0
Uji Heterogenitas
Proses pengujian statistik untuk mendeteksi heterogenitas menggunakan
statistik uji Breusch-Pagan dengan memanfaatkan fungsi
bptest().
bptest(cons_lm1)##
## studentized Breusch-Pagan test
##
## data: cons_lm1
## BP = 0.049936, df = 2, p-value = 0.9753
bptest(cons_lm2)##
## studentized Breusch-Pagan test
##
## data: cons_lm2
## BP = 0.29828, df = 2, p-value = 0.8614
bptest(cons_lm3)##
## studentized Breusch-Pagan test
##
## data: cons_lm3
## BP = 1.0605, df = 3, p-value = 0.7866
bptest(cons_lm4)##
## studentized Breusch-Pagan test
##
## data: cons_lm4
## BP = 0.49279, df = 3, p-value = 0.9205
Uji Normalitas
Proses pengujian statistik untuk mendeteksi normalitas sisaan
menggunakan statistik uji Shapiro-Wilk dengan memanfaatkan fungsi
shapiro.test().
shapiro.test(residuals(cons_lm1))##
## Shapiro-Wilk normality test
##
## data: residuals(cons_lm1)
## W = 0.98332, p-value = 0.7091
shapiro.test(residuals(cons_lm2))##
## Shapiro-Wilk normality test
##
## data: residuals(cons_lm2)
## W = 0.96969, p-value = 0.2354
shapiro.test(residuals(cons_lm3))##
## Shapiro-Wilk normality test
##
## data: residuals(cons_lm3)
## W = 0.97402, p-value = 0.3473
shapiro.test(residuals(cons_lm4))##
## Shapiro-Wilk normality test
##
## data: residuals(cons_lm4)
## W = 0.98075, p-value = 0.6102
Perbandingan Keakuratan Model
Mean Absolute Percentage Error (MAPE)
akurasi <- matrix(c(mape.koyck, mape.dlm, mape.dlm2, mape.ardl))
row.names(akurasi)<- c("Koyck","DLM 1","DLM Optimum","Autoregressive")
colnames(akurasi) <- c("MAPE")
akurasi## MAPE
## Koyck 0.3063571
## DLM 1 0.2630470
## DLM Optimum 0.2640773
## Autoregressive 0.2831208
Secara sederhana, model statistik terbaik adalah model yang menghasilkan nilai error terkecil. Dengan demikian, berdasarkan perhitungan MAPE, model terbaik yang dihasilkan adalah model regresi lag dengan metode distributed lag regression.
Plot Forecasting
par(mfrow=c(1,1))
plot(1:nrow(test), test$Revenue, type="b", col="black", xlab="Periode waktu",
ylab = "Pendapatan", ylim = c(15000000,85000000),
main="Perbandingan Metode Ramalan vs Data Aktual")
points(1:nrow(test), test$Revenue,col="black",pch=19)
lines(1:nrow(test), test$Revenue,col="black")
points(1:nrow(test), fore.koyck$forecasts,col="blue",pch=19)
lines(1:nrow(test), fore.koyck$forecasts,col="blue")
points(1:nrow(test), fore.dlm$forecasts,col="red",pch=19)
lines(1:nrow(test), fore.dlm$forecasts,col="red")
points(1:nrow(test), fore.dlm2$forecasts,col="orange",pch=19)
lines(1:nrow(test), fore.dlm2$forecasts,col="orange")
points(1:nrow(test), fore.ardl$forecasts,col="green",pch=19)
lines(1:nrow(test), fore.ardl$forecasts,col="green")
legend("topleft",c("Data Aktual", "Koyck", "DLM = 2", "DLM Optimum", "Autoregressive"), lty=1, col=c("black", "blue", "red", "orange", "green"))par(mfrow=c(1,1))
plot(1:nrow(train), train$Revenue, type="b", col="black", xlim=c(1,nrow(train)+14),
ylab="Pendapatan",xlab="Periode waktu", ylim=c(10000000,60000000),
main="Perbandingan Metode Ramalan vs Data Aktual")
points((nrow(train)+1):(nrow(train)+14), test$Revenue,col="black")
lines((nrow(train)+1):(nrow(train)+14), test$Revenue,col="black")
points((nrow(train)+1):(nrow(train)+14), fore.koyck$forecasts,col="blue")
lines((nrow(train)+1):(nrow(train)+14), fore.koyck$forecasts,col="blue")
points((nrow(train)+1):(nrow(train)+14), fore.dlm$forecasts,col="red")
lines((nrow(train)+1):(nrow(train)+14), fore.dlm$forecasts,col="red")
points((nrow(train)+1):(nrow(train)+14), fore.dlm2$forecasts,col="orange")
lines((nrow(train)+1):(nrow(train)+14), fore.dlm2$forecasts,col="orange")
points((nrow(train)+1):(nrow(train)+14), fore.ardl$forecasts,col="green")
lines((nrow(train)+1):(nrow(train)+14), fore.ardl$forecasts,col="green")
legend("topleft",c("Data Aktual", "Koyck", "DLM", "DLM Optimum", "Autoregressive"), lty=1, col=c("black", "blue", "red", "orange", "green"))Berdasarkan plot di atas, dapat diamati bahwa model forecasting dengan metode distributed lag regression dan autoregressive menghasilkan ramalan yang tidak terlalu tajam. Sementara itu, hasil ramalan dengan metode Koyck menghasilkan data yang naik-turun dengan tajam, serupa dengan data aktual yang terjadi.
Daftar Pustaka
Aqibah M, Suciptawati NLP, Sumarjaya IW. 2020. Model dinamis autoregressive distributed lag (studi kasus: pengaruh kurs dolar amerika dan inflasi terhadap harga saham tahun 2014-2018). E-Jurnal Matematika. 9(4): 240-250.
Gujarati DN. 2004. Basic Econometrics. Fourth Edition. New York: The McGraw-Hill.
Gujarati DN. 2006. Essentials of Econometrics. Third Edition. New York: The McGraw-Hill.
Podsyp. 2015. Time series starter dataset. https://www.kaggle.com/datasets/podsyp/time-series-starter-dataset [diakses 17 September 2022]