Maitreya Zahwa - 5003231043
library(forecast)
library(tseries)
library(gridExtra)
library(tidyverse)
library(lmtest)
library(BETS)
library(Metrics)
library(readxl)
sim_data <- read_excel("Data_ARIMA.xlsx")
View(sim_data)
sim_train <- sim_data[1:47, ]
sim_test <- sim_data[48:93, ]
p1 <- ggAcf(sim_train$`pencarian "Indihome"`) + ggtitle(label = "")
p2 <- ggPacf(sim_train$`pencarian "Indihome"`) + ggtitle(label = "")
grid.arrange(p1, p2, nrow = 1, top = "ACF & PACF of original series")
Based on the ACF plot results, it is known that lags 1, 2, and 3 are significantly positive, then the ACF value decreases slowly towards zero. This pattern indicates a gradual decrease in autocorrelation. Such a pattern usually indicates the data has an AR model rather than an MA. This is supported by the PACF plot, where only lag 1 is significantly large, while subsequent lags immediately decrease and are insignificant. The PACF pattern, which cuts oA at lag 1, while the ACF decreases slowly, is a characteristic of the AR(1,0,0) model. Therefore, based on the results of the ADF, PACF, and ACF plots, it is feared that the following time series data has an ARIMA(1,0,0) model.
adf.test(sim_train$`pencarian "Indihome"`)
Augmented Dickey-Fuller Test
data: sim_train$`pencarian "Indihome"`
Dickey-Fuller = -4.1318, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
pp.test(sim_train$`pencarian "Indihome"`)
Phillips-Perron Unit Root Test
data: sim_train$`pencarian "Indihome"`
Dickey-Fuller Z(alpha) = -50.247, Truncation lag parameter = 3, p-value =
0.01
alternative hypothesis: stationary
Based on the ADF and PP test, it was found that the data was stationary so there was no need for diAerencing.
m1 <- Arima(y = sim_train$`pencarian "Indihome"`, order = c(1, 0, 0), include.drift = TRUE)
summary(m1)
Series: sim_train$`pencarian "Indihome"`
ARIMA(1,0,0) with drift
Coefficients:
ar1 intercept drift
0.3760 61.7512 -0.2010
s.e. 0.1014 1.9291 0.0398
sigma^2 = 31.71: log likelihood = -259.77
AIC=527.54 AICc=528.05 BIC=537.22
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.02943456 5.528426 3.783465 -0.9132091 6.810149 0.8317536 -0.02823317
The AR(1) coefficient = 0.3760 (p < 0.05) indicates that today’s search value is influenced by 37.6% of the previous day’s search value. Meanwhile, the average Indihome search scale is around 61.75. The error variance of 31.71 illustrates the large amount of noise variation that cannot be explained by the model.
coeftest(m1)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
ar1 0.37601 0.10136 3.7097 0.0002075 ***
intercept 61.75118 1.92907 32.0109 < 2.2e-16 ***
drift -0.20101 0.03975 -5.0568 4.264e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
t_test(m1)
checkresiduals(m1)
Ljung-Box test
data: Residuals from ARIMA(1,0,0) with drift
Q* = 6.2352, df = 9, p-value = 0.7162
Model df: 1. Total lags used: 10
Based on the residual plot, it is known that the errors are randomly distributed around zero without any particular pattern. This is evidenced by the Ljung–Box test, which produces a p-value of 0.7162, indicating there is no evidence that the residuals still exhibit serial dependence.
shapiro.test(m1$residuals)
Shapiro-Wilk normality test
data: m1$residuals
W = 0.81953, p-value = 1.121e-08
m1_forecast <- m1 %>% forecast(h = 10)
m1_forecast
forecast_ts <- ts(data.frame(forecast = m1_forecast$mean, actual = sim_test$`pencarian "Indihome"`))
autoplot(forecast_ts, ylab = "", main = "Testing Data Forecast")
rmse(actual = sim_test$`pencarian "Indihome"`, predicted = m1_forecast$mean)
[1] 5.837716
The RMSE value for the testing data is 5.84, indicating that the prediction error is approximately 5.8 units from the actual search scale. This is a relatively small value and indicates that the model has fairly good predictive ability.
mape(actual = sim_test$`pencarian "Indihome"`, predicted = m1_forecast$mean)
[1] 0.09475074
smape(actual = sim_test$`pencarian "Indihome"`, predicted = m1_forecast$mean)
[1] 0.1015342