Econometric

~ Final Exam ~


Kontak : \(\downarrow\)
Email
Instagram https://www.instagram.com/diasary_nm/
RPubs https://rpubs.com/diyasarya/

library(plotly)
library(tidyverse)
library(lubridate)
library(aTSA)
library(stats)
library(lmtest)
library(forecast)
library(car)
library(nortest)
library(caret)
library(tibble)

Sales Data

The dataset below represents the monthly sales data of a company, including various factors that might influence product sales. The data is presented in a tabular format with each row representing a month and each column representing a variable.

sales_data <- tibble::tibble(
  Month = seq.Date(from = as.Date("2019-01-01"), to = as.Date("2024-05-01"), by = "1 month"),
  Advertising_Expense = c(50, 75, 60, 65, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380),
  Product_Quality = c(20, 25, 22, 23, 28, 30, 32, 35, 38, 40, 42, 45, 48, 50, 55, 60, 65, 68, 70, 72, 75, 78, 80, 82, 85, 88, 90, 92, 95, 98, 100, 102, 105, 108, 110, 112, 115, 118, 120, 122, 125, 128, 130, 132, 135, 138, 140, 142, 145, 148, 150, 152, 155, 158, 160, 162, 165, 168, 170, 172, 175, 178, 180, 182, 185),
  Product_Price = c(95, 112, 107, 100, 110, 105, 108, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400),
  Sales_Promotion = c(15, 20, 18, 19, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142),
  Online_Marketing = c(76, 77, 78, 79, 80, 85, 82, 84, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200),
  Offline_Marketing = c(84, 85, 87, 89, 89, 90, 95, 92, 94, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208),
  Product_Sales = c(200, 220, 210, 215, 230, 240, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515, 520, 525, 530, 535, 540)
)

head(sales_data)

This dataset includes the following variables:


- Month: The month of observation.
- Advertising_Expense: The amount spent on advertising (in dollars).
- Product_Quality: The quality score of the product (on a scale from 1 to 100).
- Product_Price: The price of the product (in dollars).
- Sales_Promotion: The expenditure on sales promotion (in dollars).
- Online_Marketing: The expenditure on online marketing (in dollars).
- Offline_Marketing: The expenditure on offline marketing (in dollars).
- Product_Sales: The number of products sold.

Regression Analysis

Regression is one of the statistical methods used to estimate the relationship between the dependent variable and one or more independent variables. Regression is also often used to perform simple predictions, by estimating how the dependent variable changes when the independent variable changes. In conducting regression analysis, there are classical tests that must be met, namely, normality tests, multicollinearity tests, and homogeneity tests.

Correlation Analysis

Correlation analysis is a statistical technique used to measure the strength and direction of the relationship between two or more variables. It’s commonly employed to understand how changes in one variable are associated with changes in another variable.

m <- cor(sales_data[, c(-1)])
corplot <- plot_ly(
    x = colnames(m), y = colnames(m),
    z = m, type = "heatmap") %>%
  layout(title = "Correlation Analysis for Sales Data")

corplot

Based on the results of correlation analysis, each variable has a very high correlation value.

Built Model

First, create a regression model for Product Sales as the dependent variable and Advertising Expanse, Product Quality, Product Price, Sales Promotion, Online Marketing, and Offline Marketing as independent variables.

model1 <- lm(Product_Sales~Advertising_Expense+Product_Quality+Product_Price+Sales_Promotion+Online_Marketing+Offline_Marketing, data = sales_data)
summary(model1)
## 
## Call:
## lm(formula = Product_Sales ~ Advertising_Expense + Product_Quality + 
##     Product_Price + Sales_Promotion + Online_Marketing + Offline_Marketing, 
##     data = sales_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6031 -0.0697 -0.0051  0.0594  3.3179 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         297.58588   17.69853  16.814  < 2e-16 ***
## Advertising_Expense   0.46830    0.09830   4.764 1.31e-05 ***
## Product_Quality       0.12252    0.04597   2.665 0.009955 ** 
## Product_Price        -0.82782    0.03965 -20.880  < 2e-16 ***
## Sales_Promotion       4.68575    0.51594   9.082 9.71e-13 ***
## Online_Marketing     -0.85343    0.17905  -4.767 1.30e-05 ***
## Offline_Marketing    -0.58501    0.15959  -3.666 0.000537 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7478 on 58 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 1.78e+05 on 6 and 58 DF,  p-value: < 2.2e-16

Interpretation for the regression model shows that the equation is Product_Sales = 297.58 + 0.47(Advertising_Expense) + 0.12(Product_Quality) - 0.83(Product_Price) + 4.69(Sales_Promotion) - 0.85(Online_Marketing) - 0.59(Offline_Marketing).


Before determining whether this model is the best model, test the regression assumptions which will be explained in the following section.

Normality Test

Test the data normality hypothesis:
H0 : Residuals data are normally distributed
H1 : Data residuals are not normally distributed
Reject the null hypothesis if the p-value is less than 0.05 or 0.01

lillie.test(model1$residuals)
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  model1$residuals
## D = 0.30707, p-value < 2.2e-16

In the model, the p-value is smaller than 0.05 which means reject the null hypothesis or the residuals data are not normally distributed. If the residuals data are not normally distributed then it is necessary to perform model transformations using the log method.

model1trans <- lm(log(Product_Sales)~Advertising_Expense+Product_Quality+Product_Price+Sales_Promotion+Online_Marketing+Offline_Marketing, data = sales_data)
lillie.test(model1trans$residuals)
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  model1trans$residuals
## D = 0.099847, p-value = 0.1112

After the data is transformed, the p-value is greater than 0.05, which means the null hypothesis is accepted or the residuals data are normally distributed.

Homogeneity

Test the data homogeneity hypothesis:
H0 : Data residuals have homogeneous or equal variances
H1 : Residuals data do not have the same variance
Reject the null hypothesis if the p-value is less than 0.05 or 0.01

bptest(model1trans, studentize = FALSE)
## 
##  Breusch-Pagan test
## 
## data:  model1trans
## BP = 8.2783, df = 6, p-value = 0.2184

To test homogeneity, using the Breusch-Pagan Test produces a p-value greater than 0.05, which means the null hypothesis fails to be rejected or the residuals data have a homogeneous or equal variance.

Multicollinearity

This VIF test will provide more accurate information about the presence or absence of multicollinearity in multiple regression models. Generally, regression models where there is no multicollinearity will have a VIF value smaller than 10.

vif(model1trans)
## Advertising_Expense     Product_Quality       Product_Price     Sales_Promotion 
##           9994.0793            592.2988           1549.4610          43328.2612 
##    Online_Marketing   Offline_Marketing 
##           5182.5024           4096.6433

The results of the multicollinearity test using the VIF test on the regression model that was created at the beginning produced a VIF value greater than 10. This means that the regression model is indicated to have a multicollinearity problem.


Because the regression model has high multicollinearity, the regression model coefficients are unstable or cannot be used as predictions. To overcome unstable coefficients I use LASSO Regression so that the regression model is more reliable for making predictions.

set.seed(88)
index <- createDataPartition(sales_data$Product_Sales, p=.8, list = F)
train_data <- sales_data[index,]
test_data <- sales_data[-index,]
ctrlspecs <- trainControl(method = "cv", number = 10,
                          savePredictions = "all")
lamda <- 10^seq(5, -5, length=100)
set.seed(88)
model1 <- train(Product_Sales~Advertising_Expense+Product_Quality+Product_Price+Sales_Promotion+Online_Marketing+Offline_Marketing,
                data = sales_data,
                preProcess = c("center", "scale"),
                method = "glmnet",
                tuneGrid = expand.grid(alpha = 1, lambda = lamda), 
                trControl = ctrlspecs,
                na.action = na.omit)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.
round(coef(model1$finalModel, model1$bestTune$lambda), 3)
## 7 x 1 sparse Matrix of class "dgCMatrix"
##                          s1
## (Intercept)         378.769
## Advertising_Expense  94.893
## Product_Quality       0.203
## Product_Price         .    
## Sales_Promotion       .    
## Online_Marketing      .    
## Offline_Marketing     .

After carrying out LASSO Regression with cross validation k=10, it produces a new regression equation that can be relied on as a forecast.
The equation becomes Product_Sales = 378.769 + 94.893(Advertising_Expense) + 0.203(Product_Quality)

ggplot(varImp(model1))

Following the importance variable, there are only two independent variables, namely Advertising Expense and Product Quality.

Regpredict <- predict(model1, newdata = test_data)
accuracy <- data.frame(RMSE = RMSE(Regpredict, test_data$Product_Sales),
                       Rsquared = R2(Regpredict, test_data$Product_Sales))
accuracy

Based on the results of accuracy calculations for forecasting using the LASSO Regression model, it produces an RMSE of 0.918 and an Rsquared of 1. The smaller the RMSE value, the better the model’s forecasting decisions. This shows that the forecast model is more accurate in predicting the data.

Time Series Analysis

The ARIMA model is a combination of the AR (Autoregressive) model and the MA (Moving Average) model which is able to form a time series model for stationary data. The ARIMA model is generally denoted as ARIMA(p, d, q), where p is the order for the AR process, d is the first level of difference, and q is the order for the MA process.Forecasting using the ARIMA model popular since the early 1970s was introduced by Box and Jenkins, which is able to explain information for time series data. This time, I will do a Sales Revenue forecasting with the ARIMA method.

p <- sales_data[,c(8)]
p <- ts(p, start = c(2019, 1), frequency = 12)
fig <- plot_ly(sales_data, x = ~Month, y = ~Product_Sales, type = 'scatter', mode = 'lines')

fig

If you look at the Product_Sales data plot, it is not stationary regarding the average and variance, this is because the plot forms an increasing pattern.


To be more sure that the data is not stationary, we can carry out the Augmented Dickey-Fuller test or acf and pacf plots.

tseries::adf.test(p)
## Warning in tseries::adf.test(p): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  p
## Dickey-Fuller = -44.438, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

To prove that time series data is stationary or not, another way is to use Augmented Dickey-Fuller Test.


The stationary test hypothesis of time series data using ADF is:
H0 : Non-stationary data
H1 : Stationary data
Reject the null hypothesis if the p-value is less than 0.05 or 0.01


Based on the results of the ADF test, it produces a p-value smaller than 0.05 or the null hypothesis is rejected, which means the data is stationary. However, to be sure, do a double check by looking at the acf and pacf plots.

acf(p, lag.max = 36)

pacf(p, lag.max = 36)

Based on the acf plot, the plot does not decrease exponentially which indicates that the data is not stationary. To stationary data that is not stationary, differencing needs to be done.

acf(diff(p), lag.max = 36)

pacf(diff(p), lag.max = 36)

After differencing, the acf and pacf plots decay exponentially or the data is stationary.


Modeling uses auto.arima() to get the best ARIMA model.

auto.arima(p)
## Series: p 
## ARIMA(2,1,2) with drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2   drift
##       -1.3317  -0.9433  1.8308  0.8406  5.2913
## s.e.   0.1659   0.0541  0.2004  0.2023  0.2992
## 
## sigma^2 = 4.932:  log likelihood = -144.16
## AIC=300.32   AICc=301.79   BIC=313.27

The result of auto.arima() is ARIMA(2,1,2) which was successfully generated by R with the smallest error value or smallest AIC value from the possible combination of ARIMA models.


After getting the best ARIMA model, check the coefficients of the ARIMA model, namely for AR(2), I(1), and MA(2).

coeftest(auto.arima(p))
## 
## z test of coefficients:
## 
##        Estimate Std. Error  z value  Pr(>|z|)    
## ar1   -1.331743   0.165910  -8.0269 9.998e-16 ***
## ar2   -0.943266   0.054055 -17.4502 < 2.2e-16 ***
## ma1    1.830797   0.200449   9.1335 < 2.2e-16 ***
## ma2    0.840646   0.202332   4.1548 3.256e-05 ***
## drift  5.291330   0.299197  17.6851 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Resulting in all ARIMA model coefficients (2,1,2) being significant. So, it can be continued as forecasting.

predict <- forecast(p, model = auto.arima(p), h=12, level = c(95))
accuracy(auto.arima(p))
##                       ME     RMSE     MAE        MPE      MAPE       MASE
## Training set -0.01087948 2.115745 1.02334 0.02892078 0.3669924 0.01663712
##                    ACF1
## Training set -0.3067911
plot(predict, main = "Forecast for Product Sales")

It was found that forecasting using ARIMA(2,1,2) had an RMSE of 2.115.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast    Lo 95    Hi 95
## Jun 2024       545.2756 540.9231 549.6281
## Jul 2024       550.6609 542.8177 558.5041
## Aug 2024       555.8419 547.3766 564.3073
## Sep 2024       561.1915 551.0506 571.3324
## Oct 2024       566.5093 554.9540 578.0646
## Nov 2024       571.7104 559.6832 583.7377
## Dec 2024       577.0969 563.5139 590.6799
## Jan 2025       582.3466 568.0603 596.6329
## Feb 2025       587.6036 572.7323 602.4749
## Mar 2025       592.9799 576.8277 609.1321
## Apr 2025       598.1905 581.6223 614.7586
## May 2025       603.5093 586.1678 620.8507

Comparission Forecasting

The comparison for forecasting Product Sales using the Regression model and the ARIMA model is explained in the following coding.

RegRMSE <- round(RMSE(Regpredict, test_data$Product_Sales), 3)
ARIMARMSE <- round(2.115745, 3)

data.frame(Model = c('Regression', 'ARIMA(2,1,2)'),
           RMSE = c(RegRMSE, ARIMARMSE))

The comparison results obtained show that the regression model is the most accurate for forecasting Product Sales with RMSE 0.918. So for forecasting Product Sales you can use the Regression model with the equation Product_Sales = 378.769 + 94.893(Advertising_Expense) + 0.203(Product_Quality).

Economic Indicator Data

Collect dataset from BPS to analyze the relationship between monthly economic indicators, including inflation rate, GDP growth rate, unemployment rate, interest rate, consumer confidence, stock market index, and exchange rate, over a specified period. In this study case you need to apply econometric techniques to analyze the relationships between various economic indicators and gain insights into the dynamics of the economy. By conducting correlation analysis, time series analysis, and regression modeling, you can identify key factors driving economic trends and inform decision-making processes.

Data Collection

Data collection is the process of gathering and measuring information on variables of interest in a systematic way. It’s a crucial step in research, decision-making, and problem-solving across various fields such as science, business, healthcare, and social sciences.

library(readxl)
## Warning: package 'readxl' was built under R version 4.1.3
dataecoind <- read_excel("Dataeconomicindicator.xlsx")
head(dataecoind)

Data Exploration

Data exploration is the initial phase of data analysis where you examine and understand the structure, contents, and patterns within a dataset. It involves summarizing the main characteristics of the data, often through visualizations and summary statistics, to gain insights and inform further analysis.

Identification Problem

str(dataecoind)
## tibble [144 x 8] (S3: tbl_df/tbl/data.frame)
##  $ Month: POSIXct[1:144], format: "2010-01-01" "2010-02-01" ...
##  $ InR  : num [1:144] 0.84 1.14 0.99 1.15 1.44 2.42 4.02 4.82 5.28 5.35 ...
##  $ GDP  : num [1:144] 0.0053 0.0053 0.0053 0.0053 0.0053 0.0053 0.0053 0.0053 0.0053 0.0053 ...
##  $ Up   : num [1:144] 7.36 7.41 7.06 7.22 6.77 7.21 7.51 7.14 7.18 7.42 ...
##  $ IR   : num [1:144] 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 ...
##  $ CCI  : num [1:144] 110 105 107 111 110 ...
##  $ SMI  : num [1:144] 2611 2549 2777 2971 2797 ...
##  $ ER   : num [1:144] 8991 8991 8991 8991 8991 ...

The Economic Indicators case data contains the following variables:
- Month: time in months.
- InR: inflation rate taken from the bps website.
- GDP: GDP growth rate taken from the economic trading website.
- Up: unemployement rate taken from the BPS website which is generated randomly based on the lowest value in that year and with a range that is the difference between two known months in that year.
- IR: interest rate taken from the Bank Indonesia website.
- CCI: customer confidence index taken from the Bank Indonesia website.
- SMI: stock market index taken from the BPS website.
- ER: exchange rate usd to idr taken from the BPS website.

Summary the Data

summary(dataecoind)
##      Month                          InR              GDP          
##  Min.   :2010-01-01 00:00:00   Min.   :-0.610   Min.   :0.001700  
##  1st Qu.:2012-12-24 06:00:00   1st Qu.: 0.890   1st Qu.:0.004075  
##  Median :2015-12-16 12:00:00   Median : 1.710   Median :0.004150  
##  Mean   :2015-12-16 11:00:00   Mean   : 2.112   Mean   :0.004125  
##  3rd Qu.:2018-12-08 18:00:00   3rd Qu.: 2.663   3rd Qu.:0.004700  
##  Max.   :2021-12-01 00:00:00   Max.   : 8.380   Max.   :0.005300  
##        Up              IR             CCI             SMI             ER       
##  Min.   :4.450   Min.   :3.500   Min.   : 77.3   Min.   :2549   Min.   : 8991  
##  1st Qu.:5.290   1st Qu.:5.188   1st Qu.:107.8   1st Qu.:4272   1st Qu.:11559  
##  Median :5.815   Median :6.000   Median :114.3   Median :5078   Median :13492  
##  Mean   :5.911   Mean   :5.906   Mean   :112.5   Mean   :5008   Mean   :12491  
##  3rd Qu.:6.310   3rd Qu.:6.750   3rd Qu.:120.2   3rd Qu.:5949   3rd Qu.:13952  
##  Max.   :9.930   Max.   :7.750   Max.   :145.5   Max.   :6606   Max.   :14481

Inflation Rate

a <- dataecoind[,c(2)]
a <- ts(a, start = c(2010, 1), frequency = 12)
fig <- plot_ly(dataecoind, x = ~Month, y = ~InR, type = 'scatter', mode = 'lines') %>%
  layout(title = "Inflation Rate")

fig

GDP Growth Rate

b <- dataecoind[,c(3)]
b <- ts(b, start = c(2010, 1), frequency = 12)
fig <- plot_ly(dataecoind, x = ~Month, y = ~GDP, type = 'scatter', mode = 'lines') %>%
  layout(title = "GDP Growth Rate")

fig

Unemployement Rate

c <- dataecoind[,c(4)]
c <- ts(c, start = c(2010, 1), frequency = 12)
fig <- plot_ly(dataecoind, x = ~Month, y = ~Up, type = 'scatter', mode = 'lines') %>%
  layout(title = "Unemployement Rate")

fig

Interest Rate

d <- dataecoind[,c(5)]
d <- ts(d, start = c(2010, 1), frequency = 12)
fig <- plot_ly(dataecoind, x = ~Month, y = ~IR, type = 'scatter', mode = 'lines') %>%
  layout(title = "Interest Rate")

fig

Customer Confidence

e <- dataecoind[,c(6)]
e <- ts(e, start = c(2010, 1), frequency = 12)
fig <- plot_ly(dataecoind, x = ~Month, y = ~CCI, type = 'scatter', mode = 'lines') %>%
  layout(title = "Customer Confidence Index")

fig

Stock Market index

f <- dataecoind[,c(7)]
f <- ts(c, start = c(2010, 1), frequency = 12)
fig <- plot_ly(dataecoind, x = ~Month, y = ~SMI, type = 'scatter', mode = 'lines') %>%
  layout(title = "Stock Market Index")

fig

Exchange Rate (USD to IDR)

g <- dataecoind[,c(8)]
g <- ts(g, start = c(2010, 1), frequency = 12)
fig <- plot_ly(dataecoind, x = ~Month, y = ~ER, type = 'scatter', mode = 'lines') %>%
  layout(title = "Exchange Rate (USD to IDR)")

fig

Correlation Analysis

Correlation analysis is a statistical technique used to measure the strength and direction of the relationship between two or more variables. It’s commonly employed to understand how changes in one variable are associated with changes in another variable.

m <- cor(dataecoind[, c(-1)])
corplot <- plot_ly(
    x = colnames(m), y = colnames(m),
    z = m, type = "heatmap") %>%
  layout(title = "Correlation Analysis for Economic Indicator Data")

corplot

I will use SMI as the dependent variable, to determine the independent variable I will do correlation analysis. Based on the results of the correlation analysis, three regression models were obtained:
1. The first model, SMI as the dependent variable and InR, GDP, UP, IR, CCI, ER as independent variables.
2. The second model, SMI as the dependent variable and GDP, UP, ER as independent variables. This second model uses independent variables that have a high correlation.
3. The third model, SMI as the dependent variable and InR, IR, CCI as independent variables. This second model uses independent variables that have low correlation.

Time Series Analysis

The ARIMA model is a combination of the AR (Autoregressive) model and the MA (Moving Average) model which is able to form a time series model for stationary data. The ARIMA model is generally denoted as ARIMA(p, d, q), where p is the order for the AR process, d is the first level of difference, and q is the order for the MA process.Forecasting using the ARIMA model popular since the early 1970s was introduced by Box and Jenkins, which is able to explain information for time series data. This time, I will do a Sales Revenue forecasting with the ARIMA method.

Inflation Rate

plot(stl(ts(dataecoind$InR, frequency = 12), s.window = "periodic"))

tseries::adf.test(a)
## Warning in tseries::adf.test(a): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  a
## Dickey-Fuller = -6.2056, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
acf(a, lag.max = 36)

pacf(a, lag.max = 36)

auto.arima(a)
## Series: a 
## ARIMA(1,0,1)(2,1,2)[12] 
## 
## Coefficients:
##          ar1     ma1    sar1     sar2     sma1    sma2
##       0.7736  0.1765  0.3375  -0.6983  -0.6605  0.3861
## s.e.  0.0651  0.1033  0.1831   0.1224   0.2222  0.2175
## 
## sigma^2 = 0.445:  log likelihood = -136.98
## AIC=287.96   AICc=288.86   BIC=308.14
coeftest(auto.arima(a))
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.773615   0.065093 11.8847 < 2.2e-16 ***
## ma1   0.176488   0.103315  1.7082  0.087592 .  
## sar1  0.337524   0.183137  1.8430  0.065327 .  
## sar2 -0.698254   0.122401 -5.7046 1.166e-08 ***
## sma1 -0.660451   0.222245 -2.9717  0.002961 ** 
## sma2  0.386121   0.217469  1.7755  0.075812 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict <- forecast(a, model = auto.arima(a), h=12, level = c(95))
accuracy(auto.arima(a))
##                       ME      RMSE      MAE      MPE     MAPE      MASE
## Training set -0.05899188 0.6240334 0.389205 5.721138 41.01166 0.3765948
##                      ACF1
## Training set -0.001428381
plot(predict, main = "Forecast for Inflation Rate")

For forecasting with the Inflation Rate variable, the best method is ARIMA(1,0,1)(2,1,2)[12] with a MAPE of 5.72%.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast       Lo 95    Hi 95
## Jan 2022      0.4013526 -0.90620174 1.708907
## Feb 2022      0.4915134 -1.31209506 2.295122
## Mar 2022      0.5971463 -1.44653678 2.640829
## Apr 2022      0.8524542 -1.32226852 3.027177
## May 2022      1.2972387 -0.95226019 3.546738
## Jun 2022      1.4727085 -0.82037623 3.765793
## Jul 2022      1.6493120 -0.66946573 3.968090
## Aug 2022      1.6939816 -0.64003714 4.028000
## Sep 2022      1.5253069 -0.81778498 3.868399
## Oct 2022      1.6754646 -0.67303908 4.023968
## Nov 2022      1.9320691 -0.41966498 4.283803
## Dec 2022      2.4260296  0.07236842 4.779691

GDP Growth Rate

plot(stl(ts(dataecoind$GDP, frequency = 12), s.window = "periodic"))

tseries::adf.test(b)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  b
## Dickey-Fuller = -2.803, Lag order = 5, p-value = 0.2421
## alternative hypothesis: stationary
acf(b, lag.max = 36)

pacf(b, lag.max = 36)

tseries::adf.test(diff(b))
## Warning in tseries::adf.test(diff(b)): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(b)
## Dickey-Fuller = -4.7178, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
acf(diff(b), lag.max = 36)

pacf(diff(b), lag.max = 36)

auto.arima(b)
## Series: b 
## ARIMA(0,1,0)(0,0,1)[12] 
## 
## Coefficients:
##          sma1
##       -0.3400
## s.e.   0.0776
## 
## sigma^2 = 4.988e-08:  log likelihood = 999.03
## AIC=-1994.06   AICc=-1993.98   BIC=-1988.14
coeftest(auto.arima(b))
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## sma1 -0.339978   0.077609 -4.3806 1.183e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict <- forecast(b, model = auto.arima(b), h=12, level = c(95))
accuracy(auto.arima(b))
##                         ME        RMSE          MAE       MPE     MAPE
## Training set -2.566357e-05 0.000221782 3.540085e-05 -1.166949 1.459064
##                    MASE        ACF1
## Training set 0.07080169 -0.01368546
plot(predict, main = "Forecast for GDP Growth Rate")

For forecasting with the GDP Growth Rate variable, the best method is ARIMA(0,1,0)(0,0,1)[12] with a MAPE of 1.46%.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast       Lo 95       Hi 95
## Jan 2022    0.002841517 0.002403781 0.003279252
## Feb 2022    0.002841517 0.002222466 0.003460568
## Mar 2022    0.002841517 0.002083337 0.003599696
## Apr 2022    0.002841517 0.001966046 0.003716987
## May 2022    0.002841517 0.001862711 0.003820322
## Jun 2022    0.002841517 0.001769289 0.003913745
## Jul 2022    0.002841517 0.001683378 0.003999655
## Aug 2022    0.002841517 0.001603414 0.004079619
## Sep 2022    0.002841517 0.001528311 0.004154722
## Oct 2022    0.002841517 0.001457276 0.004225757
## Nov 2022    0.002841517 0.001389713 0.004293320
## Dec 2022    0.002841517 0.001325157 0.004357876

Unemployement Rate

plot(stl(ts(dataecoind$Up, frequency = 12), s.window = "periodic"))

tseries::adf.test(c)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  c
## Dickey-Fuller = -1.9564, Lag order = 5, p-value = 0.5946
## alternative hypothesis: stationary
acf(c, lag.max = 36)

pacf(c, lag.max = 36)

tseries::adf.test(diff(c))
## Warning in tseries::adf.test(diff(c)): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(c)
## Dickey-Fuller = -7.2569, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
acf(diff(c), lag.max = 36)

pacf(diff(c), lag.max = 36)

auto.arima(c)
## Series: c 
## ARIMA(1,1,1)(0,0,2)[12] 
## 
## Coefficients:
##          ar1      ma1    sma1    sma2
##       0.1826  -0.8197  0.2218  0.5366
## s.e.  0.1053   0.0573  0.0784  0.1375
## 
## sigma^2 = 0.2613:  log likelihood = -109.52
## AIC=229.03   AICc=229.47   BIC=243.84
coeftest(auto.arima(c))
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.182631   0.105279   1.7347  0.082790 .  
## ma1  -0.819661   0.057286 -14.3082 < 2.2e-16 ***
## sma1  0.221813   0.078396   2.8294  0.004663 ** 
## sma2  0.536601   0.137520   3.9020 9.541e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict <- forecast(c, model = auto.arima(c), h=12, level = c(95))
accuracy(auto.arima(c))
##                        ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.009000985 0.5022516 0.2709603 -0.6154412 4.427444 0.5941323
##                      ACF1
## Training set -0.001938749
plot(predict, main = "Forecast for Unemployement Rate")

For forecasting with the Unemployement Rate variable, the best method is ARIMA(1,1,1)(0,0,2)[12] with a MAPE of 4.43%.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast    Lo 95    Hi 95
## Jan 2022       6.038302 5.036255 7.040349
## Feb 2022       6.252249 5.186238 7.318260
## Mar 2022       6.882316 5.788034 7.976598
## Apr 2022       8.560853 7.443510 9.678197
## May 2022       7.006320 5.867145 8.145495
## Jun 2022       6.697178 5.536717 7.857638
## Jul 2022       6.371531 5.190193 7.552868
## Aug 2022       7.236602 6.034754 8.438450
## Sep 2022       6.380299 5.158285 7.602312
## Oct 2022       7.048062 5.806211 8.289913
## Nov 2022       6.791764 5.530386 8.053141
## Dec 2022       6.902040 5.621433 8.182646

Interest Rate

plot(stl(ts(dataecoind$IR, frequency = 12), s.window = "periodic"))

tseries::adf.test(d)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  d
## Dickey-Fuller = -2.4005, Lag order = 5, p-value = 0.4097
## alternative hypothesis: stationary
acf(d, lag.max = 36)

pacf(d, lag.max = 36)

tseries::adf.test(diff(d))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(d)
## Dickey-Fuller = -3.6848, Lag order = 5, p-value = 0.0281
## alternative hypothesis: stationary
acf(diff(d), lag.max = 36)

pacf(diff(d), lag.max = 36)

auto.arima(d)
## Series: d 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.7766  -0.5488
## s.e.  0.1130   0.1467
## 
## sigma^2 = 0.03435:  log likelihood = 39.04
## AIC=-72.09   AICc=-71.92   BIC=-63.2
coeftest(auto.arima(d))
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.77661    0.11300  6.8725 6.308e-12 ***
## ma1 -0.54878    0.14672 -3.7403 0.0001838 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict <- forecast(d, model = auto.arima(d), h=12, level = c(95))
accuracy(auto.arima(d))
##                       ME      RMSE        MAE       MPE     MAPE      MASE
## Training set -0.01027169 0.1834047 0.09403041 -0.211623 1.747241 0.1110695
##                      ACF1
## Training set -0.001434022
plot(predict, main = "Forecast for Interest Rate")

For forecasting with the Interest Rate variable, the best method is ARIMA(1,1,1) with a MAPE of 1.75%.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast    Lo 95    Hi 95
## Jan 2022       3.499832 3.136561 3.863103
## Feb 2022       3.499702 2.924452 4.074951
## Mar 2022       3.499600 2.730622 4.268578
## Apr 2022       3.499521 2.548112 4.450931
## May 2022       3.499460 2.375196 4.623725
## Jun 2022       3.499413 2.211107 4.787719
## Jul 2022       3.499376 2.055264 4.943488
## Aug 2022       3.499347 1.907113 5.091582
## Sep 2022       3.499325 1.766099 5.232552
## Oct 2022       3.499308 1.631675 5.366941
## Nov 2022       3.499295 1.503314 5.495275
## Dec 2022       3.499284 1.380518 5.618050

Customer Confidence

plot(stl(ts(dataecoind$CCI, frequency = 12), s.window = "periodic"))

tseries::adf.test(e)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  e
## Dickey-Fuller = -2.203, Lag order = 5, p-value = 0.4919
## alternative hypothesis: stationary
acf(e, lag.max = 36)

pacf(e, lag.max = 36)

tseries::adf.test(diff(e))
## Warning in tseries::adf.test(diff(e)): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(e)
## Dickey-Fuller = -4.2641, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
acf(diff(e), lag.max = 36)

pacf(diff(e), lag.max = 36)

auto.arima(e)
## Series: e 
## ARIMA(4,0,0)(1,0,0)[12] with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ar3     ar4    sar1      mean
##       1.0324  -0.2965  -0.0081  0.1524  0.2266  112.5587
## s.e.  0.0848   0.1204   0.1259  0.0899  0.1017    4.5523
## 
## sigma^2 = 31.18:  log likelihood = -450.09
## AIC=914.19   AICc=915.01   BIC=934.98
coeftest(auto.arima(e))
## 
## z test of coefficients:
## 
##              Estimate  Std. Error z value Pr(>|z|)    
## ar1         1.0323581   0.0848340 12.1692  < 2e-16 ***
## ar2        -0.2964557   0.1204440 -2.4614  0.01384 *  
## ar3        -0.0081428   0.1258653 -0.0647  0.94842    
## ar4         0.1524304   0.0899082  1.6954  0.09000 .  
## sar1        0.2265503   0.1017109  2.2274  0.02592 *  
## intercept 112.5587288   4.5523379 24.7255  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict <- forecast(e, model = auto.arima(e), h=12, level = c(95))
accuracy(auto.arima(e))
##                     ME     RMSE      MAE        MPE    MAPE      MASE
## Training set 0.0204164 5.465904 3.560188 -0.2464935 3.35133 0.3666288
##                       ACF1
## Training set -0.0005158767
plot(predict, main = "Forecast for Customer Confidence Index")

For forecasting with the Customer Confidence Index variable, the best method is ARIMA(4,0,0)(1,0,0)[12] with a MAPE of 3.35%.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast     Lo 95    Hi 95
## Jan 2022       111.1713 100.22795 122.1147
## Feb 2022       109.9533  94.22468 125.6820
## Mar 2022       111.8799  94.03982 129.7199
## Apr 2022       114.1986  95.60126 132.7959
## May 2022       114.6198  95.48497 133.7545
## Jun 2022       114.6715  94.95710 134.3859
## Jul 2022       108.0043  87.71298 128.2956
## Aug 2022       107.0576  86.31219 127.8031
## Sep 2022       111.0072  89.93787 132.0766
## Oct 2022       114.8812  93.56961 136.1927
## Nov 2022       115.8228  94.31298 137.3325
## Dec 2022       115.5677  93.89090 137.2446

Stock Market Index

plot(stl(ts(dataecoind$SMI, frequency = 12), s.window = "periodic"))

tseries::adf.test(f)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  f
## Dickey-Fuller = -1.9564, Lag order = 5, p-value = 0.5946
## alternative hypothesis: stationary
acf(f, lag.max = 36)

pacf(f, lag.max = 36)

tseries::adf.test(diff(f))
## Warning in tseries::adf.test(diff(f)): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(f)
## Dickey-Fuller = -7.2569, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
acf(diff(f), lag.max = 36)

pacf(diff(f), lag.max = 36)

auto.arima(f)
## Series: f 
## ARIMA(1,1,1)(0,0,2)[12] 
## 
## Coefficients:
##          ar1      ma1    sma1    sma2
##       0.1826  -0.8197  0.2218  0.5366
## s.e.  0.1053   0.0573  0.0784  0.1375
## 
## sigma^2 = 0.2613:  log likelihood = -109.52
## AIC=229.03   AICc=229.47   BIC=243.84
coeftest(auto.arima(f))
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.182631   0.105279   1.7347  0.082790 .  
## ma1  -0.819661   0.057286 -14.3082 < 2.2e-16 ***
## sma1  0.221813   0.078396   2.8294  0.004663 ** 
## sma2  0.536601   0.137520   3.9020 9.541e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict <- forecast(f, model = auto.arima(f), h=12, level = c(95))
accuracy(auto.arima(f))
##                        ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.009000985 0.5022516 0.2709603 -0.6154412 4.427444 0.5941323
##                      ACF1
## Training set -0.001938749
plot(predict, main = "Forecast for Stock Market Index")

For forecasting with the Stock Market Index variable, the best method is ARIMA(1,1,1)(0,0,2)[12] with a MAPE of 4.43%.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast    Lo 95    Hi 95
## Jan 2022       6.038302 5.036255 7.040349
## Feb 2022       6.252249 5.186238 7.318260
## Mar 2022       6.882316 5.788034 7.976598
## Apr 2022       8.560853 7.443510 9.678197
## May 2022       7.006320 5.867145 8.145495
## Jun 2022       6.697178 5.536717 7.857638
## Jul 2022       6.371531 5.190193 7.552868
## Aug 2022       7.236602 6.034754 8.438450
## Sep 2022       6.380299 5.158285 7.602312
## Oct 2022       7.048062 5.806211 8.289913
## Nov 2022       6.791764 5.530386 8.053141
## Dec 2022       6.902040 5.621433 8.182646

Exchange Rate

plot(stl(ts(dataecoind$ER, frequency = 12), s.window = "periodic"))

tseries::adf.test(g)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  g
## Dickey-Fuller = -1.3247, Lag order = 5, p-value = 0.8576
## alternative hypothesis: stationary
acf(g, lag.max = 36)

pacf(g, lag.max = 36)

tseries::adf.test(diff(g))
## Warning in tseries::adf.test(diff(g)): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(g)
## Dickey-Fuller = -5.1911, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
acf(diff(g), lag.max = 36)

pacf(diff(g), lag.max = 36)

auto.arima(g)
## Series: g 
## ARIMA(0,1,0)(2,0,1)[12] 
## 
## Coefficients:
##         sar1    sar2     sma1
##       0.4234  0.2800  -0.3850
## s.e.  0.2323  0.0877   0.2548
## 
## sigma^2 = 60586:  log likelihood = -990.48
## AIC=1988.96   AICc=1989.25   BIC=2000.81
coeftest(auto.arima(g))
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value Pr(>|z|)   
## sar1  0.423357   0.232311  1.8224 0.068399 . 
## sar2  0.280024   0.087737  3.1916 0.001415 **
## sma1 -0.384997   0.254762 -1.5112 0.130737   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict <- forecast(g, model = auto.arima(g), h=12, level = c(95))
accuracy(auto.arima(g))
##                  ME     RMSE      MAE       MPE      MAPE       MASE
## Training set 17.768 242.6994 46.28849 0.1551672 0.3648037 0.07115336
##                      ACF1
## Training set -0.005407055
plot(predict, main = "Forecast for Exchange Rate (USD to IDR)")

For forecasting with the Stock Market Index variable, the best method is ARIMA(0,1,0)(2,0,1)[12] with a MAPE of 0.36%.


Below are the forecast value points and their 95% confidence intervals.

predict
##          Point Forecast    Lo 95    Hi 95
## Jan 2022       14314.58 13832.15 14797.01
## Feb 2022       14314.58 13632.32 14996.84
## Mar 2022       14314.58 13478.99 15150.18
## Apr 2022       14314.58 13349.72 15279.44
## May 2022       14314.58 13235.84 15393.33
## Jun 2022       14314.58 13132.88 15496.29
## Jul 2022       14314.58 13038.19 15590.97
## Aug 2022       14314.58 12950.06 15679.10
## Sep 2022       14314.58 12867.29 15761.87
## Oct 2022       14314.58 12789.01 15840.16
## Nov 2022       14314.58 12714.54 15914.62
## Dec 2022       14314.58 12643.40 15985.77

Regression Analysis

Regression is one of the statistical methods used to estimate the relationship between the dependent variable and one or more independent variables. Regression is also often used to perform simple predictions, by estimating how the dependent variable changes when the independent variable changes. In conducting regression analysis, there are classical tests that must be met, namely, normality tests, multicollinearity tests, and homogeneity tests.

SMI vs All Variable

The first model of regression will use SMI as the dependent variable and Inflation Rate, GDP Growth Rate, Unemployment Rate, Interest Rate, Customer Confidence Index, and Exchange Rate as independent variables.

mreg1 <- lm(SMI ~ InR+GDP+Up+IR+CCI+ER, data = dataecoind)
summary(mreg1)
## 
## Call:
## lm(formula = SMI ~ InR + GDP + Up + IR + CCI + ER, data = dataecoind)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1330.67  -236.07    35.39   239.68   788.09 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.175e+03  8.933e+02  -1.316    0.191    
## InR          4.598e+00  1.863e+01   0.247    0.805    
## GDP          6.446e+04  6.911e+04   0.933    0.353    
## Up          -3.365e+01  5.946e+01  -0.566    0.572    
## IR          -2.688e+02  3.206e+01  -8.383 5.53e-14 ***
## CCI          2.515e+01  4.055e+00   6.203 6.15e-09 ***
## ER           3.893e-01  3.168e-02  12.287  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 377.6 on 137 degrees of freedom
## Multiple R-squared:  0.8676, Adjusted R-squared:  0.8618 
## F-statistic: 149.6 on 6 and 137 DF,  p-value: < 2.2e-16

The first model produces the regression equation SMI = - 1.175e+03 + 4.598e+00(InR) + 6.446e+04(GDP) - 3.365e+01(Up) - 2.688e+02(IR) + 2.515e+01(CCI) + 3.893e-01(ER).
Judging from the p-value, each variable has a p-value smaller than 0.05 or variables that have a significant effect, namely IR (Interest Rate), CCI (Customer Confidence Index), and ER (Exchange Rate).However, overall the model is significant because it has a p-value < 2.2e-16.
Next, test the assumptions for the first model.

lillie.test(mreg1$residuals)
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  mreg1$residuals
## D = 0.044878, p-value = 0.6801

The normality test of the first model produces a p-value greater than 0.05, which means the null hypothesis is accepted or the residuals data are normally distributed.

bptest(mreg1, studentize = FALSE)
## 
##  Breusch-Pagan test
## 
## data:  mreg1
## BP = 12.684, df = 6, p-value = 0.04835

The first model homogeneity test produces a p-value smaller than 0.05, which means the null hypothesis is rejected or the residuals data does not have a homogeneous or equal variance.
Because the first model does not meet the homogeneity assumption, this model cannot be used to forecast the Stock Market Index.

SMI vs High Correlation Variable

The second model of regression will use SMI as the dependent variable and GDP Growth Rate, Unemployment Rate, and Exchange Rate as independent variables. In the second model, I use independent variables with high correlation.

mreg2 <- lm(SMI ~ GDP+Up+ER, data = dataecoind)
summary(mreg2)
## 
## Call:
## lm(formula = SMI ~ GDP + Up + ER, data = dataecoind)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1336.88  -310.79    64.48   372.58   989.14 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.102e+02  1.066e+03   0.385   0.7009    
## GDP          7.459e+04  6.978e+04   1.069   0.2869    
## Up          -1.687e+02  7.075e+01  -2.385   0.0184 *  
## ER           4.233e-01  4.043e-02  10.470   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 496.6 on 140 degrees of freedom
## Multiple R-squared:  0.766,  Adjusted R-squared:  0.761 
## F-statistic: 152.7 on 3 and 140 DF,  p-value: < 2.2e-16

The second model produces the regression equation SMI = 4.102e+02 + 7.459e+04(GDP) - 1.687e+02(Up) + 4.233e-01(ER).
Judging from the p-value, each variable has a p-value smaller than 0.05 or variables that have a significant effect, namely Up(Unemployment Rate), and ER (Exchange Rate).However, overall the model is significant because it has a p-value < 2.2e-16.
Next, test the assumptions for the second model.

lillie.test(mreg2$residuals)
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  mreg2$residuals
## D = 0.068864, p-value = 0.09134

The normality test of the second model produces a p-value greater than 0.05, which means the null hypothesis is accepted or the residuals data are normally distributed.

bptest(mreg2, studentize = FALSE)
## 
##  Breusch-Pagan test
## 
## data:  mreg2
## BP = 6.1162, df = 3, p-value = 0.1061

The second model normality test produces a p-value greater than 0.05, which means the null hypothesis is accepted or the residuals data have homogeneous or equal variance.

vif(mreg2)
##      GDP       Up       ER 
## 2.493209 1.948399 3.772856

Based on the results of the VIF test in the second model, each independent variable has a VIF value smaller than 10. So the second regression model proved to have no multicollinearity problems.

SMI vs Low Correlation Variable

The third model of regression will use SMI as the dependent variable and Inflation Rate, Interest Rate, and Customer Confidence Index as independent variables. In the third model, I use independent variables with low correlation.

mreg3 <- lm(SMI ~ InR+IR+CCI, data = dataecoind)
summary(mreg3)
## 
## Call:
## lm(formula = SMI ~ InR + IR + CCI, data = dataecoind)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2121.8  -644.3   190.9   596.3  1390.1 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3475.456    647.191   5.370 3.19e-07 ***
## InR          -71.538     37.571  -1.904   0.0589 .  
## IR          -478.571     58.952  -8.118 2.20e-13 ***
## CCI           40.072      5.971   6.711 4.44e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 782.8 on 140 degrees of freedom
## Multiple R-squared:  0.4186, Adjusted R-squared:  0.4062 
## F-statistic:  33.6 on 3 and 140 DF,  p-value: < 2.2e-16

The third model produces the regression equation SMI = 3475.456 - 71.538(InR) - 478.471(IR) + 40.072(CCI).
Judging from the p-value, each variable has a p-value smaller than 0.05 or variables that have a significant effect, namely IR(Interest Rate), and CCI(Customer Confidence Index).However, overall the model is significant because it has a p-value < 2.2e-16.
Next, test the assumptions for the third model.

lillie.test(mreg3$residuals)
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  mreg3$residuals
## D = 0.12881, p-value = 4.016e-06

The normality test of the third model produces a p-value lower than 0.05, which means the null hypothesis is rejected or the residuals data are not normally distributed.
Because the first model does not meet the normality assumption, this model cannot be used to forecast the Stock Market Index.

Result

Because only the second model meets the three assumptions, namely normality, homogeneity and multicollinearity, it can be said that the second regression model is the best model for forecasting the Stock Market Index.

bestmodel <- mreg2
summary(bestmodel)
## 
## Call:
## lm(formula = SMI ~ GDP + Up + ER, data = dataecoind)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1336.88  -310.79    64.48   372.58   989.14 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.102e+02  1.066e+03   0.385   0.7009    
## GDP          7.459e+04  6.978e+04   1.069   0.2869    
## Up          -1.687e+02  7.075e+01  -2.385   0.0184 *  
## ER           4.233e-01  4.043e-02  10.470   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 496.6 on 140 degrees of freedom
## Multiple R-squared:  0.766,  Adjusted R-squared:  0.761 
## F-statistic: 152.7 on 3 and 140 DF,  p-value: < 2.2e-16
accuracy(bestmodel)
##                        ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 9.406827e-15 489.6804 399.1624 -1.008576 8.415506 0.4724877

From the second regression modeling are obtained:
1. Regression equation : SMI = 4.102e+02 + 7.459e+04(GDP) - 1.687e+02(Up) + 4.233e-01(ER).
2. F-statistic = 152.7 with a p-value greater than 0.05 which means that Hypothesis Null is accepted or there is no independent variable that has a significant intermediate influence on the dependent variable.
3. R-Squared = 0.761 which means that the independent variable is able to explain the variance of the independent variable by 76,1%, the remaining 23,9% is explained by other factors that are not contained in the regression model or are not studied.
4. The MAPE value or percentage error rate is 8.42%, this value is still below 10%, so it can be said that the second regression modeling for Stock Market Index forecasting is quite effective.

Policy Implications

Policy implications refer to the potential consequences or recommendations derived from the findings of an analysis or research study, particularly in relation to public policy decisions. When analyzing data or conducting research, identifying policy implications involves considering how the results can inform or guide policymakers in making decisions to address specific issues or achieve certain goals.


Policy Implications in this case will be divided into Policy Implications for Regression Analysis and Policy Implications for Timeseries Analysis.

Policy Implication for Timeseries Analysis

Based on the results of timeseries analysis using the Box-Jenkins method or ARIMA model, it was found that
1. Predictions for the Inflation Rate, Unemployment Rate, Stock Market Index, and Exchange Rate will increase for the next year.
2. Predictions for the GDP Growth Rate and Customer Confidence Index will decline over the next year.
3. The prediction for the Interest Rate will not decrease or increase, or in other words, it will remain stable for the next year.


The implications of the analysis results for policy makers and stakeholders will depend on the specific economic context and other factors influencing the economy. However, in general, here are some insights into how changes in these economic indicators can impact each other and the economy as a whole:
1. Predictions for next year show an increase in inflation and unemployment rates, this could indicate inflationary pressures and instability in the labor market. Policymakers may need to adjust monetary and fiscal policies to control inflation and stimulate job growth, for example, by raising interest rates or launching economic stimulus programs.
2. A decline in GDP can indicate a slowdown in economic growth. This could be caused by factors such as a decline in investment, weak consumption, or a decline in exports. Policymakers can try to stimulate economic growth through fiscal and monetary policies that encourage investment and consumption, such as interest rate cuts, tax incentives, or infrastructure programs.
3. A decline in consumer confidence levels can affect consumer behavior, which in turn can affect aggregate demand and economic growth. A decline in consumer confidence could be a sign of economic uncertainty, which may result in consumers holding back their spending. This can reduce a company’s sales and earnings, as well as cause a slowdown in economic growth.


In facing these changes, policy makers and stakeholders can take the following steps:
1. Policy Reaction: Policymakers can adjust their economic policies to address the challenges they face. This can involve a combination of monetary (e.g., setting interest rates) and fiscal (e.g., changing spending or tax budgets) policies to achieve macroeconomic goals such as price stability, economic growth, and social welfare.
2. Collaboration: Stakeholders from the public, private and civil society sectors may need to work together to address these complex economic issues. This collaboration can involve dialogue and coordination between government, business, labor unions and other community organizations to develop holistic and sustainable solutions
3. Strengthening Resilience: At the micro level, companies and individuals may need to increase their resilience to economic uncertainty by diversifying portfolios, improving skills, and adopting appropriate risk management strategies.
In this context, further analysis of the potential causes behind changes in such economic indicators and their impact on various sectors and groups of society will be key to formulating appropriate and effective responses.

Policy Implication for Regression Analysis

The regression equation provided is a statistical model that connects the Stock Market Index (SMI) variable with other variables, namely Gross Domestic Product (GDP), Unemployment Rate (Up), and Exchange Rate (ER). The implications of the analysis results for policy makers and stakeholders will depend on the value of the regression coefficient and the economic interpretation of the relationship. Here are some insights into how changes in these economic indicators can impact each other and the economy as a whole:
1. GDP Growth Rate: GDP is an important measure of a country’s economic activity. The positive regression coefficient (7.459e+04) between SMI and GDP indicates that there is a positive relationship between economic growth as measured by GDP and stock market performance as measured by SMI. This means that when GDP rises, SMI tends to increase too. The implication is that policymakers may pay more attention to fiscal and monetary policies that encourage economic growth to support stock market performance.
2. Unemployment Rate: The unemployment rate is an important indicator of the health of the labor market and economic well-being. The negative regression coefficient (-1.687e+02) between SMI and Up indicates that there is a negative relationship between the unemployment rate and stock market performance. This means that when the unemployment rate rises, SMI tends to decrease. The implication is that policies aimed at reducing unemployment can have a positive impact on stock market performance.
3. Exchange Rate: Currency exchange rates can affect the competitiveness of a country’s exports and imports as well as foreign capital flows. The positive regression coefficient (4.233e-01) between SMI and ER indicates a positive relationship between currency exchange rates and stock market performance. This means that when the currency exchange rate strengthens, SMI tends to increase. The implication is that policies that influence currency exchange rates, such as monetary policy and foreign exchange intervention, can affect stock market performance.
Policy makers and stakeholders need to pay attention to the relationship between economic indicators given in the regression equation to plan effective policies in managing the economy. In addition, it is also important to consider external factors and global market dynamics that can influence overall economic performance.