1 Tugas 1 (Konsep dan Teori Analisis Regresi Tingkat Lanjut)

1.1 Diagnostik lengkap OLS pada data mtcars

Heteroskedastisitas (Breusch–Pagan), multikolinearitas (VIF), pengaruh (Cook’s D), normalitas (QQ-plot). Laporkan temuan & rekomendasi.

1.1.1 Input Data & Pemodelan Regresi

#Data
library(car)

## Loading required package: carData

data(mtcars)
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

#Model regresi
fit <- lm(mpg ~ ., data = mtcars)

1.1.2 Multikolinieritas (VIF)

vif(fit)

##       cyl      disp        hp      drat        wt      qsec        vs        am 
## 15.373833 21.620241  9.832037  3.374620 15.164887  7.527958  4.965873  4.648487 
##      gear      carb 
##  5.357452  7.908747

Variabel disp memiliki nilai vifnya paling besar dan lebih dari 10 maka variabel disp akan dihapus.

fit1 <- lm(mpg ~ . -disp, data = mtcars)
vif(fit1)

##       cyl        hp      drat        wt      qsec        vs        am      gear 
## 14.284737  7.123361  3.329298  6.189050  6.914423  4.916053  4.645108  5.324402 
##      carb 
##  4.310597

Variabel cyl memiliki nilai vifnya paling besar dan lebih dari 10 maka variabel cyl akan dihapus.

fit2 <- lm(mpg ~ . -disp -cyl, data = mtcars)
vif(fit2)

##       hp     drat       wt     qsec       vs       am     gear     carb 
## 6.015788 3.111501 6.051127 5.918682 4.270956 4.285815 4.690187 4.290468

Seluruh nilai VIF < 10, sehingga dapat disimpulkan bahwa model terbebas dari multikolinearitas (asumsi terpenuhi).

1.1.3 Normalitas (QQ-plot)

#Shapiro Wilk Test
res <- resid(fit2)
shapiro.test(res)

## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.96495, p-value = 0.3727

#QQPlot
library(tibble)
diag_df <- tibble(fitted = fitted(fit2), resid = resid(fit2),
                  stdres = rstandard(fit2), hat = hatvalues(fit2), 
                  cook = cooks.distance(fit2))
library(ggplot2)
qqplot <- ggplot(diag_df, aes(sample = stdres)) +
  stat_qq() + stat_qq_line() + labs(x = "Theoretical", y = "Standardized Residuals")
qqplot

Karena p-value = 0.3727 > 0.05 & titik-titik residual cenderung mengikuti garis lurus diagonal maka residual berdistribusi normal (asumsi terpenuhi).

1.1.4 Heteroskedastisitas (Breusch–Pagan)

library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

bptest(fit2)

## 
##  studentized Breusch-Pagan test
## 
## data:  fit2
## BP = 15.303, df = 8, p-value = 0.05352

ggplot(diag_df, aes(fitted, resid)) +
  geom_point(alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = 2) +
  labs(x = "Fitted", y = "Residuals")

Karena p-value = 0.05352 > 0.05 & titik-titik residual menyebar diatas dan bawah garis residual = 0 maka terima H₀. Artinya model terbebas dari heteroskedastisitas (asumsi terpenuhi).

1.1.5 Pengaruh (Cook’s D)

cd <- cooks.distance(fit2)
cd

##           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
##        0.0181201869        0.0106924900        0.0732925719        0.0015811375 
##   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
##        0.0068566007        0.0468088736        0.0002614461        0.0348502452 
##            Merc 230            Merc 280           Merc 280C          Merc 450SE 
##        0.3303035090        0.0011746336        0.0241765241        0.0026879608 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
##        0.0033443994        0.0031841709        0.0060875746        0.0009283207 
##   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
##        0.1443473904        0.0704888209        0.0231030848        0.1093415658 
##       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
##        0.0374106436        0.0108090861        0.0187710244        0.0011774350 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
##        0.0260170270        0.0034756825        0.0058809372        0.1632711396 
##      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
##        0.3509199153        0.0001190531        0.1078346806        0.0685191182

#Titik yang patut ditinjau (rule of thumb)
which(cd > 4/length(cd))

##          Merc 230 Chrysler Imperial      Lotus Europa    Ford Pantera L 
##                 9                17                28                29

plot(cd, ylab="Cook's D"); abline(h=4/length(cd), lty=2)

Terdapat 4 observasi yang memiliki nilai Di > ambangnya (4/n). Selanjutnya, akan dilakukan pengecekan apakah 4 observasi ini perlu dihapus atau tidak.

#dataset baru tanpa 4 mobil itu
influential_obs <- which(cd > (4/length(cd)))
mtcarss <- subset(mtcars, select = -c(disp, cyl))
mtcars_new <- mtcarss[-influential_obs, ]
fit3 <- lm(mpg ~ ., data = mtcars_new)

summary(fit2)

## 
## Call:
## lm(formula = mpg ~ . - disp - cyl, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8187 -1.3903 -0.3045  1.2269  4.5183 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 13.80810   12.88582   1.072   0.2950  
## hp          -0.01225    0.01649  -0.743   0.4650  
## drat         0.88894    1.52061   0.585   0.5645  
## wt          -2.60968    1.15878  -2.252   0.0342 *
## qsec         0.63983    0.62752   1.020   0.3185  
## vs           0.08786    1.88992   0.046   0.9633  
## am           2.42418    1.91227   1.268   0.2176  
## gear         0.69390    1.35294   0.513   0.6129  
## carb        -0.61286    0.59109  -1.037   0.3106  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.566 on 23 degrees of freedom
## Multiple R-squared:  0.8655, Adjusted R-squared:  0.8187 
## F-statistic:  18.5 on 8 and 23 DF,  p-value: 2.627e-08

summary(fit3)

## 
## Call:
## lm(formula = mpg ~ ., data = mtcars_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4398 -1.5273 -0.0027  1.1378  4.2436 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -18.25532   21.70599  -0.841   0.4108  
## hp            0.01297    0.01969   0.659   0.5179  
## drat          2.40851    1.65495   1.455   0.1619  
## wt           -3.60201    1.28552  -2.802   0.0114 *
## qsec          2.03098    1.03338   1.965   0.0642 .
## vs           -3.14638    2.40718  -1.307   0.2068  
## am            1.10451    1.85774   0.595   0.5592  
## gear          1.74121    1.57344   1.107   0.2823  
## carb         -0.81296    0.69539  -1.169   0.2568  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.299 on 19 degrees of freedom
## Multiple R-squared:  0.8959, Adjusted R-squared:  0.852 
## F-statistic: 20.43 on 8 and 19 DF,  p-value: 8.687e-08

#Plot untuk fit2
plot_fit2 <- ggplot(data.frame(mpg = mtcarss$mpg, fitted = fitted(fit2)), 
                    aes(x = fitted, y = mpg)) + geom_point() +
  geom_line(aes(y = fitted), color = "blue", linewidth = 1) +
  ggtitle("Regression with Outliers")
#Plot untuk fit3
plot_fit3 <- ggplot(data.frame(mpg = mtcars_new$mpg, fitted = fitted(fit3)), 
                    aes(x = fitted, y = mpg)) + geom_point() +
  geom_line(aes(y = fitted), color = "red", linewidth = 1) +
  ggtitle("Regression without Outliers")
#plot the two scatterplots side by side
library(gridExtra)
gridExtra::grid.arrange(plot_fit2, plot_fit3, ncol=2)

Terlihat bahwa setelah dilakukan penghapusan 4 observasi tidak memberikan pengaruh yang besar terhadap hasil maka data akhir yang terpilih ialah data awal dengan model kedua.

summary(fit2)

## 
## Call:
## lm(formula = mpg ~ . - disp - cyl, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8187 -1.3903 -0.3045  1.2269  4.5183 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 13.80810   12.88582   1.072   0.2950  
## hp          -0.01225    0.01649  -0.743   0.4650  
## drat         0.88894    1.52061   0.585   0.5645  
## wt          -2.60968    1.15878  -2.252   0.0342 *
## qsec         0.63983    0.62752   1.020   0.3185  
## vs           0.08786    1.88992   0.046   0.9633  
## am           2.42418    1.91227   1.268   0.2176  
## gear         0.69390    1.35294   0.513   0.6129  
## carb        -0.61286    0.59109  -1.037   0.3106  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.566 on 23 degrees of freedom
## Multiple R-squared:  0.8655, Adjusted R-squared:  0.8187 
## F-statistic:  18.5 on 8 and 23 DF,  p-value: 2.627e-08

Terlihat bahwa hanya wt yang memiliki pengaruh secara signifikan terhadap mpg dengan besar pengaruh sebesar -2.60968. Artinya setiap penambahan 1 unit berat mobil, rata-rata konsumsi bahan bakar (mpg) menurun sebesar 2.61 mil per galon, dengan asumsi variabel lain tetap konstan.

Terlihat juga nilai R-squared = 0.8655, artinya sekitar 86,55% variasi mpg dapat dijelaskan oleh variabel-variabel yang ada dalam model. Adjusted R-squared = 0.8187, memperhitungkan jumlah variabel dalam model, menunjukkan model cukup baik menjelaskan variasi data.

Berat mobil (wt) adalah faktor utama yang mempengaruhi konsumsi bahan bakar, sedangkan variabel lain tidak menunjukkan pengaruh yang signifikan pada data ini. Persamaan akhir regresi yang di dapat dalam penelitian ini ialah:

\[ \hat{mpg} = 13.8081 - 2.60968 \cdot wt \]

1.2 Robust SE (HC3) vs WLS pada data simulasi heteroskedastik

Bandingkan standar error dan p-value.

1.2.1 Input Data & Pemodelan Regresi

#Data
n <- 400
x1 <- runif(n,0,10)
x2 <- rnorm(n,5,2)

#Model regresi
sigma_i <- 0.6 + 0.2*x1
err <- rnorm(n, 0, sigma_i)
y <- 2 + 1.5*x1 - 1.2*x2 + err
sim <- tibble(y, x1, x2, w = 1/sigma_i^2)

fit_ols <- lm(y ~ x1 + x2, data = sim)
fit_wls <- lm(y ~ x1 + x2, data = sim, weights = w)

1.2.2 Uji Heteroskedastisitas

#Uji Breusch-Pagan untuk heteroskedastisitas
lmtest::bptest(fit_ols)

## 
##  studentized Breusch-Pagan test
## 
## data:  fit_ols
## BP = 54.953, df = 2, p-value = 1.167e-12

library(ggplot2)
ggplot(data.frame(fitted=fitted(fit_ols), resid=resid(fit_ols)),
       aes(fitted, resid)) +
  geom_point(alpha=.6) +
  geom_smooth(se=FALSE, method="loess") +
  geom_hline(yintercept = 0, linetype=2) +
  labs(x="Fitted", y="Residuals")

## `geom_smooth()` using formula = 'y ~ x'

Terlihat bahwa terdapat heteroskedastisitas maka metode Robust (HC3) dan WLS perlu dilakukan.

1.2.3 Perbandingan Robust SE (HC3) vs WLS

library(sandwich)  # buat vcovHC
library(lmtest)    # buat coeftest
library(tibble)    # buat tibble
comp_se <- tibble(
  term = names(coef(fit_ols)),
  OLS_SE = coef(summary(fit_ols))[, "Std. Error"],
  Robust_SE = sqrt(diag(vcovHC(fit_ols, type = "HC3"))),
  WLS_SE = coef(summary(fit_wls))[,"Std. Error"],
  OLS_pval = coef(summary(fit_ols))[,"Pr(>|t|)"],
  Robust_pval = coeftest(fit_ols, vcov. = vcovHC(fit_ols, type="HC3"))[,4],
  WLS_pval = coef(summary(fit_wls))[,"Pr(>|t|)"])
comp_se

## # A tibble: 3 × 7
##   term        OLS_SE Robust_SE WLS_SE  OLS_pval Robust_pval  WLS_pval
##   <chr>        <dbl>     <dbl>  <dbl>     <dbl>       <dbl>     <dbl>
## 1 (Intercept) 0.270     0.230  0.174  3.77e- 11   1.69e- 14 3.98e- 22
## 2 x1          0.0301    0.0319 0.0246 7.73e-172   3.96e-163 3.36e-201
## 3 x2          0.0427    0.0417 0.0313 1.87e- 96   3.35e- 99 1.62e-132

library(tidyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:gridExtra':
## 
##     combine

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#Plot Standard Error
df_SE <- comp_se %>%
  select(term, OLS_SE, Robust_SE, WLS_SE) %>%
  gather(Method, SE, -term)
ggplot(df_SE, aes(x = term, y = SE, color = Method, group = Method)) +
  geom_point(size = 3) +
  geom_line() +
  theme_minimal() +
  labs(title = "Perbandingan Standard Error",
       y = "Standard Error", x = "Term")

#Plot p-value (log scale)
df_pval <- comp_se %>%
  select(term, OLS_pval, Robust_pval, WLS_pval) %>%
  gather(Method, pval, -term)
ggplot(df_pval, aes(x = term, y = pval, color = Method, group = Method)) +
  geom_point(size = 3) +
  geom_line() +
  scale_y_log10() +
  theme_minimal() +
  labs(title = "Perbandingan p-value (log scale)",
       y = "p-value (log scale)", x = "Term")

Standard Error

Pada Intercept = OLS > Robust > WLS
Pada X1 = Robust > OLS > WLS
Pada X2 = Robust > OLS > WLS

P-Value

Pada Intercept = OLS > Robust > WLS
Pada X1 = Robust > OLS > WLS
Pada X2 = Robust > OLS > WLS

Semua variabel signifikan dengan terdapat perbedaan urutan p-value. SE WLS paling kecil dibanding metode lain maka WLS paling “efisien” dibandingkan OLS & Robust (HC3) jika bobot tetap.

1.3 Spline vs Polinomial (non linear)

Modelkan mpg ~ wt dan bandingkan AIC/BIC + 5-fold CV

1.3.1 Model

library(splines)
fit_poly <- lm(mpg ~ poly(wt, 2, raw = TRUE), data = mtcars)
fit_spl  <- lm(mpg ~ bs(wt, df = 5), data = mtcars)

1.3.2 AIC vs BIC

library(dplyr)
AIC(fit_poly, fit_spl) %>% arrange(AIC)

##          df      AIC
## fit_poly  4 158.0484
## fit_spl   7 163.0402

BIC(fit_poly, fit_spl) %>% arrange(BIC)

##          df      BIC
## fit_poly  4 163.9113
## fit_spl   7 173.3003

Model polynomial merupakan model terbaik dengan nilai aic dan bic lebih rendah dari model spline.

1.3.3 5-Fold Cross Validation (RMSE)

set.seed(123)
library(modelr)
library(purrr)

## 
## Attaching package: 'purrr'

## The following object is masked from 'package:car':
## 
##     some

cv_res <- crossv_kfold(mtcars, k = 5)
rmse_fun <- function(train, test, formula) {
  m <- lm(formula, data = as.data.frame(train))
  sqrt(mean((as.data.frame(test)$mpg - predict(m, newdata = 
                                                 as.data.frame(test)))^2))}
forms <- list(
  Poly   = mpg ~ poly(wt, 2, raw = TRUE),
  Spline = mpg ~ bs(wt, df = 5, Boundary.knots = range(mtcars$wt)))
cv_tbl <- map_dfr(names(forms), function(nm){
  fml <- forms[[nm]]
  tibble(model = nm, fold  = seq_len(5),
         rmse  = map2_dbl(cv_res$train, cv_res$test, ~ rmse_fun(.x, .y, fml)))
  })
cv_summary <- cv_tbl %>% 
  group_by(model) %>% 
  summarise(mean_rmse = mean(rmse), sd_rmse = sd(rmse))
cv_summary

## # A tibble: 2 × 3
##   model  mean_rmse sd_rmse
##   <chr>      <dbl>   <dbl>
## 1 Poly        2.53    1.32
## 2 Spline      2.88    1.27

Model polynomial merupakan model terbaik jika dilihat dari rmse dengan nilai rmse lebih rendah dari model spline.

1.3.4 Visualisasi kurva prediksi

library(tidyr)
grid_wt <- tibble(wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 200))
pred <- grid_wt %>% mutate(
  Poly   = predict(fit_poly, newdata = grid_wt),
  Spline = predict(fit_spl,  newdata = grid_wt)
  ) %>% pivot_longer(Poly:Spline, names_to = "model", values_to = "yhat")
gg = ggplot() +
  geom_point(data = mtcars, aes(wt, mpg), alpha = 0.4) +
  geom_line(data = pred, aes(wt, yhat, color = model), linewidth = 1) +
  labs(x = "Vehicle Weight", y = "Fuel Efficiency (mpg)",
       title = "Polynomial vs Spline Fit") + theme_minimal()
gg

Model Polynomial (merah) mengikuti pola data dengan baik. Sedangkan, model Spline (biru) cenderung berbelok-belok dan kurang pola dengan baik.

Secara keseluruhan kedua model mirip, tapi metode polynomial sedikit lebih konsisten dan “efisien” untuk dataset kecil seperti mtcars.

1.4 AR(1) GLS

Gunakan kode GLS di latihan, ganti x dengan sinyal musiman (mis. sin(2pit/12)), interpretasikan ̂𝜌 ; bandingkan SE OLS vs GLS.

1.4.1 Data dengan galat AR(1)

library(tibble)

set.seed(42)
T <- 400
t <- 1:T
x <- sin(2*pi*t/12)

rho <- 0.6
u <- rnorm(T, 0, 1)
err <- numeric(T)
for (tt in 2:T) err[tt] <- rho*err[tt-1] + u[tt]
y <- 1 + 2*x + err

dar1 <- tibble(t = t, x = x, y = y)
head(dar1)

## # A tibble: 6 × 3
##       t        x     y
##   <int>    <dbl> <dbl>
## 1     1 5   e- 1  2   
## 2     2 8.66e- 1  2.17
## 3     3 1   e+ 0  3.02
## 4     4 8.66e- 1  3.38
## 5     5 5   e- 1  2.79
## 6     6 1.22e-16  1.37

Persamaan Regresi yang digunakan: \[ y = 1 + 2x + e \]

1.4.2 Perbandingan SE (OLS vs GLS dengan AR(1))

#Model OLS
fit_ols <- lm(y ~ x, data = dar1)
summary(fit_ols)

## 
## Call:
## lm(formula = y ~ x, data = dar1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0510 -0.7985  0.0059  0.7792  3.6370 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.96844    0.05664    17.1   <2e-16 ***
## x            1.83909    0.07995    23.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.133 on 398 degrees of freedom
## Multiple R-squared:  0.5707, Adjusted R-squared:  0.5697 
## F-statistic: 529.2 on 1 and 398 DF,  p-value: < 2.2e-16

#Model GLS dengan AR(1)
library(nlme)

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:dplyr':
## 
##     collapse

fit_gls <- gls(y ~ x, data = dar1, correlation = corAR1(form = ~ t))
summary(fit_gls)

## Generalized least squares fit by REML
##   Model: y ~ x 
##   Data: dar1 
##        AIC      BIC    logLik
##   1109.051 1124.997 -550.5254
## 
## Correlation Structure: AR(1)
##  Formula: ~t 
##  Parameter estimate(s):
##       Phi 
## 0.5440471 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept) 0.9724752 0.1043172  9.322293       0
## x           1.8390741 0.1132805 16.234685       0
## 
##  Correlation: 
##   (Intr)
## x -0.013
## 
## Standardized residuals:
##          Min           Q1          Med           Q3          Max 
## -3.566387570 -0.705789893  0.001620549  0.681785092  3.195184639 
## 
## Residual standard error: 1.137024 
## Degrees of freedom: 400 total; 398 residual

#Perbandingan SE
se_ols <- coef(summary(fit_ols))[, "Std. Error"]
se_gls <- summary(fit_gls)$tTable[, "Std.Error"]
cbind(SE_OLS = se_ols, SE_GLS = se_gls)

##                 SE_OLS    SE_GLS
## (Intercept) 0.05663725 0.1043172
## x           0.07994741 0.1132805

Nilai SE pada model GLS lebih besar dibandingkan dengan model OLS. Artinya model GLS lebih valid karena model sesuai struktur error.

1.4.3 Diagnostik serial correlation

par(mfrow = c(1,2))
acf(resid(fit_ols), main = "ACF Residuals (OLS)")
acf(resid(fit_gls, type = "normalized"), main = "ACF Residuals (GLS)")

Terlihat bahwa:

Pada ACF Residual OLS memiliki 6 lag yang melewati batas signifikansi (garis putus-putus biru) maka residual model OLS masih menyisakan autokorelasi → pelanggaran asumsi OLS (independensi error).
Pada ACF Residual GLS memiliki 2 lag yang melewati batas signifikansi maka setelah memperhitungkan AR(1) dengan GLS, autokorelasi pada residual hilang.

par(mfrow = c(1,1))
lmtest::dwtest(fit_ols)

## 
##  Durbin-Watson test
## 
## data:  fit_ols
## DW = 0.92047, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0

Terlihat bawa model OLS memiliki nilai DW < 2 maka model OLS terdapat indikasi autokorelasi positif. selain itu, terlihat pula p-value < 2.2e-16 maka H₀ ditolak, artinya ada autokorelasi positif secara signifikan pada residual OLS.

Dengan demikian, GLS dengan error AR(1) lebih tepat digunakan dibanding OLS dalam kasus ini karena mampu mengatasi adanya autokorelasi serial pada error.

1.4.4 rho_hat

rho_hat <- coef(fit_gls$modelStruct$corStruct, unconstrained = FALSE)
rho_hat

##       Phi 
## 0.5440471

#atau bisa pake summary(fit_gls). ambil yang nilai phi

Estimasi 𝜌 = 0.6072367 artinya error memiliki autokorelasi positif moderat (residual di waktu t cenderung masih berhubungan dengan residual di waktu sebelumnya (t−1)).

1.4.5 Quasi-differencing (Cochrane–Orcutt) manual

Jika tetap ingin menggunakan OLS Mekanisme:

estimasi 𝜌^ dari OLS residual
transformasi variabel
fit ulang model dengan data yang sudah di-adjust

Hasil akhirnya mirip GLS, hanya pendekatan yang berbeda.

#Estimasi rho dari regresi resid_t ~ resid_{t-1}
res <- resid(fit_ols)
rho_hat <- coef(lm(res[-1] ~ 0 + res[-length(res)]))[1]

#Quasi-differencing (transformasi data)
y_star <- dar1$y[-1] - rho_hat*dar1$y[-T]
x_star <- dar1$x[-1] - rho_hat*dar1$x[-T]

fit_qd <- lm(y_star ~ x_star)
summary(fit_qd)

## 
## Call:
## lm(formula = y_star ~ x_star)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.93677 -0.61574 -0.02748  0.63409  2.57213 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.44702    0.04782   9.348   <2e-16 ***
## x_star       1.83858    0.11312  16.253   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9552 on 397 degrees of freedom
## Multiple R-squared:  0.3995, Adjusted R-squared:  0.398 
## F-statistic: 264.2 on 1 and 397 DF,  p-value: < 2.2e-16

Setelah koreksi autokorelasi, konstanta model berubah dari 0.86112 menjadi 0.34066 (menjadi lebih kecil)
Slope 𝑥= 2.10 → koefisien slope mendekati “true slope” 2
Terdapat pengaruh antara variabel x terhadap y

2 R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

2.1 Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Advanced Regression Analysis Assignment

Salwa Azzah Imtiyaz

2025-09-05