Analisis Regresi Linear

1️⃣ Mengumpulkan Data

Pada tahap ini digunakan data mtcars sebagai contoh data analisis regresi.

data <- mtcars
head(data)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

2️⃣ Eksplorasi Data

Statistik Deskriptif

summary(data)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Visualisasi Awal

ggplot(data, aes(x = wt, y = mpg)) +
  geom_point(color = "#ff66b2", size = 3) +
  theme_minimal() +
  labs(
    title = "Scatter Plot MPG vs Weight",
    x = "Weight",
    y = "Miles Per Gallon"
  )


3️⃣ Uji Asumsi

Model regresi awal dibentuk untuk melakukan pengujian asumsi klasik.

model <- lm(mpg ~ wt + hp, data = data)

Uji Normalitas Residual

shapiro.test(residuals(model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(model)
## W = 0.92792, p-value = 0.03427

Uji Heteroskedastisitas (Breusch-Pagan)

bptest(model)
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.88072, df = 2, p-value = 0.6438

Uji Multikolinearitas (VIF)

vif(model)
##       wt       hp 
## 1.766625 1.766625

Plot Diagnostik

par(mfrow = c(1,2))
plot(model, which = 1)
plot(model, which = 2)

par(mfrow = c(1,1))

4️⃣ Estimasi Model

Estimasi parameter dilakukan menggunakan metode Ordinary Least Squares (OLS).

summary(model)
## 
## Call:
## lm(formula = mpg ~ wt + hp, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

5️⃣ Pengujian Hipotesis

Uji F (Simultan)

anova(model)
## Analysis of Variance Table
## 
## Response: mpg
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## wt         1 847.73  847.73 126.041 4.488e-12 ***
## hp         1  83.27   83.27  12.381  0.001451 ** 
## Residuals 29 195.05    6.73                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Uji t (Parsial)

coef(summary(model))
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.22727012 1.59878754 23.284689 2.565459e-20
## wt          -3.87783074 0.63273349 -6.128695 1.119647e-06
## hp          -0.03177295 0.00902971 -3.518712 1.451229e-03

6️⃣ Evaluasi Model

Evaluasi model dilakukan menggunakan nilai RMSE dan R-squared.

y <- data$mpg
y_hat <- predict(model)

RMSE <- sqrt(mean((y - y_hat)^2))
R2 <- summary(model)$r.squared

list(
  RMSE = RMSE,
  R_Squared = R2
)
## $RMSE
## [1] 2.468854
## 
## $R_Squared
## [1] 0.8267855

Plot Nilai Aktual vs Prediksi

ggplot(data, aes(x = y_hat, y = y)) +
  geom_point(color = "#ff66b2", size = 3) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
  theme_minimal() +
  labs(
    title = "Aktual vs Prediksi",
    x = "Nilai Prediksi",
    y = "Nilai Aktual"
  )


✨ Kesimpulan

Berdasarkan hasil analisis regresi linear, model yang dibentuk telah memenuhi sebagian besar asumsi klasik. Variabel weight dan horsepower berpengaruh terhadap konsumsi bahan bakar mobil, serta model mampu menjelaskan variasi data dengan cukup baik berdasarkan nilai R-squared dan RMSE yang diperoleh.