๐ Laporan Praktikum 2 โ Komputasi Statistika
B
Program Studi Sarjana Statistika | Departemen Statistika
Fakultas Sains, Teknologi, dan Matematika โ Universitas
Brawijaya
Asisten: Muhammad Adโhiya Hartono ยท M Adika Rosyad Bhakti Putra ยท Joyce
Abigail Gracia Zebua ยท Ghina Azziqra
Dataset yang digunakan berasal dari sumber terbuka (Kaggle) berisi data spesifikasi dan harga mobil. Dari dataset tersebut diambil 50 observasi pertama dengan 4 variabel:
price Variabel Dependen (Y)
horsepower Xโ โ Tenaga Mesin (hp)
carheight Xโ โ Tinggi Mobil (inch)
citympg Xโ โ Efisiensi BBM Kota (mpg)
Regresi linier berganda adalah metode statistika untuk mengetahui hubungan satu variabel dependen (Y) dengan dua atau lebih variabel independen (X). Model umum:
ลถ = ฮฒโ + ฮฒโXโ + ฮฒโXโ + โฆ + ฮฒโXโ + ฮต
Koefisien diestimasi dengan Ordinary Least Squares (OLS), yaitu meminimalkan jumlah kuadrat residual (Gujarati & Porter, 2009).
| Asumsi | Uji | Keputusan |
|---|---|---|
| Normalitas | Shapiro-Wilk | p-value > 0,05 โ normal |
| Homoskedastisitas | Breusch-Pagan | p-value > 0,05 โ homoskedastis |
| Non-autokorelasi | Durbin-Watson | p-value > 0,05 โ tidak ada autokorelasi |
| Non-multikolinearitas | VIF | VIF < 10 โ tidak ada masalah |
# Import data โ ganti path sesuai lokasi file CSV Anda
# car_price <- read.csv(file.choose())
# Simulasi data sesuai laporan (50 observasi)
car_price <- data.frame(
price = c(13495,16500,16500,13950,17450,15250,17710,18920,23875,17859.17,
16430,16925,20970,21105,24565,30760,41315,36880,5151,6295,
6575,5572,6377,7957,6229,6692,7609,8558,8921,12964,
6479,6855,5399,6529,7129,7295,7295,7895,9095,8845,
10295,12945,10345,6785,8916.5,8916.5,11048,32250,35550,36000),
horsepower = c(111,111,154,102,115,110,110,110,140,160,
101,101,121,121,121,182,182,182,48,70,
70,68,68,102,68,68,68,102,88,145,
58,76,60,76,76,76,76,86,86,86,
86,101,100,78,70,70,90,176,176,262),
carheight = c(48.8,48.8,52.4,54.3,54.3,53.1,55.7,55.7,55.9,52.0,
54.3,54.3,54.3,54.3,55.7,55.7,53.7,56.3,53.2,52.0,
52.0,50.8,50.8,50.8,50.6,50.6,50.6,50.6,59.8,50.2,
50.8,50.8,52.6,52.6,52.6,54.5,58.3,53.3,53.3,54.1,
54.1,54.1,51.0,53.5,52.0,52.0,51.4,52.8,52.8,47.8),
citympg = c(21,21,19,24,18,19,19,19,17,16,
23,23,21,21,20,16,16,15,47,38,
38,37,31,24,31,31,31,24,24,19,
49,31,38,30,30,30,30,27,27,27,
27,24,25,24,38,38,24,15,15,13)
)
data_mobil <- car_price[1:50, c("price","horsepower","carheight","citympg")]| price | horsepower | carheight | citympg |
|---|---|---|---|
| 13495.00 | 111 | 48.8 | 21 |
| 16500.00 | 111 | 48.8 | 21 |
| 16500.00 | 154 | 52.4 | 19 |
| 13950.00 | 102 | 54.3 | 24 |
| 17450.00 | 115 | 54.3 | 18 |
| 15250.00 | 110 | 53.1 | 19 |
| 17710.00 | 110 | 55.7 | 19 |
| 18920.00 | 110 | 55.7 | 19 |
| 23875.00 | 140 | 55.9 | 17 |
| 17859.17 | 160 | 52.0 | 16 |
| 16430.00 | 101 | 54.3 | 23 |
| 16925.00 | 101 | 54.3 | 23 |
| 20970.00 | 121 | 54.3 | 21 |
| 21105.00 | 121 | 54.3 | 21 |
| 24565.00 | 121 | 55.7 | 20 |
| 30760.00 | 182 | 55.7 | 16 |
| 41315.00 | 182 | 53.7 | 16 |
| 36880.00 | 182 | 56.3 | 15 |
| 5151.00 | 48 | 53.2 | 47 |
| 6295.00 | 70 | 52.0 | 38 |
| 6575.00 | 70 | 52.0 | 38 |
| 5572.00 | 68 | 50.8 | 37 |
| 6377.00 | 68 | 50.8 | 31 |
| 7957.00 | 102 | 50.8 | 24 |
| 6229.00 | 68 | 50.6 | 31 |
| 6692.00 | 68 | 50.6 | 31 |
| 7609.00 | 68 | 50.6 | 31 |
| 8558.00 | 102 | 50.6 | 24 |
| 8921.00 | 88 | 59.8 | 24 |
| 12964.00 | 145 | 50.2 | 19 |
| 6479.00 | 58 | 50.8 | 49 |
| 6855.00 | 76 | 50.8 | 31 |
| 5399.00 | 60 | 52.6 | 38 |
| 6529.00 | 76 | 52.6 | 30 |
| 7129.00 | 76 | 52.6 | 30 |
| 7295.00 | 76 | 54.5 | 30 |
| 7295.00 | 76 | 58.3 | 30 |
| 7895.00 | 86 | 53.3 | 27 |
| 9095.00 | 86 | 53.3 | 27 |
| 8845.00 | 86 | 54.1 | 27 |
| 10295.00 | 86 | 54.1 | 27 |
| 12945.00 | 101 | 54.1 | 24 |
| 10345.00 | 100 | 51.0 | 25 |
| 6785.00 | 78 | 53.5 | 24 |
| 8916.50 | 70 | 52.0 | 38 |
| 8916.50 | 70 | 52.0 | 38 |
| 11048.00 | 90 | 51.4 | 24 |
| 32250.00 | 176 | 52.8 | 15 |
| 35550.00 | 176 | 52.8 | 15 |
| 36000.00 | 262 | 47.8 | 13 |
## price horsepower carheight citympg
## Min. : 5151 Min. : 48.0 Min. :47.80 Min. :13.00
## 1st Qu.: 7170 1st Qu.: 76.0 1st Qu.:50.85 1st Qu.:19.00
## Median :10320 Median :100.5 Median :52.80 Median :24.00
## Mean :14305 Mean :105.3 Mean :52.92 Mean :25.70
## 3rd Qu.:17645 3rd Qu.:119.5 3rd Qu.:54.30 3rd Qu.:30.75
## Max. :41315 Max. :262.0 Max. :59.80 Max. :49.00
Min: $5.151
Median: $10.320
Mean: $14.305
Max: $41.315
Min: 48
Median: 100,5
Mean: 105,3
Max: 262
Min: 47,80
Median: 52,80
Mean: 52,92
Max: 59,80
Min: 13
Median: 24
Mean: 25,7
Max: 49
library(ggplot2)
ggplot(data_mobil, aes(x = horsepower, y = price)) +
geom_point(color = "#e8658a", size = 3, alpha = 0.8, shape = 19) +
geom_smooth(method = "lm", se = TRUE,
color = "#8b3a52", fill = "#ffd6e7", alpha = 0.35, linewidth = 1.2) +
labs(title = "Horsepower vs Price",
subtitle = "Hubungan positif kuat antara tenaga mesin dan harga",
x = "Horsepower (hp)", y = "Price (USD)") +
scale_y_continuous(labels = scales::comma) +
theme_pink()Gambar 1. Scatter Plot Horsepower vs Price
ggplot(data_mobil, aes(x = carheight, y = price)) +
geom_point(color = "#c94c70", size = 3, alpha = 0.8, shape = 19) +
geom_smooth(method = "lm", se = TRUE,
color = "#8b3a52", fill = "#ffd6e7", alpha = 0.35, linewidth = 1.2) +
labs(title = "Car Height vs Price",
subtitle = "Hubungan yang lemah dan tersebar acak",
x = "Car Height (inch)", y = "Price (USD)") +
scale_y_continuous(labels = scales::comma) +
theme_pink()Gambar 2. Scatter Plot Car Height vs Price
ggplot(data_mobil, aes(x = citympg, y = price)) +
geom_point(color = "#b5657a", size = 3, alpha = 0.8, shape = 19) +
geom_smooth(method = "lm", se = TRUE,
color = "#8b3a52", fill = "#ffd6e7", alpha = 0.35, linewidth = 1.2) +
labs(title = "City MPG vs Price",
subtitle = "Hubungan negatif โ semakin irit, harga cenderung lebih rendah",
x = "City MPG (mpg)", y = "Price (USD)") +
scale_y_continuous(labels = scales::comma) +
theme_pink()Gambar 3. Scatter Plot City MPG vs Price
๐ก Interpretasi Eksplorasi Visual:
- Horsepower โ hubungan linear positif
kuat dengan harga mobil
- City MPG โ hubungan linear negatif
yang cukup jelas
- Car Height โ menyebar acak, hubungan
lemah terhadap harga
ลถ = ฮฒโ + ฮฒโXโ + ฮฒโXโ + ฮฒโXโ
di mana Y = price, Xโ = horsepower, Xโ = carheight, Xโ = citympg
##
## Call:
## lm(formula = price ~ horsepower + carheight + citympg, data = data_mobil)
##
## Coefficients:
## (Intercept) horsepower carheight citympg
## -51922.13 214.87 785.07 80.14
Persamaan Model yang Diperoleh:
ลถ = โ51922,13 + 214,87 Xโ + 785,07 Xโ + 80,14 Xโ
๐ Interpretasi Koefisien:
##
## Call:
## lm(formula = price ~ horsepower + carheight + citympg, data = data_mobil)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7327.8 -1550.5 208.1 1949.1 10690.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -51922.13 15219.81 -3.411 0.00136 **
## horsepower 214.87 22.60 9.506 1.99e-12 ***
## carheight 785.07 240.98 3.258 0.00211 **
## citympg 80.14 117.37 0.683 0.49820
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3728 on 46 degrees of freedom
## Multiple R-squared: 0.8548, Adjusted R-squared: 0.8453
## F-statistic: 90.23 on 3 and 46 DF, p-value: < 2.2e-16
| F-statistic | 90,23 |
| p-value | < 2,2 ร 10โปยนโถ |
| Keputusan | Tolak Hโ |
โ Karena p-value < ฮฑ (0,05), ketiga variabel secara bersama-sama berpengaruh signifikan terhadap harga mobil.
| Variabel | Koefisien | p-value | Keputusan |
|---|---|---|---|
| Horsepower (Xโ) | 214,87 | 1,99 ร 10โปยนยฒ | Signifikan โ |
| Car Height (Xโ) | 785,07 | 0,00211 | Signifikan โ |
| City MPG (Xโ) | 80,14 | 0,49820 | Tidak Signifikan โ |
๐ Kebaikan Model โ R-Squared:
- Multiple Rยฒ = 0,8548
- Adjusted Rยฒ = 0,8453
โ Model mampu menjelaskan 84,53% variasi harga mobil.
Sisanya 15,47% oleh faktor lain di luar model.
##
## Shapiro-Wilk normality test
##
## data: residuals(model_regresi)
## W = 0.95939, p-value = 0.08383
# QQ Plot cantik
residuals_df <- data.frame(residuals = residuals(model_regresi))
ggplot(residuals_df, aes(sample = residuals)) +
stat_qq(color = "#e8658a", size = 2.5, alpha = 0.8) +
stat_qq_line(color = "#8b3a52", linewidth = 1.2, linetype = "dashed") +
labs(title = "Normal Q-Q Plot Residual",
subtitle = "Titik mengikuti garis โ residual mendekati normal",
x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_pink()Gambar 4. Q-Q Plot Residual
| W | 0,95939 |
| p-value | 0,08383 |
| Keputusan | Terima Hโ |
โ TERPENUHI Karena p-value (0,084) > 0,05 โ residual berdistribusi normal.
##
## studentized Breusch-Pagan test
##
## data: model_regresi
## BP = 18.505, df = 3, p-value = 0.0003459
fitted_df <- data.frame(
fitted = fitted(model_regresi),
residuals = residuals(model_regresi)
)
ggplot(fitted_df, aes(x = fitted, y = residuals)) +
geom_point(color = "#e8658a", size = 2.5, alpha = 0.8) +
geom_hline(yintercept = 0, color = "#8b3a52",
linewidth = 1.1, linetype = "dashed") +
geom_smooth(se = FALSE, color = "#c94c70",
linewidth = 0.9, method = "loess") +
labs(title = "Residuals vs Fitted Values",
subtitle = "Pola mengembang โ indikasi heteroskedastisitas",
x = "Fitted Values", y = "Residuals") +
scale_x_continuous(labels = scales::comma) +
theme_pink()Gambar 5. Plot Fitted vs Residuals
| BP | 18,505 |
| p-value | 0,0003459 |
| Keputusan | Tolak Hโ |
โ TIDAK TERPENUHI Karena p-value
(0,0003) < 0,05 โ terdapat
heteroskedastisitas.
> ๐ก Solusi: transformasi variabel atau metode Weighted Least
Squares (WLS).
##
## Durbin-Watson test
##
## data: model_regresi
## DW = 1.612, p-value = 0.052
## alternative hypothesis: true autocorrelation is greater than 0
| DW | 1,612 |
| p-value | 0,052 |
| Keputusan | Terima Hโ |
โ TERPENUHI Karena p-value (0,052) > 0,05 โ tidak terjadi autokorelasi.
## horsepower carheight citympg
## 3.234877 1.125351 3.374615
vif_values <- c(horsepower = 3.234877, carheight = 1.125351, citympg = 3.374615)
vif_df <- data.frame(
variabel = names(vif_values),
vif = vif_values
)
ggplot(vif_df, aes(x = reorder(variabel, vif), y = vif, fill = variabel)) +
geom_col(width = 0.55, show.legend = FALSE,
color = "#c94c70", linewidth = 0.6) +
geom_hline(yintercept = 10, color = "#721c24",
linetype = "dashed", linewidth = 1) +
geom_text(aes(label = round(vif, 2)), hjust = -0.2,
color = "#8b3a52", fontface = "bold", size = 4.5) +
scale_fill_manual(values = c("#ffb3d1","#e8658a","#c94c70")) +
scale_y_continuous(limits = c(0, 12)) +
coord_flip() +
labs(title = "Variance Inflation Factor (VIF)",
subtitle = "Semua VIF jauh di bawah ambang batas 10",
x = NULL, y = "Nilai VIF") +
annotate("text", x = 0.6, y = 10.3, label = "Batas VIF = 10",
color = "#721c24", size = 3.5, fontface = "italic") +
theme_pink()Gambar 6. Visualisasi Nilai VIF
| Variabel | VIF | Status |
|---|---|---|
| Horsepower | 3,235 | Aman โ |
| Car Height | 1,125 | Aman โ |
| City MPG | 3,375 | Aman โ |
โ TERPENUHI Seluruh VIF < 10 โ tidak ada multikolinearitas.
1. Eksplorasi Data
Horsepower memiliki hubungan linear positif kuat dengan price; City MPG
memiliki hubungan negatif; Car Height menunjukkan hubungan yang
lemah.
2. Model Regresi
ลถ = โ51.922,13 + 214,87Xโ + 785,07Xโ + 80,14Xโ
Horsepower dan carheight memberikan pengaruh positif terhadap harga.
3. Uji Signifikansi
- Uji F: ketiga prediktor signifikan secara serentak (p <
2,2ร10โปยนโถ)
- Uji t: hanya horsepower dan
carheight yang signifikan secara parsial; citympg tidak
signifikan (p = 0,498)
- Adjusted Rยฒ = 0,8453 โ model menjelaskan 84,53%
variasi harga
4. Uji Asumsi
| Asumsi | Hasil | Status |
|---|---|---|
| Normalitas | p = 0,084 > 0,05 | โ Terpenuhi |
| Homoskedastisitas | p = 0,0003 < 0,05 | โ Tidak Terpenuhi |
| Non-autokorelasi | p = 0,052 > 0,05 | โ Terpenuhi |
| Non-multikolinearitas | VIF < 10 | โ Terpenuhi |
โ ๏ธ Model perlu perbaikan pada asumsi heteroskedastisitas melalui transformasi variabel atau metode Weighted Least Squares (WLS).