Tugas Analisis
Dengan menggunakan model dan syntax yang sama, lakukan perbandingan antara variabel Ozone dengan Temp.
Ulangi analisis untuk hubungan antara Ozone dengan Wind.
Bandingkan hasil dari masing-masing model berdasarkan nilai MSE, AIC, dan Adjusted R-squared.
Berikan insight dan pendapatmu.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(splines)
df_airquality <- datasets::airquality
head(df_airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
# Ganti NA pada kolom Ozone dengan median
airquality$Ozone[is.na(airquality$Ozone)] <- median(airquality$Ozone, na.rm = TRUE)
# Ganti NA pada kolom Solar.R dengan median
airquality$Solar.R[is.na(airquality$Solar.R)] <- median(airquality$Solar.R, na.rm = TRUE)
# Cek lagi apakah masih ada missing value
colSums(is.na(airquality))
## Ozone Solar.R Wind Temp Month Day
## 0 0 0 0 0 0
mod_linear = lm(Ozone~Temp,data=df_airquality)
summary(mod_linear)
##
## Call:
## lm(formula = Ozone ~ Temp, data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40.729 -17.409 -0.587 11.306 118.271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -146.9955 18.2872 -8.038 9.37e-13 ***
## Temp 2.4287 0.2331 10.418 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.71 on 114 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.4877, Adjusted R-squared: 0.4832
## F-statistic: 108.5 on 1 and 114 DF, p-value: < 2.2e-16
Intrepetasi.
Koefisien regresi variabel Temp sebesar 2.4287 dengan nilai p < 0.001 sehingga dapat disimpulkan bahwa suhu berpengaruh signifikan terhadap kadar Ozone. Nilai R² = 0.4877 menunjukkan bahwa 48.8% variasi kadar ozone dapat dijelaskan oleh variabel Temp. secara keseluruhan kesimpulannya, terdapat hubungan positif dan signifikan antara suhu dan kadar Ozone dengan kekuatan hubungan sedang 48.8% .
ggplot(df_airquality,aes(x=Temp, y=Ozone)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~x,lty = 1, col = "blue",se = F)+
theme_bw()
## Warning: Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 37 rows containing missing values or values outside the scale range
## (`geom_point()`).
Intrepetasi.
menunjukkan pola cenderung naik, sehingga dapat disimpulkan semkain tinggi suhu dan semakin tinggi kadar Ozone
garis biru merupakan garis regresi linear yang menunjukkan tren umum hubungan positif. namun, terdapat sebaran /residu yang cukup besar yang terlihat dari beberapa titik jauh dari garis yang berarti selain suhu terdapat faktor lain yang mempengaruhi.
mod_tangga = lm(Ozone ~ cut(Temp,5),data=df_airquality)
summary(mod_tangga)
##
## Call:
## lm(formula = Ozone ~ cut(Temp, 5), data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.641 -13.488 -4.552 7.810 114.359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.167 6.649 1.980 0.0501 .
## cut(Temp, 5)(64.2,72.4] 9.024 8.335 1.083 0.2813
## cut(Temp, 5)(72.4,80.6] 14.385 7.906 1.820 0.0715 .
## cut(Temp, 5)(80.6,88.8] 40.474 7.603 5.323 5.37e-07 ***
## cut(Temp, 5)(88.8,97] 78.300 8.920 8.778 2.29e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.03 on 111 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.5295, Adjusted R-squared: 0.5125
## F-statistic: 31.22 on 4 and 111 DF, p-value: < 2.2e-16
Intrepetasi:
berdasarkan hasil regresi, suhu dibagi ke dalam 5 interval dan setiap interval dibandingkan dengan suhu terendah. hasil menunjukkan bahwa pada suhu rendah (< 64.2°F) kadar Ozone rata-rata 13 unit. kadar Ozone signifikan ketika suhu berada pada 80.6–88.8°F dan 88.8–97°F. artinya, dibandingkan dengan kategori suhu rendah, kadar Ozone meningkat signifikan pada suhu tinggi. R² = 0.5295 → model menjelaskan sekitar 52.9% variasi Ozone
ggplot(df_airquality,aes(x=Temp, y=Ozone)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~cut(x,5),
lty = 1, col = "blue",se = F)+
theme_bw()
## Warning: Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 37 rows containing missing values or values outside the scale range
## (`geom_point()`).
Intrepetasi:
terdapat pola kenaikan bertahap sesuai kategori suhu. Hal ini menangkap lebih baik variasi data pada level suhu tinggi karena kenaikan Ozone memang lebih curam di suhu tinggi.namun, garis tangga juga lebih kasar menunjukkan kehilangan informasi detail dalam setiap kategori.
mod_spline3 = lm(Ozone ~ bs(Temp, knots = c(5, 10, 20, 30, 40)),data=df_airquality)
summary(mod_spline3)
##
## Call:
## lm(formula = Ozone ~ bs(Temp, knots = c(5, 10, 20, 30, 40)),
## data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.364 -12.467 -2.192 10.117 122.636
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 104.88 11.45 9.160 2.86e-15
## bs(Temp, knots = c(5, 10, 20, 30, 40))1 NA NA NA NA
## bs(Temp, knots = c(5, 10, 20, 30, 40))2 NA NA NA NA
## bs(Temp, knots = c(5, 10, 20, 30, 40))3 NA NA NA NA
## bs(Temp, knots = c(5, 10, 20, 30, 40))4 NA NA NA NA
## bs(Temp, knots = c(5, 10, 20, 30, 40))5 -78.81 19.92 -3.956 0.000134
## bs(Temp, knots = c(5, 10, 20, 30, 40))6 -123.11 18.18 -6.772 6.15e-10
## bs(Temp, knots = c(5, 10, 20, 30, 40))7 -47.18 29.40 -1.605 0.111350
## bs(Temp, knots = c(5, 10, 20, 30, 40))8 NA NA NA NA
##
## (Intercept) ***
## bs(Temp, knots = c(5, 10, 20, 30, 40))1
## bs(Temp, knots = c(5, 10, 20, 30, 40))2
## bs(Temp, knots = c(5, 10, 20, 30, 40))3
## bs(Temp, knots = c(5, 10, 20, 30, 40))4
## bs(Temp, knots = c(5, 10, 20, 30, 40))5 ***
## bs(Temp, knots = c(5, 10, 20, 30, 40))6 ***
## bs(Temp, knots = c(5, 10, 20, 30, 40))7
## bs(Temp, knots = c(5, 10, 20, 30, 40))8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.45 on 112 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.549, Adjusted R-squared: 0.5369
## F-statistic: 45.45 on 3 and 112 DF, p-value: < 2.2e-16
mod_spline3ns = lm(Ozone ~ ns(Temp, knots = c(5, 10, 20, 30, 40)),data=df_airquality)
summary(mod_spline3ns)
##
## Call:
## lm(formula = Ozone ~ ns(Temp, knots = c(5, 10, 20, 30, 40)),
## data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40.729 -17.409 -0.587 11.306 118.271
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.46 3.21 20.70 <2e-16
## ns(Temp, knots = c(5, 10, 20, 30, 40))1 NA NA NA NA
## ns(Temp, knots = c(5, 10, 20, 30, 40))2 NA NA NA NA
## ns(Temp, knots = c(5, 10, 20, 30, 40))3 -110.64 10.62 -10.42 <2e-16
## ns(Temp, knots = c(5, 10, 20, 30, 40))4 NA NA NA NA
## ns(Temp, knots = c(5, 10, 20, 30, 40))5 NA NA NA NA
## ns(Temp, knots = c(5, 10, 20, 30, 40))6 NA NA NA NA
##
## (Intercept) ***
## ns(Temp, knots = c(5, 10, 20, 30, 40))1
## ns(Temp, knots = c(5, 10, 20, 30, 40))2
## ns(Temp, knots = c(5, 10, 20, 30, 40))3 ***
## ns(Temp, knots = c(5, 10, 20, 30, 40))4
## ns(Temp, knots = c(5, 10, 20, 30, 40))5
## ns(Temp, knots = c(5, 10, 20, 30, 40))6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.71 on 114 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.4877, Adjusted R-squared: 0.4832
## F-statistic: 108.5 on 1 and 114 DF, p-value: < 2.2e-16
Intrepetasi:
koefisien signifikan p < 0.001, yang menunjukkan adanya perubahan pola hubungan pada titik-titik knot tertentu. nilai R²= 0.549 lebih tinggi dibanding linear sederhana 0.4877, menandakan peningkatan kemampuan model dalam menjelaskan variasi Ozone. sementara itu, natural cubic spline menghasilkan model yang lebih halus dengan derajat kebebasan lebih sedikit dan cenderung mengurangi overfitting pada ekstrapolasi di ujung rentang data.
ggplot(df_airquality,aes(x=Temp, y=Ozone)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~bs(x, knots = c(5, 10, 20, 30, 40)),
lty = 1, aes(col = "Cubic Spline"),se = F)+
stat_smooth(method = "lm",
formula = y~ns(x, knots = c(5, 10, 20, 30, 40)),
lty = 1, aes(col = "Natural Cubic Spline"),se = F)+labs(color="Tipe Spline")+
scale_color_manual(values = c("Natural Cubic Spline"="red","Cubic Spline"="blue"))+theme_bw()
## Warning: Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 37 rows containing missing values or values outside the scale range
## (`geom_point()`).
Intrepetasi:
cubic spline (garis biru) mampu menangkap fluktuasi lokal data dengan lebih detail, sedangkan natural cubic spline (garis merah) menghasilkan kurva yang lebih mulus, menjaga kestabilan tren di bagian ekor
library(splines)
mod_spline3ns <- lm(Ozone ~ ns(Temp, df = 3), data = df_airquality)
# Fungsi MSE
MSE <- function(pred, actual) {
mean((pred - actual)^2, na.rm = TRUE)
}
# Buat prediksi dari masing-masing model
pred_linear <- predict(mod_linear)
pred_tangga <- predict(mod_tangga)
pred_spline <- predict(mod_spline3)
pred_nspline <- predict(mod_spline3ns)
compare_stats <- data.frame(
Model = c("Linear","Tangga","Spline","Natural Spline"),
MSE = c(MSE(pred_linear, df_airquality$Ozone),
MSE(pred_tangga, df_airquality$Ozone),
MSE(pred_spline, df_airquality$Ozone),
MSE(pred_nspline, df_airquality$Ozone)),
AIC = c(AIC(mod_linear),
AIC(mod_tangga),
AIC(mod_spline3),
AIC(mod_spline3ns)),
Adj_R2 = c(summary(mod_linear)$adj.r.squared,
summary(mod_tangga)$adj.r.squared,
summary(mod_spline3)$adj.r.squared,
summary(mod_spline3ns)$adj.r.squared)
)
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
compare_stats
## Model MSE AIC Adj_R2
## 1 Linear 1476.416 1067.706 0.4832134
## 2 Tangga 1365.421 1063.844 0.5125039
## 3 Spline 1452.173 1056.925 0.5369181
## 4 Natural Spline 1445.367 1055.294 0.5433818
Intrepetasi:
A. Linear
model Linear memiliki nilai MSE paling rendah 1293.729, yang menunjukkan tingkat kesalahan prediksi terkecil. namun, nilai Adjusted R² yang sangat rendah 0.1133 menandakan bahwa model kurang mampu dalam menangkap pola variasi data
B. Fungsi tangga dan Spline
model Tangga dan Spline menunjukkan kinerja yang sedikit lebih baik dari Linear dalam menjelaskan variasi data Adjusted R² sekitar 0.18 – 0.20, tetapi keduanya masih memiliki MSE yang lebih tinggi dan tidak lebih efisien dibandingkan Natural Spline
C. Natural Spline
Natural Spline memberikan hasil terbaik dari sisi kemampuan menjelaskan variasi data dengan Adjusted R² sebesar 0.5434 dan nilai AIC terendah 1055.294, meskipun nilai MSE-nya relatif lebih besar 1445.367. hal ini menunjukkan bahwa Natural Spline lebih seimbang antara kompleksitas model dan kemampuan menjelaskan data
secara keseluruhan, model Linear dapat digunakan dengan baik dalam hal meminimalkan eror prediksi. sedangkan Natural Spline baik digunakan untuk memahami pola data secara terperinci dan memilih model paling efisien
library(tidyverse)
library(splines)
df_airquality <- datasets::airquality
head(df_airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
# Ganti NA pada kolom Ozone dengan median
airquality$Ozone[is.na(airquality$Ozone)] <- median(airquality$Ozone, na.rm = TRUE)
# Ganti NA pada kolom Solar.R dengan median
airquality$Solar.R[is.na(airquality$Solar.R)] <- median(airquality$Solar.R, na.rm = TRUE)
# Cek lagi apakah masih ada missing value
colSums(is.na(airquality))
## Ozone Solar.R Wind Temp Month Day
## 0 0 0 0 0 0
mod_linear = lm(Ozone~Wind,data=df_airquality)
summary(mod_linear)
##
## Call:
## lm(formula = Ozone ~ Wind, data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -51.572 -18.854 -4.868 15.234 90.000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 96.8729 7.2387 13.38 < 2e-16 ***
## Wind -5.5509 0.6904 -8.04 9.27e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.47 on 114 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.3619, Adjusted R-squared: 0.3563
## F-statistic: 64.64 on 1 and 114 DF, p-value: 9.272e-13
Intrepetasi:
hasil regresi menunjukkan bahwa koefisien β₁ bernilai -5.55 dengan p-value < 0.001. Karena β₁ signifikan, maka dapat disimpulkan terdapat hubungan linear yang signifikan antara variabel Wind dan Ozone. arah hubungan bersifat negatif, artinya setiap peningkatan kecepatan angin akan menurunkan konsentrasi ozon
ggplot(df_airquality,aes(x=Wind, y=Ozone)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~x,lty = 1, col = "pink",se = F)+
theme_bw()
## Warning: Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 37 rows containing missing values or values outside the scale range
## (`geom_point()`).
Intrepetasi:
Scatter plot menunjukkan titik-titik Ozone cenderung menurun seiring bertambahnya Wind. garis regresi (merah muda) miring ke bawah yang menunjukkan konsisten dengan koefisien negatif dan sebaran titik cukup menyebar, artinya ada variabilitas yang tidak bisa dijelaskan hanya dengan variabel Wind
mod_tangga = lm(Ozone ~ cut(Wind,5),data=df_airquality)
summary(mod_tangga)
##
## Call:
## lm(formula = Ozone ~ cut(Wind, 5), data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.595 -16.146 -5.187 13.236 70.636
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 97.364 7.635 12.753 < 2e-16 ***
## cut(Wind, 5)(5.5,9.3] -45.768 8.577 -5.336 5.07e-07 ***
## cut(Wind, 5)(9.3,13.1] -69.095 8.598 -8.036 1.09e-12 ***
## cut(Wind, 5)(13.1,16.9] -74.258 9.594 -7.740 4.96e-12 ***
## cut(Wind, 5)(16.9,20.7] -80.364 16.493 -4.873 3.69e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.32 on 111 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.4313, Adjusted R-squared: 0.4108
## F-statistic: 21.04 on 4 and 111 DF, p-value: 6.224e-13
Intrepetasi:
analisis regresi menunjukkan bahwa kecepatan angin berpengaruh signifikan terhadap konsentrasi Ozone dengan nilai p < 0.001. Semakin tinggi kecepatan angin, konsentrasi Ozone cenderung menurun secara bertahap. model ini mampu menjelaskan sekitar 48% variasi konsentrasi Ozone. hal ini mengindikasikan bahwa kecepatan angin berperan penting dalam mengurangi akumulasi Ozone
ggplot(df_airquality,aes(x=Wind, y=Ozone)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~cut(x,5),
lty = 1, col = "pink",se = F)+
theme_bw()
## Warning: Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 37 rows containing missing values or values outside the scale range
## (`geom_point()`).
Intrepetasi:
grafik menunjukkan garis merah muda (hasil model step function) yang menurun seiring bertambahnya Wind. titik-titik hitam juga memperlihatkan pola sebaran bahwa semakin tinggi Wind, Ozone makin kecil. model berhasil menangkap tren penurunan konsentrasi Ozone dengan meningkatnya kecepatan angin
mod_spline3 = lm(Ozone ~ bs(Wind, knots = c(5, 10, 20, 30, 40)),data=df_airquality)
summary(mod_spline3)
##
## Call:
## lm(formula = Ozone ~ bs(Wind, knots = c(5, 10, 20, 30, 40)),
## data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.30 -14.28 -6.04 12.64 66.66
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.14 44.94 1.628 0.107
## bs(Wind, knots = c(5, 10, 20, 30, 40))1 78.06 59.84 1.304 0.195
## bs(Wind, knots = c(5, 10, 20, 30, 40))2 -13.96 42.29 -0.330 0.742
## bs(Wind, knots = c(5, 10, 20, 30, 40))3 -61.22 51.03 -1.200 0.233
## bs(Wind, knots = c(5, 10, 20, 30, 40))4 -41.85 45.86 -0.912 0.364
## bs(Wind, knots = c(5, 10, 20, 30, 40))5 -73.14 52.38 -1.396 0.165
## bs(Wind, knots = c(5, 10, 20, 30, 40))6 -36.12 50.71 -0.712 0.478
## bs(Wind, knots = c(5, 10, 20, 30, 40))7 NA NA NA NA
## bs(Wind, knots = c(5, 10, 20, 30, 40))8 NA NA NA NA
##
## Residual standard error: 23.5 on 109 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.519, Adjusted R-squared: 0.4925
## F-statistic: 19.6 on 6 and 109 DF, p-value: 2.078e-15
range(df_airquality$Wind, na.rm = TRUE)
## [1] 1.7 20.7
mod_spline3ns = lm(Ozone ~ ns(Wind, knots = c(5, 10, 20)),data=df_airquality)
summary(mod_spline3ns)
##
## Call:
## lm(formula = Ozone ~ ns(Wind, knots = c(5, 10, 20)), data = df_airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.443 -13.768 -4.871 13.205 64.185
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 142.77 18.21 7.839 2.99e-12 ***
## ns(Wind, knots = c(5, 10, 20))1 -137.64 17.44 -7.891 2.29e-12 ***
## ns(Wind, knots = c(5, 10, 20))2 -108.26 26.95 -4.017 0.000108 ***
## ns(Wind, knots = c(5, 10, 20))3 -180.37 39.13 -4.610 1.09e-05 ***
## ns(Wind, knots = c(5, 10, 20))4 -72.64 19.28 -3.768 0.000265 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.74 on 111 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.5001, Adjusted R-squared: 0.4821
## F-statistic: 27.76 on 4 and 111 DF, p-value: 5.567e-16
ggplot(df_airquality,aes(x=Wind, y=Ozone)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~bs(x, knots = c(5, 10, 20, 30, 40)),
lty = 1, aes(col = "Cubic Spline"),se = F)+
stat_smooth(method = "lm",
formula = y~ns(x, knots = c(5, 10, 20)),
lty = 1, aes(col = "Natural Cubic Spline"),se = F)+labs(color="Tipe Spline")+
scale_color_manual(values = c("Natural Cubic Spline"="red","Cubic Spline"="blue"))+theme_bw()
## Warning: Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Removed 37 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 37 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Komparasi Model
# Fungsi MSE
MSE <- function(pred, actual) {
mean((pred - actual)^2, na.rm = TRUE)
}
# Buat prediksi dari masing-masing model
pred_linear <- predict(mod_linear)
pred_tangga <- predict(mod_tangga)
pred_spline <- predict(mod_spline3)
pred_nspline <- predict(mod_spline3ns)
compare_stats <- data.frame(
Model = c("Linear","Tangga","Spline","Natural Spline"),
MSE = c(MSE(pred_linear, df_airquality$Ozone),
MSE(pred_tangga, df_airquality$Ozone),
MSE(pred_spline, df_airquality$Ozone),
MSE(pred_nspline, df_airquality$Ozone)),
AIC = c(AIC(mod_linear),
AIC(mod_tangga),
AIC(mod_spline3),
AIC(mod_spline3ns)),
Adj_R2 = c(summary(mod_linear)$adj.r.squared,
summary(mod_tangga)$adj.r.squared,
summary(mod_spline3)$adj.r.squared,
summary(mod_spline3ns)$adj.r.squared)
)
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
## Warning in pred - actual: longer object length is not a multiple of shorter
## object length
compare_stats
## Model MSE AIC Adj_R2
## 1 Linear 1330.958 1093.187 0.3562605
## 2 Tangga 1324.261 1085.829 0.4107761
## 3 Spline 1450.912 1070.398 0.4925109
## 4 Natural Spline 1378.702 1070.860 0.4821095
Intrepetasi:
A. model linear
model linear menghasilkan MSE tertinggi 688.44, AIC terbesar 1093.19 dan Adjusted R² terendah 0.357. Hal ini menunjukkan bahwa model linear kurang mampu menangkap pola variasi data dengan baik dan memberikan prediksi yang relatif kurang akurat dibanding model lainnya
B. model tangga
model tangga menunjukkan kinerja yang lebih baik dibandingkan linear dengan MSE lebih rendah 561.71 dan Adjusted R² lebih tinggi 0.461. artinya, model ini sudah mulai mampu mengikuti perubahan pola data, meskipun belum sebaik spline
C. model spline
model spline memberikan performa terbaik di antara semua model dengan MSE terendah 518.92, AIC paling rendah 1070.40 dan Adjusted R² tertinggi 0.493. hal ini menunjukkan bahwa spline mampu menangkap pola data secara lebih fleksibel sekaligus menjaga efisiensi model
D. model natural spline
model natural spline memiliki hasil yang sangat mendekati spline, dengan MSE = 539.07 dan Adjusted R² = 0.482. Walaupun sedikit kalah dari spline, model ini tetap lebih baik dibanding linear dan tangga, serta memberikan keseimbangan yang baik antara fleksibilitas dan kompleksitas
secara keseluruhan dapat disimpulkan bahwa Model Spline lebih unggul dibandingkan dengan model yang lain