Bersepeda menjadi opsi berolahraga sekaligus sebagai alat transportasi, alat nya yang ringan dan tersedia untuk semua kalangan menjadikan sepeda digemari oleh hampir seluruh penjuru dunia. Bahkan terdapat sebuah sistem terintegrasi di USA dalam menyediakan jasa sewa sepeda ini dengan nama Bike Sharing. Report ini akan menyediakan analisa serta prediksi jumlah sepeda yang di sewa berdasarkan data historical tahun 2011-2012 yang didapatkan dari UCI Machine Learning Repository prediksi dilakukan dengan menggunakan pendekatan Linear Regression.
library(tidyverse)
library(GGally)
library(MLmetrics)
library(car)
library(lmtest)main_bike <- read.csv("Bike-Sharing-Dataset/day.csv")str(main_bike)## 'data.frame': 731 obs. of 16 variables:
## $ instant : int 1 2 3 4 5 6 7 8 9 10 ...
## $ dteday : chr "2011-01-01" "2011-01-02" "2011-01-03" "2011-01-04" ...
## $ season : int 1 1 1 1 1 1 1 1 1 1 ...
## $ yr : int 0 0 0 0 0 0 0 0 0 0 ...
## $ mnth : int 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : int 0 0 0 0 0 0 0 0 0 0 ...
## $ weekday : int 6 0 1 2 3 4 5 6 0 1 ...
## $ workingday: int 0 0 1 1 1 1 1 0 0 1 ...
## $ weathersit: int 2 2 1 1 1 1 2 2 1 1 ...
## $ temp : num 0.344 0.363 0.196 0.2 0.227 ...
## $ atemp : num 0.364 0.354 0.189 0.212 0.229 ...
## $ hum : num 0.806 0.696 0.437 0.59 0.437 ...
## $ windspeed : num 0.16 0.249 0.248 0.16 0.187 ...
## $ casual : int 331 131 120 108 82 88 148 68 54 41 ...
## $ registered: int 654 670 1229 1454 1518 1518 1362 891 768 1280 ...
## $ cnt : int 985 801 1349 1562 1600 1606 1510 959 822 1321 ...
Data Bike Sharing berisikan 731 observasi (baris) dengan 16 variabel (kolom) dengan penjelasan setiap kolomnya adalah sebagai berikut:
instant : nilai unik sebagai indexdteday : tanggal pencatatan dataseason : musim dengan keterangan
(1:Springer, 2:Summer, 3:Fall,
4:Winter)yr : tahun dengan keterangan (0:2011,
1:2012)mnth : bulan dengan nilai (0 - 12)holiday : apakah termasuk hari libur atau tidak
(berdasarkan holiday)weekday : hari ke- dalam 1 mingguworkingday : apakah termasuk hari kerja atau tidak
(1:Iya, 0:Tidak)weathershit : keadaan cuaca
1: Clear, Few clouds, Partly cloudy, Partly cloudy2: Mist + Cloudy, Mist + Broken clouds, Mist + Few
clouds, Mist3: Light Snow, Light Rain + Thunderstorm + Scattered
clouds, Light Rain + Scattered clouds4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow
+ Fogtemp : suhu yang telah dinormalisasikan (dibagi dengan
suhu tertinggi 41) dalam Celciusatemp : suhu terasa seperti, yang telah
dinormalisasikan (dibagi dengan suhu tertinggi 51) dalam
Celciushum : humidity/kelembapan, yang telah dinormalisasikan
(dibagi dengan nilai tertinggi 100)windspeed : kecepatan angin, yang telah
dinormalisasikan (dibagi dengan nilai tertinggi 67)casual: jumlah pengguna casual/tidak terdaftarregistered : jumlah pengguna terdaftarcnt : jumlah total pengguna
casual+`registered``Pemilihan Variabel/Kolom
Kolom yang tidak digunakan: 1. instant 2. dteday 3. casual 4. registered
Merubah Tipe Kolom
main_bike <- main_bike %>%
select(-c(instant, dteday, casual, registered)) %>%
mutate(cnt = as.numeric(cnt)) %>%
mutate_if(is.integer, as_factor)Check Missing Value
anyNA(main_bike)## [1] FALSE
Boxplot
main_bike %>%
select(temp, atemp, hum, windspeed) %>%
boxplot()
> Insight > - Terdapat outlier pada hum dan windspeed yang
ditandai dengan bulatan > - Hampir seluruh kolom numerik memiliki
sebaran normal
Correlation
ggcorr(main_bike, label = T)## Warning in ggcorr(main_bike, label = T): data in column(s) 'season', 'yr',
## 'mnth', 'holiday', 'weekday', 'workingday', 'weathersit' are not numeric and
## were ignored
Insight - Prediktor dengan korelasi kuat terhadap cnt adalah atemp dan temp
Dipilih 1 prediktor temp karena prediktor tersebut
memiliki korelasi positif yang cukup kuat, sama seperti
atemp namun karena atemp adalah suhu terasa seperti temp,
sehingga dipilih temp saja.
model_temp <- lm(cnt ~ temp, data = main_bike)
summary(model_temp)##
## Call:
## lm(formula = cnt ~ temp, data = main_bike)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4615.3 -1134.9 -104.4 1044.3 3737.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1214.6 161.2 7.537 1.43e-13 ***
## temp 6640.7 305.2 21.759 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1509 on 729 degrees of freedom
## Multiple R-squared: 0.3937, Adjusted R-squared: 0.3929
## F-statistic: 473.5 on 1 and 729 DF, p-value: < 2.2e-16
plot(main_bike$temp, main_bike$cnt)
abline(model_temp, col = "red")Insight -
tempmeruoakan variabel prediktor yang signifikan - setiap kenaikantempakan meningkatkan variabel prediktor yaknicnt- nilai R-squared cukup rendah yakni0.3937untuk Multiple R-squared
model_all <- lm(cnt ~ . , main_bike)
summary(model_all)##
## Call:
## lm(formula = cnt ~ ., data = main_bike)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3944.7 -348.2 63.8 457.4 2912.7
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1485.84 239.75 6.198 9.77e-10 ***
## season2 884.71 179.49 4.929 1.03e-06 ***
## season3 832.70 213.13 3.907 0.000102 ***
## season4 1575.35 181.00 8.704 < 2e-16 ***
## yr1 2019.74 58.22 34.691 < 2e-16 ***
## mnth2 131.03 143.78 0.911 0.362443
## mnth3 542.83 165.43 3.281 0.001085 **
## mnth4 451.17 247.57 1.822 0.068820 .
## mnth5 735.51 267.63 2.748 0.006145 **
## mnth6 515.40 282.41 1.825 0.068423 .
## mnth7 30.80 313.82 0.098 0.921854
## mnth8 444.95 303.17 1.468 0.142639
## mnth9 1004.17 265.12 3.788 0.000165 ***
## mnth10 519.67 241.55 2.151 0.031787 *
## mnth11 -116.69 230.78 -0.506 0.613257
## mnth12 -89.59 182.21 -0.492 0.623098
## holiday1 -589.70 180.36 -3.270 0.001130 **
## weekday1 212.05 109.49 1.937 0.053187 .
## weekday2 309.53 107.13 2.889 0.003982 **
## weekday3 381.36 107.48 3.548 0.000414 ***
## weekday4 386.34 107.53 3.593 0.000350 ***
## weekday5 436.98 107.44 4.067 5.30e-05 ***
## weekday6 440.46 106.56 4.133 4.01e-05 ***
## workingday1 NA NA NA NA
## weathersit2 -462.54 77.09 -6.000 3.16e-09 ***
## weathersit3 -1965.09 197.05 -9.972 < 2e-16 ***
## temp 2855.01 1398.16 2.042 0.041526 *
## atemp 1786.16 1462.12 1.222 0.222261
## hum -1535.47 292.45 -5.250 2.01e-07 ***
## windspeed -2823.30 414.55 -6.810 2.09e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 769.2 on 702 degrees of freedom
## Multiple R-squared: 0.8484, Adjusted R-squared: 0.8423
## F-statistic: 140.3 on 28 and 702 DF, p-value: < 2.2e-16
Insight - Hampir semua variabel prediktor memiliki signifikansi yang tinggi terhadap prediktor
cnt- Nilai R-Quared cukup tinggi yakni0.8423untuk Adjusted R-squared
Backward
model_backward <- step(object = model_all,
direction = "backward"
)## Start: AIC=9744
## cnt ~ season + yr + mnth + holiday + weekday + workingday + weathersit +
## temp + atemp + hum + windspeed
##
##
## Step: AIC=9744
## cnt ~ season + yr + mnth + holiday + weekday + weathersit + temp +
## atemp + hum + windspeed
##
## Df Sum of Sq RSS AIC
## - atemp 1 883090 416284778 9743.6
## <none> 415401688 9744.0
## - temp 1 2467375 417869064 9746.3
## - holiday 1 6325545 421727233 9753.0
## - weekday 6 15346257 430747945 9758.5
## - hum 1 16312287 431713975 9770.2
## - windspeed 1 27446438 442848126 9788.8
## - mnth 11 57051189 472452877 9816.1
## - season 3 51711125 467112813 9823.8
## - weathersit 2 62999538 478401226 9843.2
## - yr 1 712147968 1127549656 10471.9
##
## Step: AIC=9743.55
## cnt ~ season + yr + mnth + holiday + weekday + weathersit + temp +
## hum + windspeed
##
## Df Sum of Sq RSS AIC
## <none> 416284778 9743.6
## - holiday 1 6653951 422938729 9753.1
## - weekday 6 15002785 431287563 9757.4
## - hum 1 15984413 432269190 9769.1
## - windspeed 1 30717842 447002619 9793.6
## - mnth 11 56630559 472915337 9814.8
## - season 3 52094569 468379347 9823.7
## - weathersit 2 64280982 480565760 9844.5
## - temp 1 70299831 486584608 9855.6
## - yr 1 711362396 1127647174 10470.0
Forward
model_none <- lm(cnt ~ 1 , main_bike)
model_forward <- step(object = model_none,
scope = list(upper=model_all),
direction = "forward")## Start: AIC=11066.88
## cnt ~ 1
##
## Df Sum of Sq RSS AIC
## + atemp 1 1091003307 1648532085 10698
## + temp 1 1078688585 1660846807 10703
## + mnth 11 1070192271 1669343121 10727
## + season 3 950595868 1788939524 10761
## + yr 1 879828893 1859706499 10786
## + weathersit 2 271644573 2467890819 10995
## + windspeed 1 150705556 2588829836 11028
## + hum 1 27757373 2711778019 11061
## + holiday 1 12797494 2726737898 11066
## + workingday 1 10246038 2729289354 11066
## <none> 2739535392 11067
## + weekday 6 17659017 2721876375 11074
##
## Step: AIC=10697.61
## cnt ~ atemp
##
## Df Sum of Sq RSS AIC
## + yr 1 793490560 855041526 10220
## + weathersit 2 163531664 1485000421 10625
## + season 3 154159967 1494372119 10632
## + hum 1 99815222 1548716864 10654
## + mnth 11 135165642 1513366443 10657
## + windspeed 1 39915579 1608616506 10682
## + holiday 1 6274901 1642257184 10697
## <none> 1648532085 10698
## + workingday 1 2188508 1646343578 10699
## + temp 1 459595 1648072490 10699
## + weekday 6 15529667 1633002418 10703
##
## Step: AIC=10219.71
## cnt ~ atemp + yr
##
## Df Sum of Sq RSS AIC
## + season 3 168162811 686878714 10066
## + mnth 11 155698736 699342790 10095
## + weathersit 2 122502080 732539446 10111
## + hum 1 44927700 810113826 10182
## + windspeed 1 38710961 816330564 10188
## + holiday 1 7682915 847358611 10215
## + weekday 6 16613406 838428120 10217
## + workingday 1 2573280 852468246 10220
## <none> 855041526 10220
## + temp 1 70539 854970986 10222
##
## Step: AIC=10065.63
## cnt ~ atemp + yr + season
##
## Df Sum of Sq RSS AIC
## + weathersit 2 152432468 534446246 9886.2
## + hum 1 80425891 606452824 9976.6
## + mnth 11 44938268 641940446 10038.2
## + windspeed 1 23229150 663649564 10042.5
## + weekday 6 16893247 669985467 10059.4
## + holiday 1 7013428 679865287 10060.1
## + workingday 1 2690721 684187994 10064.8
## <none> 686878714 10065.6
## + temp 1 1870793 685007921 10065.6
##
## Step: AIC=9886.2
## cnt ~ atemp + yr + season + weathersit
##
## Df Sum of Sq RSS AIC
## + mnth 11 56800118 477646128 9826.1
## + windspeed 1 15760980 518685266 9866.3
## + weekday 6 21137709 513308537 9868.7
## + holiday 1 9917911 524528336 9874.5
## + hum 1 6812250 527633996 9878.8
## + workingday 1 5894317 528551929 9880.1
## + temp 1 2968145 531478101 9884.1
## <none> 534446246 9886.2
##
## Step: AIC=9826.07
## cnt ~ atemp + yr + season + weathersit + mnth
##
## Df Sum of Sq RSS AIC
## + windspeed 1 16851463 460794665 9801.8
## + weekday 6 21356715 456289413 9804.6
## + hum 1 9760667 467885460 9813.0
## + holiday 1 7498087 470148041 9816.5
## + workingday 1 6087557 471558571 9818.7
## <none> 477646128 9826.1
## + temp 1 483138 477162990 9827.3
##
## Step: AIC=9801.81
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed
##
## Df Sum of Sq RSS AIC
## + hum 1 19120666 441673999 9772.8
## + weekday 6 21487907 439306758 9778.9
## + holiday 1 7334929 453459736 9792.1
## + workingday 1 5829171 454965494 9794.5
## + temp 1 2461824 458332841 9799.9
## <none> 460794665 9801.8
##
## Step: AIC=9772.83
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum
##
## Df Sum of Sq RSS AIC
## + weekday 6 18042031 423631968 9754.3
## + holiday 1 7482642 434191356 9762.3
## + workingday 1 4995892 436678107 9766.5
## + temp 1 2784516 438889483 9770.2
## <none> 441673999 9772.8
##
## Step: AIC=9754.34
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum +
## weekday
##
## Df Sum of Sq RSS AIC
## + holiday 1 5762905 417869064 9746.3
## + workingday 1 5762905 417869064 9746.3
## + temp 1 1904735 421727233 9753.0
## <none> 423631968 9754.3
##
## Step: AIC=9746.33
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum +
## weekday + holiday
##
## Df Sum of Sq RSS AIC
## + temp 1 2467375 415401688 9744.0
## <none> 417869064 9746.3
##
## Step: AIC=9744
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum +
## weekday + holiday + temp
##
## Df Sum of Sq RSS AIC
## <none> 415401688 9744
Both
model_both <- step(model_none,
scope = list(upper=model_all),
direction = "both")## Start: AIC=11066.88
## cnt ~ 1
##
## Df Sum of Sq RSS AIC
## + atemp 1 1091003307 1648532085 10698
## + temp 1 1078688585 1660846807 10703
## + mnth 11 1070192271 1669343121 10727
## + season 3 950595868 1788939524 10761
## + yr 1 879828893 1859706499 10786
## + weathersit 2 271644573 2467890819 10995
## + windspeed 1 150705556 2588829836 11028
## + hum 1 27757373 2711778019 11061
## + holiday 1 12797494 2726737898 11066
## + workingday 1 10246038 2729289354 11066
## <none> 2739535392 11067
## + weekday 6 17659017 2721876375 11074
##
## Step: AIC=10697.61
## cnt ~ atemp
##
## Df Sum of Sq RSS AIC
## + yr 1 793490560 855041526 10220
## + weathersit 2 163531664 1485000421 10625
## + season 3 154159967 1494372119 10632
## + hum 1 99815222 1548716864 10654
## + mnth 11 135165642 1513366443 10657
## + windspeed 1 39915579 1608616506 10682
## + holiday 1 6274901 1642257184 10697
## <none> 1648532085 10698
## + workingday 1 2188508 1646343578 10699
## + temp 1 459595 1648072490 10699
## + weekday 6 15529667 1633002418 10703
## - atemp 1 1091003307 2739535392 11067
##
## Step: AIC=10219.71
## cnt ~ atemp + yr
##
## Df Sum of Sq RSS AIC
## + season 3 168162811 686878714 10066
## + mnth 11 155698736 699342790 10095
## + weathersit 2 122502080 732539446 10111
## + hum 1 44927700 810113826 10182
## + windspeed 1 38710961 816330564 10188
## + holiday 1 7682915 847358611 10215
## + weekday 6 16613406 838428120 10217
## + workingday 1 2573280 852468246 10220
## <none> 855041526 10220
## + temp 1 70539 854970986 10222
## - yr 1 793490560 1648532085 10698
## - atemp 1 1004664973 1859706499 10786
##
## Step: AIC=10065.63
## cnt ~ atemp + yr + season
##
## Df Sum of Sq RSS AIC
## + weathersit 2 152432468 534446246 9886.2
## + hum 1 80425891 606452824 9976.6
## + mnth 11 44938268 641940446 10038.2
## + windspeed 1 23229150 663649564 10042.5
## + weekday 6 16893247 669985467 10059.4
## + holiday 1 7013428 679865287 10060.1
## + workingday 1 2690721 684187994 10064.8
## <none> 686878714 10065.6
## + temp 1 1870793 685007921 10065.6
## - season 3 168162811 855041526 10219.7
## - atemp 1 218052547 904931261 10265.2
## - yr 1 807493404 1494372119 10631.8
##
## Step: AIC=9886.2
## cnt ~ atemp + yr + season + weathersit
##
## Df Sum of Sq RSS AIC
## + mnth 11 56800118 477646128 9826.1
## + windspeed 1 15760980 518685266 9866.3
## + weekday 6 21137709 513308537 9868.7
## + holiday 1 9917911 524528336 9874.5
## + hum 1 6812250 527633996 9878.8
## + workingday 1 5894317 528551929 9880.1
## + temp 1 2968145 531478101 9884.1
## <none> 534446246 9886.2
## - weathersit 2 152432468 686878714 10065.6
## - atemp 1 184648589 719094835 10101.1
## - season 3 198093200 732539446 10110.7
## - yr 1 762974811 1297421057 10532.5
##
## Step: AIC=9826.07
## cnt ~ atemp + yr + season + weathersit + mnth
##
## Df Sum of Sq RSS AIC
## + windspeed 1 16851463 460794665 9801.8
## + weekday 6 21356715 456289413 9804.6
## + hum 1 9760667 467885460 9813.0
## + holiday 1 7498087 470148041 9816.5
## + workingday 1 6087557 471558571 9818.7
## <none> 477646128 9826.1
## + temp 1 483138 477162990 9827.3
## - mnth 11 56800118 534446246 9886.2
## - season 3 59886815 537532943 9906.4
## - atemp 1 62937248 540583376 9914.5
## - weathersit 2 164294318 641940446 10038.2
## - yr 1 770192543 1247838671 10526.0
##
## Step: AIC=9801.81
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed
##
## Df Sum of Sq RSS AIC
## + hum 1 19120666 441673999 9772.8
## + weekday 6 21487907 439306758 9778.9
## + holiday 1 7334929 453459736 9792.1
## + workingday 1 5829171 454965494 9794.5
## + temp 1 2461824 458332841 9799.9
## <none> 460794665 9801.8
## - windspeed 1 16851463 477646128 9826.1
## - mnth 11 57890601 518685266 9866.3
## - season 3 53118949 513913614 9875.6
## - atemp 1 56942731 517737396 9885.0
## - weathersit 2 156262119 617056784 10011.3
## - yr 1 771437691 1232232356 10518.8
##
## Step: AIC=9772.83
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum
##
## Df Sum of Sq RSS AIC
## + weekday 6 18042031 423631968 9754.3
## + holiday 1 7482642 434191356 9762.3
## + workingday 1 4995892 436678107 9766.5
## + temp 1 2784516 438889483 9770.2
## <none> 441673999 9772.8
## - hum 1 19120666 460794665 9801.8
## - windspeed 1 26211462 467885460 9813.0
## - mnth 11 62026229 503700227 9846.9
## - season 3 53935519 495609518 9851.1
## - weathersit 2 56326584 498000582 9856.6
## - atemp 1 71073384 512747383 9879.9
## - yr 1 718021306 1159695305 10476.5
##
## Step: AIC=9754.34
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum +
## weekday
##
## Df Sum of Sq RSS AIC
## + holiday 1 5762905 417869064 9746.3
## + workingday 1 5762905 417869064 9746.3
## + temp 1 1904735 421727233 9753.0
## <none> 423631968 9754.3
## - weekday 6 18042031 441673999 9772.8
## - hum 1 15674790 439306758 9778.9
## - windspeed 1 25276961 448908929 9794.7
## - mnth 11 61660211 485292179 9831.7
## - season 3 54557310 478189278 9836.9
## - weathersit 2 61770592 485402560 9849.8
## - atemp 1 67338333 490970301 9860.2
## - yr 1 721451918 1145083887 10479.2
##
## Step: AIC=9746.33
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum +
## weekday + holiday
##
## Df Sum of Sq RSS AIC
## + temp 1 2467375 415401688 9744.0
## <none> 417869064 9746.3
## - holiday 1 5762905 423631968 9754.3
## - weekday 6 16322293 434191356 9762.3
## - hum 1 15934429 433803493 9771.7
## - windspeed 1 25220175 443089238 9787.2
## - mnth 11 59986096 477855159 9822.4
## - season 3 51423350 469292413 9825.2
## - weathersit 2 62709484 480578547 9844.5
## - atemp 1 68715545 486584608 9855.6
## - yr 1 721273907 1139142971 10477.4
##
## Step: AIC=9744
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum +
## weekday + holiday + temp
##
## Df Sum of Sq RSS AIC
## - atemp 1 883090 416284778 9743.6
## <none> 415401688 9744.0
## - temp 1 2467375 417869064 9746.3
## - holiday 1 6325545 421727233 9753.0
## - weekday 6 15346257 430747945 9758.5
## - hum 1 16312287 431713975 9770.2
## - windspeed 1 27446438 442848126 9788.8
## - mnth 11 57051189 472452877 9816.1
## - season 3 51711125 467112813 9823.8
## - weathersit 2 62999538 478401226 9843.2
## - yr 1 712147968 1127549656 10471.9
##
## Step: AIC=9743.55
## cnt ~ yr + season + weathersit + mnth + windspeed + hum + weekday +
## holiday + temp
##
## Df Sum of Sq RSS AIC
## <none> 416284778 9743.6
## + atemp 1 883090 415401688 9744.0
## - holiday 1 6653951 422938729 9753.1
## - weekday 6 15002785 431287563 9757.4
## - hum 1 15984413 432269190 9769.1
## - windspeed 1 30717842 447002619 9793.6
## - mnth 11 56630559 472915337 9814.8
## - season 3 52094569 468379347 9823.7
## - weathersit 2 64280982 480565760 9844.5
## - temp 1 70299831 486584608 9855.6
## - yr 1 711362396 1127647174 10470.0
summary(model_backward)$adj.r.squared## [1] 0.8422094
summary(model_forward)$adj.r.squared## [1] 0.8423198
summary(model_both)$adj.r.squared## [1] 0.8422094
Insight - Model backward dan both memiliki Adj R-squared yang sama, yakni 0.8422094 dengan kesamaan prediktor tanpa mengikutsertakan
atemp
# Prediktor Temp
pred_temp <- predict(model_temp,
newdata = main_bike)
# Semua Prediktor
pred_all <- predict(model_all,
newdata = main_bike)## Warning in predict.lm(model_all, newdata = main_bike): prediction from a rank-
## deficient fit may be misleading
# Backward
pred_backward <- predict(model_backward,
newdata = main_bike)RMSE(y_pred = pred_temp, y_true = main_bike$cnt)## [1] 1507.322
RMSE(y_pred = pred_all, y_true = main_bike$cnt)## [1] 753.8335
RMSE(y_pred = pred_backward, y_true = main_bike$cnt)## [1] 754.6344
Range cnt
range(main_bike$cnt)## [1] 22 8714
Insight - Didapatkan nilai error terkecil adalah untuk semua prediktor, namun terdapat perulangan informasi antara kolom
atempdantempmaka diputuskan untuk memilih modelbackwarddengan menghiraukan variabelatemp
residual pada model backward
Diharapkan nilai error/residual terdistribusi normal, dengan ditunjukkan
nilai tersebut berada di sekitar nilai 0.hist(model_backward$residuals)hist(model_temp$residuals)shapiro.test() Diharapkan nilai P-value
kurang dari alphashapiro.test(model_temp$residuals)##
## Shapiro-Wilk normality test
##
## data: model_temp$residuals
## W = 0.98671, p-value = 3.392e-06
shapiro.test(model_backward$residuals)##
## Shapiro-Wilk normality test
##
## data: model_backward$residuals
## W = 0.95242, p-value = 1.293e-14
Note: dibandingkan p-value dengan alpha (0.05)
p-value >= 0.05 = terima H0 (gagal tolak H0) p-value < 0.05 = terima H1 (tolak H0)
Shapiro-Wilk hypothesis test:
H0: error/residual berdistribusi normalH1: error/residual tidak berdistribusi normalKesimpulan Normalitas
- model_temp:
- dari visualisasi histogram nilai error berkisar antara -2000 s/d 2000
- dari shapiro.test nilai p-value < 0.05 uji asumsi Terpenuhi
- model_backward:
- dari visualisasi histogram nilai error berkisar di nilai 0. uji histogram Terpenuhi
- dari shapiro.test nilai p-value < 0.05 uji asumsi Terpenuhi
Diharapkan nilai error tidak membentuk pola
Homoscedasticity
plot(x = model_temp$fitted.values,
y = model_temp$residuals)
abline(h = 0, col = "red")plot(x = model_backward$fitted.values,
y = model_backward$residuals)
abline(h = 0, col = "red")bptest() diharapkan variansi
error menyebar konstan (terima H0)bptest(model_temp)##
## studentized Breusch-Pagan test
##
## data: model_temp
## BP = 9.5279, df = 1, p-value = 0.002024
bptest(model_backward)##
## studentized Breusch-Pagan test
##
## data: model_backward
## BP = 75.358, df = 27, p-value = 1.869e-06
p-value bandingkan dengan alpha (0.05)
p-value >= 0.05 = terima H0 (gagal tolak H0) p-value < 0.05 = terima H1 (tolak H0)
Breusch-Pagan hypothesis test:
Kesimpulan Homoscedasticity > 1. model_temp: - dari visualisasi scatterplot nilai error lebih tersebar . uji asumsi Terpenuhi - dari bptest nilai p-value lebih mendekati 0.05, meskipun begitu tetap Tidak Terpenuhi > 2. model_backward: - dari visualisasi scatterplot nilai error membentuk pola/Heteroscedasticity. Uji asumsi Tidak Terpenuhi - dari bptest nilai p-value jauh dibawah dari 0.05 sehingga uji asumsi Tidak Terpenuhi
Diharapkan tidak ada variabel yang memiliki korelasi teramat kuat (>10)
vif(model_backward)## GVIF Df GVIF^(1/(2*Df))
## season 169.713093 3 2.352985
## yr 1.046249 1 1.022863
## mnth 391.715048 11 1.311784
## holiday 1.116829 1 1.056801
## weekday 1.153293 6 1.011956
## weathersit 1.886133 2 1.171907
## temp 7.006223 1 2.646927
## hum 2.135348 1 1.461283
## windspeed 1.221500 1 1.105215
Penentuan:
Kesimpulan vif > 1. model_temp: - tidak dapat
diuji dengan vif karena hanya terdiri dari 1 prediktor >
2. model_backward: - tidak terdapat prediktor dengan nilai diatas 10
sehingga uji asumsi Terpenuhi
summary(model_backward)##
## Call:
## lm(formula = cnt ~ season + yr + mnth + holiday + weekday + weathersit +
## temp + hum + windspeed, data = main_bike)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3960.9 -350.9 74.1 456.0 2919.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1543.552 235.129 6.565 1.01e-10 ***
## season2 889.302 179.516 4.954 9.12e-07 ***
## season3 832.236 213.204 3.903 0.000104 ***
## season4 1578.947 181.040 8.722 < 2e-16 ***
## yr1 2018.063 58.225 34.660 < 2e-16 ***
## mnth2 136.855 143.747 0.952 0.341396
## mnth3 545.132 165.480 3.294 0.001036 **
## mnth4 456.494 247.615 1.844 0.065667 .
## mnth5 723.520 267.541 2.704 0.007010 **
## mnth6 490.552 281.776 1.741 0.082133 .
## mnth7 8.404 313.395 0.027 0.978613
## mnth8 404.912 301.494 1.343 0.179700
## mnth9 983.948 264.698 3.717 0.000217 ***
## mnth10 520.937 241.636 2.156 0.031432 *
## mnth11 -111.362 230.816 -0.482 0.629621
## mnth12 -84.389 182.229 -0.463 0.643439
## holiday1 -603.605 180.066 -3.352 0.000845 ***
## weekday1 214.877 109.508 1.962 0.050133 .
## weekday2 309.132 107.171 2.884 0.004041 **
## weekday3 377.407 107.467 3.512 0.000473 ***
## weekday4 385.206 107.562 3.581 0.000366 ***
## weekday5 428.604 107.258 3.996 7.12e-05 ***
## weekday6 438.699 106.590 4.116 4.32e-05 ***
## weathersit2 -465.202 77.083 -6.035 2.57e-09 ***
## weathersit3 -1981.357 196.670 -10.075 < 2e-16 ***
## temp 4487.305 411.838 10.896 < 2e-16 ***
## hum -1518.178 292.208 -5.196 2.68e-07 ***
## windspeed -2925.438 406.175 -7.202 1.53e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 769.5 on 703 degrees of freedom
## Multiple R-squared: 0.848, Adjusted R-squared: 0.8422
## F-statistic: 145.3 on 27 and 703 DF, p-value: < 2.2e-16
Dari beberapa evaluasi model yang dilakukan dapat disimpulkan
model_backward merupakan model yang lebih baik dibandingkan
dengan model_temp yang hanya melihat prediktor
temp karena memiliki nilai error yang lebih tinggi.
model_backward memperhatikan semua prediktor kecuali
atemp karena dianggap terdapat pengulangan informasi dengan
variabel temp. Dan dari summary yang ditampilkan lebih
banyak prediktor yang meningkatkan dan memiliki signifikansi yang cukup
baik terhadap target yakni cnt.