Background

Bersepeda menjadi opsi berolahraga sekaligus sebagai alat transportasi, alat nya yang ringan dan tersedia untuk semua kalangan menjadikan sepeda digemari oleh hampir seluruh penjuru dunia. Bahkan terdapat sebuah sistem terintegrasi di USA dalam menyediakan jasa sewa sepeda ini dengan nama Bike Sharing. Report ini akan menyediakan analisa serta prediksi jumlah sepeda yang di sewa berdasarkan data historical tahun 2011-2012 yang didapatkan dari UCI Machine Learning Repository prediksi dilakukan dengan menggunakan pendekatan Linear Regression.

Import Library

library(tidyverse)
library(GGally)
library(MLmetrics)
library(car)
library(lmtest)

Data Preparation

main_bike <- read.csv("Bike-Sharing-Dataset/day.csv")

Data Wrangling

str(main_bike)
## 'data.frame':    731 obs. of  16 variables:
##  $ instant   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ dteday    : chr  "2011-01-01" "2011-01-02" "2011-01-03" "2011-01-04" ...
##  $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ yr        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ mnth      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ weekday   : int  6 0 1 2 3 4 5 6 0 1 ...
##  $ workingday: int  0 0 1 1 1 1 1 0 0 1 ...
##  $ weathersit: int  2 2 1 1 1 1 2 2 1 1 ...
##  $ temp      : num  0.344 0.363 0.196 0.2 0.227 ...
##  $ atemp     : num  0.364 0.354 0.189 0.212 0.229 ...
##  $ hum       : num  0.806 0.696 0.437 0.59 0.437 ...
##  $ windspeed : num  0.16 0.249 0.248 0.16 0.187 ...
##  $ casual    : int  331 131 120 108 82 88 148 68 54 41 ...
##  $ registered: int  654 670 1229 1454 1518 1518 1362 891 768 1280 ...
##  $ cnt       : int  985 801 1349 1562 1600 1606 1510 959 822 1321 ...

Data Bike Sharing berisikan 731 observasi (baris) dengan 16 variabel (kolom) dengan penjelasan setiap kolomnya adalah sebagai berikut:

  • instant : nilai unik sebagai index
  • dteday : tanggal pencatatan data
  • season : musim dengan keterangan (1:Springer, 2:Summer, 3:Fall, 4:Winter)
  • yr : tahun dengan keterangan (0:2011, 1:2012)
  • mnth : bulan dengan nilai (0 - 12)
  • holiday : apakah termasuk hari libur atau tidak (berdasarkan holiday)
  • weekday : hari ke- dalam 1 minggu
  • workingday : apakah termasuk hari kerja atau tidak (1:Iya, 0:Tidak)
  • weathershit : keadaan cuaca
    • 1: Clear, Few clouds, Partly cloudy, Partly cloudy
    • 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    • 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    • 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp : suhu yang telah dinormalisasikan (dibagi dengan suhu tertinggi 41) dalam Celcius
  • atemp : suhu terasa seperti, yang telah dinormalisasikan (dibagi dengan suhu tertinggi 51) dalam Celcius
  • hum : humidity/kelembapan, yang telah dinormalisasikan (dibagi dengan nilai tertinggi 100)
  • windspeed : kecepatan angin, yang telah dinormalisasikan (dibagi dengan nilai tertinggi 67)
  • casual: jumlah pengguna casual/tidak terdaftar
  • registered : jumlah pengguna terdaftar
  • cnt : jumlah total pengguna casual+`registered``

Pemilihan Variabel/Kolom

Kolom yang tidak digunakan: 1. instant 2. dteday 3. casual 4. registered

Merubah Tipe Kolom

main_bike <- main_bike %>% 
  select(-c(instant, dteday, casual, registered)) %>% 
  mutate(cnt = as.numeric(cnt)) %>% 
  mutate_if(is.integer, as_factor)

Check Missing Value

anyNA(main_bike)
## [1] FALSE

Exploratory Data Analysis

Boxplot

main_bike %>% 
  select(temp, atemp, hum, windspeed) %>% 
  boxplot()

> Insight > - Terdapat outlier pada hum dan windspeed yang ditandai dengan bulatan > - Hampir seluruh kolom numerik memiliki sebaran normal

Correlation

ggcorr(main_bike, label = T)
## Warning in ggcorr(main_bike, label = T): data in column(s) 'season', 'yr',
## 'mnth', 'holiday', 'weekday', 'workingday', 'weathersit' are not numeric and
## were ignored

Insight - Prediktor dengan korelasi kuat terhadap cnt adalah atemp dan temp

Modeling

Modeling 1 prediktor

Dipilih 1 prediktor temp karena prediktor tersebut memiliki korelasi positif yang cukup kuat, sama seperti atemp namun karena atemp adalah suhu terasa seperti temp, sehingga dipilih temp saja.

model_temp <- lm(cnt ~ temp, data = main_bike)
summary(model_temp)
## 
## Call:
## lm(formula = cnt ~ temp, data = main_bike)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4615.3 -1134.9  -104.4  1044.3  3737.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1214.6      161.2   7.537 1.43e-13 ***
## temp          6640.7      305.2  21.759  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1509 on 729 degrees of freedom
## Multiple R-squared:  0.3937, Adjusted R-squared:  0.3929 
## F-statistic: 473.5 on 1 and 729 DF,  p-value: < 2.2e-16
plot(main_bike$temp, main_bike$cnt)
abline(model_temp, col = "red")

Insight - temp meruoakan variabel prediktor yang signifikan - setiap kenaikan temp akan meningkatkan variabel prediktor yakni cnt - nilai R-squared cukup rendah yakni 0.3937 untuk Multiple R-squared

Modeling Semua Prediktor

model_all <- lm(cnt ~ . , main_bike)
summary(model_all)
## 
## Call:
## lm(formula = cnt ~ ., data = main_bike)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3944.7  -348.2    63.8   457.4  2912.7 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1485.84     239.75   6.198 9.77e-10 ***
## season2       884.71     179.49   4.929 1.03e-06 ***
## season3       832.70     213.13   3.907 0.000102 ***
## season4      1575.35     181.00   8.704  < 2e-16 ***
## yr1          2019.74      58.22  34.691  < 2e-16 ***
## mnth2         131.03     143.78   0.911 0.362443    
## mnth3         542.83     165.43   3.281 0.001085 ** 
## mnth4         451.17     247.57   1.822 0.068820 .  
## mnth5         735.51     267.63   2.748 0.006145 ** 
## mnth6         515.40     282.41   1.825 0.068423 .  
## mnth7          30.80     313.82   0.098 0.921854    
## mnth8         444.95     303.17   1.468 0.142639    
## mnth9        1004.17     265.12   3.788 0.000165 ***
## mnth10        519.67     241.55   2.151 0.031787 *  
## mnth11       -116.69     230.78  -0.506 0.613257    
## mnth12        -89.59     182.21  -0.492 0.623098    
## holiday1     -589.70     180.36  -3.270 0.001130 ** 
## weekday1      212.05     109.49   1.937 0.053187 .  
## weekday2      309.53     107.13   2.889 0.003982 ** 
## weekday3      381.36     107.48   3.548 0.000414 ***
## weekday4      386.34     107.53   3.593 0.000350 ***
## weekday5      436.98     107.44   4.067 5.30e-05 ***
## weekday6      440.46     106.56   4.133 4.01e-05 ***
## workingday1       NA         NA      NA       NA    
## weathersit2  -462.54      77.09  -6.000 3.16e-09 ***
## weathersit3 -1965.09     197.05  -9.972  < 2e-16 ***
## temp         2855.01    1398.16   2.042 0.041526 *  
## atemp        1786.16    1462.12   1.222 0.222261    
## hum         -1535.47     292.45  -5.250 2.01e-07 ***
## windspeed   -2823.30     414.55  -6.810 2.09e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 769.2 on 702 degrees of freedom
## Multiple R-squared:  0.8484, Adjusted R-squared:  0.8423 
## F-statistic: 140.3 on 28 and 702 DF,  p-value: < 2.2e-16

Insight - Hampir semua variabel prediktor memiliki signifikansi yang tinggi terhadap prediktor cnt - Nilai R-Quared cukup tinggi yakni 0.8423 untuk Adjusted R-squared

Stepwise Modeling

Backward

model_backward <- step(object = model_all, 
                       direction = "backward"
                       )
## Start:  AIC=9744
## cnt ~ season + yr + mnth + holiday + weekday + workingday + weathersit + 
##     temp + atemp + hum + windspeed
## 
## 
## Step:  AIC=9744
## cnt ~ season + yr + mnth + holiday + weekday + weathersit + temp + 
##     atemp + hum + windspeed
## 
##              Df Sum of Sq        RSS     AIC
## - atemp       1    883090  416284778  9743.6
## <none>                     415401688  9744.0
## - temp        1   2467375  417869064  9746.3
## - holiday     1   6325545  421727233  9753.0
## - weekday     6  15346257  430747945  9758.5
## - hum         1  16312287  431713975  9770.2
## - windspeed   1  27446438  442848126  9788.8
## - mnth       11  57051189  472452877  9816.1
## - season      3  51711125  467112813  9823.8
## - weathersit  2  62999538  478401226  9843.2
## - yr          1 712147968 1127549656 10471.9
## 
## Step:  AIC=9743.55
## cnt ~ season + yr + mnth + holiday + weekday + weathersit + temp + 
##     hum + windspeed
## 
##              Df Sum of Sq        RSS     AIC
## <none>                     416284778  9743.6
## - holiday     1   6653951  422938729  9753.1
## - weekday     6  15002785  431287563  9757.4
## - hum         1  15984413  432269190  9769.1
## - windspeed   1  30717842  447002619  9793.6
## - mnth       11  56630559  472915337  9814.8
## - season      3  52094569  468379347  9823.7
## - weathersit  2  64280982  480565760  9844.5
## - temp        1  70299831  486584608  9855.6
## - yr          1 711362396 1127647174 10470.0

Forward

model_none <- lm(cnt ~ 1 , main_bike)
model_forward <- step(object = model_none,
                      scope = list(upper=model_all),
                      direction = "forward")
## Start:  AIC=11066.88
## cnt ~ 1
## 
##              Df  Sum of Sq        RSS   AIC
## + atemp       1 1091003307 1648532085 10698
## + temp        1 1078688585 1660846807 10703
## + mnth       11 1070192271 1669343121 10727
## + season      3  950595868 1788939524 10761
## + yr          1  879828893 1859706499 10786
## + weathersit  2  271644573 2467890819 10995
## + windspeed   1  150705556 2588829836 11028
## + hum         1   27757373 2711778019 11061
## + holiday     1   12797494 2726737898 11066
## + workingday  1   10246038 2729289354 11066
## <none>                     2739535392 11067
## + weekday     6   17659017 2721876375 11074
## 
## Step:  AIC=10697.61
## cnt ~ atemp
## 
##              Df Sum of Sq        RSS   AIC
## + yr          1 793490560  855041526 10220
## + weathersit  2 163531664 1485000421 10625
## + season      3 154159967 1494372119 10632
## + hum         1  99815222 1548716864 10654
## + mnth       11 135165642 1513366443 10657
## + windspeed   1  39915579 1608616506 10682
## + holiday     1   6274901 1642257184 10697
## <none>                    1648532085 10698
## + workingday  1   2188508 1646343578 10699
## + temp        1    459595 1648072490 10699
## + weekday     6  15529667 1633002418 10703
## 
## Step:  AIC=10219.71
## cnt ~ atemp + yr
## 
##              Df Sum of Sq       RSS   AIC
## + season      3 168162811 686878714 10066
## + mnth       11 155698736 699342790 10095
## + weathersit  2 122502080 732539446 10111
## + hum         1  44927700 810113826 10182
## + windspeed   1  38710961 816330564 10188
## + holiday     1   7682915 847358611 10215
## + weekday     6  16613406 838428120 10217
## + workingday  1   2573280 852468246 10220
## <none>                    855041526 10220
## + temp        1     70539 854970986 10222
## 
## Step:  AIC=10065.63
## cnt ~ atemp + yr + season
## 
##              Df Sum of Sq       RSS     AIC
## + weathersit  2 152432468 534446246  9886.2
## + hum         1  80425891 606452824  9976.6
## + mnth       11  44938268 641940446 10038.2
## + windspeed   1  23229150 663649564 10042.5
## + weekday     6  16893247 669985467 10059.4
## + holiday     1   7013428 679865287 10060.1
## + workingday  1   2690721 684187994 10064.8
## <none>                    686878714 10065.6
## + temp        1   1870793 685007921 10065.6
## 
## Step:  AIC=9886.2
## cnt ~ atemp + yr + season + weathersit
## 
##              Df Sum of Sq       RSS    AIC
## + mnth       11  56800118 477646128 9826.1
## + windspeed   1  15760980 518685266 9866.3
## + weekday     6  21137709 513308537 9868.7
## + holiday     1   9917911 524528336 9874.5
## + hum         1   6812250 527633996 9878.8
## + workingday  1   5894317 528551929 9880.1
## + temp        1   2968145 531478101 9884.1
## <none>                    534446246 9886.2
## 
## Step:  AIC=9826.07
## cnt ~ atemp + yr + season + weathersit + mnth
## 
##              Df Sum of Sq       RSS    AIC
## + windspeed   1  16851463 460794665 9801.8
## + weekday     6  21356715 456289413 9804.6
## + hum         1   9760667 467885460 9813.0
## + holiday     1   7498087 470148041 9816.5
## + workingday  1   6087557 471558571 9818.7
## <none>                    477646128 9826.1
## + temp        1    483138 477162990 9827.3
## 
## Step:  AIC=9801.81
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed
## 
##              Df Sum of Sq       RSS    AIC
## + hum         1  19120666 441673999 9772.8
## + weekday     6  21487907 439306758 9778.9
## + holiday     1   7334929 453459736 9792.1
## + workingday  1   5829171 454965494 9794.5
## + temp        1   2461824 458332841 9799.9
## <none>                    460794665 9801.8
## 
## Step:  AIC=9772.83
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum
## 
##              Df Sum of Sq       RSS    AIC
## + weekday     6  18042031 423631968 9754.3
## + holiday     1   7482642 434191356 9762.3
## + workingday  1   4995892 436678107 9766.5
## + temp        1   2784516 438889483 9770.2
## <none>                    441673999 9772.8
## 
## Step:  AIC=9754.34
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum + 
##     weekday
## 
##              Df Sum of Sq       RSS    AIC
## + holiday     1   5762905 417869064 9746.3
## + workingday  1   5762905 417869064 9746.3
## + temp        1   1904735 421727233 9753.0
## <none>                    423631968 9754.3
## 
## Step:  AIC=9746.33
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum + 
##     weekday + holiday
## 
##        Df Sum of Sq       RSS    AIC
## + temp  1   2467375 415401688 9744.0
## <none>              417869064 9746.3
## 
## Step:  AIC=9744
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum + 
##     weekday + holiday + temp
## 
##        Df Sum of Sq       RSS  AIC
## <none>              415401688 9744

Both

model_both <- step(model_none,
                   scope = list(upper=model_all),
                      direction = "both")
## Start:  AIC=11066.88
## cnt ~ 1
## 
##              Df  Sum of Sq        RSS   AIC
## + atemp       1 1091003307 1648532085 10698
## + temp        1 1078688585 1660846807 10703
## + mnth       11 1070192271 1669343121 10727
## + season      3  950595868 1788939524 10761
## + yr          1  879828893 1859706499 10786
## + weathersit  2  271644573 2467890819 10995
## + windspeed   1  150705556 2588829836 11028
## + hum         1   27757373 2711778019 11061
## + holiday     1   12797494 2726737898 11066
## + workingday  1   10246038 2729289354 11066
## <none>                     2739535392 11067
## + weekday     6   17659017 2721876375 11074
## 
## Step:  AIC=10697.61
## cnt ~ atemp
## 
##              Df  Sum of Sq        RSS   AIC
## + yr          1  793490560  855041526 10220
## + weathersit  2  163531664 1485000421 10625
## + season      3  154159967 1494372119 10632
## + hum         1   99815222 1548716864 10654
## + mnth       11  135165642 1513366443 10657
## + windspeed   1   39915579 1608616506 10682
## + holiday     1    6274901 1642257184 10697
## <none>                     1648532085 10698
## + workingday  1    2188508 1646343578 10699
## + temp        1     459595 1648072490 10699
## + weekday     6   15529667 1633002418 10703
## - atemp       1 1091003307 2739535392 11067
## 
## Step:  AIC=10219.71
## cnt ~ atemp + yr
## 
##              Df  Sum of Sq        RSS   AIC
## + season      3  168162811  686878714 10066
## + mnth       11  155698736  699342790 10095
## + weathersit  2  122502080  732539446 10111
## + hum         1   44927700  810113826 10182
## + windspeed   1   38710961  816330564 10188
## + holiday     1    7682915  847358611 10215
## + weekday     6   16613406  838428120 10217
## + workingday  1    2573280  852468246 10220
## <none>                      855041526 10220
## + temp        1      70539  854970986 10222
## - yr          1  793490560 1648532085 10698
## - atemp       1 1004664973 1859706499 10786
## 
## Step:  AIC=10065.63
## cnt ~ atemp + yr + season
## 
##              Df Sum of Sq        RSS     AIC
## + weathersit  2 152432468  534446246  9886.2
## + hum         1  80425891  606452824  9976.6
## + mnth       11  44938268  641940446 10038.2
## + windspeed   1  23229150  663649564 10042.5
## + weekday     6  16893247  669985467 10059.4
## + holiday     1   7013428  679865287 10060.1
## + workingday  1   2690721  684187994 10064.8
## <none>                     686878714 10065.6
## + temp        1   1870793  685007921 10065.6
## - season      3 168162811  855041526 10219.7
## - atemp       1 218052547  904931261 10265.2
## - yr          1 807493404 1494372119 10631.8
## 
## Step:  AIC=9886.2
## cnt ~ atemp + yr + season + weathersit
## 
##              Df Sum of Sq        RSS     AIC
## + mnth       11  56800118  477646128  9826.1
## + windspeed   1  15760980  518685266  9866.3
## + weekday     6  21137709  513308537  9868.7
## + holiday     1   9917911  524528336  9874.5
## + hum         1   6812250  527633996  9878.8
## + workingday  1   5894317  528551929  9880.1
## + temp        1   2968145  531478101  9884.1
## <none>                     534446246  9886.2
## - weathersit  2 152432468  686878714 10065.6
## - atemp       1 184648589  719094835 10101.1
## - season      3 198093200  732539446 10110.7
## - yr          1 762974811 1297421057 10532.5
## 
## Step:  AIC=9826.07
## cnt ~ atemp + yr + season + weathersit + mnth
## 
##              Df Sum of Sq        RSS     AIC
## + windspeed   1  16851463  460794665  9801.8
## + weekday     6  21356715  456289413  9804.6
## + hum         1   9760667  467885460  9813.0
## + holiday     1   7498087  470148041  9816.5
## + workingday  1   6087557  471558571  9818.7
## <none>                     477646128  9826.1
## + temp        1    483138  477162990  9827.3
## - mnth       11  56800118  534446246  9886.2
## - season      3  59886815  537532943  9906.4
## - atemp       1  62937248  540583376  9914.5
## - weathersit  2 164294318  641940446 10038.2
## - yr          1 770192543 1247838671 10526.0
## 
## Step:  AIC=9801.81
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed
## 
##              Df Sum of Sq        RSS     AIC
## + hum         1  19120666  441673999  9772.8
## + weekday     6  21487907  439306758  9778.9
## + holiday     1   7334929  453459736  9792.1
## + workingday  1   5829171  454965494  9794.5
## + temp        1   2461824  458332841  9799.9
## <none>                     460794665  9801.8
## - windspeed   1  16851463  477646128  9826.1
## - mnth       11  57890601  518685266  9866.3
## - season      3  53118949  513913614  9875.6
## - atemp       1  56942731  517737396  9885.0
## - weathersit  2 156262119  617056784 10011.3
## - yr          1 771437691 1232232356 10518.8
## 
## Step:  AIC=9772.83
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum
## 
##              Df Sum of Sq        RSS     AIC
## + weekday     6  18042031  423631968  9754.3
## + holiday     1   7482642  434191356  9762.3
## + workingday  1   4995892  436678107  9766.5
## + temp        1   2784516  438889483  9770.2
## <none>                     441673999  9772.8
## - hum         1  19120666  460794665  9801.8
## - windspeed   1  26211462  467885460  9813.0
## - mnth       11  62026229  503700227  9846.9
## - season      3  53935519  495609518  9851.1
## - weathersit  2  56326584  498000582  9856.6
## - atemp       1  71073384  512747383  9879.9
## - yr          1 718021306 1159695305 10476.5
## 
## Step:  AIC=9754.34
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum + 
##     weekday
## 
##              Df Sum of Sq        RSS     AIC
## + holiday     1   5762905  417869064  9746.3
## + workingday  1   5762905  417869064  9746.3
## + temp        1   1904735  421727233  9753.0
## <none>                     423631968  9754.3
## - weekday     6  18042031  441673999  9772.8
## - hum         1  15674790  439306758  9778.9
## - windspeed   1  25276961  448908929  9794.7
## - mnth       11  61660211  485292179  9831.7
## - season      3  54557310  478189278  9836.9
## - weathersit  2  61770592  485402560  9849.8
## - atemp       1  67338333  490970301  9860.2
## - yr          1 721451918 1145083887 10479.2
## 
## Step:  AIC=9746.33
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum + 
##     weekday + holiday
## 
##              Df Sum of Sq        RSS     AIC
## + temp        1   2467375  415401688  9744.0
## <none>                     417869064  9746.3
## - holiday     1   5762905  423631968  9754.3
## - weekday     6  16322293  434191356  9762.3
## - hum         1  15934429  433803493  9771.7
## - windspeed   1  25220175  443089238  9787.2
## - mnth       11  59986096  477855159  9822.4
## - season      3  51423350  469292413  9825.2
## - weathersit  2  62709484  480578547  9844.5
## - atemp       1  68715545  486584608  9855.6
## - yr          1 721273907 1139142971 10477.4
## 
## Step:  AIC=9744
## cnt ~ atemp + yr + season + weathersit + mnth + windspeed + hum + 
##     weekday + holiday + temp
## 
##              Df Sum of Sq        RSS     AIC
## - atemp       1    883090  416284778  9743.6
## <none>                     415401688  9744.0
## - temp        1   2467375  417869064  9746.3
## - holiday     1   6325545  421727233  9753.0
## - weekday     6  15346257  430747945  9758.5
## - hum         1  16312287  431713975  9770.2
## - windspeed   1  27446438  442848126  9788.8
## - mnth       11  57051189  472452877  9816.1
## - season      3  51711125  467112813  9823.8
## - weathersit  2  62999538  478401226  9843.2
## - yr          1 712147968 1127549656 10471.9
## 
## Step:  AIC=9743.55
## cnt ~ yr + season + weathersit + mnth + windspeed + hum + weekday + 
##     holiday + temp
## 
##              Df Sum of Sq        RSS     AIC
## <none>                     416284778  9743.6
## + atemp       1    883090  415401688  9744.0
## - holiday     1   6653951  422938729  9753.1
## - weekday     6  15002785  431287563  9757.4
## - hum         1  15984413  432269190  9769.1
## - windspeed   1  30717842  447002619  9793.6
## - mnth       11  56630559  472915337  9814.8
## - season      3  52094569  468379347  9823.7
## - weathersit  2  64280982  480565760  9844.5
## - temp        1  70299831  486584608  9855.6
## - yr          1 711362396 1127647174 10470.0

Perbandingan nilai Adj R-squared

summary(model_backward)$adj.r.squared
## [1] 0.8422094
summary(model_forward)$adj.r.squared
## [1] 0.8423198
summary(model_both)$adj.r.squared
## [1] 0.8422094

Insight - Model backward dan both memiliki Adj R-squared yang sama, yakni 0.8422094 dengan kesamaan prediktor tanpa mengikutsertakan atemp

Prediksi Model & Nilai Error

Prediksi

# Prediktor Temp
pred_temp <- predict(model_temp,
                     newdata = main_bike)

# Semua Prediktor
pred_all <- predict(model_all,
                    newdata = main_bike)
## Warning in predict.lm(model_all, newdata = main_bike): prediction from a rank-
## deficient fit may be misleading
# Backward
pred_backward <- predict(model_backward, 
                         newdata = main_bike)

Nilai Error

RMSE(y_pred = pred_temp, y_true = main_bike$cnt)
## [1] 1507.322
RMSE(y_pred = pred_all, y_true = main_bike$cnt)
## [1] 753.8335
RMSE(y_pred = pred_backward, y_true = main_bike$cnt)
## [1] 754.6344

Range cnt

range(main_bike$cnt)
## [1]   22 8714

Insight - Didapatkan nilai error terkecil adalah untuk semua prediktor, namun terdapat perulangan informasi antara kolom atemp dan temp maka diputuskan untuk memilih model backward dengan menghiraukan variabel atemp

Evaluasi Model

Normalitas

  1. Visualisasi nilai residual pada model backward Diharapkan nilai error/residual terdistribusi normal, dengan ditunjukkan nilai tersebut berada di sekitar nilai 0.
hist(model_backward$residuals)

hist(model_temp$residuals)

  1. Uji dengan shapiro.test() Diharapkan nilai P-value kurang dari alpha
shapiro.test(model_temp$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model_temp$residuals
## W = 0.98671, p-value = 3.392e-06
shapiro.test(model_backward$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model_backward$residuals
## W = 0.95242, p-value = 1.293e-14

Note: dibandingkan p-value dengan alpha (0.05)

p-value >= 0.05 = terima H0 (gagal tolak H0) p-value < 0.05 = terima H1 (tolak H0)

Shapiro-Wilk hypothesis test:

  • H0: error/residual berdistribusi normal
  • H1: error/residual tidak berdistribusi normal

Kesimpulan Normalitas

  1. model_temp:
  • dari visualisasi histogram nilai error berkisar antara -2000 s/d 2000
  • dari shapiro.test nilai p-value < 0.05 uji asumsi Terpenuhi
  1. model_backward:
  • dari visualisasi histogram nilai error berkisar di nilai 0. uji histogram Terpenuhi
  • dari shapiro.test nilai p-value < 0.05 uji asumsi Terpenuhi

Homoscedasticity

Diharapkan nilai error tidak membentuk pola Homoscedasticity

  1. Visualisasi Scatterplot antara fitted value dengan error
plot(x = model_temp$fitted.values, 
     y = model_temp$residuals)
abline(h = 0, col = "red")

plot(x = model_backward$fitted.values, 
     y = model_backward$residuals)
abline(h = 0, col = "red")

  1. Uji statistik, menggunakan bptest() diharapkan variansi error menyebar konstan (terima H0)
bptest(model_temp)
## 
##  studentized Breusch-Pagan test
## 
## data:  model_temp
## BP = 9.5279, df = 1, p-value = 0.002024
bptest(model_backward)
## 
##  studentized Breusch-Pagan test
## 
## data:  model_backward
## BP = 75.358, df = 27, p-value = 1.869e-06

p-value bandingkan dengan alpha (0.05)

p-value >= 0.05 = terima H0 (gagal tolak H0) p-value < 0.05 = terima H1 (tolak H0)

Breusch-Pagan hypothesis test:

  • H0: Variansi error menyebar konstan (Homoscedasticity)
  • H1: Variansi error menyebar tidak konstan/membentuk pola (Heteroscedasticity)

Kesimpulan Homoscedasticity > 1. model_temp: - dari visualisasi scatterplot nilai error lebih tersebar . uji asumsi Terpenuhi - dari bptest nilai p-value lebih mendekati 0.05, meskipun begitu tetap Tidak Terpenuhi > 2. model_backward: - dari visualisasi scatterplot nilai error membentuk pola/Heteroscedasticity. Uji asumsi Tidak Terpenuhi - dari bptest nilai p-value jauh dibawah dari 0.05 sehingga uji asumsi Tidak Terpenuhi

Multicollinearity

Diharapkan tidak ada variabel yang memiliki korelasi teramat kuat (>10)

vif(model_backward)
##                  GVIF Df GVIF^(1/(2*Df))
## season     169.713093  3        2.352985
## yr           1.046249  1        1.022863
## mnth       391.715048 11        1.311784
## holiday      1.116829  1        1.056801
## weekday      1.153293  6        1.011956
## weathersit   1.886133  2        1.171907
## temp         7.006223  1        2.646927
## hum          2.135348  1        1.461283
## windspeed    1.221500  1        1.105215

Penentuan:

  • nilai VIF > 10 : ada multicollinearity
  • nilai VIF < 10 : tidak ada multicollinearity

Kesimpulan vif > 1. model_temp: - tidak dapat diuji dengan vif karena hanya terdiri dari 1 prediktor > 2. model_backward: - tidak terdapat prediktor dengan nilai diatas 10 sehingga uji asumsi Terpenuhi

Conclusion

summary(model_backward)
## 
## Call:
## lm(formula = cnt ~ season + yr + mnth + holiday + weekday + weathersit + 
##     temp + hum + windspeed, data = main_bike)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3960.9  -350.9    74.1   456.0  2919.9 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1543.552    235.129   6.565 1.01e-10 ***
## season2       889.302    179.516   4.954 9.12e-07 ***
## season3       832.236    213.204   3.903 0.000104 ***
## season4      1578.947    181.040   8.722  < 2e-16 ***
## yr1          2018.063     58.225  34.660  < 2e-16 ***
## mnth2         136.855    143.747   0.952 0.341396    
## mnth3         545.132    165.480   3.294 0.001036 ** 
## mnth4         456.494    247.615   1.844 0.065667 .  
## mnth5         723.520    267.541   2.704 0.007010 ** 
## mnth6         490.552    281.776   1.741 0.082133 .  
## mnth7           8.404    313.395   0.027 0.978613    
## mnth8         404.912    301.494   1.343 0.179700    
## mnth9         983.948    264.698   3.717 0.000217 ***
## mnth10        520.937    241.636   2.156 0.031432 *  
## mnth11       -111.362    230.816  -0.482 0.629621    
## mnth12        -84.389    182.229  -0.463 0.643439    
## holiday1     -603.605    180.066  -3.352 0.000845 ***
## weekday1      214.877    109.508   1.962 0.050133 .  
## weekday2      309.132    107.171   2.884 0.004041 ** 
## weekday3      377.407    107.467   3.512 0.000473 ***
## weekday4      385.206    107.562   3.581 0.000366 ***
## weekday5      428.604    107.258   3.996 7.12e-05 ***
## weekday6      438.699    106.590   4.116 4.32e-05 ***
## weathersit2  -465.202     77.083  -6.035 2.57e-09 ***
## weathersit3 -1981.357    196.670 -10.075  < 2e-16 ***
## temp         4487.305    411.838  10.896  < 2e-16 ***
## hum         -1518.178    292.208  -5.196 2.68e-07 ***
## windspeed   -2925.438    406.175  -7.202 1.53e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 769.5 on 703 degrees of freedom
## Multiple R-squared:  0.848,  Adjusted R-squared:  0.8422 
## F-statistic: 145.3 on 27 and 703 DF,  p-value: < 2.2e-16

Dari beberapa evaluasi model yang dilakukan dapat disimpulkan model_backward merupakan model yang lebih baik dibandingkan dengan model_temp yang hanya melihat prediktor temp karena memiliki nilai error yang lebih tinggi.

model_backward memperhatikan semua prediktor kecuali atemp karena dianggap terdapat pengulangan informasi dengan variabel temp. Dan dari summary yang ditampilkan lebih banyak prediktor yang meningkatkan dan memiliki signifikansi yang cukup baik terhadap target yakni cnt.