1 Objektif

Pada artikel ini, dilakukan penerapan algoritma Regresi Linear untuk memprediksi tingkat kejahatan (crime_rate). Dataset yang digunakan berasal dari setiap negara bagian di Amerika Serikat dan diambil pada tahun 1960. Regresi Linear dapat digunakan untuk memprediksi tingkat kejahatan dan mengetahui variabel apa saja yang signifikan berpengaruh terhadap tingkat kejahatan.

2 Persiapan Data dan Exploratory Data Analysis

Melakukan read dataset yang akan digunakan dalam analisis

crime <- read.csv("data_input/crime.csv")
head(crime)

Setelah dilakukan read data, nama kolom masih sulit untuk dimengerti karena berupa singkatan, sehingga diubah menjadi seperti di bawah ini

names(crime) <- c("X" ,"percent_m", "is_south", "mean_education", "police_exp60", "police_exp59", "labour_participation", "m_per1000f", "state_pop", "nonwhites_per1000", "unemploy_m24", "unemploy_m39", "gdp", "inequality", "prob_prison", "time_prison", "crime_rate")
crime

Dataset ini berasal dari tiap negara bagian di Amerika Serikat, diambil pada tahun 1960. Penjelasan dari tiap kolom pada dataset sebagai berikut:

  • persen_m: persentase laki-laki berusia 14-24
  • is_south: apakah berada di negara bagian Selatan. 1 untuk Ya, 0 untuk Tidak.
  • mean_education: rata-rata tahun sekolah
  • police_exp60: pengeluaran polisi tahun 1960
  • police_exp59: pengeluaran polisi tahun 1959
  • labour_participation: tingkat partisipasi angkatan kerja
  • m_per1000f: jumlah pria per 1000 wanita
  • state_pop: populasi negara bagian
  • nonwhites_per1000: jumlah penduduk non-kulit putih per 1000 orang
  • unemploy_m24: tingkat pengangguran pria perkotaan berusia 14-24 tahun
  • unemploy_m39: tingkat pengangguran laki-laki perkotaan berusia 35-39 tahun
  • gdp: produk domestik bruto per kapita
  • inequality: ketimpangan pendapatan
  • prob_prison: kemungkinan hukuman penjara
  • time_prison: waktu rata-rata hukuman di penjara
  • crime_rate: tingkat kejahatan dalam kategori yang tidak ditentukan

Dilakukan pengecekan pada struktur data

library(dplyr)
glimpse(crime)
#> Rows: 47
#> Columns: 17
#> $ X                    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,...
#> $ percent_m            <int> 151, 143, 142, 136, 141, 121, 127, 131, 157, 1...
#> $ is_south             <int> 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1...
#> $ mean_education       <int> 91, 113, 89, 121, 121, 110, 111, 109, 90, 118,...
#> $ police_exp60         <int> 58, 103, 45, 149, 109, 118, 82, 115, 65, 71, 1...
#> $ police_exp59         <int> 56, 95, 44, 141, 101, 115, 79, 109, 62, 68, 11...
#> $ labour_participation <int> 510, 583, 533, 577, 591, 547, 519, 542, 553, 6...
#> $ m_per1000f           <int> 950, 1012, 969, 994, 985, 964, 982, 969, 955, ...
#> $ state_pop            <int> 33, 13, 18, 157, 18, 25, 4, 50, 39, 7, 101, 47...
#> $ nonwhites_per1000    <int> 301, 102, 219, 80, 30, 44, 139, 179, 286, 15, ...
#> $ unemploy_m24         <int> 108, 96, 94, 102, 91, 84, 97, 79, 81, 100, 77,...
#> $ unemploy_m39         <int> 41, 36, 33, 39, 20, 29, 38, 35, 28, 24, 35, 31...
#> $ gdp                  <int> 394, 557, 318, 673, 578, 689, 620, 472, 421, 5...
#> $ inequality           <int> 261, 194, 250, 167, 174, 126, 168, 206, 239, 1...
#> $ prob_prison          <dbl> 0.084602, 0.029599, 0.083401, 0.015801, 0.0413...
#> $ time_prison          <dbl> 26.2011, 25.2999, 24.3006, 29.9012, 21.2998, 2...
#> $ crime_rate           <int> 791, 1635, 578, 1969, 1234, 682, 963, 1555, 85...

Tipe kolom is_south adalah integer sehingga belum tepat, karena seharusnya bertipe factor, kemudian dilakukan pengubahan menjadi bertipe factor. Selain itu, kolom X akan dihapus, karena kolom tersebut berisi nomor dari tiap baris, sehingga tidak dibutuhkan dalam analisis.

crime <- crime %>% 
  select(-X) %>% 
  mutate(is_south = as.factor(is_south))
crime

Dilakukan cek apakah terdapat missing value pada dataset

colSums(is.na(crime))
#>            percent_m             is_south       mean_education 
#>                    0                    0                    0 
#>         police_exp60         police_exp59 labour_participation 
#>                    0                    0                    0 
#>           m_per1000f            state_pop    nonwhites_per1000 
#>                    0                    0                    0 
#>         unemploy_m24         unemploy_m39                  gdp 
#>                    0                    0                    0 
#>           inequality          prob_prison          time_prison 
#>                    0                    0                    0 
#>           crime_rate 
#>                    0

Selanjutnya dicek bagaimana korelasi antar variabel

library(GGally)
ggcorr(crime,label=T, layout.exp=2, hjust= 1)

Pada grafik korelasi di atas, korelasi antara variabel crime_rate dan variabel prediktor bervariasi, ada yang berkorelasi positif dan berkorelasi negatif. Variabel police_exp59 dan police_exp60 memiliki korelasi kuat terhadap crime_rate. Variabel gdp berkorelasi sedang terhadap crime_rate. Variabel nonwhites_per1000, time_prison, unemploy_m24, dan percent_m berkorelasi sangat rendah terhadap crime_rate, dan variabel prediktor sisanya berkorelasi rendah terhadap crime_rate.

Langkah selanjutnya adalah cek distribusi pada variabel yang bertipe numerik

library(tidyr)
ggplot(gather(crime %>% select_if(is.numeric)), aes(value)) + 
    geom_histogram(bins = 10) + 
    facet_wrap(~key, scales = 'free_x')

3 Pembuatan Model

Sebelum dilakukan pemodelan, terlebih dahulu dilakukan split pada data menjadi data train dan data test. Proporsi untuk data train dan data test adalah 70:30.

set.seed(123)
samplesize <- round(0.7 * nrow(crime), 0)
index <- sample(seq_len(nrow(crime)), size = samplesize)

data_train <- crime[index, ]
data_test <- crime[-index, ]

Sebagai model baseline, akan digunakan semua variabel prediktor untuk pembuatan model

model_all <- lm(formula = crime_rate ~ ., data = data_train)
summary(model_all)
#> 
#> Call:
#> lm(formula = crime_rate ~ ., data = data_train)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -304.71  -91.82  -31.14  101.84  426.16 
#> 
#> Coefficients:
#>                         Estimate  Std. Error t value Pr(>|t|)    
#> (Intercept)          -10332.3968   2232.6587  -4.628  0.00024 ***
#> percent_m                 9.6677      4.4748   2.160  0.04530 *  
#> is_south1                 8.1011    223.3111   0.036  0.97148    
#> mean_education           26.2959      7.6978   3.416  0.00329 ** 
#> police_exp60             20.0908     11.8628   1.694  0.10858    
#> police_exp59            -14.8569     13.5366  -1.098  0.28771    
#> labour_participation     -4.6592      2.1721  -2.145  0.04669 *  
#> m_per1000f                8.2657      2.8606   2.889  0.01019 *  
#> state_pop                 1.3699      1.7925   0.764  0.45521    
#> nonwhites_per1000         1.3334      0.9991   1.335  0.19960    
#> unemploy_m24            -11.9795      5.1323  -2.334  0.03212 *  
#> unemploy_m39             14.2389      9.8688   1.443  0.16724    
#> gdp                       0.6436      1.2624   0.510  0.61671    
#> inequality                4.2472      2.8552   1.488  0.15518    
#> prob_prison             940.1481   3823.0185   0.246  0.80869    
#> time_prison              14.7756      9.4304   1.567  0.13558    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 199.9 on 17 degrees of freedom
#> Multiple R-squared:  0.8813, Adjusted R-squared:  0.7766 
#> F-statistic: 8.414 on 15 and 17 DF,  p-value: 0.00003846

Dari summary model, dapat dinyatakan bahwa:

  • pada taraf signifikansi 0.05, variabel percent_m, mean_education, labour_participation, m_per1000f dan unemploy_m24 berpengaruh signifikan terhadap crime_rate
  • pada uji kecocokan model, didapatkan p-value kurang dari 0.05 sehingga model dinyatakan cocok dalam menyatakan hubungan variabel-variabel prediktor terhadap variabel target yaitu crime_rate
  • model yang dibuat menggambarkan 77.66% informasi dari crime_rate, sedangkan sisanya yaitu 22.34% dipengaruhi variabel lain diluar model

3.1 Model Tuning

Model yang dibuat masih terdapat beberapa variabel yang tidak berpengaruh signifikan terhadap variabel target. Dilakukan stepwise regression untuk memilih variabel terbaik.

model_none <- lm(crime_rate ~ 1, data = data_train)

3.1.1 Metode Backward

model_backward <- step(object = model_all, direction = "backward")
#> Start:  AIC=359.75
#> crime_rate ~ percent_m + is_south + mean_education + police_exp60 + 
#>     police_exp59 + labour_participation + m_per1000f + state_pop + 
#>     nonwhites_per1000 + unemploy_m24 + unemploy_m39 + gdp + inequality + 
#>     prob_prison + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - is_south              1        53  679085 357.76
#> - prob_prison           1      2416  681448 357.87
#> - gdp                   1     10383  689416 358.25
#> - state_pop             1     23329  702361 358.87
#> <none>                               679033 359.75
#> - police_exp59          1     48115  727148 360.01
#> - nonwhites_per1000     1     71147  750179 361.04
#> - unemploy_m39          1     83151  762184 361.57
#> - inequality            1     88386  767419 361.79
#> - time_prison           1     98056  777088 362.20
#> - police_exp60          1    114569  793601 362.90
#> - labour_participation  1    183780  862813 365.66
#> - percent_m             1    186444  865477 365.76
#> - unemploy_m24          1    217618  896650 366.93
#> - m_per1000f            1    333491 1012523 370.94
#> - mean_education        1    466102 1145135 375.00
#> 
#> Step:  AIC=357.76
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
#>     unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison + 
#>     time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - prob_prison           1      2364  681450 355.87
#> - gdp                   1     11061  690146 356.29
#> - state_pop             1     24951  704037 356.95
#> <none>                               679085 357.76
#> - police_exp59          1     48985  728070 358.05
#> - unemploy_m39          1     89682  768768 359.85
#> - time_prison           1    100182  779268 360.30
#> - police_exp60          1    115574  794659 360.94
#> - nonwhites_per1000     1    136646  815731 361.81
#> - inequality            1    138689  817775 361.89
#> - percent_m             1    187112  866197 363.79
#> - labour_participation  1    244731  923817 365.91
#> - unemploy_m24          1    245363  924448 365.93
#> - m_per1000f            1    333731 1012816 368.95
#> - mean_education        1    490535 1169620 373.70
#> 
#> Step:  AIC=355.87
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
#>     unemploy_m24 + unemploy_m39 + gdp + inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - gdp                   1      9229  690678 354.31
#> - state_pop             1     24282  705731 355.03
#> <none>                               681450 355.87
#> - police_exp59          1     55730  737180 356.46
#> - unemploy_m39          1     96511  777961 358.24
#> - police_exp60          1    125056  806506 359.43
#> - inequality            1    141439  822888 360.09
#> - nonwhites_per1000     1    143129  824578 360.16
#> - time_prison           1    150025  831475 360.44
#> - percent_m             1    184813  866263 361.79
#> - labour_participation  1    242368  923817 363.91
#> - unemploy_m24          1    244548  925998 363.99
#> - m_per1000f            1    339823 1021273 367.22
#> - mean_education        1    503886 1185335 372.14
#> 
#> Step:  AIC=354.31
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
#>     unemploy_m24 + unemploy_m39 + inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - state_pop             1     28240  718918 353.64
#> <none>                               690678 354.31
#> - police_exp59          1     51483  742161 354.69
#> - unemploy_m39          1    116128  806806 357.44
#> - police_exp60          1    122245  812923 357.69
#> - nonwhites_per1000     1    137395  828073 358.30
#> - inequality            1    144212  834890 358.57
#> - percent_m             1    177929  868607 359.88
#> - time_prison           1    213364  904042 361.20
#> - labour_participation  1    245231  935909 362.34
#> - unemploy_m24          1    266982  957660 363.10
#> - m_per1000f            1    388189 1078867 367.03
#> - mean_education        1    639260 1329938 373.94
#> 
#> Step:  AIC=353.64
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + nonwhites_per1000 + unemploy_m24 + 
#>     unemploy_m39 + inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - police_exp59          1     35183  754102 353.21
#> <none>                               718918 353.64
#> - unemploy_m39          1    106513  825431 356.20
#> - police_exp60          1    108581  827499 356.28
#> - nonwhites_per1000     1    139272  858191 357.48
#> - percent_m             1    151792  870711 357.96
#> - labour_participation  1    217241  936159 360.35
#> - inequality            1    217382  936301 360.36
#> - unemploy_m24          1    243808  962726 361.27
#> - time_prison           1    248908  967826 361.45
#> - m_per1000f            1    407117 1126035 366.44
#> - mean_education        1    614356 1333275 372.02
#> 
#> Step:  AIC=353.21
#> crime_rate ~ percent_m + mean_education + police_exp60 + labour_participation + 
#>     m_per1000f + nonwhites_per1000 + unemploy_m24 + unemploy_m39 + 
#>     inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> <none>                               754102 353.21
#> - nonwhites_per1000     1    112421  866523 355.80
#> - unemploy_m39          1    123900  878001 356.23
#> - percent_m             1    156886  910987 357.45
#> - labour_participation  1    182728  936829 358.37
#> - unemploy_m24          1    231604  985706 360.05
#> - inequality            1    249087 1003189 360.63
#> - time_prison           1    262614 1016716 361.07
#> - m_per1000f            1    405612 1159714 365.42
#> - mean_education        1    579494 1333596 370.03
#> - police_exp60          1    765063 1519165 374.33
summary(model_backward)
#> 
#> Call:
#> lm(formula = crime_rate ~ percent_m + mean_education + police_exp60 + 
#>     labour_participation + m_per1000f + nonwhites_per1000 + unemploy_m24 + 
#>     unemploy_m39 + inequality + time_prison, data = data_train)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -283.69  -80.50  -16.92   82.84  429.87 
#> 
#> Coefficients:
#>                        Estimate Std. Error t value   Pr(>|t|)    
#> (Intercept)          -9643.5294  1570.9487  -6.139 0.00000353 ***
#> percent_m                8.3543     3.9050   2.139   0.043752 *  
#> mean_education          25.3573     6.1671   4.112   0.000459 ***
#> police_exp60             8.3365     1.7646   4.724   0.000103 ***
#> labour_participation    -3.5144     1.5221  -2.309   0.030727 *  
#> m_per1000f               7.1285     2.0723   3.440   0.002337 ** 
#> nonwhites_per1000        1.1725     0.6474   1.811   0.083817 .  
#> unemploy_m24           -10.2895     3.9585  -2.599   0.016368 *  
#> unemploy_m39            16.0240     8.4283   1.901   0.070453 .  
#> inequality               4.6137     1.7115   2.696   0.013205 *  
#> time_prison             15.7576     5.6929   2.768   0.011222 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 185.1 on 22 degrees of freedom
#> Multiple R-squared:  0.8682, Adjusted R-squared:  0.8082 
#> F-statistic: 14.49 on 10 and 22 DF,  p-value: 0.0000001705

Hasil summary dari model backward, variabel nonwhites_per1000 tidak berpengaruh signifikan terhadap crime_rate karena p-value yang cukup besar pada uji t yaitu 0.083817 dan lebih besar alpha = 0.05. Sehingga variabel nonwhites_per1000 akan coba dihapus dari pembuatan model.

model_backward_sig <- lm(formula = crime_rate ~ percent_m + mean_education + 
                           police_exp60 + labour_participation + m_per1000f + unemploy_m24 + 
                           unemploy_m39 + inequality + time_prison, data = data_train)
summary(model_backward_sig)
#> 
#> Call:
#> lm(formula = crime_rate ~ percent_m + mean_education + police_exp60 + 
#>     labour_participation + m_per1000f + unemploy_m24 + unemploy_m39 + 
#>     inequality + time_prison, data = data_train)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -311.71  -84.46   -6.46  114.52  454.64 
#> 
#> Coefficients:
#>                       Estimate Std. Error t value   Pr(>|t|)    
#> (Intercept)          -9471.069   1643.937  -5.761 0.00000721 ***
#> percent_m               11.252      3.735   3.013    0.00620 ** 
#> mean_education          22.617      6.268   3.608    0.00148 ** 
#> police_exp60             9.667      1.682   5.748 0.00000745 ***
#> labour_participation    -3.034      1.571  -1.931    0.06589 .  
#> m_per1000f               6.405      2.132   3.005    0.00632 ** 
#> unemploy_m24           -11.069      4.125  -2.683    0.01328 *  
#> unemploy_m39            17.352      8.803   1.971    0.06085 .  
#> inequality               5.650      1.691   3.341    0.00284 ** 
#> time_prison             14.612      5.931   2.463    0.02167 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 194.1 on 23 degrees of freedom
#> Multiple R-squared:  0.8485, Adjusted R-squared:  0.7892 
#> F-statistic: 14.31 on 9 and 23 DF,  p-value: 0.0000001824

3.1.2 Metode Forward

model_forward <- step(object = model_none, direction = "forward", 
                      scope = list(lower=model_none, upper=model_all))
#> Start:  AIC=400.08
#> crime_rate ~ 1
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + police_exp60          1   3067509 2652830 376.72
#> + police_exp59          1   2936030 2784309 378.32
#> + gdp                   1   1438230 4282109 392.52
#> + prob_prison           1   1208195 4512144 394.25
#> + state_pop             1    863317 4857022 396.68
#> + mean_education        1    644363 5075976 398.14
#> + time_prison           1    342134 5378205 400.04
#> <none>                              5720339 400.08
#> + nonwhites_per1000     1    250813 5469526 400.60
#> + inequality            1    237199 5483139 400.68
#> + m_per1000f            1    232198 5488141 400.71
#> + labour_participation  1    130122 5590217 401.32
#> + unemploy_m39          1    128267 5592072 401.33
#> + unemploy_m24          1    106316 5614023 401.46
#> + percent_m             1     11241 5709098 402.02
#> + is_south              1      1342 5718997 402.07
#> 
#> Step:  AIC=376.72
#> crime_rate ~ police_exp60
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + percent_m             1    584944 2067886 370.50
#> + inequality            1    503782 2149048 371.77
#> + nonwhites_per1000     1    419666 2233164 373.04
#> + is_south              1    405185 2247645 373.25
#> + m_per1000f            1    239626 2413204 375.60
#> <none>                              2652830 376.72
#> + labour_participation  1     72927 2579903 377.80
#> + gdp                   1     68380 2584450 377.86
#> + police_exp59          1     59345 2593485 377.98
#> + state_pop             1     40469 2612361 378.22
#> + unemploy_m24          1     26535 2626295 378.39
#> + time_prison           1     20992 2631838 378.46
#> + unemploy_m39          1      2025 2650805 378.70
#> + mean_education        1       533 2652297 378.72
#> + prob_prison           1       196 2652634 378.72
#> 
#> Step:  AIC=370.5
#> crime_rate ~ police_exp60 + percent_m
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + m_per1000f            1    208254 1859632 369.00
#> + inequality            1    206298 1861588 369.03
#> + is_south              1    145600 1922287 370.09
#> <none>                              2067886 370.50
#> + nonwhites_per1000     1     77702 1990185 371.24
#> + mean_education        1     52341 2015546 371.66
#> + labour_participation  1     43643 2024244 371.80
#> + police_exp59          1     41682 2026204 371.83
#> + unemploy_m39          1     14239 2053648 372.27
#> + state_pop             1     10511 2057375 372.33
#> + time_prison           1      7176 2060710 372.39
#> + gdp                   1      4096 2063790 372.44
#> + unemploy_m24          1      1825 2066062 372.47
#> + prob_prison           1      1097 2066790 372.48
#> 
#> Step:  AIC=369
#> crime_rate ~ police_exp60 + percent_m + m_per1000f
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + is_south              1    329154 1530478 364.57
#> + inequality            1    238278 1621354 366.47
#> + nonwhites_per1000     1    199051 1660581 367.26
#> <none>                              1859632 369.00
#> + time_prison           1     91701 1767931 369.33
#> + unemploy_m24          1     47868 1811764 370.14
#> + state_pop             1     25337 1834295 370.55
#> + unemploy_m39          1     22867 1836765 370.59
#> + police_exp59          1     12801 1846831 370.77
#> + prob_prison           1      3360 1856272 370.94
#> + mean_education        1      3157 1856475 370.94
#> + gdp                   1      1436 1858196 370.97
#> + labour_participation  1        39 1859593 371.00
#> 
#> Step:  AIC=364.57
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + mean_education        1    150791 1379687 363.15
#> + time_prison           1    144235 1386243 363.30
#> <none>                              1530478 364.57
#> + gdp                   1     75647 1454831 364.90
#> + unemploy_m24          1     52556 1477922 365.42
#> + prob_prison           1     49823 1480655 365.48
#> + labour_participation  1     37770 1492707 365.75
#> + state_pop             1     19849 1510629 366.14
#> + inequality            1     17586 1512891 366.19
#> + police_exp59          1      8087 1522390 366.40
#> + nonwhites_per1000     1      5799 1524678 366.45
#> + unemploy_m39          1      2597 1527881 366.52
#> 
#> Step:  AIC=363.15
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south + 
#>     mean_education
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + time_prison           1    276560 1103127 357.77
#> + inequality            1    114707 1264980 362.28
#> + state_pop             1     83064 1296622 363.10
#> <none>                              1379687 363.15
#> + prob_prison           1     62098 1317588 363.63
#> + police_exp59          1     51031 1328656 363.90
#> + unemploy_m39          1     26743 1352944 364.50
#> + nonwhites_per1000     1     22643 1357044 364.60
#> + gdp                   1     12574 1367113 364.85
#> + unemploy_m24          1      9499 1370188 364.92
#> + labour_participation  1      6348 1373339 365.00
#> 
#> Step:  AIC=357.77
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south + 
#>     mean_education + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + inequality            1     97087 1006039 356.73
#> <none>                              1103127 357.77
#> + state_pop             1     35495 1067632 358.69
#> + prob_prison           1     29800 1073327 358.86
#> + police_exp59          1     14494 1088632 359.33
#> + unemploy_m39          1     11090 1092037 359.43
#> + unemploy_m24          1      9711 1093415 359.47
#> + nonwhites_per1000     1      9460 1093666 359.48
#> + gdp                   1      7889 1095238 359.53
#> + labour_participation  1      1928 1101198 359.71
#> 
#> Step:  AIC=356.73
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south + 
#>     mean_education + time_prison + inequality
#> 
#>                        Df Sum of Sq     RSS    AIC
#> <none>                              1006039 356.73
#> + labour_participation  1     35364  970675 357.54
#> + unemploy_m39          1     33057  972983 357.62
#> + gdp                   1     10518  995521 358.38
#> + prob_prison           1      7291  998748 358.49
#> + state_pop             1      4712 1001328 358.57
#> + police_exp59          1      3856 1002184 358.60
#> + nonwhites_per1000     1      2703 1003336 358.64
#> + unemploy_m24          1       873 1005167 358.70
summary(model_forward)
#> 
#> Call:
#> lm(formula = crime_rate ~ police_exp60 + percent_m + m_per1000f + 
#>     is_south + mean_education + time_prison + inequality, data = data_train)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -363.98  -88.84   11.32   78.90  531.21 
#> 
#> Coefficients:
#>                 Estimate Std. Error t value    Pr(>|t|)    
#> (Intercept)    -7901.441   1534.858  -5.148 0.000025425 ***
#> police_exp60      11.153      1.514   7.366 0.000000102 ***
#> percent_m          8.039      3.735   2.153     0.04120 *  
#> m_per1000f         3.982      1.529   2.605     0.01526 *  
#> is_south1        344.961    137.104   2.516     0.01866 *  
#> mean_education    16.534      5.409   3.057     0.00527 ** 
#> time_prison       14.378      5.668   2.537     0.01781 *  
#> inequality         3.039      1.957   1.553     0.13293    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 200.6 on 25 degrees of freedom
#> Multiple R-squared:  0.8241, Adjusted R-squared:  0.7749 
#> F-statistic: 16.74 on 7 and 25 DF,  p-value: 0.00000005419

Hasil summary dari model forward, variabel inequality tidak berpengaruh signifikan terhadap crime_rate karena p-value yang cukup besar pada uji t yaitu 0.13293 dan lebih besar alpha = 0.05. Sehingga variabel inequality akan coba dihapus dari pembuatan model.

model_forward_sig <- lm(formula = crime_rate ~ police_exp60 + percent_m + m_per1000f + 
                          is_south + mean_education + time_prison, data = data_train)
summary(model_forward_sig)
#> 
#> Call:
#> lm(formula = crime_rate ~ police_exp60 + percent_m + m_per1000f + 
#>     is_south + mean_education + time_prison, data = data_train)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -326.64 -156.72    7.77  101.09  541.88 
#> 
#> Coefficients:
#>                 Estimate Std. Error t value    Pr(>|t|)    
#> (Intercept)    -7771.882   1573.673  -4.939 0.000039506 ***
#> police_exp60      10.232      1.430   7.153 0.000000135 ***
#> percent_m          8.436      3.826   2.205    0.036491 *  
#> m_per1000f         4.793      1.476   3.248    0.003199 ** 
#> is_south1        461.421    117.863   3.915    0.000583 ***
#> mean_education    13.074      5.061   2.583    0.015767 *  
#> time_prison       14.838      5.812   2.553    0.016890 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 206 on 26 degrees of freedom
#> Multiple R-squared:  0.8072, Adjusted R-squared:  0.7627 
#> F-statistic: 18.14 on 6 and 26 DF,  p-value: 0.0000000361

3.1.3 Metode Both

model_both1 <- step(object = model_all, direction = "both",
                    scope = list(lower=model_none, upper=model_all))
#> Start:  AIC=359.75
#> crime_rate ~ percent_m + is_south + mean_education + police_exp60 + 
#>     police_exp59 + labour_participation + m_per1000f + state_pop + 
#>     nonwhites_per1000 + unemploy_m24 + unemploy_m39 + gdp + inequality + 
#>     prob_prison + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - is_south              1        53  679085 357.76
#> - prob_prison           1      2416  681448 357.87
#> - gdp                   1     10383  689416 358.25
#> - state_pop             1     23329  702361 358.87
#> <none>                               679033 359.75
#> - police_exp59          1     48115  727148 360.01
#> - nonwhites_per1000     1     71147  750179 361.04
#> - unemploy_m39          1     83151  762184 361.57
#> - inequality            1     88386  767419 361.79
#> - time_prison           1     98056  777088 362.20
#> - police_exp60          1    114569  793601 362.90
#> - labour_participation  1    183780  862813 365.66
#> - percent_m             1    186444  865477 365.76
#> - unemploy_m24          1    217618  896650 366.93
#> - m_per1000f            1    333491 1012523 370.94
#> - mean_education        1    466102 1145135 375.00
#> 
#> Step:  AIC=357.76
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
#>     unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison + 
#>     time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - prob_prison           1      2364  681450 355.87
#> - gdp                   1     11061  690146 356.29
#> - state_pop             1     24951  704037 356.95
#> <none>                               679085 357.76
#> - police_exp59          1     48985  728070 358.05
#> + is_south              1        53  679033 359.75
#> - unemploy_m39          1     89682  768768 359.85
#> - time_prison           1    100182  779268 360.30
#> - police_exp60          1    115574  794659 360.94
#> - nonwhites_per1000     1    136646  815731 361.81
#> - inequality            1    138689  817775 361.89
#> - percent_m             1    187112  866197 363.79
#> - labour_participation  1    244731  923817 365.91
#> - unemploy_m24          1    245363  924448 365.93
#> - m_per1000f            1    333731 1012816 368.95
#> - mean_education        1    490535 1169620 373.70
#> 
#> Step:  AIC=355.87
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
#>     unemploy_m24 + unemploy_m39 + gdp + inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - gdp                   1      9229  690678 354.31
#> - state_pop             1     24282  705731 355.03
#> <none>                               681450 355.87
#> - police_exp59          1     55730  737180 356.46
#> + prob_prison           1      2364  679085 357.76
#> + is_south              1         1  681448 357.87
#> - unemploy_m39          1     96511  777961 358.24
#> - police_exp60          1    125056  806506 359.43
#> - inequality            1    141439  822888 360.09
#> - nonwhites_per1000     1    143129  824578 360.16
#> - time_prison           1    150025  831475 360.44
#> - percent_m             1    184813  866263 361.79
#> - labour_participation  1    242368  923817 363.91
#> - unemploy_m24          1    244548  925998 363.99
#> - m_per1000f            1    339823 1021273 367.22
#> - mean_education        1    503886 1185335 372.14
#> 
#> Step:  AIC=354.31
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
#>     unemploy_m24 + unemploy_m39 + inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - state_pop             1     28240  718918 353.64
#> <none>                               690678 354.31
#> - police_exp59          1     51483  742161 354.69
#> + gdp                   1      9229  681450 355.87
#> + prob_prison           1       532  690146 356.29
#> + is_south              1       510  690168 356.29
#> - unemploy_m39          1    116128  806806 357.44
#> - police_exp60          1    122245  812923 357.69
#> - nonwhites_per1000     1    137395  828073 358.30
#> - inequality            1    144212  834890 358.57
#> - percent_m             1    177929  868607 359.88
#> - time_prison           1    213364  904042 361.20
#> - labour_participation  1    245231  935909 362.34
#> - unemploy_m24          1    266982  957660 363.10
#> - m_per1000f            1    388189 1078867 367.03
#> - mean_education        1    639260 1329938 373.94
#> 
#> Step:  AIC=353.64
#> crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
#>     labour_participation + m_per1000f + nonwhites_per1000 + unemploy_m24 + 
#>     unemploy_m39 + inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> - police_exp59          1     35183  754102 353.21
#> <none>                               718918 353.64
#> + state_pop             1     28240  690678 354.31
#> + gdp                   1     13187  705731 355.03
#> + is_south              1       539  718380 355.61
#> + prob_prison           1       107  718811 355.63
#> - unemploy_m39          1    106513  825431 356.20
#> - police_exp60          1    108581  827499 356.28
#> - nonwhites_per1000     1    139272  858191 357.48
#> - percent_m             1    151792  870711 357.96
#> - labour_participation  1    217241  936159 360.35
#> - inequality            1    217382  936301 360.36
#> - unemploy_m24          1    243808  962726 361.27
#> - time_prison           1    248908  967826 361.45
#> - m_per1000f            1    407117 1126035 366.44
#> - mean_education        1    614356 1333275 372.02
#> 
#> Step:  AIC=353.21
#> crime_rate ~ percent_m + mean_education + police_exp60 + labour_participation + 
#>     m_per1000f + nonwhites_per1000 + unemploy_m24 + unemploy_m39 + 
#>     inequality + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> <none>                               754102 353.21
#> + police_exp59          1     35183  718918 353.64
#> + state_pop             1     11941  742161 354.69
#> + gdp                   1      7432  746670 354.89
#> + prob_prison           1      3268  750834 355.07
#> + is_south              1         2  754099 355.21
#> - nonwhites_per1000     1    112421  866523 355.80
#> - unemploy_m39          1    123900  878001 356.23
#> - percent_m             1    156886  910987 357.45
#> - labour_participation  1    182728  936829 358.37
#> - unemploy_m24          1    231604  985706 360.05
#> - inequality            1    249087 1003189 360.63
#> - time_prison           1    262614 1016716 361.07
#> - m_per1000f            1    405612 1159714 365.42
#> - mean_education        1    579494 1333596 370.03
#> - police_exp60          1    765063 1519165 374.33
model_both2 <- step(object = model_none, direction = "both",
                    scope = list(lower=model_none, upper=model_all))
#> Start:  AIC=400.08
#> crime_rate ~ 1
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + police_exp60          1   3067509 2652830 376.72
#> + police_exp59          1   2936030 2784309 378.32
#> + gdp                   1   1438230 4282109 392.52
#> + prob_prison           1   1208195 4512144 394.25
#> + state_pop             1    863317 4857022 396.68
#> + mean_education        1    644363 5075976 398.14
#> + time_prison           1    342134 5378205 400.04
#> <none>                              5720339 400.08
#> + nonwhites_per1000     1    250813 5469526 400.60
#> + inequality            1    237199 5483139 400.68
#> + m_per1000f            1    232198 5488141 400.71
#> + labour_participation  1    130122 5590217 401.32
#> + unemploy_m39          1    128267 5592072 401.33
#> + unemploy_m24          1    106316 5614023 401.46
#> + percent_m             1     11241 5709098 402.02
#> + is_south              1      1342 5718997 402.07
#> 
#> Step:  AIC=376.72
#> crime_rate ~ police_exp60
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + percent_m             1    584944 2067886 370.50
#> + inequality            1    503782 2149048 371.77
#> + nonwhites_per1000     1    419666 2233164 373.04
#> + is_south              1    405185 2247645 373.25
#> + m_per1000f            1    239626 2413204 375.60
#> <none>                              2652830 376.72
#> + labour_participation  1     72927 2579903 377.80
#> + gdp                   1     68380 2584450 377.86
#> + police_exp59          1     59345 2593485 377.98
#> + state_pop             1     40469 2612361 378.22
#> + unemploy_m24          1     26535 2626295 378.39
#> + time_prison           1     20992 2631838 378.46
#> + unemploy_m39          1      2025 2650805 378.70
#> + mean_education        1       533 2652297 378.72
#> + prob_prison           1       196 2652634 378.72
#> - police_exp60          1   3067509 5720339 400.08
#> 
#> Step:  AIC=370.5
#> crime_rate ~ police_exp60 + percent_m
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + m_per1000f            1    208254 1859632 369.00
#> + inequality            1    206298 1861588 369.03
#> + is_south              1    145600 1922287 370.09
#> <none>                              2067886 370.50
#> + nonwhites_per1000     1     77702 1990185 371.24
#> + mean_education        1     52341 2015546 371.66
#> + labour_participation  1     43643 2024244 371.80
#> + police_exp59          1     41682 2026204 371.83
#> + unemploy_m39          1     14239 2053648 372.27
#> + state_pop             1     10511 2057375 372.33
#> + time_prison           1      7176 2060710 372.39
#> + gdp                   1      4096 2063790 372.44
#> + unemploy_m24          1      1825 2066062 372.47
#> + prob_prison           1      1097 2066790 372.48
#> - percent_m             1    584944 2652830 376.72
#> - police_exp60          1   3641211 5709098 402.02
#> 
#> Step:  AIC=369
#> crime_rate ~ police_exp60 + percent_m + m_per1000f
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + is_south              1    329154 1530478 364.57
#> + inequality            1    238278 1621354 366.47
#> + nonwhites_per1000     1    199051 1660581 367.26
#> <none>                              1859632 369.00
#> + time_prison           1     91701 1767931 369.33
#> + unemploy_m24          1     47868 1811764 370.14
#> - m_per1000f            1    208254 2067886 370.50
#> + state_pop             1     25337 1834295 370.55
#> + unemploy_m39          1     22867 1836765 370.59
#> + police_exp59          1     12801 1846831 370.77
#> + prob_prison           1      3360 1856272 370.94
#> + mean_education        1      3157 1856475 370.94
#> + gdp                   1      1436 1858196 370.97
#> + labour_participation  1        39 1859593 371.00
#> - percent_m             1    553572 2413204 375.60
#> - police_exp60          1   3612637 5472269 402.62
#> 
#> Step:  AIC=364.57
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + mean_education        1    150791 1379687 363.15
#> + time_prison           1    144235 1386243 363.30
#> <none>                              1530478 364.57
#> + gdp                   1     75647 1454831 364.90
#> + unemploy_m24          1     52556 1477922 365.42
#> + prob_prison           1     49823 1480655 365.48
#> + labour_participation  1     37770 1492707 365.75
#> + state_pop             1     19849 1510629 366.14
#> + inequality            1     17586 1512891 366.19
#> + police_exp59          1      8087 1522390 366.40
#> + nonwhites_per1000     1      5799 1524678 366.45
#> + unemploy_m39          1      2597 1527881 366.52
#> - percent_m             1    203794 1734271 366.70
#> - is_south              1    329154 1859632 369.00
#> - m_per1000f            1    391809 1922287 370.09
#> - police_exp60          1   3904810 5435287 404.39
#> 
#> Step:  AIC=363.15
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south + 
#>     mean_education
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + time_prison           1    276560 1103127 357.77
#> + inequality            1    114707 1264980 362.28
#> + state_pop             1     83064 1296622 363.10
#> <none>                              1379687 363.15
#> + prob_prison           1     62098 1317588 363.63
#> + police_exp59          1     51031 1328656 363.90
#> + unemploy_m39          1     26743 1352944 364.50
#> - mean_education        1    150791 1530478 364.57
#> + nonwhites_per1000     1     22643 1357044 364.60
#> + gdp                   1     12574 1367113 364.85
#> + unemploy_m24          1      9499 1370188 364.92
#> + labour_participation  1      6348 1373339 365.00
#> - m_per1000f            1    258475 1638162 366.82
#> - percent_m             1    260338 1640025 366.85
#> - is_south              1    476788 1856475 370.94
#> - police_exp60          1   3135153 4514840 400.27
#> 
#> Step:  AIC=357.77
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south + 
#>     mean_education + time_prison
#> 
#>                        Df Sum of Sq     RSS    AIC
#> + inequality            1     97087 1006039 356.73
#> <none>                              1103127 357.77
#> + state_pop             1     35495 1067632 358.69
#> + prob_prison           1     29800 1073327 358.86
#> + police_exp59          1     14494 1088632 359.33
#> + unemploy_m39          1     11090 1092037 359.43
#> + unemploy_m24          1      9711 1093415 359.47
#> + nonwhites_per1000     1      9460 1093666 359.48
#> + gdp                   1      7889 1095238 359.53
#> + labour_participation  1      1928 1101198 359.71
#> - percent_m             1    206309 1309436 361.42
#> - time_prison           1    276560 1379687 363.15
#> - mean_education        1    283116 1386243 363.30
#> - m_per1000f            1    447536 1550663 367.00
#> - is_south              1    650267 1753394 371.06
#> - police_exp60          1   2170936 3274063 391.67
#> 
#> Step:  AIC=356.73
#> crime_rate ~ police_exp60 + percent_m + m_per1000f + is_south + 
#>     mean_education + time_prison + inequality
#> 
#>                        Df Sum of Sq     RSS    AIC
#> <none>                              1006039 356.73
#> + labour_participation  1     35364  970675 357.54
#> + unemploy_m39          1     33057  972983 357.62
#> - inequality            1     97087 1103127 357.77
#> + gdp                   1     10518  995521 358.38
#> + prob_prison           1      7291  998748 358.49
#> + state_pop             1      4712 1001328 358.57
#> + police_exp59          1      3856 1002184 358.60
#> + nonwhites_per1000     1      2703 1003336 358.64
#> + unemploy_m24          1       873 1005167 358.70
#> - percent_m             1    186467 1192507 360.34
#> - is_south              1    254750 1260789 362.17
#> - time_prison           1    258941 1264980 362.28
#> - m_per1000f            1    273030 1279069 362.65
#> - mean_education        1    375999 1382039 365.20
#> - police_exp60          1   2183658 3189697 392.80

3.1.4 Goodness of Fit

summary(model_all)$adj.r.squared
#> [1] 0.7765553
summary(model_backward)$adj.r.squared
#> [1] 0.8082499
summary(model_backward_sig)$adj.r.squared
#> [1] 0.7892437
summary(model_forward)$adj.r.squared
#> [1] 0.7748856
summary(model_forward_sig)$adj.r.squared
#> [1] 0.7626549
summary(model_both1)$adj.r.squared
#> [1] 0.8082499
summary(model_both2)$adj.r.squared
#> [1] 0.7748856

Menggunakan step wise regression, didapatkan nilai adjusted r-squared yang lebih tinggi. Model backward dan both1 memiliki adjusted r-squared paling tinggi sebesar 80.82%, diikuti model backward_sig dan model_all.

3.2 Model Evaluation

library(performance)
compare_performance(model_all, model_backward, model_backward_sig, model_forward,
                    model_forward_sig, model_both1, model_both2)

3.2.1 Evaluasi pada Data Test

library(MLmetrics)
all_pred <- predict(model_all,data_test)
backward_pred <- predict(model_backward,data_test)
backward_sig_pred <- predict(model_backward_sig,data_test)
forward_pred <- predict(model_forward,data_test)
forward_sig_pred <- predict(model_forward_sig,data_test)
both_pred1 <- predict(model_both1,data_test)
both_pred2 <- predict(model_both2,data_test)

data.frame(RMSE = c(RMSE(all_pred,data_test$crime_rate),
                    RMSE(backward_pred,data_test$crime_rate),
                    RMSE(backward_sig_pred,data_test$crime_rate),
                    RMSE(forward_pred,data_test$crime_rate),
                    RMSE(forward_sig_pred,data_test$crime_rate),
                    RMSE(both_pred1,data_test$crime_rate),
                    RMSE(both_pred2,data_test$crime_rate)),
           model = c("All","Backward","Backward_sig","Forward","Forward_sig","Both 1","Both 2"))

Pada evaluasi menggunakan data test, didapatkan RMSE terkecil pada model Forward dan Both 2, diikuti model Backward_sig dan Backward. Nilai RMSE terbesar terdapat pada Model All.

4 Uji Asumsi

Model yang dipilih adalah model Backward_sig, karena memiliki nilai adjusted r-square yang cukup tinggi dan nilai RMSE yang cukup rendah.

4.1 Normalitas

Shapiro-Wilk hypothesis:

  • H0: error/residual berdistribusi normal
  • H1: error/residual tidak berdistribusi normal
shapiro.test(model_backward_sig$residuals)
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  model_backward_sig$residuals
#> W = 0.97332, p-value = 0.5771

Dari hasil uji Shapiro-Wilk, didapatkan p-value = 0.5771 lebih besar dari alpha = 0.05, maka H0 gagal ditolak, sehingga disimpulkan residual berdistribusi normal.

4.2 Homoskedastisitas

resact <- data.frame(residual = model_backward_sig$residuals, 
                     fitted = model_backward_sig$fitted.values)

resact %>% ggplot(aes(fitted, residual)) + geom_point() + geom_hline(aes(yintercept = 0)) + 
    theme(panel.grid = element_blank(), panel.background = element_blank())

Melalui uji visual, diketahui nilai residual menyebar acak dan tidak ada pola tertentu, sehingga residual disimpulkan memenuhi asumsi homoskedastisitas.

Secara formal, untuk mendeteksi gejala heteroskedastisitas, dapat digunakan uji Breusch-Pagan.

Breusch-Pagan hypothesis :

  • H0: Homoscedasticity
  • H1: Heteroscedasticity
library(lmtest)
bptest(model_backward_sig)
#> 
#>  studentized Breusch-Pagan test
#> 
#> data:  model_backward_sig
#> BP = 11.579, df = 9, p-value = 0.2381

Didapatkan p-value = 0.2381 lebih besar dari alpha = 0.05, maka H0 gagal ditolak, sehingga disimpulkan residual bersifat homoskedastisitas.

4.3 Non Autokorelasi

library(car)
set.seed(1)
durbinWatsonTest(model_backward_sig)
#>  lag Autocorrelation D-W Statistic p-value
#>    1      0.07469894      1.841523   0.622
#>  Alternative hypothesis: rho != 0

Melalui uji Durbin-Watson, diperoleh p-value = 0.622 lebih besar dari alpha = 0.05, maka gagal menolak H0, sehingga disimpulkan residual dalam model regresi tidak berkorelasi otomatis.

4.4 Non Multikolinearitas

vif(model_backward_sig)
#>            percent_m       mean_education         police_exp60 
#>             1.671434             4.221867             2.541909 
#> labour_participation           m_per1000f         unemploy_m24 
#>             3.682841             3.321822             4.096037 
#>         unemploy_m39           inequality          time_prison 
#>             4.297793             3.413640             1.718370

Dari nilai VIF variabel prediktor di atas, semuanya bernilai kurang dari 10, sehingga disimpulkan tidak terdapat multikolinearitas.

5 Simpulan

Variabel yang digunakan pada model akhir untuk mendeskripsikan variansi crime_rate adalah percent_m, mean_education, police_exp60, labour_participation, m_per1000f, unemploy_m24, unemploy_m39, inequality, dan time_prison. Model ini telah memenuhi uji asumsi klasik. Model akhir dapat ditulis dalam persamaan regresi sebagai berikut:

\[ crime\_rate = -9471.069 + 11.252 * percent\_m + 22.617 * mean\_education + 9.667 * police\_exp60 \\- 3.034 * labour\_participation + 6.405 * m\_per1000f - 11.069 * unemploy\_m24 \\+ 17.352 * unemploy\_m39 + 5.650 * inequality + 14.612 * time\_prison \] Model akhir memiliki adjusted r-square sebesar 78.92%, artinya variabel prediktor menggambarkan variansi variabel target (crime_rate) sebesar 78.92%, sisanya 21.08% dijelaskan oleh variabel lain diluar model. Nilai RMSE yang dihasilkan pada data train sebesar 162.04 dan pada data test sebesar 316.3251.