Bagian ini berisi dari library-library yang digunkan dalam pengrjan tugas.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(GGally)
## Loading required package: ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
Bagian ini mengerjakan bagian untuk membaca csv dan juga memberi nama pda setiap kolom dari dataset crime.
crime <- read.csv("crime.csv") %>%
dplyr::select(-X)
names(crime) <- c("percent_m", "is_south", "mean_education", "police_exp60", "police_exp59", "labour_participation", "m_per1000f", "state_pop", "nonwhites_per1000", "unemploy_m24", "unemploy_m39", "gdp", "inequality", "prob_prison", "time_prison", "crime_rate")
Bagian ini berisi penjelasan dataset crime.
percent_m: percentage of males aged 14-24is_south: whether it is in a Southern state. 1 for Yes, 0 for No.mean_education: mean years of schoolingpolice_exp60: police expenditure in 1960police_exp59: police expenditure in 1959labour_participation: labour force participation ratem_per1000f: number of males per 1000 femalesstate_pop: state populationnonwhites_per1000: number of non-whites resident per 1000 peopleunemploy_m24: unemployment rate of urban males aged 14-24unemploy_m39: unemployment rate of urban males aged 35-39gdp: gross domestic product per headinequality: income inequalityprob_prison: probability of imprisonmenttime_prison: avg time served in prisonscrime_rate: crime rate in an unspecified categoryBagian ini melihat isi dan bagian dari dataset crime.
summary(crime)
## percent_m is_south mean_education police_exp60
## Min. :119.0 Min. :0.0000 Min. : 87.0 Min. : 45.0
## 1st Qu.:130.0 1st Qu.:0.0000 1st Qu.: 97.5 1st Qu.: 62.5
## Median :136.0 Median :0.0000 Median :108.0 Median : 78.0
## Mean :138.6 Mean :0.3404 Mean :105.6 Mean : 85.0
## 3rd Qu.:146.0 3rd Qu.:1.0000 3rd Qu.:114.5 3rd Qu.:104.5
## Max. :177.0 Max. :1.0000 Max. :122.0 Max. :166.0
## police_exp59 labour_participation m_per1000f state_pop
## Min. : 41.00 Min. :480.0 Min. : 934.0 Min. : 3.00
## 1st Qu.: 58.50 1st Qu.:530.5 1st Qu.: 964.5 1st Qu.: 10.00
## Median : 73.00 Median :560.0 Median : 977.0 Median : 25.00
## Mean : 80.23 Mean :561.2 Mean : 983.0 Mean : 36.62
## 3rd Qu.: 97.00 3rd Qu.:593.0 3rd Qu.: 992.0 3rd Qu.: 41.50
## Max. :157.00 Max. :641.0 Max. :1071.0 Max. :168.00
## nonwhites_per1000 unemploy_m24 unemploy_m39 gdp
## Min. : 2.0 Min. : 70.00 Min. :20.00 Min. :288.0
## 1st Qu.: 24.0 1st Qu.: 80.50 1st Qu.:27.50 1st Qu.:459.5
## Median : 76.0 Median : 92.00 Median :34.00 Median :537.0
## Mean :101.1 Mean : 95.47 Mean :33.98 Mean :525.4
## 3rd Qu.:132.5 3rd Qu.:104.00 3rd Qu.:38.50 3rd Qu.:591.5
## Max. :423.0 Max. :142.00 Max. :58.00 Max. :689.0
## inequality prob_prison time_prison crime_rate
## Min. :126.0 Min. :0.00690 Min. :12.20 Min. : 342.0
## 1st Qu.:165.5 1st Qu.:0.03270 1st Qu.:21.60 1st Qu.: 658.5
## Median :176.0 Median :0.04210 Median :25.80 Median : 831.0
## Mean :194.0 Mean :0.04709 Mean :26.60 Mean : 905.1
## 3rd Qu.:227.5 3rd Qu.:0.05445 3rd Qu.:30.45 3rd Qu.:1057.5
## Max. :276.0 Max. :0.11980 Max. :44.00 Max. :1993.0
str(crime)
## 'data.frame': 47 obs. of 16 variables:
## $ percent_m : int 151 143 142 136 141 121 127 131 157 140 ...
## $ is_south : int 1 0 1 0 0 0 1 1 1 0 ...
## $ mean_education : int 91 113 89 121 121 110 111 109 90 118 ...
## $ police_exp60 : int 58 103 45 149 109 118 82 115 65 71 ...
## $ police_exp59 : int 56 95 44 141 101 115 79 109 62 68 ...
## $ labour_participation: int 510 583 533 577 591 547 519 542 553 632 ...
## $ m_per1000f : int 950 1012 969 994 985 964 982 969 955 1029 ...
## $ state_pop : int 33 13 18 157 18 25 4 50 39 7 ...
## $ nonwhites_per1000 : int 301 102 219 80 30 44 139 179 286 15 ...
## $ unemploy_m24 : int 108 96 94 102 91 84 97 79 81 100 ...
## $ unemploy_m39 : int 41 36 33 39 20 29 38 35 28 24 ...
## $ gdp : int 394 557 318 673 578 689 620 472 421 526 ...
## $ inequality : int 261 194 250 167 174 126 168 206 239 174 ...
## $ prob_prison : num 0.0846 0.0296 0.0834 0.0158 0.0414 ...
## $ time_prison : num 26.2 25.3 24.3 29.9 21.3 ...
## $ crime_rate : int 791 1635 578 1969 1234 682 963 1555 856 705 ...
head(crime)
## percent_m is_south mean_education police_exp60 police_exp59
## 1 151 1 91 58 56
## 2 143 0 113 103 95
## 3 142 1 89 45 44
## 4 136 0 121 149 141
## 5 141 0 121 109 101
## 6 121 0 110 118 115
## labour_participation m_per1000f state_pop nonwhites_per1000 unemploy_m24
## 1 510 950 33 301 108
## 2 583 1012 13 102 96
## 3 533 969 18 219 94
## 4 577 994 157 80 102
## 5 591 985 18 30 91
## 6 547 964 25 44 84
## unemploy_m39 gdp inequality prob_prison time_prison crime_rate
## 1 41 394 261 0.084602 26.2011 791
## 2 36 557 194 0.029599 25.2999 1635
## 3 33 318 250 0.083401 24.3006 578
## 4 39 673 167 0.015801 29.9012 1969
## 5 20 578 174 0.041399 21.2998 1234
## 6 29 689 126 0.034201 20.9995 682
Melihat korelasi dari bagian-bagian kolom pada dataset crime.
ggcorr(crime, label = T, label_size = 3)
Digunakan sebagai pertimbangan variabel yang digunakan untuk membuat model.
crime.all <- lm(crime_rate ~., crime)
step(crime.all, direction="backward")
## Start: AIC=514.65
## crime_rate ~ percent_m + is_south + mean_education + police_exp60 +
## police_exp59 + labour_participation + m_per1000f + state_pop +
## nonwhites_per1000 + unemploy_m24 + unemploy_m39 + gdp + inequality +
## prob_prison + time_prison
##
## Df Sum of Sq RSS AIC
## - is_south 1 29 1354974 512.65
## - labour_participation 1 8917 1363862 512.96
## - time_prison 1 10304 1365250 513.00
## - state_pop 1 14122 1369068 513.14
## - nonwhites_per1000 1 18395 1373341 513.28
## - m_per1000f 1 31967 1386913 513.74
## - gdp 1 37613 1392558 513.94
## - police_exp59 1 37919 1392865 513.95
## <none> 1354946 514.65
## - unemploy_m24 1 83722 1438668 515.47
## - police_exp60 1 144306 1499252 517.41
## - unemploy_m39 1 181536 1536482 518.56
## - percent_m 1 193770 1548716 518.93
## - prob_prison 1 199538 1554484 519.11
## - mean_education 1 402117 1757063 524.86
## - inequality 1 423031 1777977 525.42
##
## Step: AIC=512.65
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 +
## labour_participation + m_per1000f + state_pop + nonwhites_per1000 +
## unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison +
## time_prison
##
## Df Sum of Sq RSS AIC
## - time_prison 1 10341 1365315 511.01
## - labour_participation 1 10878 1365852 511.03
## - state_pop 1 14127 1369101 511.14
## - nonwhites_per1000 1 21626 1376600 511.39
## - m_per1000f 1 32449 1387423 511.76
## - police_exp59 1 37954 1392929 511.95
## - gdp 1 39223 1394197 511.99
## <none> 1354974 512.65
## - unemploy_m24 1 96420 1451395 513.88
## - police_exp60 1 144302 1499277 515.41
## - unemploy_m39 1 189859 1544834 516.81
## - percent_m 1 195084 1550059 516.97
## - prob_prison 1 204463 1559437 517.26
## - mean_education 1 403140 1758114 522.89
## - inequality 1 488834 1843808 525.13
##
## Step: AIC=511.01
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 +
## labour_participation + m_per1000f + state_pop + nonwhites_per1000 +
## unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison
##
## Df Sum of Sq RSS AIC
## - labour_participation 1 10533 1375848 509.37
## - nonwhites_per1000 1 15482 1380797 509.54
## - state_pop 1 21846 1387161 509.75
## - police_exp59 1 28932 1394247 509.99
## - gdp 1 36070 1401385 510.23
## - m_per1000f 1 41784 1407099 510.42
## <none> 1365315 511.01
## - unemploy_m24 1 91420 1456735 512.05
## - police_exp60 1 134137 1499452 513.41
## - unemploy_m39 1 184143 1549458 514.95
## - percent_m 1 186110 1551425 515.01
## - prob_prison 1 237493 1602808 516.54
## - mean_education 1 409448 1774763 521.33
## - inequality 1 502909 1868224 523.75
##
## Step: AIC=509.37
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 +
## m_per1000f + state_pop + nonwhites_per1000 + unemploy_m24 +
## unemploy_m39 + gdp + inequality + prob_prison
##
## Df Sum of Sq RSS AIC
## - nonwhites_per1000 1 11675 1387523 507.77
## - police_exp59 1 21418 1397266 508.09
## - state_pop 1 27803 1403651 508.31
## - m_per1000f 1 31252 1407100 508.42
## - gdp 1 35035 1410883 508.55
## <none> 1375848 509.37
## - unemploy_m24 1 80954 1456802 510.06
## - police_exp60 1 123896 1499744 511.42
## - unemploy_m39 1 190746 1566594 513.47
## - percent_m 1 217716 1593564 514.27
## - prob_prison 1 226971 1602819 514.54
## - mean_education 1 413254 1789103 519.71
## - inequality 1 500944 1876792 521.96
##
## Step: AIC=507.77
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 +
## m_per1000f + state_pop + unemploy_m24 + unemploy_m39 + gdp +
## inequality + prob_prison
##
## Df Sum of Sq RSS AIC
## - police_exp59 1 16706 1404229 506.33
## - state_pop 1 25793 1413315 506.63
## - m_per1000f 1 26785 1414308 506.66
## - gdp 1 31551 1419073 506.82
## <none> 1387523 507.77
## - unemploy_m24 1 83881 1471404 508.52
## - police_exp60 1 118348 1505871 509.61
## - unemploy_m39 1 201453 1588976 512.14
## - prob_prison 1 216760 1604282 512.59
## - percent_m 1 309214 1696737 515.22
## - mean_education 1 402754 1790276 517.74
## - inequality 1 589736 1977259 522.41
##
## Step: AIC=506.33
## crime_rate ~ percent_m + mean_education + police_exp60 + m_per1000f +
## state_pop + unemploy_m24 + unemploy_m39 + gdp + inequality +
## prob_prison
##
## Df Sum of Sq RSS AIC
## - state_pop 1 22345 1426575 505.07
## - gdp 1 32142 1436371 505.39
## - m_per1000f 1 36808 1441037 505.54
## <none> 1404229 506.33
## - unemploy_m24 1 86373 1490602 507.13
## - unemploy_m39 1 205814 1610043 510.76
## - prob_prison 1 218607 1622836 511.13
## - percent_m 1 307001 1711230 513.62
## - mean_education 1 389502 1793731 515.83
## - inequality 1 608627 2012856 521.25
## - police_exp60 1 1050202 2454432 530.57
##
## Step: AIC=505.07
## crime_rate ~ percent_m + mean_education + police_exp60 + m_per1000f +
## unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison
##
## Df Sum of Sq RSS AIC
## - gdp 1 26493 1453068 503.93
## <none> 1426575 505.07
## - m_per1000f 1 84491 1511065 505.77
## - unemploy_m24 1 99463 1526037 506.24
## - prob_prison 1 198571 1625145 509.20
## - unemploy_m39 1 208880 1635455 509.49
## - percent_m 1 320926 1747501 512.61
## - mean_education 1 386773 1813348 514.35
## - inequality 1 594779 2021354 519.45
## - police_exp60 1 1127277 2553852 530.44
##
## Step: AIC=503.93
## crime_rate ~ percent_m + mean_education + police_exp60 + m_per1000f +
## unemploy_m24 + unemploy_m39 + inequality + prob_prison
##
## Df Sum of Sq RSS AIC
## <none> 1453068 503.93
## - m_per1000f 1 103159 1556227 505.16
## - unemploy_m24 1 127044 1580112 505.87
## - prob_prison 1 247978 1701046 509.34
## - unemploy_m39 1 255443 1708511 509.55
## - percent_m 1 296790 1749858 510.67
## - mean_education 1 445788 1898855 514.51
## - inequality 1 738244 2191312 521.24
## - police_exp60 1 1672038 3125105 537.93
##
## Call:
## lm(formula = crime_rate ~ percent_m + mean_education + police_exp60 +
## m_per1000f + unemploy_m24 + unemploy_m39 + inequality + prob_prison,
## data = crime)
##
## Coefficients:
## (Intercept) percent_m mean_education police_exp60
## -6426.101 9.332 18.012 10.265
## m_per1000f unemploy_m24 unemploy_m39 inequality
## 2.234 -6.087 18.735 6.133
## prob_prison
## -3796.032
Bagian ini mengerjakan linear regresi dari crime_rate dengan GDP dan mean_education. Lalu melihat gambaran model yang sudah dibuat.
model.crime <- lm(formula = crime_rate ~ percent_m + mean_education + m_per1000f +
unemploy_m24 + unemploy_m39 + inequality + prob_prison +
police_exp60, data = crime)
summary(model.crime)
##
## Call:
## lm(formula = crime_rate ~ percent_m + mean_education + m_per1000f +
## unemploy_m24 + unemploy_m39 + inequality + prob_prison +
## police_exp60, data = crime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -444.70 -111.07 3.03 122.15 483.30
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6426.101 1194.611 -5.379 4.04e-06 ***
## percent_m 9.332 3.350 2.786 0.00828 **
## mean_education 18.012 5.275 3.414 0.00153 **
## m_per1000f 2.234 1.360 1.642 0.10874
## unemploy_m24 -6.087 3.339 -1.823 0.07622 .
## unemploy_m39 18.735 7.248 2.585 0.01371 *
## inequality 6.133 1.396 4.394 8.63e-05 ***
## prob_prison -3796.032 1490.646 -2.547 0.01505 *
## police_exp60 10.265 1.552 6.613 8.26e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195.5 on 38 degrees of freedom
## Multiple R-squared: 0.7888, Adjusted R-squared: 0.7444
## F-statistic: 17.74 on 8 and 38 DF, p-value: 1.159e-10
Terlihat bahwasanya Multiple R-squared dan Adjusted R-squared memiliki nilai yang cukup baik.
plot(model.crime)
Rata-rata dari pendidikan (seluruh gender) serta laki-laki yang berumur 35-39 cukup berpengaruh dalam angka terjadinya kriminal. Selain itu, kenapa muncul nilai yang besar pada “prob_prison” karena data yang didaptkan dalam angka 0.06 - 0.11, sehingga nilainya tidak cukup pas untuk dibandingkan dengan yang lainnya. Selanjutnya yang diketahui juga dari model ini adalah jumlah laki-laki/1000 wanita paling kecil mempengaruhi dibandingkan yang lainnya dalam model yang sudah dibuat.
Bagian ini melihat hubungan dengan melihat nilai VIFnya.
vif(model.crime)
## percent_m mean_education m_per1000f unemploy_m24 unemploy_m39
## 2.131963 4.189684 1.932367 4.360038 4.508106
## inequality prob_prison police_exp60
## 3.731074 1.381879 2.560496