Crime Rate - Linear Regression

1. Introduction

The objective of this analysis is to determine what variables affecting to crime rate in the United States using linear regression model. By using this analysis, the variables affecting to crime_rate in the United States can be determined in 1960.

2. Data Preparation and Library & Setup

2.1. Library & Setup

library(dplyr)
library(lmtest)
library(car)
library(readr)
library(GGally)
library(MLmetrics)
library(tidyverse)
library(ggplot2)
library(hrbrthemes)
library(tidyr)
library(viridis)

2.2. Data Inputted

crime <- read.csv("crime.csv")
glimpse(crime)

## Rows: 47
## Columns: 16
## $ percent_m            <int> 151, 143, 142, 136, 141, 121, 127, 131, 157, 140,~
## $ is_south             <int> 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0~
## $ mean_education       <int> 91, 113, 89, 121, 121, 110, 111, 109, 90, 118, 10~
## $ police_exp60         <int> 58, 103, 45, 149, 109, 118, 82, 115, 65, 71, 121,~
## $ police_exp59         <int> 56, 95, 44, 141, 101, 115, 79, 109, 62, 68, 116, ~
## $ labour_participation <int> 510, 583, 533, 577, 591, 547, 519, 542, 553, 632,~
## $ m_per1000f           <int> 950, 1012, 969, 994, 985, 964, 982, 969, 955, 102~
## $ state_pop            <int> 33, 13, 18, 157, 18, 25, 4, 50, 39, 7, 101, 47, 2~
## $ nonwhites_per1000    <int> 301, 102, 219, 80, 30, 44, 139, 179, 286, 15, 106~
## $ unemploy_m24         <int> 108, 96, 94, 102, 91, 84, 97, 79, 81, 100, 77, 83~
## $ unemploy_m39         <int> 41, 36, 33, 39, 20, 29, 38, 35, 28, 24, 35, 31, 2~
## $ gdp                  <int> 394, 557, 318, 673, 578, 689, 620, 472, 421, 526,~
## $ inequality           <int> 261, 194, 250, 167, 174, 126, 168, 206, 239, 174,~
## $ prob_prison          <dbl> 0.084602, 0.029599, 0.083401, 0.015801, 0.041399,~
## $ time_prison          <dbl> 26.2011, 25.2999, 24.3006, 29.9012, 21.2998, 20.9~
## $ crime_rate           <int> 791, 1635, 578, 1969, 1234, 682, 963, 1555, 856, ~

Column description:

percent_m : percentage of male population
mean_education : average years of education spent
police_exp60 : expenditure of police department in 1960
police_exp59 : expenditure of police department in 1959
labour_participation : labour participation rate
m_per1000f : number of males per 1000 females
state_pop : state population
nonwhites_per1000 : number of non whites race per 1000 people
unemploy_m24 : employment rate of males 14-24
unemploy_m39 : employment rate of males 15-39
gdp : gross domestic product per population
inequality : income inequality rate
prob_prison : probability to be imprisoned
time_prison : average time spent in prison
crime_rate : rate of crime occurance

3. EDA

3.1. Missing Value Check

To understand the structure of data, therefore, it needs to check the missing values within data used

colSums(is.na(crime))

##            percent_m             is_south       mean_education 
##                    0                    0                    0 
##         police_exp60         police_exp59 labour_participation 
##                    0                    0                    0 
##           m_per1000f            state_pop    nonwhites_per1000 
##                    0                    0                    0 
##         unemploy_m24         unemploy_m39                  gdp 
##                    0                    0                    0 
##           inequality          prob_prison          time_prison 
##                    0                    0                    0 
##           crime_rate 
##                    0

There is no missing value within the data used.

3.2. Data Distributrion Check

To describe data distribution check, it will use plots as it describes below

hist(crime$crime_rate)

boxplot(crime$crime_rate)

plot(crime$crime_rate, ylim = c(0,2000))
abline(h = mean(crime$crime_rate), col="red")

Based on the plots above, there are outliers data, but not in significant numbers compared to the majority numbers, therefore, we can continue the analysis using this data.

3.3. Data Correlation Check

Before building models in linear regression, the correlation among variables are needed to be checked, therefore, here is the correlation check using pearson correlation.

ggcorr(crime, label = TRUE, label_size = 2.5, hjust = 1, layout.exp = 5)

According to correlation check above, the strongest variables affecting to crime_rate variable are police_exp59 and police_exp60 with the positive correlation score are 0.7

4. Linear Regression Modeling

To answer the objective of this analysis, the target variable is “crime-rate” and there are 3 models that will be build based on its predictors as it is mentioned below:

Model 1 is based on correlation check above that will use 1 predictor, namely “police_exp60”
Model 2 is based on correlation check above that will use all predictors
Model 3 is based backward step-wise regression to find the suitable predictors

R-Squarred Modeling Check Term

the higher r-squared value, the better it is
single predictor uses multiple r-squared
more than 1 predictors uses adjusted r-squared

4.1 Model 1

model.1 <- lm(crime_rate ~ police_exp60, crime)
summary(model.1)

## 
## Call:
## lm(formula = crime_rate ~ police_exp60, data = crime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -586.91 -155.63   32.52  139.58  568.84 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   144.464    126.693   1.140     0.26    
## police_exp60    8.948      1.409   6.353 9.34e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 283.9 on 45 degrees of freedom
## Multiple R-squared:  0.4728, Adjusted R-squared:  0.4611 
## F-statistic: 40.36 on 1 and 45 DF,  p-value: 9.338e-08

Model.1:

Based on test above, its multiple R-Squared is 0.4728. Coefficient of police_60 as its predictor is 8.948 and its p-value 9.338e-08. It can be interpreted that police_60 has significant influence to crime rate.

4.2 Model 2

model.2 <- lm(crime_rate ~ ., crime)
summary(model.2)

## 
## Call:
## lm(formula = crime_rate ~ ., data = crime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -395.74  -98.09   -6.69  112.99  512.67 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -5984.2876  1628.3184  -3.675 0.000893 ***
## percent_m                8.7830     4.1714   2.106 0.043443 *  
## is_south                -3.8035   148.7551  -0.026 0.979765    
## mean_education          18.8324     6.2088   3.033 0.004861 ** 
## police_exp60            19.2804    10.6110   1.817 0.078892 .  
## police_exp59           -10.9422    11.7478  -0.931 0.358830    
## labour_participation    -0.6638     1.4697  -0.452 0.654654    
## m_per1000f               1.7407     2.0354   0.855 0.398995    
## state_pop               -0.7330     1.2896  -0.568 0.573845    
## nonwhites_per1000        0.4204     0.6481   0.649 0.521279    
## unemploy_m24            -5.8271     4.2103  -1.384 0.176238    
## unemploy_m39            16.7800     8.2336   2.038 0.050161 .  
## gdp                      0.9617     1.0367   0.928 0.360754    
## inequality               7.0672     2.2717   3.111 0.003983 ** 
## prob_prison          -4855.2658  2272.3746  -2.137 0.040627 *  
## time_prison             -3.4790     7.1653  -0.486 0.630708    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 209.1 on 31 degrees of freedom
## Multiple R-squared:  0.8031, Adjusted R-squared:  0.7078 
## F-statistic: 8.429 on 15 and 31 DF,  p-value: 3.539e-07

Model.2:

Based on test above, its adjusted R-Squared is 0.7078. Coefficients of prob_prison with -4855.2658 has the negative correlation on crime rate. While the predictors that have high influence on Crime Rate are inequality with 0.003983 and mean_education 0.004861.

4.3 Model 3

model.3 <- step(lm(crime_rate ~., crime ), direction = "backward")

## Start:  AIC=514.65
## crime_rate ~ percent_m + is_south + mean_education + police_exp60 + 
##     police_exp59 + labour_participation + m_per1000f + state_pop + 
##     nonwhites_per1000 + unemploy_m24 + unemploy_m39 + gdp + inequality + 
##     prob_prison + time_prison
## 
##                        Df Sum of Sq     RSS    AIC
## - is_south              1        29 1354974 512.65
## - labour_participation  1      8917 1363862 512.96
## - time_prison           1     10304 1365250 513.00
## - state_pop             1     14122 1369068 513.14
## - nonwhites_per1000     1     18395 1373341 513.28
## - m_per1000f            1     31967 1386913 513.74
## - gdp                   1     37613 1392558 513.94
## - police_exp59          1     37919 1392865 513.95
## <none>                              1354946 514.65
## - unemploy_m24          1     83722 1438668 515.47
## - police_exp60          1    144306 1499252 517.41
## - unemploy_m39          1    181536 1536482 518.56
## - percent_m             1    193770 1548716 518.93
## - prob_prison           1    199538 1554484 519.11
## - mean_education        1    402117 1757063 524.86
## - inequality            1    423031 1777977 525.42
## 
## Step:  AIC=512.65
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
##     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
##     unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison + 
##     time_prison
## 
##                        Df Sum of Sq     RSS    AIC
## - time_prison           1     10341 1365315 511.01
## - labour_participation  1     10878 1365852 511.03
## - state_pop             1     14127 1369101 511.14
## - nonwhites_per1000     1     21626 1376600 511.39
## - m_per1000f            1     32449 1387423 511.76
## - police_exp59          1     37954 1392929 511.95
## - gdp                   1     39223 1394197 511.99
## <none>                              1354974 512.65
## - unemploy_m24          1     96420 1451395 513.88
## - police_exp60          1    144302 1499277 515.41
## - unemploy_m39          1    189859 1544834 516.81
## - percent_m             1    195084 1550059 516.97
## - prob_prison           1    204463 1559437 517.26
## - mean_education        1    403140 1758114 522.89
## - inequality            1    488834 1843808 525.13
## 
## Step:  AIC=511.01
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
##     labour_participation + m_per1000f + state_pop + nonwhites_per1000 + 
##     unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison
## 
##                        Df Sum of Sq     RSS    AIC
## - labour_participation  1     10533 1375848 509.37
## - nonwhites_per1000     1     15482 1380797 509.54
## - state_pop             1     21846 1387161 509.75
## - police_exp59          1     28932 1394247 509.99
## - gdp                   1     36070 1401385 510.23
## - m_per1000f            1     41784 1407099 510.42
## <none>                              1365315 511.01
## - unemploy_m24          1     91420 1456735 512.05
## - police_exp60          1    134137 1499452 513.41
## - unemploy_m39          1    184143 1549458 514.95
## - percent_m             1    186110 1551425 515.01
## - prob_prison           1    237493 1602808 516.54
## - mean_education        1    409448 1774763 521.33
## - inequality            1    502909 1868224 523.75
## 
## Step:  AIC=509.37
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
##     m_per1000f + state_pop + nonwhites_per1000 + unemploy_m24 + 
##     unemploy_m39 + gdp + inequality + prob_prison
## 
##                     Df Sum of Sq     RSS    AIC
## - nonwhites_per1000  1     11675 1387523 507.77
## - police_exp59       1     21418 1397266 508.09
## - state_pop          1     27803 1403651 508.31
## - m_per1000f         1     31252 1407100 508.42
## - gdp                1     35035 1410883 508.55
## <none>                           1375848 509.37
## - unemploy_m24       1     80954 1456802 510.06
## - police_exp60       1    123896 1499744 511.42
## - unemploy_m39       1    190746 1566594 513.47
## - percent_m          1    217716 1593564 514.27
## - prob_prison        1    226971 1602819 514.54
## - mean_education     1    413254 1789103 519.71
## - inequality         1    500944 1876792 521.96
## 
## Step:  AIC=507.77
## crime_rate ~ percent_m + mean_education + police_exp60 + police_exp59 + 
##     m_per1000f + state_pop + unemploy_m24 + unemploy_m39 + gdp + 
##     inequality + prob_prison
## 
##                  Df Sum of Sq     RSS    AIC
## - police_exp59    1     16706 1404229 506.33
## - state_pop       1     25793 1413315 506.63
## - m_per1000f      1     26785 1414308 506.66
## - gdp             1     31551 1419073 506.82
## <none>                        1387523 507.77
## - unemploy_m24    1     83881 1471404 508.52
## - police_exp60    1    118348 1505871 509.61
## - unemploy_m39    1    201453 1588976 512.14
## - prob_prison     1    216760 1604282 512.59
## - percent_m       1    309214 1696737 515.22
## - mean_education  1    402754 1790276 517.74
## - inequality      1    589736 1977259 522.41
## 
## Step:  AIC=506.33
## crime_rate ~ percent_m + mean_education + police_exp60 + m_per1000f + 
##     state_pop + unemploy_m24 + unemploy_m39 + gdp + inequality + 
##     prob_prison
## 
##                  Df Sum of Sq     RSS    AIC
## - state_pop       1     22345 1426575 505.07
## - gdp             1     32142 1436371 505.39
## - m_per1000f      1     36808 1441037 505.54
## <none>                        1404229 506.33
## - unemploy_m24    1     86373 1490602 507.13
## - unemploy_m39    1    205814 1610043 510.76
## - prob_prison     1    218607 1622836 511.13
## - percent_m       1    307001 1711230 513.62
## - mean_education  1    389502 1793731 515.83
## - inequality      1    608627 2012856 521.25
## - police_exp60    1   1050202 2454432 530.57
## 
## Step:  AIC=505.07
## crime_rate ~ percent_m + mean_education + police_exp60 + m_per1000f + 
##     unemploy_m24 + unemploy_m39 + gdp + inequality + prob_prison
## 
##                  Df Sum of Sq     RSS    AIC
## - gdp             1     26493 1453068 503.93
## <none>                        1426575 505.07
## - m_per1000f      1     84491 1511065 505.77
## - unemploy_m24    1     99463 1526037 506.24
## - prob_prison     1    198571 1625145 509.20
## - unemploy_m39    1    208880 1635455 509.49
## - percent_m       1    320926 1747501 512.61
## - mean_education  1    386773 1813348 514.35
## - inequality      1    594779 2021354 519.45
## - police_exp60    1   1127277 2553852 530.44
## 
## Step:  AIC=503.93
## crime_rate ~ percent_m + mean_education + police_exp60 + m_per1000f + 
##     unemploy_m24 + unemploy_m39 + inequality + prob_prison
## 
##                  Df Sum of Sq     RSS    AIC
## <none>                        1453068 503.93
## - m_per1000f      1    103159 1556227 505.16
## - unemploy_m24    1    127044 1580112 505.87
## - prob_prison     1    247978 1701046 509.34
## - unemploy_m39    1    255443 1708511 509.55
## - percent_m       1    296790 1749858 510.67
## - mean_education  1    445788 1898855 514.51
## - inequality      1    738244 2191312 521.24
## - police_exp60    1   1672038 3125105 537.93

summary(model.3)

## 
## Call:
## lm(formula = crime_rate ~ percent_m + mean_education + police_exp60 + 
##     m_per1000f + unemploy_m24 + unemploy_m39 + inequality + prob_prison, 
##     data = crime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -444.70 -111.07    3.03  122.15  483.30 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -6426.101   1194.611  -5.379 4.04e-06 ***
## percent_m          9.332      3.350   2.786  0.00828 ** 
## mean_education    18.012      5.275   3.414  0.00153 ** 
## police_exp60      10.265      1.552   6.613 8.26e-08 ***
## m_per1000f         2.234      1.360   1.642  0.10874    
## unemploy_m24      -6.087      3.339  -1.823  0.07622 .  
## unemploy_m39      18.735      7.248   2.585  0.01371 *  
## inequality         6.133      1.396   4.394 8.63e-05 ***
## prob_prison    -3796.032   1490.646  -2.547  0.01505 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195.5 on 38 degrees of freedom
## Multiple R-squared:  0.7888, Adjusted R-squared:  0.7444 
## F-statistic: 17.74 on 8 and 38 DF,  p-value: 1.159e-10

According to examination of these 3 models, it shows that the model.3 has the highest r-squared with 0.7444 compared to 2 other models.

5. Linear Regression Model Predictors and Error Performance Check

RMSE (Root Mean Squared Error) analysis will be used to determine performance and error of each model.

5.1 Model 1 Peformance Check

pred_model.1 <- predict(model.1, newdata = crime)

RMSE(pred_model.1, y_true = crime$crime_rate)

## [1] 277.8192

5.2 Model 2 Performance Check

pred_model.2 <- predict(model.2, newdata = crime)

RMSE(pred_model.2, y_true = crime$crime_rate)

## [1] 169.79

5.3 Model 3 Performance Check

pred_model.3 <- predict(model.3, newdata = crime)

RMSE(pred_model.3, y_true = crime$crime_rate)

## [1] 175.8304

R-Squarred Modeling Check Term

the lower RMSE value, the better it is

RMSE Performance Result:

model.1 : 277.8192
model.2 : 169.79
model.3 : 175.8304

Based on RMSE analysis result, model with the least of error is model.2 with its score is 169.79. But its value is not that far from model.3 with its value 175.8304

6. Linear Regression Model Evaluation

Based on predictors and error performance check above, found that model.2 has the error the least, therefore in this part the model that will be evaluated is model.2. The evaluation will be carried out are:

Normality Test
Autocorrelation Test
Heterocedasticity Test
Multicolinearity Test

6.1. Normality Test

6.1.1. Normality Test - Model.1

hist(model.1$residuals)

shapiro.test(model.1$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model.1$residuals
## W = 0.97601, p-value = 0.439

6.1.2. Normality Test - Model.2

hist(model.2$residuals)

shapiro.test(model.2$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model.2$residuals
## W = 0.9846, p-value = 0.7849

6.1.3. Normality Test - Model.3

hist(model.3$residuals)

shapiro.test(model.3$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model.3$residuals
## W = 0.98511, p-value = 0.8051

Normality Test Term

H0 : residuals are distributed normally (p > 0.05)
H1 : residuals are not distributed normally (p < 0.05)

Result:

p-value from model.1 : 0.439
p-value from model.2 : 0.7849
p-value from model.3 : 0.8051

Based on the normality test result, these 3 models have p-value > 0.05, therefore, residuals are distributed normally, so that the H0 is accepted for all models

6.2. Autocorrelation Test

6.2.1. Autocorrelation Test - Model.1

durbinWatsonTest(model.1)

##  lag Autocorrelation D-W Statistic p-value
##    1     -0.06465098      2.122017   0.642
##  Alternative hypothesis: rho != 0

6.2.2. Autocorrelation Test - Model.2

durbinWatsonTest(model.2)

##  lag Autocorrelation D-W Statistic p-value
##    1       0.1303644      1.723274   0.342
##  Alternative hypothesis: rho != 0

6.2.3. Autocorrelation Test - Model.3

durbinWatsonTest(model.3)

##  lag Autocorrelation D-W Statistic p-value
##    1       0.1049091      1.752067   0.388
##  Alternative hypothesis: rho != 0

Autocorrelation Test Term

H0 : no autocorrelation (p > 0.05)
H1 : there is an autocorrelation (p < 0.05)

Result:

p-value from model.1 : 0.642
p-value from model.2 : 0.324
p-value from model.3 : 0.348

Based on the autocorrelation test result, these 3 models have p-value > 0.05, therefore, there is no autocorrelation found in these 3 models.

6.3. Heterocedascity Test

6.3.1. Heterocedascity Test - Model.1

bptest(model.1)

## 
##  studentized Breusch-Pagan test
## 
## data:  model.1
## BP = 21.098, df = 1, p-value = 4.364e-06

plot(crime$crime_rate, model.1$residuals)
abline(h = 0, col = "red")

6.3.2. Heterocedascity Test - Model.2

bptest(model.2)

## 
##  studentized Breusch-Pagan test
## 
## data:  model.2
## BP = 18.469, df = 15, p-value = 0.2388

plot(crime$crime_rate, model.2$residuals)
abline(h = 0, col = "red")

6.3.3. Heterocedascity Test - Model.3

bptest(model.3)

## 
##  studentized Breusch-Pagan test
## 
## data:  model.3
## BP = 13.51, df = 8, p-value = 0.09546

plot(crime$crime_rate, model.3$residuals)
abline(h = 0, col = "red")

Heterocedascity Test Term

H0 : residuals have no patterns *hetero (p > 0.05)
H1 : residuals have patterns *homo (p < 0.05)

Result:

p-value from model.1 : 0.000004364
p-value from model.2 : 0.2388
p-value from model.3 : 0.09546

Based on the heterocedascity test result, only 2 models have p-value > 0.05 namely model.2 and model.3, meanwhile there is a residual pattern found in model.1.

6.4. Multicollinearity Test

6.4.1. Multicollinearity Test - Model.1

Due to model.1 has only one predictor variable, therefore, “vif()” to analyse multiicollinaerity can not be implemented

6.4.2. Multicollinearity Test - Model.2

vif(model.2)

##            percent_m             is_south       mean_education 
##             2.892448             5.342783             5.077447 
##         police_exp60         police_exp59 labour_participation 
##           104.658667           113.559262             3.712690 
##           m_per1000f            state_pop    nonwhites_per1000 
##             3.785934             2.536708             4.674088 
##         unemploy_m24         unemploy_m39                  gdp 
##             6.063931             5.088880            10.530375 
##           inequality          prob_prison          time_prison 
##             8.644528             2.809459             2.713785

6.4.3. Multicollinearity Test - Model.3

vif(model.3)

##      percent_m mean_education   police_exp60     m_per1000f   unemploy_m24 
##       2.131963       4.189684       2.560496       1.932367       4.360038 
##   unemploy_m39     inequality    prob_prison 
##       4.508106       3.731074       1.381879

Multicollinearity Test Term

H0 : there is no multicollinearity occur in the model ( x < 10)
H1 : there is multicollinearity occur in the model ( x > 10)

Result:

p-value from model.1 : can not apply vif due to only 1 predictor used in this model
p-value from model.2 : there is multicollinearity in the model.2, in variables gdp (x = 10.53), police_exp60 (x = 104.65) and police_exp59 (x = 113.55)
p-value from model.3 : there is no multicollinearity occur

Based on the multicollinearity test result, only model.3 that has no multicollinaerity among its variable.

7 Conclusion

To ease comparing all the analysis results, they are changed into dataframe.

Model.Name <- c("model.1", "model.2", "model.3")
R.Squarred <- c("0.4728", "0.7078", "0.7444")
RMSE <- c("277.8192", "169.79", "175.8304")
Normality.Test <- c("normal - 0.439", "normal - 0.7849", "normal -  0.8051")
Autocorrelation.Test <- c("No Autocorrelation - 0.642", "No Autocorrelation - 0.324", "No Autocorrelation - 0.348")
Heterocedascity.Test <- c("Homo - 0.000004364", "Hetero - 0.2388", "Hetero - 0.09546")
Multicollinearity.Test <- c("Can not apply", "Multicollinearity Occurs", "No Multicollinearity")

model.comparison <- data.frame(Model.Name, R.Squarred, RMSE, Normality.Test, Autocorrelation.Test, Heterocedascity.Test, Multicollinearity.Test)

Based on the datafarme, here is the table to compile all analysis results to select which model has better performance compared to all models.

rmarkdown::paged_table(model.comparison)

Based on the table comparison above, only model.3 with backward step-wise regression that passed all the tests. Meanwhile model.1 failed in heterocedascity test with its value 0.000004364 (lower than p-value 0.05) and model.2 failed in multicollinearity test (there are 3 variables with value > 10).

Crime Rate - Linear Regression

Dwi Puji Laksono Gumilang

2022-09-01

1. Introduction

2. Data Preparation and Library & Setup

2.1. Library & Setup

2.2. Data Inputted

3. EDA

3.1. Missing Value Check

3.2. Data Distributrion Check

3.3. Data Correlation Check

4. Linear Regression Modeling

4.1 Model 1

4.2 Model 2

4.3 Model 3

5. Linear Regression Model Predictors and Error Performance Check

5.1 Model 1 Peformance Check

5.2 Model 2 Performance Check

5.3 Model 3 Performance Check

6. Linear Regression Model Evaluation

6.1. Normality Test

6.1.1. Normality Test - Model.1

6.1.2. Normality Test - Model.2

6.1.3. Normality Test - Model.3

6.2. Autocorrelation Test

6.2.1. Autocorrelation Test - Model.1

6.2.2. Autocorrelation Test - Model.2

6.2.3. Autocorrelation Test - Model.3

6.3. Heterocedascity Test

6.3.1. Heterocedascity Test - Model.1

6.3.2. Heterocedascity Test - Model.2

6.3.3. Heterocedascity Test - Model.3

6.4. Multicollinearity Test

6.4.1. Multicollinearity Test - Model.1

6.4.2. Multicollinearity Test - Model.2

6.4.3. Multicollinearity Test - Model.3

7 Conclusion