Task

As this is a graded task for our Academy students, completion of the task is not optional and count towards your final score

Write a regression analysis report applying what you’ve learned in the workshop. Using the dataset provided by you, write your findings on the different socioeconomic variables most highly correlated to crime rates (crime_rate). Explain your recommendations where appropriate. To help you through the exercise, you should ask the following questions of your candidate model:

Students should be awarded the full points if:
1. The model achieves an adjusted R-squared value above the grading threshold of 0.701
2. The residual plot resembles a random scatterplot

Solution

Fist of all my objective is to write my findings on the different socioeconomic variables most highly correlated to crime rates. Socioeconomics it self mean “social science that studies how economic activity affects and is shaped by social processes.” In general it analyzes how societies progress, stagnate, or regress because of their local or regional economy, or the global economy. Societies are divided into 3 groups: 1. Social, 2. Cultural and 3. Economic.

This dataset was collected in 1960 and a full description of the dataset wasn’t conveniently available. Sammuel use the description he gathered from the authors of the MASS package. After he rename the dataset to easier to read, the variables are:
- percent_m: percentage of males aged 14-24 - is_south: whether it is in a Southern state. 1 for Yes, 0 for No.
- mean_education: mean years of schooling
- police_exp: police expenditure in 1960 and 1959 - labour_participation: labour force participation rate
- m_per1000f: number of males per 1000 females
- state_pop: state population
- nonwhites_per1000: number of non-whites resident per 1000 people
- unemploy24_39: unemployment rate of urban males aged 14-24 and aged 35-39
- gdp: gross domestic product per head
- inequality: income inequality
- prob_prison: probability of imprisonment
- time_prison: avg time served in prisons
- crime_rate: crime rate in an unspecified category

Data Preparation

To prepare the data we taking crime data that provided by Algorit.ma and here i subseting the x column and changing the name so i can read it more clearly.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
crime.dat <- read.csv("crime.csv") %>% 
            select(-X)
names(crime.dat) <- c("percent_m", "is_south", "mean_education", "police_exp60", "police_exp59", "labour_participation", "m_per1000f", "state_pop", "nonwhites_per1000", "unemploy_m24", "unemploy_m39", "gdp", "inequality", "prob_prison", "time_prison", "crime_rate")
crime.dat$police_exp <- crime.dat$police_exp59 + crime.dat$police_exp60
crime.dat$unemploy24_39<- crime.dat$unemploy_m24 + crime.dat$unemploy_m39
crime.dat <- subset(crime.dat, select=-c(police_exp59, police_exp60, unemploy_m24, unemploy_m39))
crime.dat$is_south<-as.factor(crime.dat$is_south)
str(crime.dat)
## 'data.frame':    47 obs. of  14 variables:
##  $ percent_m           : int  151 143 142 136 141 121 127 131 157 140 ...
##  $ is_south            : Factor w/ 2 levels "0","1": 2 1 2 1 1 1 2 2 2 1 ...
##  $ mean_education      : int  91 113 89 121 121 110 111 109 90 118 ...
##  $ labour_participation: int  510 583 533 577 591 547 519 542 553 632 ...
##  $ m_per1000f          : int  950 1012 969 994 985 964 982 969 955 1029 ...
##  $ state_pop           : int  33 13 18 157 18 25 4 50 39 7 ...
##  $ nonwhites_per1000   : int  301 102 219 80 30 44 139 179 286 15 ...
##  $ gdp                 : int  394 557 318 673 578 689 620 472 421 526 ...
##  $ inequality          : int  261 194 250 167 174 126 168 206 239 174 ...
##  $ prob_prison         : num  0.0846 0.0296 0.0834 0.0158 0.0414 ...
##  $ time_prison         : num  26.2 25.3 24.3 29.9 21.3 ...
##  $ crime_rate          : int  791 1635 578 1969 1234 682 963 1555 856 705 ...
##  $ police_exp          : int  114 198 89 290 210 233 161 224 127 139 ...
##  $ unemploy24_39       : int  149 132 127 141 111 113 135 114 109 124 ...

Regression Model

I created the formula using step=backward to predicting the crime rate given a reasonable set of values for the predictor variable.

library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
crmodel.base<- lm(crime_rate~.,crime.dat)
summary(crmodel.base)
## 
## Call:
## lm(formula = crime_rate ~ ., data = crime.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -400.63 -121.25    1.87  108.51  489.48 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -5.826e+03  1.590e+03  -3.664 0.000864 ***
## percent_m             8.275e+00  4.370e+00   1.894 0.067077 .  
## is_south1             8.993e+01  1.497e+02   0.601 0.552082    
## mean_education        1.379e+01  6.139e+00   2.246 0.031524 *  
## labour_participation  3.886e-01  1.468e+00   0.265 0.792893    
## m_per1000f            7.295e-01  2.031e+00   0.359 0.721723    
## state_pop            -9.503e-01  1.353e+00  -0.703 0.487250    
## nonwhites_per1000     9.513e-02  6.593e-01   0.144 0.886142    
## gdp                   1.227e+00  1.081e+00   1.135 0.264602    
## inequality            7.741e+00  2.373e+00   3.262 0.002574 ** 
## prob_prison          -4.040e+03  2.316e+03  -1.744 0.090454 .  
## time_prison           1.485e+00  7.059e+00   0.210 0.834669    
## police_exp            5.776e+00  1.257e+00   4.597 6.01e-05 ***
## unemploy24_39         1.835e+00  1.943e+00   0.945 0.351754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 220.2 on 33 degrees of freedom
## Multiple R-squared:  0.7675, Adjusted R-squared:  0.6759 
## F-statistic: 8.381 on 13 and 33 DF,  p-value: 4.116e-07
step(crmodel.base, direction="backward")
## Start:  AIC=518.45
## crime_rate ~ percent_m + is_south + mean_education + labour_participation + 
##     m_per1000f + state_pop + nonwhites_per1000 + gdp + inequality + 
##     prob_prison + time_prison + police_exp + unemploy24_39
## 
##                        Df Sum of Sq     RSS    AIC
## - nonwhites_per1000     1      1009 1600646 516.48
## - time_prison           1      2145 1601782 516.51
## - labour_participation  1      3396 1603033 516.55
## - m_per1000f            1      6255 1605891 516.63
## - is_south              1     17497 1617133 516.96
## - state_pop             1     23927 1623563 517.15
## - unemploy24_39         1     43247 1642883 517.71
## - gdp                   1     62432 1662068 518.25
## <none>                              1599636 518.45
## - prob_prison           1    147449 1747085 520.60
## - percent_m             1    173809 1773446 521.30
## - mean_education        1    244509 1844145 523.14
## - inequality            1    515785 2115422 529.59
## - police_exp            1   1024471 2624107 539.71
## 
## Step:  AIC=516.48
## crime_rate ~ percent_m + is_south + mean_education + labour_participation + 
##     m_per1000f + state_pop + gdp + inequality + prob_prison + 
##     time_prison + police_exp + unemploy24_39
## 
##                        Df Sum of Sq     RSS    AIC
## - time_prison           1      3014 1603660 514.57
## - labour_participation  1      4926 1605572 514.63
## - m_per1000f            1      5514 1606160 514.64
## - state_pop             1     24238 1624883 515.19
## - is_south              1     25581 1626227 515.23
## - unemploy24_39         1     48928 1649574 515.90
## - gdp                   1     61999 1662645 516.27
## <none>                              1600646 516.48
## - prob_prison           1    151807 1752453 518.74
## - percent_m             1    199479 1800124 520.00
## - mean_education        1    243802 1844448 521.14
## - inequality            1    519500 2120145 527.69
## - police_exp            1   1356048 2956693 543.32
## 
## Step:  AIC=514.57
## crime_rate ~ percent_m + is_south + mean_education + labour_participation + 
##     m_per1000f + state_pop + gdp + inequality + prob_prison + 
##     police_exp + unemploy24_39
## 
##                        Df Sum of Sq     RSS    AIC
## - m_per1000f            1      3815 1607476 512.68
## - labour_participation  1      5837 1609497 512.74
## - state_pop             1     21514 1625174 513.20
## - is_south              1     25696 1629356 513.32
## - unemploy24_39         1     50242 1653902 514.02
## - gdp                   1     65128 1668788 514.44
## <none>                              1603660 514.57
## - percent_m             1    227419 1831079 518.80
## - prob_prison           1    232450 1836111 518.93
## - mean_education        1    241857 1845517 519.17
## - inequality            1    522622 2126282 525.83
## - police_exp            1   1358738 2962398 541.41
## 
## Step:  AIC=512.68
## crime_rate ~ percent_m + is_south + mean_education + labour_participation + 
##     state_pop + gdp + inequality + prob_prison + police_exp + 
##     unemploy24_39
## 
##                        Df Sum of Sq     RSS    AIC
## - labour_participation  1     14595 1622070 511.11
## - is_south              1     25496 1632971 511.42
## - state_pop             1     40425 1647901 511.85
## - gdp                   1     68865 1676340 512.65
## <none>                              1607476 512.68
## - unemploy24_39         1    106387 1713862 513.69
## - prob_prison           1    230168 1837643 516.97
## - percent_m             1    265485 1872961 517.87
## - mean_education        1    280743 1888218 518.25
## - inequality            1    558268 2165744 524.69
## - police_exp            1   1443206 3050682 540.79
## 
## Step:  AIC=511.11
## crime_rate ~ percent_m + is_south + mean_education + state_pop + 
##     gdp + inequality + prob_prison + police_exp + unemploy24_39
## 
##                  Df Sum of Sq     RSS    AIC
## - is_south        1     14140 1636210 509.51
## - state_pop       1     45007 1667078 510.39
## <none>                        1622070 511.11
## - gdp             1     85006 1707076 511.51
## - unemploy24_39   1     91852 1713922 511.69
## - prob_prison     1    227914 1849985 515.29
## - percent_m       1    278219 1900289 516.55
## - mean_education  1    406829 2028899 519.62
## - inequality      1    771466 2393536 527.39
## - police_exp      1   1430499 3052570 538.82
## 
## Step:  AIC=509.51
## crime_rate ~ percent_m + mean_education + state_pop + gdp + inequality + 
##     prob_prison + police_exp + unemploy24_39
## 
##                  Df Sum of Sq     RSS    AIC
## - state_pop       1     45088 1681298 508.79
## <none>                        1636210 509.51
## - unemploy24_39   1     85168 1721378 509.90
## - gdp             1     98370 1734580 510.26
## - prob_prison     1    219956 1856166 513.44
## - percent_m       1    325254 1961464 516.04
## - mean_education  1    403684 2039894 517.88
## - inequality      1   1008407 2644617 530.08
## - police_exp      1   1529176 3165386 538.53
## 
## Step:  AIC=508.79
## crime_rate ~ percent_m + mean_education + gdp + inequality + 
##     prob_prison + police_exp + unemploy24_39
## 
##                  Df Sum of Sq     RSS    AIC
## <none>                        1681298 508.79
## - unemploy24_39   1     89362 1770660 509.23
## - gdp             1     90985 1772283 509.27
## - prob_prison     1    187011 1868308 511.75
## - percent_m       1    393761 2075059 516.68
## - mean_education  1    516691 2197989 519.39
## - inequality      1    963607 2644905 528.09
## - police_exp      1   1603714 3285012 538.27
## 
## Call:
## lm(formula = crime_rate ~ percent_m + mean_education + gdp + 
##     inequality + prob_prison + police_exp + unemploy24_39, data = crime.dat)
## 
## Coefficients:
##    (Intercept)       percent_m  mean_education             gdp  
##      -5559.360          10.455          15.530           1.385  
##     inequality     prob_prison      police_exp   unemploy24_39  
##          8.514       -3384.687           5.522           1.876
crmodel.backward<- lm(formula = crime_rate ~ percent_m + mean_education + gdp +
                        inequality +
                      prob_prison + police_exp + unemploy24_39, data = crime.dat) 
summary(crmodel.backward)
## 
## Call:
## lm(formula = crime_rate ~ percent_m + mean_education + gdp + 
##     inequality + prob_prison + police_exp + unemploy24_39, data = crime.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -441.22 -103.95   -9.48   88.75  485.20 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -5559.3597  1093.5825  -5.084 9.62e-06 ***
## percent_m         10.4554     3.4595   3.022  0.00442 ** 
## mean_education    15.5299     4.4858   3.462  0.00132 ** 
## gdp                1.3845     0.9530   1.453  0.15429    
## inequality         8.5140     1.8008   4.728 2.94e-05 ***
## prob_prison    -3384.6875  1625.0821  -2.083  0.04388 *  
## police_exp         5.5221     0.9054   6.099 3.77e-07 ***
## unemploy24_39      1.8763     1.3032   1.440  0.15792    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 207.6 on 39 degrees of freedom
## Multiple R-squared:  0.7557, Adjusted R-squared:  0.7118 
## F-statistic: 17.23 on 7 and 39 DF,  p-value: 3.76e-10
plot(crime.dat$crime_rate,residuals(crmodel.backward), main = "Crime rate Scaterplot", sub= "Using Backward step",cex= 0.5)
abline(abline(h = 0, col="darksalmon", lwd=2))