Objective of the Study

Correlation between Total Crime and Population size

Population size and crime rate have historically been seen as closely related functions of one another. We seek to answer the question on whether high populations translates to high crime rates. As population grows so does crime rates in most coutries due to the governments inability to create sustaining environment to support the usually rapid population growths resulting into higher crime rates.

The main objective of this project is to study the relationship and correlation between population size and crime rate in the United States between the year 1994 to 2018.

Data source and nature

Data Collection

The data is collected from the FBI Website which provides free historical data on population size and crime. Link >>>>> https://ucr.fbi.gov/crime-in-the-u.s

Type of study

This is observational study.

Data Source

The dataset for this project is downloaded from the FBI website on crime rates in the US for the period from 1994 to 2018:

Link >>>>> https://ucr.fbi.gov/crime-in-the-u.s/2013/crime-in-the-u.s.-2013/tables/1tabledatadecoverviewpdf/table_1_crime_in_the_united_states_by_volume_and_rate_per_100000_inhabitants_1994-2013.xls

Response

Crime rate is the response variable. It is numerical continuous variable.

Explanatory

The explanatory variable is the population size and is numerical.

Cases

Each case represents the population size and crime rate in a given year within our period of study. The full dataset represents data for 25 years with approximately 25 cases.


Data Load

Crimes data is loaded directly from the csv file uploaded into my Github.

## # A tibble: 6 x 5
##   Year  Population Violent_crime Violent_crime_ra~ Murder_and_nonnegligent~
##   <chr>      <dbl>         <dbl>             <dbl>                    <dbl>
## 1 1994   260327021       1857670               714                    23326
## 2 1995   262803276       1798792               685                    21606
## 3 1996   265228572       1688540               637                    19645
## 4 1997   267783607       1636096               611                    18208
## 5 1998   270248003       1533887               568                    16974
## 6 1999   272690813       1426044               523                    15522

Data Transformtion, Cleanup and Preparation:

Tidy data, add Total_crime column.

##  [1] "Year"                                     
##  [2] "Population"                               
##  [3] "Violent_crime"                            
##  [4] "Violent_crime_rate"                       
##  [5] "Murder_and_nonnegligent_manslaughter"     
##  [6] "Murder_and_nonnegligent_manslaughter_rate"
##  [7] "Rape"                                     
##  [8] "Rape_rate"                                
##  [9] "Robbery"                                  
## [10] "Robbery_rate"                             
## [11] "Aggravated_assault"                       
## [12] "Aggravated_assault_rate"                  
## [13] "Property_crime"                           
## [14] "Property_crime_rate"                      
## [15] "Burglary"                                 
## [16] "Burglary_rate"                            
## [17] "Larceny_theft"                            
## [18] "Larceny_theft_rate"                       
## [19] "Motor_vehicle_theft"                      
## [20] "Motor_vehicle_theft_rate"                 
## [21] "Total_crime"                              
## [22] "Crime_rate"

Select only columns that are relevant for our purpose.

## # A tibble: 6 x 4
##   Year  Population Total_crime Crime_rate
##   <chr>      <dbl>       <dbl>      <dbl>
## 1 1994   260327021    13989543       5.37
## 2 1995   262803276    13862727       5.27
## 3 1996   265228572    13493863       5.09
## 4 1997   267783607    13194571       4.93
## 5 1998   270248003    12485714       4.62
## 6 1999   272690813    11634378       4.27

Data Visualization:

Population Data plot


Total crime Data plot


Crime Rate Data plot


Statistical Analysis:

TEST: Population vs Crime_rate

In this section we will create a linear regression model to see if there exists a strong relationship between population and Crime_rate.

We create a function for Linear Model

## 
## Call:
## lm(formula = Crime_rate ~ Population, data = crime_data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44329 -0.04349  0.02860  0.07695  0.20442 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.485e+01  4.096e-01   36.25   <2e-16 ***
## Population  -3.718e-08  1.378e-09  -26.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1434 on 23 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9694, Adjusted R-squared:  0.968 
## F-statistic: 727.8 on 1 and 23 DF,  p-value: < 2.2e-16

\[ Crimerate = 14.85 + (-3.718e-8)(Population) \]


Display the Linear Model

There is a negative relationship between population increase and the rate of crime.

Regression Statistics

Linear Regression Equation:

\[ Crimerate = 14.85 + (-3.718-8)(Population) \]

Note: The intercept is outside the data range, however it fits the data well within the residual standard error for all points within our dataset.

Multiple R-Square:0.9694

R-Square:0.968

Description: The model fits the data well with a strong negative correlation.

Hypothesis Testing

H_0 : Null Hypothesis There is no relationship between Crime_rate and population.

H_A : Alternative Hypothesis There is a relationship between Crime_rate and population.

Here the multiple R value is 0.9694 which shows that there is significant correlation between Crime_rate and Population. Also the value of R square is 0.968 which shows the extent to which the Total_Crime affects the Population. Therefore, we reject the null hypothesis H_0 and accept the Alternative hypothesis H_A.


Conclusion:

We notice that the two variables, Crime_rate and Population, change in the opposite direction. An decrease in Crime_rate leads to an increase in population and vice versa. Therefore, there is a negative correlation between the two variables.

Also, from the linear regression model, we reject the null hypothesis and accept the alternative hypothesis. We conclude that there is a strong relationship between Crime_rate and Population for the 25 year period of study (1994-2018).