DATA 607 Final Project

Samuel I Kigamba

December 10, 2019

SECTION ONE - INTRODUCTION

Objective of this analysis.

Goal:

My objective in this project is seeking to understand the relationship between population growth and the crime rates in the United States between the year 1994 to 2018 and to comment and conclude on the overal pattern.

Data sources.

To analyze the crime rates in the US for the period from 1994 to 2018, we will use the data set provided by the FBI. See link: https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/tables/table-1

https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.csv

SECTION TWO - Data Acquisition

I downloaded data for the period 1994 - 2018 and saved it as a .csv file in my local drive and later uploaded it into my github.

DATA: https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.cs

load libraries

library(ggplot2)
library(XML)
library(RCurl)
library(knitr)
library(dplyr)
library(plyr)
library(tidyr)
library(plotly)
library(tidyverse)
library(tidyselect)
library(data.table)
library(readxl)
library(fBasics)

load data

#Reading the dataset from my Github repository
crime_data <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.csv")

#To analyze the data frame in detail uncomment the following lines of code.
#glimpse(crime_data)
#str(crime_data)
#summary(crime_data)

head(crime_data, 2) #Confirm data was loaded correctly
## # A tibble: 2 x 24
##    Year Population Violent_crime Violent_crime_r~ Murder_and_nonn~
##   <dbl>      <dbl>         <dbl>            <dbl>            <dbl>
## 1  1994  260327021       1857670             714.            23326
## 2  1995  262803276       1798792             684.            21606
## # ... with 19 more variables:
## #   Murder_and_nonnegligent_manslaughter_rate <dbl>, Rape <dbl>,
## #   Rape_rate <dbl>, Robbery <dbl>, Robbery_rate <dbl>,
## #   Aggravated_assault <dbl>, Aggravated_assault_rate <dbl>,
## #   Property_crime <dbl>, Property_crime_rate <dbl>, Burglary <dbl>,
## #   Burglary_rate <dbl>, Larceny_theft <dbl>, Larceny_theft_rate <dbl>,
## #   Motor_vehicle_theft <dbl>, Motor_vehicle_theft_rate <dbl>, X21 <lgl>,
## #   `_` <lgl>, X23 <lgl>, X24 <lgl>
dim(crime_data)
## [1] 27 24

Data transformation and clean-up

#Select columns relevant to our analysis [1:20 only]. Note Column 21:24 are empty
crime_data <- crime_data[, 1:20]
# Add a column on % Crime Rate (Total crimes / Total Population)
crime_data <- crime_data %>% mutate(PercentageCrimeRate = (Violent_crime + Property_crime) / Population * 100)
# Add a column on Average Crime Rate (Total crimes / N (no of crime classifications))
crime_data %>% mutate(AverageCrimeRate = (Violent_crime_rate + Property_crime_rate) / 2) -> crime_data
names(crime_data)# Print out all column names to confirm the two just added
##  [1] "Year"                                     
##  [2] "Population"                               
##  [3] "Violent_crime"                            
##  [4] "Violent_crime_rate"                       
##  [5] "Murder_and_nonnegligent_manslaughter"     
##  [6] "Murder_and_nonnegligent_manslaughter_rate"
##  [7] "Rape"                                     
##  [8] "Rape_rate"                                
##  [9] "Robbery"                                  
## [10] "Robbery_rate"                             
## [11] "Aggravated_assault"                       
## [12] "Aggravated_assault_rate"                  
## [13] "Property_crime"                           
## [14] "Property_crime_rate"                      
## [15] "Burglary"                                 
## [16] "Burglary_rate"                            
## [17] "Larceny_theft"                            
## [18] "Larceny_theft_rate"                       
## [19] "Motor_vehicle_theft"                      
## [20] "Motor_vehicle_theft_rate"                 
## [21] "PercentageCrimeRate"                      
## [22] "AverageCrimeRate"

SECTION THREE - Data analysis

Population size between 1994 to 2018

populationGrowth <-
        ggplot(crime_data, aes(Year, Population, group = 1)) + 
        geom_line(linetype = "dashed", color = "red") + geom_point()+
        ggtitle("Population Size Between 1994 - 2018") +
        xlab("Years") + ylab("Population") +
        theme(
                plot.title = element_text(color="blue", size=15, face="bold.italic"),
                axis.text.x = element_text(angle=60, hjust=1),
                axis.title.x = element_text(color="blue", size=15, face="bold"),
                axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(populationGrowth)

The total population has been growing over the years from 1994 to 2018. We note a sharp increase between the year 1999 to 2001.

Crime Types

1. Violent Crimes

ViolentCrimeRate <-
   ggplot(crime_data, aes(Year, Violent_crime_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") + geom_point()+
   ggtitle("Violent crime rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Violent_crime") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(ViolentCrimeRate)

The Violent crime rate per 100,000 between 1994 - 2018 has been decreasing over the years.

2. Murder and manslaughter

Murder_and_nonnegligent_manslaughter_rate <-
   ggplot(crime_data, aes(Year, Murder_and_nonnegligent_manslaughter_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") + geom_point()+
   ggtitle("Rape Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Rate_rate") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Murder_and_nonnegligent_manslaughter_rate)

The Murder_and_nonnegligent_manslaughter_rate per 100,000 has been decreasing over the years between 1994 - 2018 with a small increment between 2014 to 2016. There after the rate decrease.

3. Rape

Rape_rate <-
   ggplot(crime_data, aes(Year, Rape_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") + geom_point()+
   ggtitle("Rape Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Rate_rate") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Rape_rate)

The Rape Rate per 100,000 had been decreasing over the years from 1994 - 2013 before steadily increasing to the year 2018.

4. Robberly

Robbery_rate <-
   ggplot(crime_data, aes(Year, Robbery_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") + geom_point()+
   ggtitle("Robberly Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Robberly_Rate") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Robbery_rate)

The Robberly Rate per 100,000 has been decreasing over the years between 1994 - 2018.

5. Aggravated Assault

Aggravated_assault_rate <-
   ggplot(crime_data, aes(Year, Aggravated_assault_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") + geom_point()+
   ggtitle("Aggrevated_Assualt per 100,000 growth between 1994 - 2018") + xlab("Years") + ylab("Aggrevated_Assualt") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Aggravated_assault_rate)

The Aggrevated_Assualt per 100,000 has been decreasing over the years between 1994 - 2014 with a small increment between 2014 to 2016. There after the rate decrease between 2016-2018.

6. Property Crime

Property_crime_rate <-
   ggplot(crime_data, aes(Year, Property_crime_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") +  geom_point()+
   ggtitle("Property Crime per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Property_crime") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Property_crime_rate)

The Property Crime per 100,000 has been decreasing over the years between 1994 - 2018.

7. Burglary

Burglary_rate <-
   ggplot(crime_data, aes(Year, Burglary_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") +  geom_point()+
   ggtitle("Burglary Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Burglary_Rate") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Burglary_rate)

The Burglary Rate per 100,000 has been decreasing over the years between 1994 - 2018.

8. Larcency theft

Larceny_theft_rate <-
   ggplot(crime_data, aes(Year, Larceny_theft_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") +  geom_point()+
   ggtitle("Larcency theft per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Larcency_theft") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Larceny_theft_rate)

The Larcency theft per 100,000 has been decreasing over the years between 1994 - 2018.

9. Motor_vehicle_theft

Motor_vehicle_theft_rate <-
   ggplot(crime_data, aes(Year, Motor_vehicle_theft_rate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") +  geom_point()+
   ggtitle("Motor_vehicle_theft per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Violent_crime") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Motor_vehicle_theft_rate)

The Motor_vehicle_theft per 100,000 has been decreasing over the years between 1994 - 2018.

Overall Crime Rate

1. Average Crime

AverageCrimeRate <-
   ggplot(crime_data, aes(Year, AverageCrimeRate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") +  geom_point()+
   ggtitle("Average Crime Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Violent_crime") +
   theme(
     plot.title = element_text(color="blue", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(AverageCrimeRate)

2. Percentage Crime Rate

PercentageCrimeRate <-
   ggplot(crime_data, aes(Year, PercentageCrimeRate, group = 1)) + 
   geom_line(linetype = "dashed", color = "red") +  geom_point()+
   ggtitle("% Crime Rate between 1994 - 2018") + xlab("Year") + ylab("Overall % Crime Rate") +
   theme(
     plot.title = element_text(color="red", size=15, face="bold.italic"),
     axis.text.x = element_text(angle=60, hjust=1),
     axis.title.x = element_text(color="blue", size=15, face="bold"),
     axis.title.y = element_text(color="orange", size=15, face="bold"))
ggplotly(PercentageCrimeRate)

SECTION FOUR - Statistical Analysis:

TEST: Population vs PercentageCrimeRate

In this section we will create a linear regression model to see if there exists a strong relationship between population and PercentageCrimeRate

We create a function to calculate the correlation between population and PercentageCrimeRate

findCorrelation <- function() {
  x = crime_data$PercentageCrimeRate
  y = crime_data$Population
  corr = round(cor(x, y),4)
  print (paste0("Correlation = ",corr))
  return (corr)}
c = findCorrelation()
## [1] "Correlation = NA"

We create a function for Linear Model

findStatsFunction <- function() {
  m = lm (PercentageCrimeRate ~ Population, data = crime_data)
  s = summary(m)
  print(s)
  
  slp = round(m$coefficients[2], 4)
  int = round(m$coefficients[1], 4)

  return (m)
}
m = findStatsFunction()
## 
## Call:
## lm(formula = PercentageCrimeRate ~ Population, data = crime_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44329 -0.04349  0.02860  0.07695  0.20442 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.485e+01  4.096e-01   36.25   <2e-16 ***
## Population  -3.718e-08  1.378e-09  -26.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1434 on 23 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.9694, Adjusted R-squared:  0.968 
## F-statistic: 727.8 on 1 and 23 DF,  p-value: < 2.2e-16

\[ PercentageCrimeRate = 14.85 + (-3.718e-8)(Population) \]

Display the Linear Model

plot = ggplot(crime_data, aes(Population, PercentageCrimeRate, group = 1)) +
  geom_point(colour="blue") +
  xlab("Population Size") +
  ylab("PercentageCrimeRate") +
  labs(title = "Population Size vs PercentageCrimeRate") +
  geom_abline(aes(slope=round(m$coefficients[2], 4), intercept=round(m$coefficients[1], 4), color="red")
              )

ggplotly(plot)

There is a negative relationship between population increase and the rate of crime.

Regression Statistics

Linear Regression Equation:

\[ PercentageCrimeRate = 14.85 + (-3.718-8)(Population) \]

Note: The intercept is outside the data range, however it fits the data well within the residual standard error for all points within our dataset.

Multiple R-Square:0.9694

R-Square:0.968

Description: The model fits the data well with a strong negative correlation.

Hypothesis Testing

H_0 : Null Hypothesis There is no relationship between PercentageCrimeRate and population.

H_A : Alternative Hypothesis There is a relationship between PercentageCrimeRate and population.

Here the multiple R value is 0.9694 which shows that there is significant correlation between PercentageCrimeRate and Population. Also the value of R square is 0.968 which shows the extent to which the PercentageCrimeRate affects the Population. Therefore, we reject the null hypothesis H_0 and accept the Alternative hypothesis H_A.

SECTION FIVE - Conclusion:

Conclusion

The analysis above leads us to the following conclusions that the average rate of crime has decreased steadily over the period between 1994 and 2018.

References

https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/tables/table-1

https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.cs

http://rpubs.com/igukusamuel/557830

https://bookdown.org/yihui/rmarkdown/slidy-presentation.html

https://github.com/igukusamuel/DATA-607-Final-Project/blob/master/607-Final%20Project.pdf

https://github.com/igukusamuel/DATA-607-Final-Project/blob/master/DATA%20607%20Final%20Project.Rmd

https://github.com/igukusamuel/DATA-607-Final-Project/blob/master/DATA-607-Final-Project.html

https://ucr.fbi.gov/crime-in-the-u.s

https://bookdown.org/yihui/rmarkdown/slidy-presentation.html