library(ggplot2)
library(XML)
library(RCurl)
library(knitr)
library(dplyr)
library(plyr)
library(tidyr)
library(dplyr)
library(ggplot2)
library(sqldf)
library(tidyr)
library(plotly)
library(tidyverse)
library(tidyselect)
library(data.table)
library(readxl)
library(fBasics)
library(lattice)
library(MASS)
Population size and crime rate have historically been seen as closely related functions of one another. We seek to answer the question on whether high populations translates to high crime rates. As population grows so does crime rates in most coutries due to the governments inability to create sustaining environment to support the usually rapid population growths resulting into higher crime rates.
The main objective of this project is to study the relationship and correlation between population size and crime rate in the United States between the year 1994 to 2018.
The data is collected from the FBI Website which provides free historical data on population size and crime. Link >>>>> https://ucr.fbi.gov/crime-in-the-u.s
This is observational study.
The dataset for this project is downloaded from the FBI website on crime rates in the US for the period from 1994 to 2018:
Crime rate is the response variable. It is numerical continuous variable.
The explanatory variable is the population size and is numerical.
Each case represents the population size and crime rate in a given year within our period of study. The full dataset represents data for 25 years with approximately 25 cases.
Crimes data is loaded directly from the csv file uploaded into my Github.
crime_data = read_csv(file="https://raw.githubusercontent.com/igukusamuel/DATA-606-Final-Project/master/CrimeData.csv")
head(crime_data[ ,1:5])
## # A tibble: 6 x 5
## Year Population Violent_crime Violent_crime_ra~ Murder_and_nonnegligent~
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1994 260327021 1857670 714 23326
## 2 1995 262803276 1798792 685 21606
## 3 1996 265228572 1688540 637 19645
## 4 1997 267783607 1636096 611 18208
## 5 1998 270248003 1533887 568 16974
## 6 1999 272690813 1426044 523 15522
#Select columns relevant to our analysis [1:20 only].
#Note Column 21:24 are empty
crime_data <- crime_data[, 1:20]
# Add a column on Total_crime (Violent_crime + Property_crime)
crime_data <- crime_data %>% mutate(Total_crime = (Violent_crime + Property_crime))
# Add a column on Crime_ratio (Population / Total_crime)
crime_data <- crime_data %>% mutate(Crime_rate = (Total_crime / Population)*100)
# Print out all column names to confirm the two just added
names(crime_data)
## [1] "Year"
## [2] "Population"
## [3] "Violent_crime"
## [4] "Violent_crime_rate"
## [5] "Murder_and_nonnegligent_manslaughter"
## [6] "Murder_and_nonnegligent_manslaughter_rate"
## [7] "Rape"
## [8] "Rape_rate"
## [9] "Robbery"
## [10] "Robbery_rate"
## [11] "Aggravated_assault"
## [12] "Aggravated_assault_rate"
## [13] "Property_crime"
## [14] "Property_crime_rate"
## [15] "Burglary"
## [16] "Burglary_rate"
## [17] "Larceny_theft"
## [18] "Larceny_theft_rate"
## [19] "Motor_vehicle_theft"
## [20] "Motor_vehicle_theft_rate"
## [21] "Total_crime"
## [22] "Crime_rate"
crime_data1 <- dplyr::select(crime_data, Year, Population, Total_crime, Crime_rate)
head(crime_data1)
## # A tibble: 6 x 4
## Year Population Total_crime Crime_rate
## <chr> <dbl> <dbl> <dbl>
## 1 1994 260327021 13989543 5.37
## 2 1995 262803276 13862727 5.27
## 3 1996 265228572 13493863 5.09
## 4 1997 267783607 13194571 4.93
## 5 1998 270248003 12485714 4.62
## 6 1999 272690813 11634378 4.27
Population <-
ggplot(crime_data1, aes(Year, Population, group = 1)) +
geom_line(linetype = "dashed", color = "red") +
geom_point()+
ggtitle("Population between 1994 - 2018") +
xlab("Years") + ylab("Population") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold")
)
ggplotly(Population)
Total_Crime <-
ggplot(crime_data1, aes(Year, Total_crime, group = 1)) +
geom_line(linetype = "dashed", color = "red") +
geom_point()+
ggtitle("Total_crime between 1994 - 2018") +
xlab("Years") + ylab("Total_crime") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold")
)
ggplotly(Total_Crime)
Crime_Rate <-
ggplot(crime_data1, aes(Year, Crime_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") +
geom_point()+
ggtitle("Crime_Rate between 1994 - 2018") +
xlab("Years") + ylab("Crime_Rate") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold")
)
ggplotly(Crime_Rate)
findCorrelation <- function() {
x = crime_data1$Crime_rate
y = crime_data1$Population
corr = round(cor(x, y),4)
print (paste0("Correlation = ",corr))
return (corr)
}
c = findCorrelation()
## [1] "Correlation = NA"
findStatsFunction <- function() {
m = lm (Crime_rate ~ Population, data = crime_data1)
s = summary(m)
print(s)
slp = round(m$coefficients[2], 4)
int = round(m$coefficients[1], 4)
return (m)
}
m = findStatsFunction()
##
## Call:
## lm(formula = Crime_rate ~ Population, data = crime_data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44329 -0.04349 0.02860 0.07695 0.20442
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.485e+01 4.096e-01 36.25 <2e-16 ***
## Population -3.718e-08 1.378e-09 -26.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1434 on 23 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.9694, Adjusted R-squared: 0.968
## F-statistic: 727.8 on 1 and 23 DF, p-value: < 2.2e-16
\[ Crimerate = 14.85 + (-3.718e-8)(Population) \]
lmodel = ggplot(crime_data1, aes(Population, Crime_rate, group = 1)) +
geom_point(colour="blue") +
xlab("Crime_rate") +
ylab("Population Size") +
labs(title = "Population Size vs Crime_rate") +
geom_abline(aes(slope=round(m$coefficients[2], 4), intercept=round(m$coefficients[1], 4), color="red"))
ggplotly(lmodel)
Linear Regression Equation:
\[ Crimerate = 14.85 + (-3.718-8)(Population) \]
Note: The intercept is outside the data range, however it fits the data well within the residual standard error for all points within our dataset.
Multiple R-Square:0.9694
R-Square:0.968
Description: The model fits the data well with a strong negative correlation.
H_0 : Null Hypothesis There is no relationship between Crime_rate and population.
H_A : Alternative Hypothesis There is a relationship between Crime_rate and population.
Here the multiple R value is 0.9694 which shows that there is significant correlation between Crime_rate and Population. Also the value of R square is 0.968 which shows the extent to which the Total_Crime affects the Population. Therefore, we reject the null hypothesis H_0 and accept the Alternative hypothesis H_A.
We notice that the two variables, Crime_rate and Population, change in the opposite direction. An decrease in Crime_rate leads to an increase in population and vice versa. Therefore, there is a negative correlation between the two variables.
Also, from the linear regression model, we reject the null hypothesis and accept the alternative hypothesis. We conclude that there is a strong relationship between Crime_rate and Population for the 25 year period of study (1994-2018).
https://ucr.fbi.gov/crime-in-the-u.s https://ucr.fbi.gov/crime-in-the-u.s/2013/crime-in-the-u.s.-2013/tables/1tabledatadecoverviewpdf/table_1_crime_in_the_united_states_by_volume_and_rate_per_100000_inhabitants_1994-2013.xls http://rpubs.com/igukusamuel/558552 https://raw.githubusercontent.com/igukusamuel/DATA-606-Final-Project/master/CrimeData.csv https://github.com/igukusamuel/DATA-606-Final-Project/blob/master/DATA%20606%20Final%20Project.Rmd https://github.com/igukusamuel/DATA-606-Final-Project/blob/master/DATA-606-Final-Project.html https://github.com/igukusamuel/DATA-606-Final-Project/blob/master/Project%20Proposal.Rmd https://bookdown.org/yihui/rmarkdown/slidy-presentation.html https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/tables/table-1