Samuel I Kigamba
December 10, 2019
Goal:
My objective in this project is seeking to understand the relationship between population growth and the crime rates in the United States between the year 1994 to 2018 and to comment and conclude on the overal pattern.
To analyze the crime rates in the US for the period from 1994 to 2018, we will use the data set provided by the FBI. See link: https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/tables/table-1
https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.csv
I downloaded data for the period 1994 - 2018 and saved it as a .csv file in my local drive and later uploaded it into my github.
DATA: https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.cs
#Reading the dataset from my Github repository
crime_data <- read_csv("https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.csv")
#To analyze the data frame in detail uncomment the following lines of code.
#glimpse(crime_data)
#str(crime_data)
#summary(crime_data)
head(crime_data, 2) #Confirm data was loaded correctly
## # A tibble: 2 x 24
## Year Population Violent_crime Violent_crime_r~ Murder_and_nonn~
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1994 260327021 1857670 714. 23326
## 2 1995 262803276 1798792 684. 21606
## # ... with 19 more variables:
## # Murder_and_nonnegligent_manslaughter_rate <dbl>, Rape <dbl>,
## # Rape_rate <dbl>, Robbery <dbl>, Robbery_rate <dbl>,
## # Aggravated_assault <dbl>, Aggravated_assault_rate <dbl>,
## # Property_crime <dbl>, Property_crime_rate <dbl>, Burglary <dbl>,
## # Burglary_rate <dbl>, Larceny_theft <dbl>, Larceny_theft_rate <dbl>,
## # Motor_vehicle_theft <dbl>, Motor_vehicle_theft_rate <dbl>, X21 <lgl>,
## # `_` <lgl>, X23 <lgl>, X24 <lgl>
## [1] 27 24
#Select columns relevant to our analysis [1:20 only]. Note Column 21:24 are empty
crime_data <- crime_data[, 1:20]
# Add a column on % Crime Rate (Total crimes / Total Population)
crime_data <- crime_data %>% mutate(PercentageCrimeRate = (Violent_crime + Property_crime) / Population * 100)
# Add a column on Average Crime Rate (Total crimes / N (no of crime classifications))
crime_data %>% mutate(AverageCrimeRate = (Violent_crime_rate + Property_crime_rate) / 2) -> crime_data
names(crime_data)# Print out all column names to confirm the two just added
## [1] "Year"
## [2] "Population"
## [3] "Violent_crime"
## [4] "Violent_crime_rate"
## [5] "Murder_and_nonnegligent_manslaughter"
## [6] "Murder_and_nonnegligent_manslaughter_rate"
## [7] "Rape"
## [8] "Rape_rate"
## [9] "Robbery"
## [10] "Robbery_rate"
## [11] "Aggravated_assault"
## [12] "Aggravated_assault_rate"
## [13] "Property_crime"
## [14] "Property_crime_rate"
## [15] "Burglary"
## [16] "Burglary_rate"
## [17] "Larceny_theft"
## [18] "Larceny_theft_rate"
## [19] "Motor_vehicle_theft"
## [20] "Motor_vehicle_theft_rate"
## [21] "PercentageCrimeRate"
## [22] "AverageCrimeRate"
populationGrowth <-
ggplot(crime_data, aes(Year, Population, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Population Size Between 1994 - 2018") +
xlab("Years") + ylab("Population") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(populationGrowth)
ViolentCrimeRate <-
ggplot(crime_data, aes(Year, Violent_crime_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Violent crime rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Violent_crime") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(ViolentCrimeRate)
Murder_and_nonnegligent_manslaughter_rate <-
ggplot(crime_data, aes(Year, Murder_and_nonnegligent_manslaughter_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Rape Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Rate_rate") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Murder_and_nonnegligent_manslaughter_rate)
Rape_rate <-
ggplot(crime_data, aes(Year, Rape_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Rape Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Rate_rate") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Rape_rate)
Robbery_rate <-
ggplot(crime_data, aes(Year, Robbery_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Robberly Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Robberly_Rate") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Robbery_rate)
Aggravated_assault_rate <-
ggplot(crime_data, aes(Year, Aggravated_assault_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Aggrevated_Assualt per 100,000 growth between 1994 - 2018") + xlab("Years") + ylab("Aggrevated_Assualt") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Aggravated_assault_rate)
Property_crime_rate <-
ggplot(crime_data, aes(Year, Property_crime_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Property Crime per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Property_crime") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Property_crime_rate)
Burglary_rate <-
ggplot(crime_data, aes(Year, Burglary_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Burglary Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Burglary_Rate") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Burglary_rate)
Larceny_theft_rate <-
ggplot(crime_data, aes(Year, Larceny_theft_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Larcency theft per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Larcency_theft") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Larceny_theft_rate)
Motor_vehicle_theft_rate <-
ggplot(crime_data, aes(Year, Motor_vehicle_theft_rate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Motor_vehicle_theft per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Violent_crime") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(Motor_vehicle_theft_rate)
AverageCrimeRate <-
ggplot(crime_data, aes(Year, AverageCrimeRate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("Average Crime Rate per 100,000 between 1994 - 2018") + xlab("Years") + ylab("Violent_crime") +
theme(
plot.title = element_text(color="blue", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="blue", size=15, face="bold"))
ggplotly(AverageCrimeRate)
PercentageCrimeRate <-
ggplot(crime_data, aes(Year, PercentageCrimeRate, group = 1)) +
geom_line(linetype = "dashed", color = "red") + geom_point()+
ggtitle("% Crime Rate between 1994 - 2018") + xlab("Year") + ylab("Overall % Crime Rate") +
theme(
plot.title = element_text(color="red", size=15, face="bold.italic"),
axis.text.x = element_text(angle=60, hjust=1),
axis.title.x = element_text(color="blue", size=15, face="bold"),
axis.title.y = element_text(color="orange", size=15, face="bold"))
ggplotly(PercentageCrimeRate)
In this section we will create a linear regression model to see if there exists a strong relationship between population and PercentageCrimeRate
We create a function to calculate the correlation between population and PercentageCrimeRate
findCorrelation <- function() {
x = crime_data$PercentageCrimeRate
y = crime_data$Population
corr = round(cor(x, y),4)
print (paste0("Correlation = ",corr))
return (corr)}
c = findCorrelation()
## [1] "Correlation = NA"
findStatsFunction <- function() {
m = lm (PercentageCrimeRate ~ Population, data = crime_data)
s = summary(m)
print(s)
slp = round(m$coefficients[2], 4)
int = round(m$coefficients[1], 4)
return (m)
}
m = findStatsFunction()
##
## Call:
## lm(formula = PercentageCrimeRate ~ Population, data = crime_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44329 -0.04349 0.02860 0.07695 0.20442
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.485e+01 4.096e-01 36.25 <2e-16 ***
## Population -3.718e-08 1.378e-09 -26.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1434 on 23 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.9694, Adjusted R-squared: 0.968
## F-statistic: 727.8 on 1 and 23 DF, p-value: < 2.2e-16
\[ PercentageCrimeRate = 14.85 + (-3.718e-8)(Population) \]
plot = ggplot(crime_data, aes(Population, PercentageCrimeRate, group = 1)) +
geom_point(colour="blue") +
xlab("Population Size") +
ylab("PercentageCrimeRate") +
labs(title = "Population Size vs PercentageCrimeRate") +
geom_abline(aes(slope=round(m$coefficients[2], 4), intercept=round(m$coefficients[1], 4), color="red")
)
ggplotly(plot)
Linear Regression Equation:
\[ PercentageCrimeRate = 14.85 + (-3.718-8)(Population) \]
Note: The intercept is outside the data range, however it fits the data well within the residual standard error for all points within our dataset.
Multiple R-Square:0.9694
R-Square:0.968
Description: The model fits the data well with a strong negative correlation.
H_0 : Null Hypothesis There is no relationship between PercentageCrimeRate and population.
H_A : Alternative Hypothesis There is a relationship between PercentageCrimeRate and population.
Here the multiple R value is 0.9694 which shows that there is significant correlation between PercentageCrimeRate and Population. Also the value of R square is 0.968 which shows the extent to which the PercentageCrimeRate affects the Population. Therefore, we reject the null hypothesis H_0 and accept the Alternative hypothesis H_A.
The analysis above leads us to the following conclusions that the average rate of crime has decreased steadily over the period between 1994 and 2018.
https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/tables/table-1
https://raw.githubusercontent.com/igukusamuel/DATA-607-Final-Project/master/CrimeData.cs
http://rpubs.com/igukusamuel/557830
https://bookdown.org/yihui/rmarkdown/slidy-presentation.html
https://github.com/igukusamuel/DATA-607-Final-Project/blob/master/607-Final%20Project.pdf
https://github.com/igukusamuel/DATA-607-Final-Project/blob/master/DATA%20607%20Final%20Project.Rmd
https://github.com/igukusamuel/DATA-607-Final-Project/blob/master/DATA-607-Final-Project.html
https://ucr.fbi.gov/crime-in-the-u.s
https://bookdown.org/yihui/rmarkdown/slidy-presentation.html