After preparing the data, the first thing I did was run a regression analysis of the violent crimes rate and proportions of different races. The correlation matrix and significance test is shown. The significance test was done with a level at 0.05. Correlations that did not meet the significance level are crossed out.
Looking at the correlation matrix, three numbers stood out.
Or in other words:
However, a concept you learn in any statistics class is that correlation does not mean causation. An age old example is ice cream sales and shark bites. As ice cream sales increase, shark bites increase. Although the two are correlated, shark bites is not caused by ice cream sales. The hotter the temperature, the greater the ice cream sales. The hotter the temperature, the more people go to the beach. The more people at the beach, the greater the likelihood of a shark bite. The causation in this scenario would be the more people at the beach.
I believe that violent crimes and black proportion is analogous and the underlying causation is poverty. To test this, I gathered the Small Area Income and Poverty Estimates (SAIPE) from the United States Census Bureau. I calculated the proportion of population in poverty within each county. I then found the percentile ranges of the poverty proportions and plotted that against the violent crimes per thousand and race proportions as shown on the next panel.
You can toggle through the different line plots using the legend but the thick blue line represents the average number of violent crimes per thousand within each percentile range. This confirms the earlier suspicion that poverty is also directly correlated with violent crimes. The green line represents the correlation between black population proportion and the violent crime rate within that poverty percentile range. At smaller percentile ranges, there is no significant difference between black populations and other minority groups. As the poverty percentile range increases, the black proportion population begins to separate itself with an increasing correlation.
By taking the running total, you can see the strong positive correlation violent crimes has with poverty as well as the growing correlation of black population proportion to the value of 0.49.
But that still leaves the question, even as poverty increases, why does the black population have such a strong correlation with violent crimes? This is because as the proportion of the black population increases, so do the poverty rates. Click the next panel to view this relationship.
As the black population proportion increases, so does the poverty rate. This means that the greater the poverty rate in a county, the more likely that there will be an increasing black population proportion.
The opposite is in fact true for white people.
As poverty rate increases, the proportion of white people decreases. This can also explain the earlier finding that the correlation between violent crimes and the white population being -0.49. The earlier statistic may have suggested that white people are less violent than all other minority groups, but it is possible that the true causation is that white proportion is negatively correlated with poverty rates causing the negative correlation with violent crimes.
And here, you can see the two plotted together to see the stark contrast.
And in case you were curious, here are all the races population proportion against poverty level. White population proportion is the only one that decreases with increasing poverty and the black population proportion is the only one that increases. The remaining minority groups remain relatively constant.
So what does this all mean?
Yes, as black proportion goes up, so does the rate of violent crimes. But violent crimes will go up at higher poverty rates regardless of race. And there is the unfortunate fact that as black proportion goes up, so does the rate of poverty explaining the strong correlation that black population proportion has with the violent crime rate.
To summarize everything, violence has more to do with poverty then race.
So if you want to disparage an already disenfranchised minority group, level the playing field and address poverty first.
As with any other analysis, I understand what I provided may have a lot of limitations.
Limitations
The biggest limitation of this study was that I compared proportions of populations with crime and poverty. To get a most accurate picture, it would have been best to look at the demographic committing each violent crime with its associated poverty rate. This data was not consistently available as not all counties/states released that data.
Another limitation is that I looked at the data at the county level. Due to the very distinct segregation, this can hinder some of the results. For example, Cook County encapsulates all of Chicago. But within Chicago, it only takes a mile to see both ends of the living condition spectrum. This means that the proportion of poverty within Cook County does not accurately represent it.
Lastly, correlation does not mean causation and statistics alone cannot prove causation. Poverty may be the cause, it may not be, but it is definitely correlated and both sides of the opposition must be willing to look at everything objectively.
Data Sources
---
title: "Violent Crimes, Blacks, and Poverty"
author: "David Sung"
output:
flexdashboard::flex_dashboard:
theme: journal
storyboard: true
social: menu
source: embed
---
```{r setup, include=FALSE}
library(plotly)
library(sqldf)
library(knitr)
library(DT)
setwd("C:\\Users\\sungdavid\\Documents\\Personal\\pov")
povcrime <- read.csv('./data/povcrime.csv')
povcrime$prop_pov <- as.numeric(as.character(sub("," , ".", povcrime$prop_pov)))
povcrime$prop_pov <- povcrime$prop_pov / 100
q = "SELECT * FROM povcrime WHERE class = 'county'"
povcrime <- sqldf(q)
library(Hmisc)
library(corrplot)
# Table view
tableview <- data.frame(county = povcrime$name, state = povcrime$state.1, vc_pertho = povcrime$vc_pertho,
prop_pov = povcrime$prop_pov, white_prop = povcrime$white_prop, black_prop = povcrime$blackmix_prop,
asian_prop = povcrime$asianmix_prop, hispanic_prop = povcrime$hispanic_prop,
native_prop = povcrime$nativemix_prop, pacific_prop = povcrime$pimix_prop)
# Create plots and accessible dataframe
met <- data.frame(vc_pertho = povcrime$vc_pertho, prop_pov = povcrime$prop_pov,
white_prop = povcrime$white_prop, black_prop = povcrime$blackmix_prop,
asian_prop = povcrime$asianmix_prop, hispanic_prop = povcrime$hispanic_prop,
native_prop = povcrime$nativemix_prop, pacific_prop = povcrime$pimix_prop)
cor_df <- data.frame(vc_pertho = povcrime$vc_pertho,
white_prop = povcrime$white_prop, black_prop = povcrime$blackmix_prop,
asian_prop = povcrime$asianmix_prop, hispanic_prop = povcrime$hispanic_prop,
native_prop = povcrime$nativemix_prop, pacific_prop = povcrime$pimix_prop)
# Create correlation plots
cor <- rcorr(as.matrix(cor_df), type = 'pearson')
col <- colorRampPalette(c("blue", "white", "red"))
cor_plot_r <- corrplot(cor$r, p.mat = cor$r, method = 'color', insig = "p-value", sig.level=-1, addrect=2, col=col(20))
cor_plot_p <- corrplot(cor$r, p.mat = cor$P, method = 'color', sig.level=.05, addrect=2, col=col(20))
# Create percentiles data frame from 0 to 100% incremented by 5%
percentiles <- data.frame(quantile(met$prop_pov, seq(from = .05, to = 1, by = 0.05), na.rm = TRUE))
percentilerange <- c('0 to 05%', '05 to 10%', '10 to 15%', '15 to 20%', '20 to 25%',
'25 to 30%', '30 to 35%', '35 to 40%', '40 to 45%', '45 to 50%',
'50 to 55%', '55 to 60%', '60 to 65%', '65 to 70%', '70 to 75%',
'75 to 80%', '80 to 85%', '85 to 90%', '90 to 95%', '95 to 100%')
# Initialize the running total data frame
run_df <- data.frame(percentile = seq(from = 5, to = 100, by = 5),
vc_pertho = 0, white_prop = 0, black_prop = 0,
asian_prop = 0, hispanic_prop = 0,
native_prop = 0, pacific_prop = 0, obs = 0)
# Fill the running total data frame with the correlation values
for (i in 1:(nrow(percentiles))) {
q1 <- "SELECT * FROM met WHERE prop_pov <= "
q4 <- percentiles[i, 1]
q <- paste0(q1, q4)
temp_df <- sqldf(q)
run_df[i, 'obs'] <- nrow(temp_df)
run_df[i, 'vc_pertho'] <- mean(temp_df$vc_pertho, na.rm = TRUE)
temp_cor <- rcorr(as.matrix(temp_df), type = 'pearson')
for (t in 2:(ncol(run_df)-2)) {
run_df[i, t+1] <- temp_cor$r[t+1,1]
}
}
# Create running total plot
ay <- list(title = 'Corr b/t VC and Race',
showgrid = FALSE)
ay2 <- list(
overlaying = "y",
side = "right",
title = "VC/tho",
showgrid = FALSE)
ax <- list(title = "Poverty Percentile",
showgrid = FALSE)
run_plt <- plot_ly(data = run_df, x = ~percentile) %>%
add_lines(y = ~vc_pertho, name = "RunTot Avg",
line = list(shape = "spline", width = 10), yaxis = 'y2') %>%
add_lines(y = ~white_prop, name = "White", line = list(shape = "spline")) %>%
add_lines(y = ~black_prop, name = "Black", line = list(shape = "spline")) %>%
add_lines(y = ~asian_prop, name = "Asian", line = list(shape = "spline")) %>%
add_lines(y = ~hispanic_prop, name = "Hispanic", line = list(shape = "spline")) %>%
add_lines(y = ~native_prop, name = "Nat Amer", line = list(shape = "spline")) %>%
add_lines(y = ~pacific_prop, name = "Pac Isl", line = list(shape = "spline")) %>%
layout(
legend = list(orientation = 'h'), title = "Running Total of Correlation and Total Average of VC vs. Poverty Proportion Percentile",
yaxis = ay,
yaxis2 = ay2,
xaxis = ax
)
# Initialize the range data frame
range_df <- data.frame('percentilerange' = percentilerange,
vc_pertho = 0, white_prop = 0, black_prop = 0,
asian_prop = 0, hispanic_prop = 0,
native_prop = 0, pacific_prop = 0, obs = 0)
range_df$percentilerange <- as.character(range_df$percentilerange)
# Fill the range data frame with correlation values
for (i in 1:(nrow(percentiles))) {
if (i == 1) {
q1 <- "SELECT * FROM met where prop_pov < "
q2 <- percentiles[i, 1]
q <- paste0(q1, q2)
temp_df <- sqldf(q)
} else {
q1 <- "SELECT * FROM met WHERE prop_pov >= "
q2 <- percentiles[i-1, 1]
q3 <- " AND prop_pov < "
q4 <- percentiles[i, 1]
q <- paste0(q1, q2, q3, q4)
temp_df <- sqldf(q)
}
range_df[i, 'obs'] <- nrow(temp_df)
range_df[i, 'vc_pertho'] <- mean(temp_df$vc_pertho, na.rm = TRUE)
temp_cor <- rcorr(as.matrix(temp_df), type = 'pearson')
for (t in 2:(ncol(range_df)-2)) {
range_df[i, t+1] <- temp_cor$r[t+1,1]
}
}
# Initialize the all poverty range data frame
allrange_df <- data.frame('percentilerange' = percentilerange,
white_prop = 0, black_prop = 0, asian_prop = 0,
hispanic_prop = 0, native_prop=0, pacific_prop = 0)
# Fill the range data frame with correlation values
for (i in 1:(nrow(percentiles))) {
if (i == 1) {
q1 <- "SELECT * FROM met where prop_pov < "
q2 <- percentiles[i, 1]
q <- paste0(q1, q2)
temp_df <- sqldf(q)
} else {
q1 <- "SELECT * FROM met WHERE prop_pov >= "
q2 <- percentiles[i-1, 1]
q3 <- " AND prop_pov < "
q4 <- percentiles[i, 1]
q <- paste0(q1, q2, q3, q4)
temp_df <- sqldf(q)
}
for (t in 2:(ncol(allrange_df))) {
allrange_df[i, t] <- mean(temp_df[ , t+1], na.rm = TRUE)
}
}
m <- list(
l = 100,
r = 100,
b = 100,
t = 50,
pad = 4
)
# Create black range plot
allrange_plt <- plot_ly(data = allrange_df, x = ~percentilerange) %>%
add_lines(y = ~white_prop, name = "White",
line = list(shape = "spline")) %>%
add_lines(y = ~black_prop, name = "Black", line = list(shape = "spline")) %>%
add_lines(y = ~asian_prop, name = "Asian", line = list(shape = "spline")) %>%
add_lines(y = ~hispanic_prop, name = "Hispanic", line = list(shape = "spline")) %>%
add_lines(y = ~native_prop, name = "Native", line = list(shape = "spline")) %>%
add_lines(y = ~pacific_prop, name = "Pac Isl", line = list(shape = "spline")) %>%
layout(
yaxis = list(title = 'Avg Proportion'),
xaxis = list(title="Poverty Percentile Range", tickangle = 45),
margin = m,
legend = list(orientation = "h")
)
# Create range plot
ay <- list(title = 'Corr b/t VC and Race',
showgrid = FALSE)
ay2 <- list(
overlaying = "y",
side = "right",
title = "VC/tho",
showgrid = FALSE)
ax <- list(tickangle = 45,
categoryorder = "category ascending",
showgrid = FALSE)
range_plt <- plot_ly(data = range_df, x = ~percentilerange) %>%
add_lines(y = ~vc_pertho, name = "VC/tho Avg",
line = list(shape = "spline", width = 10), yaxis = 'y2') %>%
add_lines(y = ~white_prop, name = "White", line = list(shape = "spline")) %>%
add_lines(y = ~black_prop, name = "Black", line = list(shape = "spline")) %>%
add_lines(y = ~asian_prop, name = "Asian", line = list(shape = "spline")) %>%
add_lines(y = ~hispanic_prop, name = "Hispanic", line = list(shape = "spline")) %>%
add_lines(y = ~native_prop, name = "Nat Amer", line = list(shape = "spline")) %>%
add_lines(y = ~pacific_prop, name = "Pac Isl", line = list(shape = "spline")) %>%
layout(
legend = list(orientation = 'h'),
title = "Correlation and Average of VC vs. Poverty Proportion Percentile",
yaxis = ay,
yaxis2 = ay2,
xaxis = ax,
margin = m
)
# Race proportion vs. poverty proportion
black_plt <- plot_ly(data = met, x = ~prop_pov, y = ~black_prop, type = 'scatter') %>%
layout(
xaxis = list(title = 'Proportion of Poverty'),
yaxis = list(title = 'Proportion of Black')
)
white_plt<- plot_ly(data = met, x = ~prop_pov, y = ~white_prop, type = 'scatter') %>%
layout(
xaxis = list(title = 'Proportion of Poverty'),
yaxis = list(title = 'Proportion of White')
)
blackwhite_plt <- plot_ly(data = met, x = ~prop_pov, y = ~white_prop, name = "White", type = 'scatter') %>%
add_markers(y = ~black_prop, name = "Black", type = 'scatter') %>%
layout(
xaxis = list(title = 'Proportion of Poverty'),
yaxis = list(title = 'Proportion'),
legend = list(orientation = 'h')
)
# Correlation plot with R value
cor_plot_r <- corrplot(cor$r, p.mat = cor$r, method = 'color', insig = "p-value", sig.level=-1, addrect=2, col=col(20))
# Correlation plot using signifiance level of 0.05
cor_plot_p <- corrplot(cor$r, p.mat = cor$P, method = 'color', sig.level=.05, addrect=2, col=col(20))
```
### Correlation and signifiance matrix of race and the violent crime rate
```{r}
corrplot(cor$r, p.mat = cor$r, method = 'color', insig = "p-value", sig.level=-1, addrect=2, col=col(20))
corrplot(cor$r, p.mat = cor$P, method = 'color', sig.level=.05, addrect=2, col=col(20))
```
***
After preparing the data, the first thing I did was run a regression analysis of the violent crimes rate and proportions of different races. The correlation matrix and significance test is shown. The significance test was done with a level at 0.05. Correlations that did not meet the significance level are crossed out.
Looking at the correlation matrix, three numbers stood out.
- The correlation between violent crimes per thousand and white proportion is -0.49. The only negative correlation with violent crimes.
- The correlation between violent crimes per thousand and black proportion is 0.49. The highest correlation with violent crimes.
- The correlation between black proportion and white proportion is -0.84
Or in other words:
- As the proportion of whites increases, violent crime decreases
- As the proportion of blacks increases, violent crimes increases
- As the proportion of whites increases, the proportion of blacks decreases
However, a concept you learn in any statistics class is that correlation does not mean causation. An age old example is ice cream sales and shark bites. As ice cream sales increase, shark bites increase. Although the two are correlated, shark bites is not caused by ice cream sales. The hotter the temperature, the greater the ice cream sales. The hotter the temperature, the more people go to the beach. The more people at the beach, the greater the likelihood of a shark bite. The causation in this scenario would be the more people at the beach.
I believe that violent crimes and black proportion is analogous and the underlying causation is poverty. To test this, I gathered the Small Area Income and Poverty Estimates (SAIPE) from the United States Census Bureau. I calculated the proportion of population in poverty within each county. I then found the percentile ranges of the poverty proportions and plotted that against the violent crimes per thousand and race proportions as shown on the next panel.
### Correlation and the average violent crime rate plotted against percentile ranges
```{r}
range_plt
```
***
You can toggle through the different line plots using the legend but the thick blue line represents the average number of violent crimes per thousand within each percentile range. This confirms the earlier suspicion that poverty is also directly correlated with violent crimes. The green line represents the correlation between black population proportion and the violent crime rate within that poverty percentile range. At smaller percentile ranges, there is no significant difference between black populations and other minority groups. As the poverty percentile range increases, the black proportion population begins to separate itself with an increasing correlation.
### Running total of the correlation and the average violent crime rate plotted against percentile
```{r}
run_plt
```
***
By taking the running total, you can see the strong positive correlation violent crimes has with poverty as well as the growing correlation of black population proportion to the value of 0.49.
But that still leaves the question, even as poverty increases, why does the black population have such a strong correlation with violent crimes? This is because as the proportion of the black population increases, so do the poverty rates. Click the next panel to view this relationship.
### Black population proportion plotted against the poverty rate
```{r}
black_plt
```
***
As the black population proportion increases, so does the poverty rate. This means that the greater the poverty rate in a county, the more likely that there will be an increasing black population proportion.
The opposite is in fact true for white people.
### White population proportion plotted against the poverty rate
```{r}
white_plt
```
***
As poverty rate increases, the proportion of white people decreases. This can also explain the earlier finding that the correlation between violent crimes and the white population being -0.49. The earlier statistic may have suggested that white people are less violent than all other minority groups, but it is possible that the true causation is that white proportion is negatively correlated with poverty rates causing the negative correlation with violent crimes.
### White and black population proportions plotted against the poverty rate
```{r}
blackwhite_plt
```
***
And here, you can see the two plotted together to see the stark contrast.
### All race population proportions plotted against the poverty rate
```{r}
allrange_plt
```
***
And in case you were curious, here are all the races population proportion against poverty level. White population proportion is the only one that decreases with increasing poverty and the black population proportion is the only one that increases. The remaining minority groups remain relatively constant.
### Closing statements
```{r}
datatable(tableview)
```
***
**So what does this all mean?**
Yes, as black proportion goes up, so does the rate of violent crimes. But violent crimes will go up at higher poverty rates regardless of race. And there is the unfortunate fact that as black proportion goes up, so does the rate of poverty explaining the strong correlation that black population proportion has with the violent crime rate.
To summarize everything, violence has more to do with poverty then race.
So if you want to disparage an already disenfranchised minority group, level the playing field and address poverty first.
As with any other analysis, I understand what I provided may have a lot of limitations.
**Limitations**
The biggest limitation of this study was that I compared proportions of populations with crime and poverty. To get a most accurate picture, it would have been best to look at the demographic committing each violent crime with its associated poverty rate. This data was not consistently available as not all counties/states released that data.
Another limitation is that I looked at the data at the county level. Due to the very distinct segregation, this can hinder some of the results. For example, Cook County encapsulates all of Chicago. But within Chicago, it only takes a mile to see both ends of the living condition spectrum. This means that the proportion of poverty within Cook County does not accurately represent it.
Lastly, correlation does not mean causation and statistics alone cannot prove causation. Poverty may be the cause, it may not be, but it is definitely correlated and both sides of the opposition must be willing to look at everything objectively.
**Data Sources**
- Violent Crimes Data: County Health Rankings
- Demographic Data: US Census Bureau
- Small Area Income and Poverty Estimates: US Census Bureau