Dataset Citation: Trump, Kris-Stella; Williamson, Vanessa; Einstein, Katherine Levine, 2018, “Vol 16(2): Replication Data for: Black Lives Matter: Evidence that Police- Caused Deaths Predict Protest Activity”, https://doi.org/10.7910/DVN/L2GSK6, Harvard Dataverse, V1, UNF:6:RmmW7To7Mtq99VYCC+GXag== [fileUNF]
As part of my first exploration, I have elected to look into a data that has a special importance for me. As a black African woman in America, I have become more and more curious to see facts beyond the surface. I am a strong believer that data tell stories, and often with undeniable evidence. So this was a great chance for me to explore some of the questions I have in regards to deaths caused by police, poverty rates /prospects of living (in particular in the black community) and BLM protests. In addition, it is just fair that I learn about explore this topic as much as I can considering that I hope to stay in this country for a longer period of time, and considering that I share a huge part of my heritage with the people with the Black community in the states.
So proceeding with investigation, I will be exploring a Black Lives Matter data set that contains data on BLM protest frequency by locale in the United States, poverty rates, numbers of deaths caused by the police and several others. The dataset was collected from different websites by a group of data analytics (Kris-Stella Trump, Vanessa Williamson, and Katherine Einstein). The dataset comprised data for the year of 2014, and was collected with the intent of finding evidence that police-caused deaths predict protest activity. The data set contains a total of 51 variables; It is also important to mention that 2014 was an important year for BLM as a movement, due to the uprising police brutality cases of young black men who were murdered by the police.So, for this study, I will only be using the following variables:
Geography.x, city and state of incidentArea.Name, city and state of incidentPovertyRate, poverty rate of a particular areatot.protests, the total number of protests that took place in a particular areadeaths,number of deaths caused by the policedem, Democratic party- political affiliationrep, Republican party- political affiliationdeaths_black, the total number of black deaths caused by the police in the black communityBlackPovertyRate, the poverty rate in the black community of a particular city or stateI will use some those variables to ask the following potential questions:
What is the distribution of unarmed deaths in different states?
How can we describe poverty in community in comparison with poverty in other racial communities?
Can poverty rate helps us predict the probability of having protests?
Is there an correlation between poverty and police deaths?
blm_dt <- blm_df %>%
select(Geography.x, PovertyRate, Area.Name, tot.protests, tot.attend, deaths_unarmed, deaths, deathduring, dem, rep, BlackPovertyRate, deaths_black)
blm_dt <- blm_dt %>% separate(col = Geography.x, into = c("City", "State"),
sep = ", ") %>% select(-City) %>%
rename(City= Area.Name) %>%
filter(!is.na(City))1. What is the distribution of deaths in different states?
There is great relevance in deaths caused by police, especially in times like this when police brutality has finally been debunked by the broad society, and targeted as a major concern to be addressed. Hence, it is interesting to look at the numbers behind these facts. As you look at map 1 and the tables below, analyze the deaths caused by the police in different parts of the US.
blm_dt <- blm_dt %>% pivot_longer(cols= c(dem, rep), names_to= "politics", values_to= "Count") %>%
mutate(Politics = fct_recode(politics, Democrats= "dem",
Republicans = "rep")) %>%
select(Politics, everything())blm_plot <- blm_dt %>%
filter(State!= "District of Columbia") %>%
mutate(state = fct_recode(State, CA = "California",
ID = "Idaho", FL= "Florida", CT ="Connecticut", ME="Maine", MA ="Massachusetts", NH ="New Hampshire", RI ="Rhode Island", VT ="Vermont", NJ= "New Jersey", NY ="New York", PA="Pennsylvania", IL = "Illinois", IN ="Indiana", MI = "Michigan", OH = "Ohio", WY= "Wisconsin", IA= "Iowa", KS= "Kansas", MN = "Minnesota", MO ="Missouri", NE="Nebraska", ND= "North Dakota", SD ="South Dakota", DE ="Delaware", GA ="Georgia", MD ="Maryland", NC= "North Carolina", SC="South Carolina", VA= "Virginia", WV="West Virginia", AL= "Alabama", KY= "Kentucky", MS= "Mississippi", TN= "Tennessee", AR= "Arkansas", LA= "Louisiana", OK= "Oklahoma", TX= "Texas", AZ ="Arizona", CO= "Colorado", ID= "Idaho", MT= "Montana",NV= "Nevada", NM= "New Mexico", UT= "Utah", WY= "Wyoming", AK= "Alaska", CA= "California", HI= "Hawaii", OR= "Oregon", WA= "Washington")) %>%
select(State, everything())
blm_map <- blm_plot %>%
mutate(death= if_else(deaths >=0, true = "Deaths", false = "No")) %>%
group_by(state, death) %>%
summarise(values= sum(deaths)) %>%
mutate(values.int = as.integer(values)) %>%
select(state, death, values.int)Deaths caused by the Police In the United States
##
##
## |Region | Police_deaths|
## |:------------|-------------:|
## |South | 828|
## |West | 930|
## |Northeastern | 188|
## |Midwest | 360|
##
##
## |State | Police_deaths|
## |:----------|-------------:|
## |California | 522|
## |Florida | 234|
## |Texas | 210|
## |Arizona | 134|
## |Washington | 80|
As observed on the table 1 below indicated, the west region registered the most deaths(828) caused by the police, which is somewhat intriguing to me because I was so sure it would be the South instead. What astounds me the most is the large number of deaths, which is just too larger than I anticipated.
But as we look at the states, we are able to see that the high concentration of deaths is in California, which explains the large numbers in the South.
2. How does the poverty rate helps us predict the probability of protests in a particular area or state?
It would sound obvious to say that whether you are rich or poor, for morality reasons, you must strive for human rights to be upheld, thus be vocal about the BLM mission. However, I was curious to see if poverty rates would ever affect the number of protests held in a state or city. I would not be surprised if places with a higher poverty rate would have less protests because of few resources or vulnerabity to retaliation by the police or government. In addition, I would also think that people in areas with high poverty rates would also be more focused in surviving, and more worried about daily necessities as oppose to protesting, even though it may be an important cause for the community. So, this part aims to answer whether we can say that poverty rates affect the occurrence and amount of BLM protests.
ggplot(blm_dt, aes(x= PovertyRate, y= tot.protests, colour=Region)) +
geom_point() +
scale_colour_brewer(palette = "Dark2")As the graph 1 shows, the poverty rates do no affect the occurrence and amount of BLM protests. There graph does not seem to show any sort of relationship between the variables in the graph.The points just seem to right, but with no association whatsoever. As a consequence, my predictions were not accurate and right as I expected them to be! But is important to note the outlier on top of the graph. We can see that a city in the Northeastern part of the U.S organized the most BLM protests in the whole country in 2014- it would make sense to be New York City due to the diversity- in particular black people, resources,the liberal presence in the city, and also for being a city- major stages for police brutality against people of color.
3.1. Is there an correlation between poverty and police deaths?
3.2. What is the 5 most dangerous city for a black person to live in, according to the poverty rate and number of police death?
Similarly to the association between poverty and deaths caused by the police, one would think that areas severely affected by poverty(especially in areas of predominantly people of color) would be prone to police brutality. Therefore, I was also curious to see if there is a possible relationship between the general poverty rate and deaths caused by the police.
ggplot(blm_dt, aes(x= PovertyRate, y= deaths, colour=Region)) +
geom_point() +
scale_colour_brewer(palette = "Dark2")Just like on graph 1, the observations in graph 2 showcase no relationship between the variables PovertyRate and deaths, meaning that the level of poverty and deaths caused by the police are not “correlated”. Focusing now on the number of deaths, have a look at the tables below to make more sense of the graph.
4. What is the 5 most dangerous city for a black person to live in, according to the poverty rate and number of police death?
blm_dt %>%
group_by(City) %>%
summarise(Police_deaths= sum(deaths_black)) %>%
arrange(desc(Police_deaths)) %>% slice(1:5) %>%
pander()| City | Police_deaths |
|---|---|
| Chicago | 32 |
| New York | 32 |
| Houston | 24 |
| Baltimore | 22 |
| Los Angeles | 22 |
blm_dt %>%
group_by(City) %>%
summarise(Poverty= mean(BlackPovertyRate)) %>%
arrange(desc(Poverty)) %>%
slice(1:5) %>%
pander()| City | Poverty |
|---|---|
| San Juan Capistrano | 88.1 |
| Socorro | 78 |
| Missoula | 75.8 |
| Bell Gardens | 71.3 |
| Blacksburg | 69.3 |
After looking at the tables, we can see that when it comes to police brutality in cities, we can see that Chicago and New York is the worst. In 2014 alone, the police killed 32 black people. Regardless of the situation in which the killings happen, it is still scary to look at the numbers, especially when we then think that all the cities have a huge black presence in the cities. One may say that these cities, for being big and prone to a lot a crime, would have the police kill to protect and establish order. However, it is hard to look at the numbers and not think of the racial bias, discrimination and oppression that may be behind the numbers.
In addition, I was shocked to see the rate of poverty black people deal with in the cities San Juan Capistrano and Socorro. I am curious to read more about it and understand the reasons behind this high rate.
Overall, one of the biggest limitations I had was the amount of years worth of data for this study. I intended to analyze trends over the years, but because I only had data for one year,(2014), I was very limited in regards to what I intended to do and answer. In addition, I wish I this data set had information about police brutality deaths races other than black. By having information for other races over time, I would be able to compare police brutality occurrence among races. I know black people are more prone to it, but I wanted to be able to see through data as well.
A curious point that I raised as I proceeded with the analysis is that when it came to poverty distribution in the main database, Native American were never mentioned or described in the data. It has me intrigued as well because I have read a lot about the severe poverty that Native Americans live in. They are PEOPLE too, who live in the United States. I wonder if the information was not available, or the data collectors did not find it relevant to collect… because for me, that would be problematic.
In conclusion, after conducting this analysis, I was somewhat able to answer the questions I had:
When it comes to states, California has the highest number of deaths by the police, followed by Florida and Texas. Therefore, the regions with the highest numbers are West and South.
When it comes to cities, New York and Chicago had the highest number of deaths of black people caused by the Police for the year of 2014. There are numerous factors that may lead to these numbers- it can be due to crime or mere racism!
There is no correlation between poverty, number of protests and the number of deaths. I would still be curious to run a regression analysis with more indicators that may help better explain the relationship.
Regarding poverty in the black community, San Juan Capistrano and Socorro have the highest poverty rates.
Part 1
Strategy of execution: I chose to do analysis over a topic that I am not only familiar with, but that I can also relate to. The dataset did not contain all the information I needed, but I was still able to gain substantial understanding of police brutality in the year of 2014, as well as poverty rates,etc. I elected to have a lot of graphs I believe that data always makes more sense when we have visual illustration. I opted for simplicity as well, avoiding to dig too deep on the questions I had. I think the answers I came up with are perhaps more superficial due to the fact that this is not meant to be a long report. However, I hope, in a near future, to look deeper, considering all the possible indicators for regressions, to accurately test association between poverty and police deaths, and several others.
Choice of questions: I explored dataset first to see the variables that I had available. By looking at the numbers, I was able to combine my own curiosities about black issues, with what I was observing in the dataset, to come up with the questions I aimed to answer in this analysis.
Packages: I used the following packages: ggplot, usmap, pander, tidyverse, dplr, tidyr, and a little bit of forcats.
Difficulties: I had a hard time sorting the data out for the US map illustration, but after trying to replicate my mini project 3, I was able to find the solution for my problem.