kable(ps[1:5,])
| id | name | date | manner_of_death | armed | age | gender | race | city | state | signs_of_mental_illness | threat_level | flee | body_camera |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Tim Elliot | 2015-01-02 | shot | gun | 53 | M | A | Shelton | WA | True | attack | Not fleeing | False |
| 4 | Lewis Lee Lembke | 2015-01-02 | shot | gun | 47 | M | W | Aloha | OR | False | attack | Not fleeing | False |
| 5 | John Paul Quintero | 2015-01-03 | shot and Tasered | unarmed | 23 | M | H | Wichita | KS | False | other | Not fleeing | False |
| 8 | Matthew Hoffman | 2015-01-04 | shot | toy weapon | 32 | M | W | San Francisco | CA | True | attack | Not fleeing | False |
| 9 | Michael Rodriguez | 2015-01-04 | shot | nail gun | 39 | M | H | Evans | CO | False | attack | Not fleeing | False |
#Time series plot of Deaths on account of police shooting.
#Deaths due to police shooting by Year and Month
## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
absolute values and not weighted values and that matters a LOTTTTTTT.## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
Kindly note that this dataset merely provides the what where how's and the when's of the police shootings, but it provides no context about the usually complex why's surrounding the deaths involving the police shooting. So any conclusions drawn merely based on this data set and analysis are at your own discertion (and risk). In my opinion, this data set and the correspoding EDA are only informative at best.
Link to part 2: https://www.kaggle.com/stansilas/some-naive-forecasting-using-prophet
VERY IMPORTANT NOTE :
* At first glance it might appear that the number of whites being shot dead by police are nearly double that of the blacks.
* However an important to note to make is that these numbers are absolute values and not weighted values and that matters a LOTTTTTTT.
* Population of Whites in USA in 2017 is 195,645,900.
* Compare this with the population of Blacks which is 39,257,300. I.e Blacks are roughly 1/5th the population of white.
* So upon weighting the police deaths by populations, it will become clear that more blacks are being shot dead by police that whites.
* Let us look at the distribution of the deaths by race in this dataset :
* A B H N O W
33 542 367 28 28 1041
* So for blacks, (542)/ (392,57,300) = 0.00001380634 is the death rate due to police.
* But for whites it is (1,041/195,645,900) = 0.00000532083 is the death rate due to police shooting.
* To put this is perspective, the death rate of blacks is 0.00001380634 , which is = 1.38 blacks for every 100,000 blacks get killed in police shooting.
* But the death rate of whites is 0.00000532083 which translates to 0.5 whites for every 100,000 whites get killed in police shooting.
* For a simpler take on this, though blacks are 1/5th the population of whites,
* blacks get killed thrice (1.38) as often as whites do (0.5). i.e ( 0.53 = 1.5 ~~ 1.38)
Population Numbers reference : https://goo.gl/9SMIb2
* Assumption : Please note that for sake of simplicity, I’m taking population of usa in 2017. For more accurate values, one should use
* race population numbers on the date of the police shooting death in the city of the shooting.
Ambiguity : The race in the dataset(provided by Washington Post) is merely represented by a single letter -A B H N O W. Which is very unfortunate.
So I don’t know whether A is Asian or American Indian or Alaska Native.
Similarly , I don’t know if H is Hawaii/Pacific Islander or Hispanic.
* Is N Native American or Native Alaskan or American Indian ?
* So I’m assumping :
* W <- “White”,
* B <- “Black” ,
* H <- “Hispanic/Latino”,
* O <- “Other” ,
* A <- “Asian”,
* N <- “Native American”
Note to self: Weight the absolute numbers/counts by the populations of race in each state/city.
Otherwise the charts could be rather misleading to a common man.
LinkedIn: www.linkedin.com/in/vivekmangipudi
Link to part 2: Forecasting deaths: coming shortly
Link to previous studies : www.rpubs.com/stanspwan