I want to investigate gun-involved deaths in the United States between 2012 and 2014.
Are men more likely to die from gun violence than women? Are minorities more likely to die from gun violence than others?
Use the tools in R such as str() and summary() to describe the original dataset you imported.
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(readr)
library(ggplot2)
library(dplyr)
GunDeaths12_14 <- read_csv("C:/Users/Samuel.Bradford/Desktop/GunDeaths12-14.zip")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_integer(),
## year = col_integer(),
## month = col_character(),
## intent = col_character(),
## police = col_integer(),
## sex = col_character(),
## age = col_integer(),
## race = col_character(),
## hispanic = col_integer(),
## place = col_character(),
## education = col_integer()
## )
summary(GunDeaths12_14)
## X1 year month intent
## Min. : 1 Min. :2012 Length:100798 Length:100798
## 1st Qu.: 25200 1st Qu.:2012 Class :character Class :character
## Median : 50400 Median :2013 Mode :character Mode :character
## Mean : 50400 Mean :2013
## 3rd Qu.: 75599 3rd Qu.:2014
## Max. :100798 Max. :2014
##
## police sex age race
## Min. :0.00000 Length:100798 Min. : 0.00 Length:100798
## 1st Qu.:0.00000 Class :character 1st Qu.: 27.00 Class :character
## Median :0.00000 Mode :character Median : 42.00 Mode :character
## Mean :0.01391 Mean : 43.86
## 3rd Qu.:0.00000 3rd Qu.: 58.00
## Max. :1.00000 Max. :107.00
## NA's :18
## hispanic place education
## Min. :100.0 Length:100798 Min. :1.000
## 1st Qu.:100.0 Class :character 1st Qu.:2.000
## Median :100.0 Mode :character Median :2.000
## Mean :114.2 Mean :2.296
## 3rd Qu.:100.0 3rd Qu.:3.000
## Max. :998.0 Max. :5.000
## NA's :53
str(GunDeaths12_14)
## Classes 'tbl_df', 'tbl' and 'data.frame': 100798 obs. of 11 variables:
## $ X1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## $ month : chr "01" "01" "01" "02" ...
## $ intent : chr "Suicide" "Suicide" "Suicide" "Suicide" ...
## $ police : int 0 0 0 0 0 0 0 0 0 0 ...
## $ sex : chr "M" "F" "M" "M" ...
## $ age : int 34 21 60 64 31 17 48 41 50 NA ...
## $ race : chr "Asian/Pacific Islander" "White" "White" "White" ...
## $ hispanic : int 100 100 100 100 100 100 100 100 100 998 ...
## $ place : chr "Home" "Street" "Other specified" "Home" ...
## $ education: int 4 3 4 4 2 1 2 2 3 5 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 11
## .. ..$ X1 : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ year : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ month : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ intent : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ police : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ sex : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ age : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ race : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ hispanic : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ place : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ education: list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
Describe the steps you took to get from your original dataset to the final dataset you used for your analysis. Include the R code in chunks.
GunDeaths12_14 %>%
select(year, intent, sex, age, race, education) %>%
filter(!is.na(age)) %>%
filter(!is.na(intent)) %>%
filter(!is.na(education)) -> GunDeaths12_14
summary(GunDeaths12_14)
## year intent sex age
## Min. :2012 Length:100726 Length:100726 Min. : 0.00
## 1st Qu.:2012 Class :character Class :character 1st Qu.: 27.00
## Median :2013 Mode :character Mode :character Median : 42.00
## Mean :2013 Mean : 43.87
## 3rd Qu.:2014 3rd Qu.: 58.00
## Max. :2014 Max. :107.00
## race education
## Length:100726 Min. :1.000
## Class :character 1st Qu.:2.000
## Mode :character Median :2.000
## Mean :2.296
## 3rd Qu.:3.000
## Max. :5.000
Show how you approached the questions you posed at the beginning. Describe how much you were able to accomplish. There should be both graphical and numerical results produced by R code included in chunks. Explain what you did and what it means.
g1 = ggplot(data = GunDeaths12_14, aes(x=sex)) +
geom_bar(aes(fill = intent)) +
ggtitle("Gun deaths by gender") +
theme(axis.text.x = element_text(size = 6, color="#993333",
angle=45))
g1
table(GunDeaths12_14$sex, GunDeaths12_14$intent)
##
## Accidental Homicide Suicide Undetermined
## F 215 5356 8687 169
## M 1410 29777 54475 637
This shows that men are almost 10 times more likely to die from gun violence than women. In addition, out of 100,726 observations, the suicide rate for men is 63.12%, while the suicide rate for women is 60.21%. Surprisingly, 37.12% of women who die from gun violence have it happen in a homicide situation, while only 34.5% of men die that way.
GunDeaths12_14$race[GunDeaths12_14$race == "Native American/Native Alaskan"] <- "Native American"
GunDeaths12_14$race[GunDeaths12_14$race == "Black"] <- "African American"
GunDeaths12_14$race[GunDeaths12_14$race == "White"] <- "Caucasian"
g2 = ggplot(data = GunDeaths12_14, aes(x=race)) +
geom_bar(aes(fill = intent)) +
ggtitle("Gun deaths by race") +
theme(axis.text.x = element_text(size = 6, color="#993333",
angle=0))
table(GunDeaths12_14$race, GunDeaths12_14$intent)
##
## Accidental Homicide Suicide Undetermined
## African American 321 19498 3331 125
## Asian/Pacific Islander 12 557 745 10
## Caucasian 1126 9125 55363 585
## Hispanic 145 5628 3169 72
## Native American 21 325 554 14
g2
This shows that Caucasian individuals are more likely to die from gun violence than all other minorities combined. Between 2012 and 2014, out of 100,726 incidents of gun violence, 65.72% of those incidents involved Caucasian males or females, while 34.28% of the incidents involved minority males and females. Something else to make note of is the large portion of homicides among gun deaths of African Americans and Hispanics. Of the 23,275 accounts of gun deaths among blacks, 83.77% of them are a result of a homicide, while 62.44% of Hispanic gun deaths are a result of a homicide.
GunDeaths12_14$education[GunDeaths12_14$education == 1] <- "Less than HS"
GunDeaths12_14$education[GunDeaths12_14$education == 2] <- "Graduated HS"
GunDeaths12_14$education[GunDeaths12_14$education == 3] <- "Some college"
GunDeaths12_14$education[GunDeaths12_14$education == 4] <- "College graduate"
GunDeaths12_14$education[GunDeaths12_14$education == 5] <- "Not Available"
g3 = ggplot(data = GunDeaths12_14, aes(x=education)) +
geom_bar(aes(fill = intent)) +
ggtitle("Gun deaths by education level") +
theme(axis.text.x = element_text(size = 6, color="#993333",
angle=0))
table(GunDeaths12_14$education, GunDeaths12_14$intent)
##
## Accidental Homicide Suicide Undetermined
## College graduate 146 1559 11147 93
## Graduated HS 633 15649 26321 324
## Less than HS 492 11838 9291 200
## Not Available 27 447 871 9
## Some college 327 5640 15532 180
g3
Based on the data, high school graduates are almost twice as likely to die from gun violence than any other group. High school graduates represent 43.2% of all gun involved deaths. Surprisingly, individuals with less than a high school education and individuals with some college experience have almost identical numbers as far as gun involved deaths.