The goal for this part of the project is to tidy and transform the data from Matthew Farris’ post on mass shootings so that analysis can be performed, and questions can be answered either textually or using visualizations. I will look at the 2015 data up to Oct 1st of this year.
Data Sources:
[http://shootingtracker.com/wiki/Main_Page] This contains a link to the csv file with shooting data, including date, number of killed and wounded, etc.
[http://www.washingtonpost.com/news/wonkblog/wp/2015/10/01/2015-274-days-294-mass-shootings-hundreds-dead/] This is a great infographic on a calendar, so you can easily see what day of the week the shootings took place, as well as color coding for levels of the mass shootings.
## load packages
library(tidyr)
library(magrittr)
library(XML)
library(rvest)
library(stringi)
library(stringr)
library(dplyr)
library(ggplot2)
### Bring in csv file as dataframe mass_shoot_2015_df
mass_shoot_2015_df <- read.csv("https://raw.githubusercontent.com/karenweigandt/IS607/master/2015CURRENTmassshootingdata.csv", sep=",", stringsAsFactors = FALSE)
This file has 5 columns for associated articles, which are not of interest in the framework of what is to be accomplished here, so these will be removed from the data frame in the next step, data cleanup.
Here I separate the date into month, day and year. This allows me to create a dataframe where I can calculate the total deaths and injuries by month.
mass_shoot_2015_df <- mass_shoot_2015_df[, 1:5] #keep all rows, first 5 columns of data
## create a data frame with monthly totals
monthly_tolls_df <-
mass_shoot_2015_df %>%
separate(Date, c("month","day","year")) %>%
separate(Location, c("city","state"), sep = ",") %>%
group_by(month) %>%
summarise(death_toll = sum(Dead), injury_toll = sum(Injured))
monthly_tolls_df$month <- as.integer(monthly_tolls_df$month) ## change data type for easy ordering
monthly_tolls_df <- arrange(monthly_tolls_df, month) ## put the months in numerical order
monthly_tolls_df
## Source: local data frame [10 x 3]
##
## month death_toll injury_toll
## (int) (int) (int)
## 1 1 40 79
## 2 2 44 47
## 3 3 31 113
## 4 4 17 68
## 5 5 46 154
## 6 6 46 158
## 7 7 49 169
## 8 8 48 160
## 9 9 44 134
## 10 10 14 12
Prepare data for plotting
## create df where death and injury tolls are in a single column, so both can be plotted as y
toll_for_plot <- monthly_tolls_df
colnames(toll_for_plot) <- c("month", "death", "injury")
toll_for_plot <-
toll_for_plot %>%
gather("outcome", "toll", 2:3)
ggplot(data=toll_for_plot, aes(x=factor(month), y=toll, group=outcome, color=outcome, shape=outcome)) +
xlab("Month as Numeric") + ylab("Toll Count") + ## Set axis labels
ggtitle("2015 Death and Injury by Month for Mass Shootings through Oct 1st") + ## Set title
geom_line(size=1.5) +
geom_point(size=3, fill="white") +
scale_shape_manual(values=c(22,21))
This data set provides many opportunities to tidy and reshape data for exploration. We could also look at the data by city, or compare year over year data. Many thanks to Matthew for a great data set!