The right to bear arms. What are the consequences of such constitutional rights in one of the world’s most powerful countries? Have they been more pronounced in certain states? Do the effects exhibit some form of influence on neighbouring states? Where do they often take place?
This visualization attempts to identify trends that lead to answering the questions above. The data used for this task consists of records pertaining to individual gun violence records that occurred throughout the United States, complete with geolocation data. The data was initially collected from the non-profit corporation, Gun Violence Archive (https://www.gunviolencearchive.org/), and further supplemented with additional information (e.g. geolocation data) by jamesqo (https://github.com/jamesqo/gun-violence-data).
The variables of interest within this dataset are:
| Variable | Description |
|---|---|
| date | Date of ocurrence |
| state | U.S. State in which incident occurred |
| n_killed | No. of people killed |
| n_injured | No. of people injured |
| latitude | Latitude of incident |
| longitude | Longitude of incident |
Adding onto the primary dataset, a population dataset of each state (https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html#par_textimage), a shapefile of U.S. states (https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html) and a U.S. state codes dataset (https://worldpopulationreview.com/states/state-abbreviations) are used as well.
To answer the questions posed in the introduction, the following visualizations are proposed:
However, there are some challenges in creating the visualizations.
Data Challenges:
| Challenge | Solution |
|---|---|
| Incomplete data, with little records in 2013 and only data up till March in 2018. | Data in 2013 and 2018 will be filtered away to show consistency in representing yearly data. |
| Data corresponds to individual records. | Data needs to be aggregated by the state level to show the disparities in gun violence cases amongst them. Groupby functions have to be utilised. |
Design Challenges:
| Challenge | Solution |
|---|---|
| There are too many states to show in U.S., which will affect the display of the map. | Do not include states that not part of the Contiguous U.S., e.g. Northen Mariana Islands, Puerto Rico, Guam, etc. |
| Constructing bar-charts will be an issue even if only the Contiguous U.S. is considered (49 states including Washington DC). | Utilise only the top and bottom states when comparing via bar charts. Having 49 groups will produce cluttered results. |
Sketch of the proposed design:
The following packages are needed for the visualization:
To properly convert animated maps into gifs, additional software needs to be downloaded at https://imagemagick.org/script/download.php#windows.
packages = c('readr', 'data.table', 'ggplot2', 'gganimate', 'knitr', 'lubridate', 'magick', 'sf', 'tmap', 'tidyverse')
for(p in packages){
if(!require(p, character.only=T)){
install.packages(p)
}
library(p, character.only = T)
}
The datasets containing attribute datasets are read. Along with attributes, a csv containing U.S. states and their codes are read as well. The code names provide for shorter labels which makes the visualization less cluttered.
gun_cases <- read_csv('stage3.csv')
population <- read_csv('nst-est2019-01.csv')
state_codes <- read_csv('statecode.csv')
The dataset contains data from 2013 to 2018, but the data contained within the years 2013 and 2018 are incomplete, hence, they are removed and will not be part of the visualization.
In addition, the visualization will only focus on the Contiguous United States, consisting of the 48 adjoining U.S. states and Washington D.C. It will not include Alaska, Hawaii, American Samoa, Northern Mariana Islands, Guam, Puerto Rico and the United States Virgin Islands.
gun_cases <- gun_cases[!(year(gun_cases$date) %in% c(2013,2018)), ]
gun_cases <- gun_cases[!(gun_cases$state %in% c('Alaska', 'American Samoa', 'Commonwealth of the Northern Mariana Islands', 'Guam', 'Puerto Rico', 'Hawaii', 'United States Virgin Islands')), ]
As for the population dataset, the average population is needed for the overview of the interactive visualization. The population figures are averaged over the 4 years from 2014 to 2017 inclusive as the variable ‘avg_pop’.
population$avg_pop <- rowMeans(population[,c("2014", "2015", "2016", "2017")])
To be able to map the data by states, the dataset needs to first be grouped by states. During this process, the total number of deaths and injuries are summed up. The avg_pop variable previously created is joined and matched to the states as well.
groupby_state_all <- gun_cases %>% group_by(state) %>%
summarise(killed = sum(n_killed), injured = sum(n_injured))
groupby_state_all <- left_join(groupby_state_all, population[, c("Geographic Area", "avg_pop")], by = c("state" = "Geographic Area"))
Purely representing the death and injury rates in percentage values may not be as relatable, especially when the numbers are small. To address this issue, the death and injury rates are represented as the number of cases per 100,000 people.
groupby_state_all$kill_rate <- groupby_state_all$killed/groupby_state_all$avg_pop * 100000
groupby_state_all$injury_rate <- groupby_state_all$injured/groupby_state_all$avg_pop * 100000
To be able to map the states in R, the shapefile needs to loaded via the st_read() function.
After reading the shapefile of U.S. states, the data is joined together with the main dataset to form all_mapped, as well as with the state codes dataset to make labelling easier through the short-forms of the state names.
statesf <- st_read(dsn = "cb_2018_us_state_500k",
layer = "cb_2018_us_state_500k")
## Reading layer `cb_2018_us_state_500k' from data source `C:\Users\Scorp\Desktop\MITB\Visual Analytics\Assignment\Assignment 5\To Submit\cb_2018_us_state_500k' using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
## geographic CRS: NAD83
all_mapped <- statesf %>%
right_join(groupby_state_all, by = c("NAME" = "state")) %>%
left_join(state_codes[,c(1,3)], by = c("NAME" = "State"))
Here, the map visualization of the overall death rates using the summed values of deaths and injuries from 2014 to 2017 is coded. tm_fill() colours the polygons representing U.S. states, with varying hues corresponding to deaths per 100,000 people from 2014 to 2017. tm_bubbles() creates bubbles in each state, with the size corresponding to the injuries per 100,000 people fom 2014 to 2017.
all_map <- tm_shape(all_mapped) +
tm_fill("kill_rate",
palette = "Blues",
title = "Deaths per 100K People",
id = "NAME",
popup.vars = c("Sum of Deaths (2014 to 2017) per 100,000 People: " = "kill_rate", "Total Deaths (2014 to 2017): " = "killed")) +
tm_bubbles(size = "injury_rate",
col = "red",
alpha = 0.5,
id = "NAME",
popup.vars = c("Sum of Injuries (2014 to 2017) per 100,000 People: " = "injury_rate", "Total Injuries (2014 to 2017): " = "injured")) +
tm_layout(legend.show = TRUE) +
tm_borders(alpha = 0.5) +
tm_text("Code", size = 0.8, ymod = 1)
With the overall map visualization completed, the next step is to break down the visualization down into yearly data. Before that can be done, the population dataset has be transformed and elongated into the pop.long variable. A look at a few rows of the dataframe will bring about better clarity.
pop.long <- population %>%
pivot_longer(-"Geographic Area", names_to = "year", values_to = "population")
pop.long$year <- as.numeric(pop.long$year)
top_n(pop.long, -10)
## # A tibble: 10 x 3
## `Geographic Area` year population
## <chr> <dbl> <dbl>
## 1 Vermont 2016 623657
## 2 Vermont 2017 624344
## 3 Vermont 2018 624358
## 4 Wyoming 2013 582122
## 5 Wyoming 2014 582531
## 6 Wyoming 2015 585613
## 7 Wyoming 2016 584215
## 8 Wyoming 2017 578931
## 9 Wyoming 2018 577601
## 10 Wyoming NA 582822.
Following which, the gun_cases dataset is grouped by firstly the states, then by years, with the sum of ‘n_killed’ and ‘n_injured’ being output as values. The result is joined to pop.long. To reduce coding conflicts, the variable name ‘year(date)’ is changed to ‘date_year’.
groupby_state_year <- gun_cases %>% group_by(state, year(date)) %>%
summarise(killed = sum(n_killed), injured = sum(n_injured)) %>%
left_join(pop.long, by = c("state" = "Geographic Area", "year(date)" = "year")) %>%
setnames('year(date)', 'date_year')
Similarly to the overall visualization, the death and injury rates are represented as number of cases per 100,000 people.
And likewise, the data is joined with the shapefile and state codes to form the year_mapped dataset.
groupby_state_year$kill_rate <- groupby_state_year$killed/groupby_state_year$population * 100000
groupby_state_year$injury_rate <- groupby_state_year$injured/groupby_state_year$population * 100000
year_mapped <- statesf %>%
right_join(groupby_state_year, by = c("NAME" = "state")) %>%
left_join(state_codes[, c(1,3)], by = c("NAME" = "State"))
The visualization created through year_map is the same as the overall visualization, but that it has been broken down by the years, made possible by the tm_facets() function.
year_map <- tm_shape(year_mapped) +
tm_fill("kill_rate",
palette = "Blues",
title = "Deaths per 100,000 People",
id = "NAME",
popup.vars = c("Deaths per 100,000 People: " = "kill_rate", "Total Deaths: " = "killed")) +
tm_borders() +
tm_bubbles(size = "injury_rate",
col = "red",
alpha = 0.4,
border.lwd = NA,
id = "NAME",
popup.vars = c("Injuries per 100,000 People: " = "injury_rate", "Total Injuries: " = "injured"),
title.size = "Injuries per 100,000 People") +
tm_facets(by = "date_year",
free.coords=TRUE) +
tm_layout(legend.show = TRUE,
legend.outside = FALSE,
legend.position = c("left", "bottom"),
legend.text.size = 1,
legend.title.size = 1,
legend.stack = "horizontal",
title.size = 20)
Before the coordinates can be fed into the tmap package, it has to be converted to a shapefile. To accomplish the task of mapping the coordinates by months, the year and month value has to be extracted from the gun_cases ‘date’ variable to form a new variable ‘year_month’.
gun_cases$year_month <- format(as.Date(gun_cases$date), "%Y-%m")
coord_pts <- st_as_sf(gun_cases, coords = c("longitude", "latitude"), na.fail = FALSE)
To accomplish the task of creating an animated map using the tmap package, facets have to be created with the parameter along set to ‘year_month’. The map variable is then input into the tmap_animation() function to create a .gif file.
Note that the rmarkdown chunk options is set to eval = FALSE, as the process takes some time to run. For all subsequent chunks that involve creating .gif files, eval will be set to FALSE as well.
anim_map <-
tm_shape(year_mapped) +
tm_polygons('#DDEAF6',
border.alpha = 1,
lwd = 1.5) +
tm_text("Code", size = 0.5) +
tm_shape(coord_pts) +
tm_bubbles(size = "n_killed",
col = "red",
alpha = 0.5,
border.lwd = NA,
title.size = "No. of Deaths") +
tm_facets(along = "year_month",
free.coords=FALSE) +
tm_layout(legend.show = TRUE,
title.position = c("center", "center"),
title.size = 12)
tmap_animation(anim_map, filename = "monthly_map.gif", delay = 30)
The last step in creating the visualization is to create the animated horizontal bar charts.
To make ordering possible for the animated charts, a new variable ‘rank’, has to be created, where the states are ranked in order of gun violence death rates for each year.
anim_data <- groupby_state_year %>%
group_by(date_year) %>%
# The * 1 makes it possible to have non-integer ranks while sliding
mutate(rank = min_rank(kill_rate) * 1) %>%
ungroup()
The rmarkdown chunk below creates the animated horizontal bar chart for the top 10 states by gun violence death rates (killed per 100,000 people). The animation is exported to a .gif file named “anim_bar_top_death_rate.gif”.
ggplot(anim_data, aes(rank, group = state,
fill = as.factor(state),
color = as.factor(state),
height=.5)) +
geom_tile(aes(y = kill_rate/2, height = kill_rate, width = 0.9),
alpha = 0.8,
color = NA) +
geom_text(aes(y = 0, label = paste(state, " ")),
vjust = 0.2,
hjust = 1,
size = 5) +
geom_text(aes(y = kill_rate, label = as.character(round(kill_rate, 2))),
hjust = -0.3,
size = 5) +
scale_y_continuous(limits = c(0,25)) +
coord_flip(clip = "off", expand = FALSE) +
scale_x_continuous(labels = scales::comma, limits = c(39.5,49.5)) +
guides(color = FALSE, fill = FALSE) +
labs(title = 'Top 10 States in Gun Violence Death Rates', subtitle = "{closest_state}", x = "", y = "Deaths per 100,000 People") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5, size = 18),
axis.ticks.y = element_blank(), # These relate to the axes post-flip
axis.text.y = element_blank(), # These relate to the axes post-flip
plot.margin = margin(1,1,1,4, "cm"),
legend.position = "none") +
# gganimate specific bits:
transition_states(
date_year,
transition_length = 5,
state_length = 5
) +
ease_aes('sine-in-out')
anim_save("anim_bar_top_death_rate.gif")
The rmarkdown chunk below creates the animated horizontal bar chart for the bottom 10 states by gun violence death rates (killed per 100,000 people). The animation is exported to a .gif file named “anim_bar_bottom_death_rate.gif”.
ggplot(anim_data, aes(rank, group = state,
fill = as.factor(state),
color = as.factor(state),
height=.5)) +
geom_tile(aes(y = kill_rate/2, height = kill_rate, width = 0.9),
alpha = 0.8,
color = NA) +
geom_text(aes(y = 0, label = paste(state, " ")),
vjust = 0.2,
hjust = 1,
size = 5) +
geom_text(aes(y = kill_rate, label = as.character(round(kill_rate, 2))),
hjust = -0.3,
size = 5) +
scale_y_continuous(limits = c(0,5)) +
coord_flip(clip = "off", expand = FALSE) +
scale_x_continuous(labels = scales::comma, limits = c(0.5,10.5)) +
guides(color = FALSE, fill = FALSE) +
labs(title = 'Bottom 10 States in Gun Violence Death Rates', subtitle = "{closest_state}", x = "", y = "Deaths per 100,000 People") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5, size = 18),
axis.ticks.y = element_blank(), # These relate to the axes post-flip
axis.text.y = element_blank(), # These relate to the axes post-flip
plot.margin = margin(1,1,1,4, "cm"),
legend.position = "none") +
# gganimate specific bits:
transition_states(
date_year,
transition_length = 5,
state_length = 5
) +
ease_aes('sine-in-out')
anim_save("anim_bar_bottom_death_rate.gif")
The visualizations created are then pieced together in the next section to create the final visualization.
The first visualization analyses overall death and injury rates from 2014 to 2017 across the states, represented by the shades and bubbles respectively. All rates in the visualizations correspond to the number of cases per 100,000 people. Clicking the polygons reveals more information about the death rates and total deaths, likewise when clicking on the bubbles for information on injuries.
The second visualization breaks down the overview into yearly facets. This section attempts to show if states that exhibited higher gun violence activity have spread their influence to neighbouring states.
The third visualization, the animated bar charts, show how the top and bottom 10 states in death rates have changed throughout the years, and also the general trends that the rates have moved in.
The last visualization, the animated coordinates, shows the actual locations of gun violence cases that resulted in death. It seeks to provide identification of patterns or hotspots of deaths involving firearms.
Useful Information:
The increase of gun violence death rates in states seem to occur if there are neighbouring states with higher rates. E.g. Illinois and Kentucky increased in death rates in 2015 when its neighbouring states of Indian, Missouri and Tennessee had relatively higher rates in 2014. There also appears to be a cluster of states with higher rates stemming from Louisiana. This hints at the possibility of influence from neighbouring states in the increase in gun violence rates.
From the animated bar charts, it appears that the general trend of gun violence rates is that it has been increasing for the top 10 states. The lowest rate of the top 10 states have increased from 5.32 deaths per 100,000 people to 7.2 from 2014 to 2017. The rates of the bottom 10 states on the other hand, do not exhibit obvious trends.
The animated coordinates visualization shows that there are interesting patterns in these gun violence death cases. Notable hotspots are along California’s Interstate 5, Chicago (Illinois), Miami (Florida), just to name a few. It also appears that gun violence resulting in deaths tend to occur on highways and in populated cities.