Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: BuzzFeed News


Objective

The objective of original visualization is to depict that increase in fatalities in school shooting across USA is creating a demand towards social sentiment - an algorithm which search’s social media for potential threats and warn the authorities.

Target audience for the visualization will be school authorities, parents, students and citizens of USA as their lives are most affected in such an event.As the graph depcts schools investing to protect themselves from these threats.

The visualisation chosen had the following three main issues:

  • Ignoring convention :- Given visualization ignores the convention for comparing two continuous variable, which does not necessary makes it hard to interpret but makes it hard to compare against other variable and plotting all datapoint points on a single axis makes the graph congested.
  • No scaling for killed variable :- Given visualization does not provide any proper scale for comparing spending per month on social sentiment systems and fatalities in school shooting, instead the graph is using size to compensate for scale which is not that effective.
  • Inadequate visual comparison of variables :- Given visualization does not use most adequate visual variable for comparing the relationship between spending per month on social sentiment systems and fatalities in school shooting.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readr)
library(tidyverse)
library(ggplot2)
library(lubridate)

rss=read_csv("recent_school_shootings.csv")
sp=read_csv("school_purchases.csv")

school_purchases=sp
recent_school_shootings=rss



# total spending per month
school_purchases_month <- school_purchases %>%
  group_by(year, month) %>%
  summarize(total = sum(price, na.rm = TRUE)) %>%
  mutate(date = as.Date(paste0(year,"-",month,"-01")))

tot_date=seq(as.Date('2014-02-01'),as.Date('2019-09-01'),by = "1 month")
spm_date=as.Date(setdiff(tot_date,school_purchases_month$date), origin="1970-01-01")
zero_val=rep(0,each=length(spm_date))

school_pm=data.frame(date=c(school_purchases_month$date,spm_date),Spending_Per_Month=c(school_purchases_month$total,zero_val))
school_pm=school_pm%>%mutate(month=month(school_pm$date))
school_pm=school_pm%>%mutate(year=year(school_pm$date))


rss=subset(rss, killed > 0)
rss=rss%>%select(date,year,killed)

rss=rss%>%mutate(month=month(rss$date))

rss_month <- rss %>%
  group_by(year, month) %>% summarize(killed_tot = sum(killed, na.rm = TRUE)) %>%
  mutate(date = as.Date(paste0(year,"-",month,"-01")))

tot_date=seq(as.Date('2014-02-01'),as.Date('2019-09-01'),by = "1 month")
x=as.Date(setdiff(tot_date,rss_month$date), origin="1970-01-01")
y=rep(0,each=length(x))

r=data.frame(date=c(rss_month$date,x),killed=c(rss_month$killed_tot,y))

r=r%>%mutate(month=month(r$date))
r=r%>%mutate(year=year(r$date))

xx=merge(x=r, y=school_pm, by = 'date', all.x = TRUE)
xx=xx%>%select(date, killed, Spending_Per_Month)

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

This graph clearly shows that after a major event (first line shows Parkland incident) there is a major increase in spending on social sentiment.