On Twitter, somebody claimed that
guess what, gun ownership correlates with gun deaths in the US https://t.co/PmOMEOnMy1 pic.twitter.com/LYFtpHGeNy
— Arthur Charpentier (@freakonometrics) June 19, 2015
But looking at the data, I wasn’t convinced:
@freakonometrics @neilhall_uk Uhm, no. Clearly two clusters, and next to no correlation within each cluster.
— Konrad Rudolph (@klmr) June 19, 2015
I decided to actually look at the data and verify that the “clusters” with Gun ownership < 20% and > 20%, respectively, were indeed internally uncorrelated.
As a first step, I retrieve the data and put it into a usable format. The first table answers the question,
Are any firearms now kept in or around your home? Include those kept in a garage, outdoor storage area, car, truck, or other motor vehicle?
gun_ownership_url = 'http://www.washingtonpost.com/wp-srv/health/interactives/guns/ownership.html'
gun_ownership = readHTMLTable(gun_ownership_url, header = TRUE, which = 1)
gun_ownership = gun_ownership[-1, ]
parse_num = function (x) as.numeric(sub(',', '', x))
gun_ownership = select(gun_ownership, State = 1, Total = 2, Yes = 3,
`Yes %` = 4, No = 5, `No %` = 6) %>%
mutate_each(funs(parse_num), -State)
head(gun_ownership)
## State Total Yes Yes % No No %
## 1 All Participants 201881 67786 31.7 134095 68.3
## 2 Alabama 2623 1294 51.7 1329 48.3
## 3 Alaska 2716 1627 57.8 1089 42.2
## 4 Arizona 3066 989 31.1 2077 68.9
## 5 Arkansas 2780 1431 55.3 1349 44.7
## 6 California* 3897 846 21.3 3051 78.7
The “*” annotation next to a state name indicates that the state possesses child access prevention laws, according to the website. Let’s annotate this properly.
gun_ownership = gun_ownership %>%
mutate(`Child access prevention` = grepl('\\*$', State),
State = sub('\\*$', '', State))
DC is called “The District”. Let’s fix that.
gun_ownership[gun_ownership$State == 'The District', 'State'] = 'District of Columbia'
Next, here’s the firearms deaths data.
# Website actively prevents scraping, but allows downloading data.
#gun_deaths_url = 'http://kff.org/other/state-indicator/firearms-death-rate-per-100000/'
#gun_deaths = readHTMLTable(gun_deaths_url)
# Use manually downloaded CSV output.
gun_deaths = read.csv('raw_data.csv', skip = 3) %>%
select(State = 1, `Deaths per 100000` = 2)
head(gun_deaths)
## State Deaths per 100000
## 1 United States 10.4
## 2 Alabama 17.6
## 3 Alaska 19.8
## 4 Arizona 14.1
## 5 Arkansas 16.8
## 6 California 7.7
Now merge the data.
data = inner_join(gun_ownership, gun_deaths, by = 'State') %>%
select(State,
`Gun ownership (%)` = `Yes %`,
`Child access prevention`,
`Deaths per 100000`)
head(data)
## State Gun ownership (%) Child access prevention Deaths per 100000
## 1 Alabama 51.7 FALSE 17.6
## 2 Alaska 57.8 FALSE 19.8
## 3 Arizona 31.1 FALSE 14.1
## 4 Arkansas 55.3 FALSE 16.8
## 5 California 21.3 TRUE 7.7
## 6 Colorado 34.7 FALSE 11.5
Plot the data to verify that it’s roughly the same as that published.
plot = ggplot(data, aes(x = `Gun ownership (%)`, y = `Deaths per 100000`)) +
geom_point()
plot
The general shape is certainly correct. A few states however have very different positions. Let’s add state names.
state_names_url = 'http://www.infoplease.com/ipa/A0110468.html'
state_names = readHTMLTable(state_names_url, header = TRUE, which = 1) %>%
{.[-1, ]} %>%
select(State = 1, Code = 3)
state_names[state_names$State == 'Dist. of Columbia', 'State'] = 'District of Columbia'
data = inner_join(data, state_names, by = 'State')
plot = plot %+% data
plot + geom_text(aes(x = `Gun ownership (%)` + 2, label = Code), size = 3)
This shows the correspondence better. The only clearly moved state is “Arizona” (AZ). Let’s look at the correlation now.
model = `Deaths per 100000` ~ `Gun ownership (%)`
fit = lm(model, data)
p = function (fit) format.pval(anova(fit)$`Pr(>F)`[1], digits = 10)
plot + geom_smooth(method = lm) +
annotate('text', x = 40, y = 3,
label = sprintf('R^2 == %0.2f', summary(fit)$r.squared), parse = TRUE)
So, a clear correlation at \(P\) = 1.263963147e-10 significance.
Now let’s examine these clusters. I’m using a gun ownership cut-off of 20% because that’s the clusters I initially perceived:
plot + aes(color = `Gun ownership (%)` < 20)
Here I’ll first look at their correlation and then try to explain the clusters.
data = data %>%
mutate(`Bottom left` = `Gun ownership (%)` < 20)
bottom_left_fit = lm(model, filter(data, `Bottom left`))
top_right_fit = lm(model, filter(data, ! `Bottom left`))
These correlate much less clearly (bottom left: \(R^2\) = 0.2911171, top right: \(R^2\) = 0.3596438) but the correlation in the top right cluster is still significant: \(P\) = 1.692435589e-05 (the bottom left cluster isn’t, \(P\) = 0.2113111988).
Let’s look at the clusters on a map:
us_map = map_data('state')
map = ggplot(data) +
geom_map(aes(map_id = tolower(State)), map = us_map) +
expand_limits(x = us_map$long, y = us_map$lat) +
coord_fixed()
map + aes(fill = `Gun ownership (%)` < 20)
… all states in the Northeast. In fact, if you don’t like guns, or don’t want to get murdered, simply don’t live in the Midwest:
map + aes(fill = `Gun ownership (%)`)
map + aes(fill = `Deaths per 100000`)
Finally let’s look at the states with child access prevention laws.
plot + geom_point(aes(color = `Child access prevention`))
… unsurprisingly, those are the states with low gun ownership, and fewer deaths. Unfortunately this once again doesn’t tell us whether these laws actually contribute to the lower number of guns (the exact opposite is in fact likely: states with a previously lower number of guns per capita have less incentive to oppose gun control laws, because they are less vested in gun rights).
Not much. The correlation between gun possession and gun deaths is real, and even when excluding the exceptionally sane Northeast states, the correlation persists (contrary to what I had assumed). What remains unclear is the cause. Whenever a similar plot gets posted online, the implicit claim is that there’s a causal relationship. But the correlation shows nothing of the sort.
A more comprehensive look at the evidence can be found on Skeptics Stack Exchange. The discussion there explains some of the problems with merely looking at such correlations. The studies linked there show one thing very clearly: there is no consensus. And if there’s any effect at all, it will be small in comparison to other factors such as poverty level etc.
This isn’t to say that (much) stricter gun control laws aren’t desirable, nor that they wouldn’t be effective. However, they are only a very small part of the story, and cross-state correlation is not good evidence to show their effectiveness.
In the end, I think a more compelling argument in favour of gun control laws is the following consideration: would you rather live in a society where citizens (up to 60%!) place so little trust in the system that they fully expect to have to defend themselves using extreme violence, or would you rather live in a society that is peaceful and civilised so that citizens can live with the expectation that risks to their lives are negligible and they do not constantly have to fear bodily harm? For me, the answer is clear.