On Twitter, somebody claimed that

guess what, gun ownership correlates with gun deaths in the US https://t.co/PmOMEOnMy1 pic.twitter.com/LYFtpHGeNy

— Arthur Charpentier (@freakonometrics) June 19, 2015

But looking at the data, I wasn’t convinced:

@freakonometrics @neilhall_uk Uhm, no. Clearly two clusters, and next to no correlation within each cluster.

— Konrad Rudolph (@klmr) June 19, 2015

I decided to actually look at the data and verify that the “clusters” with Gun ownership < 20% and > 20%, respectively, were indeed internally uncorrelated.

Plotting the data

As a first step, I retrieve the data and put it into a usable format. The first table answers the question,

Are any firearms now kept in or around your home? Include those kept in a garage, outdoor storage area, car, truck, or other motor vehicle?

gun_ownership_url = 'http://www.washingtonpost.com/wp-srv/health/interactives/guns/ownership.html'
gun_ownership = readHTMLTable(gun_ownership_url, header = TRUE, which = 1)
gun_ownership = gun_ownership[-1, ]

parse_num = function (x) as.numeric(sub(',', '', x))
gun_ownership = select(gun_ownership, State = 1, Total = 2, Yes = 3,
                       `Yes %` = 4, No = 5, `No %` = 6) %>%
    mutate_each(funs(parse_num), -State)

head(gun_ownership)
##              State  Total   Yes Yes %     No No %
## 1 All Participants 201881 67786  31.7 134095 68.3
## 2          Alabama   2623  1294  51.7   1329 48.3
## 3           Alaska   2716  1627  57.8   1089 42.2
## 4          Arizona   3066   989  31.1   2077 68.9
## 5         Arkansas   2780  1431  55.3   1349 44.7
## 6      California*   3897   846  21.3   3051 78.7

The “*” annotation next to a state name indicates that the state possesses child access prevention laws, according to the website. Let’s annotate this properly.

gun_ownership = gun_ownership %>%
    mutate(`Child access prevention` = grepl('\\*$', State),
           State  = sub('\\*$', '', State))

DC is called “The District”. Let’s fix that.

gun_ownership[gun_ownership$State == 'The District', 'State'] = 'District of Columbia'

Next, here’s the firearms deaths data.

# Website actively prevents scraping, but allows downloading data.
#gun_deaths_url = 'http://kff.org/other/state-indicator/firearms-death-rate-per-100000/'
#gun_deaths = readHTMLTable(gun_deaths_url)
# Use manually downloaded CSV output.
gun_deaths = read.csv('raw_data.csv', skip = 3) %>%
    select(State = 1, `Deaths per 100000` = 2)

head(gun_deaths)
##           State Deaths per 100000
## 1 United States              10.4
## 2       Alabama              17.6
## 3        Alaska              19.8
## 4       Arizona              14.1
## 5      Arkansas              16.8
## 6    California               7.7

Now merge the data.

data = inner_join(gun_ownership, gun_deaths, by = 'State') %>%
    select(State,
           `Gun ownership (%)` = `Yes %`,
           `Child access prevention`,
           `Deaths per 100000`)
head(data)
##        State Gun ownership (%) Child access prevention Deaths per 100000
## 1    Alabama              51.7                   FALSE              17.6
## 2     Alaska              57.8                   FALSE              19.8
## 3    Arizona              31.1                   FALSE              14.1
## 4   Arkansas              55.3                   FALSE              16.8
## 5 California              21.3                    TRUE               7.7
## 6   Colorado              34.7                   FALSE              11.5

Plot the data to verify that it’s roughly the same as that published.

plot = ggplot(data, aes(x = `Gun ownership (%)`, y = `Deaths per 100000`)) +
    geom_point()

plot

The general shape is certainly correct. A few states however have very different positions. Let’s add state names.

state_names_url = 'http://www.infoplease.com/ipa/A0110468.html'
state_names = readHTMLTable(state_names_url, header = TRUE, which = 1) %>%
    {.[-1, ]} %>%
    select(State = 1, Code = 3)

state_names[state_names$State == 'Dist. of Columbia', 'State'] = 'District of Columbia'

data = inner_join(data, state_names, by = 'State')
plot = plot %+% data
plot + geom_text(aes(x = `Gun ownership (%)` + 2, label = Code), size = 3)

This shows the correspondence better. The only clearly moved state is “Arizona” (AZ). Let’s look at the correlation now.

model = `Deaths per 100000` ~ `Gun ownership (%)`
fit = lm(model, data)
p = function (fit) format.pval(anova(fit)$`Pr(>F)`[1], digits = 10)
plot + geom_smooth(method = lm) +
    annotate('text', x = 40, y = 3,
             label = sprintf('R^2 == %0.2f', summary(fit)$r.squared), parse = TRUE)

So, a clear correlation at \(P\) = 1.263963147e-10 significance.

Clusters

Now let’s examine these clusters. I’m using a gun ownership cut-off of 20% because that’s the clusters I initially perceived:

plot + aes(color = `Gun ownership (%)` < 20)

Here I’ll first look at their correlation and then try to explain the clusters.

data = data %>%
    mutate(`Bottom left` = `Gun ownership (%)` < 20)

bottom_left_fit = lm(model, filter(data, `Bottom left`))
top_right_fit = lm(model, filter(data, ! `Bottom left`))

These correlate much less clearly (bottom left: \(R^2\) = 0.2911171, top right: \(R^2\) = 0.3596438) but the correlation in the top right cluster is still significant: \(P\) = 1.692435589e-05 (the bottom left cluster isn’t, \(P\) = 0.2113111988).

Let’s look at the clusters on a map:

us_map = map_data('state')
map = ggplot(data) +
    geom_map(aes(map_id = tolower(State)), map = us_map) +
    expand_limits(x = us_map$long, y = us_map$lat) +
    coord_fixed()
map + aes(fill = `Gun ownership (%)` < 20)

… all states in the Northeast. In fact, if you don’t like guns, or don’t want to get murdered, simply don’t live in the Midwest:

map + aes(fill = `Gun ownership (%)`)

map + aes(fill = `Deaths per 100000`)

Child access prevention laws

Finally let’s look at the states with child access prevention laws.

plot + geom_point(aes(color = `Child access prevention`))

… unsurprisingly, those are the states with low gun ownership, and fewer deaths. Unfortunately this once again doesn’t tell us whether these laws actually contribute to the lower number of guns (the exact opposite is in fact likely: states with a previously lower number of guns per capita have less incentive to oppose gun control laws, because they are less vested in gun rights).

What does this tell us?

Not much. The correlation between gun possession and gun deaths is real, and even when excluding the exceptionally sane Northeast states, the correlation persists (contrary to what I had assumed). What remains unclear is the cause. Whenever a similar plot gets posted online, the implicit claim is that there’s a causal relationship. But the correlation shows nothing of the sort.

A more comprehensive look at the evidence can be found on Skeptics Stack Exchange. The discussion there explains some of the problems with merely looking at such correlations. The studies linked there show one thing very clearly: there is no consensus. And if there’s any effect at all, it will be small in comparison to other factors such as poverty level etc.

This isn’t to say that (much) stricter gun control laws aren’t desirable, nor that they wouldn’t be effective. However, they are only a very small part of the story, and cross-state correlation is not good evidence to show their effectiveness.

In the end, I think a more compelling argument in favour of gun control laws is the following consideration: would you rather live in a society where citizens (up to 60%!) place so little trust in the system that they fully expect to have to defend themselves using extreme violence, or would you rather live in a society that is peaceful and civilised so that citizens can live with the expectation that risks to their lives are negligible and they do not constantly have to fear bodily harm? For me, the answer is clear.


Source as a Gist