I am not a gun-owner, nor do I ever really plan to own a gun. However like most of us I have been forced to think about gun violence recently, particularly in the context of mass shootings. I often find myself discussing this issue with friends who are gun-owners, and the conversation typically gravitates toward assault weapons. What is it about assault rifles (like the AR-15) that makes them particularly deadly?

One intuitive answer that I’ve heard is that the .223 round typically used in an AR-15 has a lot of power and not much recoil. That seems very plausible to me, and it is pretty easy to test.

So the main goal of this project was to find some data about the speed and force of various guns and types of ammo, and make a basic scatterplot of muzzle energy (a combination of bullet mass and the speed at which it leaves the gun) and recoil energy.

There should be an overall positive correlation between these two, but if the .223 is particularly powerful for it’s recoil, it should stand out as a clear outlier.

Dataset

I pulled the dataset from this page. Thanks to Chuck Hawks for unknowingly lending me 236 datapoints.

I did some initial cleaning and took out brand names in Excel. Here’s what the dataset looks like:

data <- read.csv("BallisticsData.csv")[,1:5]

Our first few rows of data:
Caliber	bulletWeight	MuzzleVel	RifleWeight	RecoilEnergy
0.170	17	2550	7.5	0.2
0.170	25	4000	8.5	1.6
0.204	33	4225	8.5	2.6
0.218	45	2800	8.5	1.3
0.219	55	3300	8.5	3.2

Something to note is that the data includes multiple rows of the same caliber. Something I’ve learned since starting this project is that different companies make different bullets at the same caliber, and they’re a little different. So while the .17 HMR and the .17 Hornet are both a little over .17 inches in diameter, the Hornet is a slightly larger bullet by mass (20 grains compared to 17). Hawks also has data from different rifles. For our purposes this will come into play later with rifle weight.

High Muzzle Energy, Low Recoil?

Next I calculated muzzle energy (rounded to the nearest ft/lb) using a formula from this site.

data$MuzzleEnergy <- round((data$bulletWeight)*(data$MuzzleVel^2)/450240)

The correlation between muzzle energy and recoil energy is strong and positive. This make intuitive sense (i.e., equal and opposite reaction). Here’s the scatterplot:

data %>%
    ggplot(aes(x = MuzzleEnergy, y = RecoilEnergy)) + theme_light() +
    geom_point(alpha=.3, colour = "#1215E4") +
    labs(x ='Muzzle Energy (ft/lb)', y = 'Recoil Energy (ft/lb)', title='Muzzle & Recoil Energy (r = .87)')

It looks like there are a couple of very extreme outliers here, maybe those are the .223 rounds? Let’s add color to figure it out.

# factor to identify .223 (for color in plots)
data$is223 <- rep('No', nrow(data))
data$is223[which(data$Caliber == .223)] <- 'Yes'

data %>%
    ggplot(aes(x = MuzzleEnergy, y = RecoilEnergy)) +  theme_light() +
    geom_point(aes(color = is223, alpha = is223, size = is223)) +
    scale_colour_manual(name = ".223?", values = c("#1215E4", "#E41212")) +
    scale_size_manual(name = ".223?", values = c(1,3)) +
    scale_alpha_manual(name = ".223?", values = c(.3, 1)) +
    labs(x='Muzzle Energy (ft/lb)', y='Recoil Energy (ft/lb)', title="Muzzle & Recoil Energy (.223 highlighted)")

Well that’s a little underwhelming. The big outlier that’s over 10,000 ft/lb of muzzle energy isn’t a .223, and it looks like the (large, red) .223 datapoints aren’t outliers in terms of the relationship between muzzle energy and recoil energy.

One possibility is that weight is a factor here; some of these guns are huge. (30 lbs!)

Weights in this dataset are in increments of .25lb, which is pretty fine-grained for our purposes. Let’s just look at a histogram of the weights, with one-pound bins.

data %>%
    ggplot(aes(x = RifleWeight)) +  theme_light() +
    geom_histogram(binwidth = 1, fill = '#1215E4', alpha = .3, color = "#1215E4") +
    labs(x='Rifle Weight', y = 'Count', title='Distribution of Rifle Weights')

So the 30 lb gun is the clear outlier here, and most of the distribution is between 7 and 9. Let’s break the scatterplot up into facets by weight to see if (controlling for weight), the .223s look like more of an outlier.

data %>%
    ggplot(aes(x = MuzzleEnergy, y = RecoilEnergy)) + theme_light() +
    geom_point(aes(colour = is223), alpha = .3) +
    scale_colour_manual(name = ".223?", values = c("#1215E4", "#E41212")) +
    facet_grid(factor(RifleWeight) ~.) +
    labs(x = 'Muzzle Energy (ft/lb)', y='Recoil Energy (ft/lb)', title='Faceted Scatterplot')

Yikes, too many facets! Let’s look at a restricted weight range, and round to the nearest half pound.

data %>%
    mutate(RoundedWeight = round_any(RifleWeight, accuracy = .5)) %>%
    filter(RoundedWeight >= 6.5 & RoundedWeight <= 10) %>%
    ggplot(aes(x = MuzzleEnergy, y = RecoilEnergy)) + theme_light() +
    geom_point(aes(colour=is223), alpha = .3) +
    scale_colour_manual(name = ".223?", values = c("#1215E4", "#E41212")) +
    facet_grid(RoundedWeight ~ .) +
    labs(x = 'Muzzle Energy (ft/lb)', y='Recoil Energy (ft/lb)', title='Faceted Scatterplot (6.5-10lb)')

This is a bit clearer, and we can see a couple of patterns. First, there is a pretty clear relationship between rifle weight and muzzle energy. Heavier guns (facets near the bottom) tend to be more powerful (further to the right). That makes a lot of sense, but really it’s probably the reverse: manufacturers probably want to make more powerful guns heavier to keep recoil from getting out of hand.

The second pattern is that, regardless of rifle weight, the .223 rifles don’t seem to be outliers in the muzzle energy x recoil energy plot. Let’s get a closer look by eliminating the facets without .223s, and adding in a regression line. Just eyeballing it, I would guess that the .223 dots are pretty close to a best fit line.

data %>%
    mutate(RoundedWeight = round_any(RifleWeight, accuracy = .5)) %>%
    filter(RoundedWeight >= 7 & RoundedWeight <= 8.5) %>%
    ggplot(aes(x = MuzzleEnergy, y = RecoilEnergy)) + theme_light() +
    geom_point(aes(colour=is223), alpha = .3) +
    scale_colour_manual(name = ".223?", values = c("#1215E4", "#E41212")) +
    facet_grid(RoundedWeight ~ .) +
    stat_smooth(method="lm", aes(group = factor(RoundedWeight)), size=.25) +
    labs(x = 'Muzzle Energy (ft/lb)', y='Recoil Energy (ft/lb)', title='Recoil/Muzzle Energy, Faceted by Weight')

For the guns that weigh 7-8 pounds, the .223 is under the regression line, meaning they have less recoil than we would expect given their muzzle energy. But they don’t stand out as outliers, and they’re right on the line (or slightly above) for the guns that weigh 8-9 pounds.

It might be helpful to zoom in a bit on the relevant part of the data, for added context (although it shouldn’t change whether the .223 is an aberration). On the page with his recoil data table, Hawks mentions that there are military standards for acceptable lb/ft of recoil energy. According to Hawks, most military service rifles are below 15 lb/ft of recoil energy, and many shooters develop an involuntary flinch at 20 lb/ft or so.

So let’s filter the data to max out at 20 lb/ft of recoil energy and zoom in a bit.

d <- data %>%
    mutate(RoundedWeight = round_any(RifleWeight, accuracy = .5)) %>%
    filter(RoundedWeight >= 7 & RoundedWeight <= 8.5 & RecoilEnergy <= 20)

d %>%
    ggplot(aes(x = MuzzleEnergy, y = RecoilEnergy)) + theme_light() +
    geom_point(aes(colour=is223), alpha = .3) +
    scale_colour_manual(name = ".223?", values = c("#1215E4", "#E41212")) +
    facet_grid(RoundedWeight ~ .) +
    stat_smooth(method="lm", aes(group = factor(RoundedWeight)), size=.25) +
    labs(x = 'Muzzle Energy (ft/lb)', y = 'Recoil Energy (ft/lb)',
         title="Muzzle and Recoil Energy (Recoil Energy <= 20 lb/ft)")

Here we see that when we remove the guns with a ton of recoil, the .223 is below the regression line (less recoil than expected) for the three rifles weighing less than 8.5 lb. For the 8.5 lb gun, it’s right on top of the line.

To my eyes it doesn’t look like an outlier, but we can test this question a little more objectively. I should be able to calculate the distance between each point and the regression line. Based on the eyeball test, I don’t expect the .223 points to have a particularly large error. But at least we can put it in the context of the spread around the regression line.

Distribution of model residuals

First I’ll fit a linear regression predicting recoil with muzzle energy and weight. Note that, although I have binned the weights to rounded values for the facets in the plots, it’s better to use the actual weights for the regression.

d$resids <- lm(RecoilEnergy ~ MuzzleEnergy + RifleWeight, data = d)$residuals

The residuals from our regression indicate the distance and direction of error (the difference between the regression line and the observed value for each point). Let’s z-score these residuals to put individual residuals in the context of the overall distribution. Then we’ll plot a histogram of the z-scores.

d <- d %>%
    mutate(z_resids = resids/sd(resids))

d %>%
    ggplot(aes(x = z_resids)) + theme_light() +
    geom_vline(xintercept = -1, colour = '#1215E4', alpha = .3, linetype = "dashed") +
    geom_vline(xintercept = 1, colour = '#1215E4', alpha = .3, linetype = "dashed") +
    geom_histogram(binwidth = .25, fill = "#1215E4", alpha = .3, color = "#1215E4") +
    coord_cartesian(xlim=c(-4,4)) +
    labs(x='Z-Scored Residuals', y = 'Count', 
        title = 'Histogram of Z-Scored Residuals', 
        subtitle ='Recoil ~ Energy + Weight', 
        caption = 'Dashed lines indicate +/- 1 standard deviation from mean') +
    scale_x_continuous(breaks = -4:4)

A couple of visual notes: I added dashed lines to show +/- 1 standard deviation. Also, the blue line trails off to the right because there was one outlier (not a .223) that is extremely far above the line (Z = 7.37). You can see it in the scatterplot above if you look at the left end of the 7.5 lb facet, at around 20 ft/lb of recoil.

If the .223 values are outliers in any real sense, I would expect them to be much lower than 1SD below the mean (in the realm of Z = -2 or -3). They won’t be there, because there aren’t any datapoints there. But let’s add color to identify the .223 values anyway.

d %>%
    ggplot(aes(x = z_resids)) + theme_light() +
    geom_vline(xintercept = -1, colour = '#1215E4', alpha = .3, linetype = "dashed") +
    geom_vline(xintercept = 1, colour = '#1215E4', alpha = .3, linetype = "dashed") +
    geom_histogram(aes(fill = is223),binwidth = .25, alpha = .3, color = "#1215E4") +
    scale_fill_manual(name = ".223?", values = c("#1215E4", "#E41212")) +
    coord_cartesian(xlim=c(-4,4)) +
    labs(x='Z-Scored Residuals', y = 'Count', 
        title = 'Histogram of Z-Scored Residuals', 
        subtitle ='Recoil ~ Energy + Weight', 
        caption = 'Dashed lines indicate +/- 1 standard deviation from mean') +
    scale_x_continuous(breaks = -4:4)

We can see that all four of the .223 rifles fall below the regression line; they all have lower recoil than expected for their weight and muzzle energy. However, they are not outliers in any real sense. Half of the .223 rifles in this dataset have residuals within one standard deviation of the regression line.

TL;DR:

Maybe there is something about the .223 round, or about AR-15 rifles specifically, that makes them particularly deadly. However, if there really is something qualitatively different, it is not the relationship between muzzle energy and recoil.

Recoil and muzzle energy: Is the .223 round an outlier?

Dataset

High Muzzle Energy, Low Recoil?

Distribution of model residuals

TL;DR: