I bet when you open a bag of M&Ms, you look at a few of them in your hand, admire the pretty color variation that exists, and just eat them. Yes, I used to do that, too. But in March, 2014, I spilled bag of M&Ms onto my desk and noticed something odd. I used a tool from my days as a Six Sigma Black Belt and created a Pareto Chart, and snapped this picture:
This Bag
Wait a minute! Only one brown? Why aren’t they more even?
To make a long story short, I posted this on Facebook, and started routinely making “M&M Pareto Charts” and posting them to an album. A friend joined in and posted ones with Peanut M&Ms. As of today, we have logged 88
bags of M&Ms in this manner.
And recently, I compiled the posts into a data set.
But back to the candy!
I became fascinated by what I was seeing because I saw so much variation. I mean, the naive person might just assume that every bag had (roughly) the same amount of each color. You could imagine a manufacturing process that did that, right? But could you imagine a manufacturing process that wasn’t predictable in terms of the distribution of colors? That’s what I was seeing!
As I looked at these pictures, I came to be curious about a number of things:
Now, let’s get to analyzing this data!
One thing that is clearly missing from the data collection is the Bag Size.
M&Ms come in different sizes. There’s a pretty typical size (which I call Regular), and there’s the size you most often see around Halloween (a small bag callled Fun Size – I like to joke that it would be more fun if it were bigger).
This gives us a pretty good first glance at the counts of M&Ms in the bags in our data set. We see there are at least four sizes. After a lot of views of those distributions, I classified all the bags.
This table shows how many bags we’ve collected by Kind and Size.
Kind | BagSize | BagCount |
---|---|---|
Peanut | Fun Size | 1 |
Peanut | Regular | 25 |
Peanut Butter | Regular | 1 |
Plain | Fun Size | 25 |
Plain | Really Big | 1 |
Plain | Regular | 29 |
Plain | Sharing Size | 6 |
Let’s get back to the questions I raised up front and start to explore the color distributions in the bags of M&Ms, shall we?
First, Maximum and Minimums in a bag.
The maximum proportion tells us the proportion that the most numerous color represents in a given bag. If, say, 40% of the bag is Yellow M&Ms, that might be the max proportion for that bag (40%). It tells us about concentration in a given bag versus more even distribution.
So that starts to tell us that there’s a pretty wide range. The maximum color proportion seems to vary from as little as 20% to as much as 55%! It’s also the case that for the smaller bags, that variability is greater (Fun Size, Plain, has that wide range), where as the bigger bags have smaller top ends (Regular, about 40%, Sharing Size, less than 30%). That seems to suggest that the bigger the bag, the less likely it is to have a large proportion of one color.
Now, we look at the minimum proportions.
This looks a little more odd. For example, Plain:Fun Size: It seems like a lot of the bags have missing colors (look at the tall bar at 0.00). This is also the case for Regular:Peanut, which we saw, earlier, has about the same number of M&Ms as a Plain:Fun Size bag does.
It’s also the case that, though there is a some variation, it is not nearly as wide as the maximum. The minimums range from 0% to a bit of 10%.
M&Ms come in six colors, as we saw in the pictures and graphs at the beginning. How often do we see bags that have fewer than six colors?
I find this very interesting! The smallest bags, Plain:Fun Size, have the highest frequency of missing a color. While I don’t have a lot of Plain:Sharing Size in the data set, there are none with missing colors. So the larger the count of M&Ms in the bag, the lower the chance of a missing color (and vice versa?).
Remember that bag of M&Ms that set me down the path of collecting this data? At last, lets check the data set to see how often that happens!
Well! Guess what? That bag, that bag with the lonely, brown M&M, was the only Plain:Regular bag in the data set that had a an M&M color with just one in the bag. And that’s out of 29
bags!
This is what got me so curious, and for me, the next step is to determine the probability of this happening from a pure probability-theory perspective.
On average, a Plain, Regular-Sized M&M bag (the kind that had the lonely, brown M&M) has about 55
M&Ms in it. Let’s assume a general probability of a given M&M to be even for each color (1/6). Let’s start with Brown.
55
tries: 55
x (1/6) x (5/6)^(55
-1) = 0.000472206
0.2832
%This is different than the 3.4
%. Statistically different? It is if the p-value is quite small, and: p-value = 0.0136038
Yes! What I saw was something I would ordinarily only see in 353
bags!
Now is the time to look at the data set as a whole, to see what colors come up most.
My general impression that I was seeing a lot of Blue was not that far off. Also, we see a lot of Orange and Yellow, but considerably less of the three other colors.
Hey, remember the iconic image that is on the cover of the 1979 Joy Division Album, Unknown Pleasures?
Turns out, that’s called a Ridgeline plot, and we can look at the color distribution the same way:
This shows how much sharper the modes are for the three least numerous colors, while the more numerous ones have flatter and wider distributions. It seems that Blue, Orange and Green all had more bags with high proportions.
It would be one thing if I was the person seeing the “lots of Blue” phenomenon, but since my collaborator, BP, was the one responsible for the Peanut M&Ms, let’s see if the distributions are the same with a different graphical technique.
This graph does show us the slight variation of colors across the data set. I would have thought that the predominance of Blue I was finding would not be found in the Peanut bags. But here it shows that the Blue was similarly numerous in the Peanut bags, too. I find that very interesting!
Let’s review my original questions and how they turned out:
how often is there only one of one color?
For a Regular-sized, Plain bag of M&Ms, it’s pretty unusual! However, for bags with smaller counts, it can happen pretty regularly. I even ran the probability calculations to see if the rate is higher or lower than it “should be”, and what I found was that my lonely, brown M&M was from a very rare bag
how often does a bag have only five of the six colors?
This is also pretty unusual for Regular, Plain M&Ms, but again, with smaller-count bags (Fun Size, or Peanut Regular Size), it can happen more often
does blue really dominate the counts, as it seems to?
Blue did show up more frequently than any other color, and it was significantly more than the least frequent color (Red). But it was not dominant overall.
In the sense that smaller bags are more likely to have missing colors, yes. But in the sense that a larger-count bag can have a lot more of one color, no.
Yes!
do Peanut and Plain M&Ms have similar color distributions?
Yes, and that was one of the more surprising findings (the last chart)
I hope you had some fun thinking about the delicious candy and the things we could do with data visualization in analyzing the colors! I know this was fun for me.
Last point: more data? Would you consider adding to the data set? All you have to do is:
When I get a lot more data, I can update this analysis and see if my relatively small sample holds true to more bags. I especially need a diversity of bag sizes across Plain, Peanut and Peanut Butter!