Introduction

M&Ms are colorful, fun and make for great data

I bet when you open a bag of M&Ms, you look at a few of them in your hand, admire the pretty color variation that exists, and just eat them. Yes, I used to do that, too. But in March, 2014, I spilled bag of M&Ms onto my desk and noticed something odd. I used a tool from my days as a Six Sigma Black Belt and created a Pareto Chart, and snapped this picture:

This Bag

This Bag

Wait a minute! Only one brown? Why aren’t they more even?

To make a long story short, I posted this on Facebook, and started routinely making “M&M Pareto Charts” and posting them to an album. A friend joined in and posted ones with Peanut M&Ms. As of today, we have logged 88 bags of M&Ms in this manner.

And recently, I compiled the posts into a data set.

M&Ms and the questions I might ask about colors

But back to the candy!

I became fascinated by what I was seeing because I saw so much variation. I mean, the naive person might just assume that every bag had (roughly) the same amount of each color. You could imagine a manufacturing process that did that, right? But could you imagine a manufacturing process that wasn’t predictable in terms of the distribution of colors? That’s what I was seeing!

As I looked at these pictures, I came to be curious about a number of things:

  • how often is there only one of one color?
  • how often does a bag have only five of the six colors?
  • does blue really dominate the counts, as it seems to?
  • how does the color distribution differ by size of bag?
    • are smaller bags more variable?
    • are smaller bags more likely to have missing colors?
  • do Peanut and Plain M&Ms have similar color distributions?

Now, let’s get to analyzing this data!

Data Analysis

Bag Sizes

One thing that is clearly missing from the data collection is the Bag Size.

M&Ms come in different sizes. There’s a pretty typical size (which I call Regular), and there’s the size you most often see around Halloween (a small bag callled Fun Size – I like to joke that it would be more fun if it were bigger).

This gives us a pretty good first glance at the counts of M&Ms in the bags in our data set. We see there are at least four sizes. After a lot of views of those distributions, I classified all the bags.

This table shows how many bags we’ve collected by Kind and Size.

Cout of Bags by Kind and Size
Kind BagSize BagCount
Peanut Fun Size 1
Peanut Regular 25
Peanut Butter Regular 1
Plain Fun Size 25
Plain Really Big 1
Plain Regular 29
Plain Sharing Size 6

Color Analysis: Maximimums and Minimums

Let’s get back to the questions I raised up front and start to explore the color distributions in the bags of M&Ms, shall we?

First, Maximum and Minimums in a bag.

The maximum proportion tells us the proportion that the most numerous color represents in a given bag. If, say, 40% of the bag is Yellow M&Ms, that might be the max proportion for that bag (40%). It tells us about concentration in a given bag versus more even distribution.

So that starts to tell us that there’s a pretty wide range. The maximum color proportion seems to vary from as little as 20% to as much as 55%! It’s also the case that for the smaller bags, that variability is greater (Fun Size, Plain, has that wide range), where as the bigger bags have smaller top ends (Regular, about 40%, Sharing Size, less than 30%). That seems to suggest that the bigger the bag, the less likely it is to have a large proportion of one color.

Now, we look at the minimum proportions.

This looks a little more odd. For example, Plain:Fun Size: It seems like a lot of the bags have missing colors (look at the tall bar at 0.00). This is also the case for Regular:Peanut, which we saw, earlier, has about the same number of M&Ms as a Plain:Fun Size bag does.

It’s also the case that, though there is a some variation, it is not nearly as wide as the maximum. The minimums range from 0% to a bit of 10%.

Color Analysis: Missing Colors

M&Ms come in six colors, as we saw in the pictures and graphs at the beginning. How often do we see bags that have fewer than six colors?

I find this very interesting! The smallest bags, Plain:Fun Size, have the highest frequency of missing a color. While I don’t have a lot of Plain:Sharing Size in the data set, there are none with missing colors. So the larger the count of M&Ms in the bag, the lower the chance of a missing color (and vice versa?).

Color Analysis: The lonely, Brown M&M

Remember that bag of M&Ms that set me down the path of collecting this data? At last, lets check the data set to see how often that happens!

Well! Guess what? That bag, that bag with the lonely, brown M&M, was the only Plain:Regular bag in the data set that had a an M&M color with just one in the bag. And that’s out of 29 bags!

This is what got me so curious, and for me, the next step is to determine the probability of this happening from a pure probability-theory perspective.

Probability of the lonely

On average, a Plain, Regular-Sized M&M bag (the kind that had the lonely, brown M&M) has about 55 M&Ms in it. Let’s assume a general probability of a given M&M to be even for each color (1/6). Let’s start with Brown.

  • The chance of drawing a brown M&M out of the bag on any is 1/6.
  • The chance of not drawing a brown, then, is 5/6, each try.
  • We can use a binomial distribution to calculate the chance of there being one and only one Brown in 55 tries: 55x (1/6) x (5/6)^(55-1) = 0.000472206
  • This applies to it happening for one color, but since we are asking about not just Brown, it could be any color, which means it is six times that number, hence, the probability is 0.2832%

This is different than the 3.4%. Statistically different? It is if the p-value is quite small, and: p-value = 0.0136038

Yes! What I saw was something I would ordinarily only see in 353 bags!

Conclusion

Let’s review my original questions and how they turned out:

  • how often is there only one of one color?

    For a Regular-sized, Plain bag of M&Ms, it’s pretty unusual! However, for bags with smaller counts, it can happen pretty regularly. I even ran the probability calculations to see if the rate is higher or lower than it “should be”, and what I found was that my lonely, brown M&M was from a very rare bag

  • how often does a bag have only five of the six colors?

    This is also pretty unusual for Regular, Plain M&Ms, but again, with smaller-count bags (Fun Size, or Peanut Regular Size), it can happen more often

  • does blue really dominate the counts, as it seems to?

    Blue did show up more frequently than any other color, and it was significantly more than the least frequent color (Red). But it was not dominant overall.

  • how does the color distribution differ by size of bag?
    • are smaller bags more variable?

    In the sense that smaller bags are more likely to have missing colors, yes. But in the sense that a larger-count bag can have a lot more of one color, no.

    • are smaller bags more likely to have missing colors?

    Yes!

  • do Peanut and Plain M&Ms have similar color distributions?

    Yes, and that was one of the more surprising findings (the last chart)

I hope you had some fun thinking about the delicious candy and the things we could do with data visualization in analyzing the colors! I know this was fun for me.

Last point: more data? Would you consider adding to the data set? All you have to do is:

  • Open a bag of M&Ms
  • Sort them by color
  • Take a picture
  • Upload to a public image hosting site
  • Summarize the bag information at this Google Sheet

When I get a lot more data, I can update this analysis and see if my relatively small sample holds true to more bags. I especially need a diversity of bag sizes across Plain, Peanut and Peanut Butter!