1 Introduction

Using samples to make predictions about a population brings uncertainty into our data. As the study of risk and uncertainty, probability is therefore key to understanding statistics. We introduce the ideas here for describing and quantifying uncertainty via probabilities.

2 Introducing probability

Statistics is a powerful tool that allows us to use data to answer questions about the world around us. Last week on the course, we saw that we cannot generally collect data on every member of our population of interest, and so we must take a sample of the population. We then use the sample data to make inferences about the whole population. However, every time we take a different sample from the population, we will obtain different data, and hence a different estimate of whatever it is that we are interested in. One question then is: “How accurate is this estimate?” In order to answer this question, we need to understand the uncertainty in our data and in order to do this we need to understand probability.

play button   Download the video

2.1 Quantifying chance

Whenever we consider how likely it is that something will happen, we are thinking about chance. Probability is the area of maths that allows us to quantify and study chance.

In probability, we are interested in the likelihood of some event happening. An event is an outcome, or set of outcomes, of an experiment, or an observation, or set of observations, of a variable. If our experiment is rolling a single die, for example, one event is rolling a six. If our variable was the height of females in the UK, an event might be that a given woman is between 1.65 and 1.66 metres tall.

Probability assigns to an event a number between 0 and 1, indicating how likely it is that the event will occur. A probability of 0 means that the event is impossible and cannot happen, whilst a probability of 1 means that the event will certainly happen. Most events have a probability that is greater than 0, but less than 1, and the higher the probability, the more likely the event is to happen. A probability of one half means that an event is just as likely to happen as it is to not happen.

A probability can be expressed in several different, but equivalent, ways. For example, intuitively, the probability of rolling an even number of a fair die is 1/2 (we will explain in more detail how we obtain probabilities shortly). However, we could also write this in the following ways:

  • As a number: 1/2, or 0.5;
  • As a percentage: 50%;
  • As a chance: a 1 in 2 chance;
  • As a ratio: 1:1 (out of 2 outcomes we expect for 1 successful outcome and 1 unsuccessful outcomes).

Similarly consider the event of throwing a six on a fair die. The probability of this is 1/6 which we can again express in the similar ways:

  • As a number: 1/6, or approximately 0.17;
  • As a percentage: approximately 17%;
  • As a chance: A 1 in 6 chance;
  • As a ratio: 1:5 (out of 6 outcomes we expect for 1 successful outcome and 5 unsuccessful outcomes).

The diagram below shows some examples of events, ordered according to their approximate probability.

Image 1: Probability of events

2.1.1 Questions

Mairi and Chris are playing a game, in which Mairi flips a fair coin three times. If Mairi gets three heads, she wins. Otherwise, Chris wins. The following are events associated with this game:

  • Mairi wins the game
  • Mairi gets either 1 head or 3 heads

In order of probability (from lowest to highest) the following events might occur:

  • Mairi does not finish the game because she spontaneously combusts
  • Mairi wins the game (flips only heads during the game)
  • Mairi loses the game (flips at least one tail during the game)

2.1.2 Notation for probabilities

When we are writing about probabilities, in order to keep our writing concise, we usually represent an event by a capital letter. For example, \(A\) might be the event that we roll a six on a fair die, and \(B\) might be the event that a woman chosen at random is between 1.65 and 1.66 metres tall. We use the capital letter \(P\) to denote probability and write the probability of an event as \(P\)(event). So, for example, \(P(A)\) is the probability of rolling a six on a fair die, and in this case we have that \(P(A)=1/6\). Similarly \(P(B)\) is the probability that a woman chosen at random is between 1.65 and 1.66 metres tall - we will look at how we may work out such a probability in week 5 of this course.

2.1.3 Understanding probabilities

Let \(A\) be the event that Mairi wins the game described above.

  • \(P(A)\) is the probability that Mairi wins the game

Mairi calculates that \(P(A)=1/8\). The following statements are true:

  • The probability of Mairi winning the game is 0.125
  • If Mairi plays the game 100 times we would expect her to win around 12 or 13 times.

Let \(B\) be the event that Mairi loses the game. Mairi calculates the \(P(B)\) is greater than \(P(A)\). The fact that \(P(B)\) is greater than \(P(A)\) means that Mairi is more likely to lose the game than win.

2.2 Where do probabilities come from?

Probabilities are important to statistics, as they provide a way of quantifying the uncertainty within our data. But where do these probabilities come from? There are two ways to obtain probabilities.

  1. Firstly, we can use what we know about an object or system to work out the probability in a logical way. For example, when rolling a fair die, we assume that the die is a cube, and that, by the symmetry of the cube, it is equally likely to land on each of the six faces. Therefore we deduce that the probability of rolling any given number is 1/6. We call such a probability a theoretical probability.
  2. Secondly, we can estimate the probabilities experimentally. For example, if we wanted to know the probability that a woman will be between 1.65 and 1.66 metres tall, we cannot deduce this logically, but we could measure the heights of a large number of women, and see how many are between 1.65 and 1.66 metres tall, and use this number to estimate the probability. We call such a probability an experimental probability.

While using experimental data only allows us to get an estimate of a probability, often we are in a situation where we cannot deduce the probability by using a logical theoretical argument. For example, in many cases the experimental probability can thought of as analogous to taking a sample from the population and using the observed sample to estimate the (theoretical) probability.

2.2.1 Finding probabilities

A hospital is interested in the probability of success of a new treatment. It calculates the probability based on previous success rates of the treatment. This is an example of an experimental probability.

Mairi wants to know the probability of picking a gummy bear out of her bag of pick ‘n’ mix. She empties the bag, counts the number of gummy bears, and counts the total number of sweets in the bag, to work out a probability. This is an example of a theoretical probability.

A theoretical probability could be found for the following events:

  • The event that you roll 6 die and the numbers you get add up to 13.
  • The event that you win the jackpot in the National Lottery.

2.3 Paul the Octopus

In 2010 Paul the Octopus, a common octopus living in an aquarium in Oberhausen, Germany, made headlines when he correctly predicted the outcomes of eight World Cup football matches. Was Paul the Octopus an animal oracle, or did he just make a sequence of lucky guesses? Can we use probability to answer this question?

play button   Download the video

2.4 Calculating probabilities

To calculate a theoretical probability for an event, we first need to know what all the possible outcomes of the experiment are, or what values the variable can possibly take. The set of all possible outcomes of an experiment, or values a variable can take, is called the sample space. For example, when we are rolling a die, the sample space consists of six outcomes, one for each number that can be rolled. The sample space for the heights of women in the UK would consist of all heights that it is possible for a woman to be.

When each outcome of an experiment is equally likely, the probability of an event occurring can be calculated by dividing the number of outcomes contained within the event by the total number of outcomes of the experiment.

For example, say we are rolling a die, and want to work out the probability of rolling a 6. There are 6 outcomes in the sample space. Our event \(A\) is rolling a six, which is just one of these events, and so the probability of rolling a six, denoted \(P(A)\), is 1 divided by 6, or 1/6, as our intuition tells us. If, instead, we wanted to work out the probability that we will roll a number bigger than 3, then our new event, say \(B\), of rolling a number bigger than 3, consists of three outcomes: Rolling a 4, rolling a 5 or rolling a 6. There are still 6 possible outcomes in total, so the probability of rolling a number bigger than 3, denoted \(P(B)\) is given by 3 divided by 6, or 1/2.

2.4.1 Choosing lunch

A restaurant offers a two-course set lunch menu. There are two options for the first course, soup or olives, and three options for the main course, pasta, chicken or haggis. Mairi decides to make choosing her lunch by randomly picking a starter, and then randomly picking a main course.

  • There are 6 possible combinations of starter and main course.
  • The probability of picking a combination that has pasta as the main course is 2 out of 6 or 1/3.
  • The probability of picking a combination that does not have soup or haggis is 2 out of 6 or 1/3.

2.4.2 Mairi’s sweets

Mairi has counted the numbers of the different types of sweets in her bag of pick ‘n’ mix. The bag contains 20 sweets in total: 6 gummy bears, 3 gummy sour cherries, 4 chocolate mice, 5 fizzy cola bottles and 2 chocolate mini eggs. Mairi selects one sweet at random from the bag.

  • The probability that Mairi will select a fizzy cola bottle from the bag is 5 out of 20 or 1/4.
  • The probability that Mairi will select a chocolate (a chocolate mouse or a mini egg) from the bag is 6/20 or 30%.

Mairi eats four sweets out of the bag: A gummy sour cherry, a chocolate mouse, and 2 gummy bears. She then picks a fifth sweet at random of the remaining sweets.

  • The probability now that Mairi will select a chocolate from the bag is 5/16.

2.5 Horse racing

In this game, choose one of the horses, numbered between 2 and 12. Press on the ‘Roll’ button to roll two dice. The numbers on the dice, and the sum of these two numbers, is shown in the top right-hand corner of the game window. The horse labelled with the number that is equal to this sum will move forward. Keep rolling the dice; the first horse to cross the line wins. To reset the game, press the reset button in the top right-hand corner of the game window.

Try playing the game a few times. Can you predict which horse will win? Is there a winning strategy? Why?

2.5.1 Exploring the probabilities

In this game, the probability of a horse winning is determined by how likely it is that that horse will move forward on each go. This is, in turn, determined by how likely it is that the horse’s number will come up as the total of the two numbers rolled on the dice. To work out which horse is most likely to win the game, we need to understand the probability of getting a given number as the total when we roll two dice.

2.5.2 Questions

Consider the simple experiment of rolling two six-sided dice, and noting down which numbers come up on each die. We set a single outcome to be the ordered pair \((a,b)\) where \(a\) corresponds to number recorded on the first die and \(b\) the number recorded on the second die.

  • There are 36 different outcomes. For each of the six outcomes on the first dice, there are six possible outcomes on the second dice, giving \(6\times6=36\) possible outcomes in total.

2.5.3 Calculating the probabilities

Each time we roll two dice, we can write down not just which numbers come up, but also the sum of the two numbers rolled. The set of possible combinations of numbers rolled on the two dice are ordered pairs of numbers \((a,b)\), where the first number in the pair, \(a\), corresponds to the number rolled on the first die, and the second, \(b\), the number of the second die. Note that, in this experiment, the order that we roll the numbers matters: Rolling a 1 and then a 2 is not considered to be the same as rolling a 2 and then a 1. In other words \((1,2)\ne(2,1)\). For each pair, we can also calculate the sum of the two numbers rolled (i.e. \(a+b\)), which is equal to a number between 2 and 12. The table below shows the possible combinations of numbers rolled on the two dice, along with the corresponding sums of the two numbers.

Image 2: Possible combinations for two dice

From this table, we can see how many combinations lead to the two dice summing to a given number. For example, how many combinations lead to a sum of 3? The number 3 occurs twice in the table, meaning that there are two combinations of the dice-rolling experiment that give a total of 3: Rolling 1 and then 2 (i.e. ), or rolling 2 and then 1 (i.e. ).

Each paired combination of numbers is equally likely. For example, (i.e. rolling a 1 and then a 2) is equally likely as (i.e. rolling a 6 and then another 6). Since the total number of combinations is 36, the probability of any given combination is then 1/36. We can use this fact to work out the probability that the sum of the two numbers rolled is equal to a given number by dividing the number of combinations leading to that given number by the total number of outcomes. For example, there are two ways of rolling a sum of 3, and so the probability of rolling a sum of 3 is 2 out of 36, or 1/18.

3 Laws of probability

3.1 The birthday paradox

So far, we have calculated fairly simple probabilities, in cases where each outcome of the experiment is equally likely. What if this is not the case? There are several rules, called the laws of probability, that we can use to help us calculate more complex probabilities.

play button   Download the video

3.2 Constructing events

Most events that we are interested in are much more complex than those we have seen so far. To calculate more complex probabilities, we need to be able to construct more complex events out of simpler ones.

Say we have two events, \(A\) and \(B\). From these two events, we can create:

  • The event “\(A\) and \(B\)”, also written as \(A\cap{B}\): The event that both \(A\) and \(B\) happen;
  • The event “\(A\) or \(B\)”, also written as \(A\cup{B}\): The event that either \(A\) happens, \(B\) happens, or both of \(A\) and \(B\) happen;
  • The event “not \(A\)”, also written \(A'\) or \(A^\mathsf{c}\): The event that does not happen; and sometimes called \(A\) complement.

For example, say that we are rolling a six-sided die, and that event \(A\) is rolling a number greater than 2 on the die, and event \(B\) is rolling an even number on the die. Then the event \(A \cap B\) is the event that we roll a number that is both even and greater than 2 (i.e. rolling a 4 or a 6). The event \(A \cup B\) is the event that we roll a number that is either greater than 2, even, or both greater than 2 and even (i.e. rolling a 2, 3, 4, 5 or 6). The event \(A'\) is the event that we roll a number that not greater than 2 (i.e. rolling a 1 or a 2).

We can extend the notation above to consider more than two events. For example, if we have four events, \(A\), \(B\), \(C\) and \(D\), the event \(A \cap B \cap C \cap D\) is the event that all four of the events \(A\), \(B\), \(C\) and \(D\) happen.

3.3 The laws of probability

The laws of probability are useful tools for calculating the probabilities of more complex events. Before introducing the first law of probability though, we need to introduce the concept of independent events.

3.3.1 Independence

Two events, \(A\) and \(B\), are independent of one another if the probability that one of the events happens is unaffected by whether or not the other one happens. The events are dependent on one another if they are not independent of each other.

Consider the example where event \(A\) is rolling a number greater than 3 on a dice, and event \(B\) is rolling an even number, the events are dependent on each other: If we roll a number greater than 3, then we have rolled a 4, 5 or 6, and so we have a 2/3 chance of having rolled an even number (as the numbers 4, 5 and 6 are equally likely to be rolled). However, if we do not roll a number greater than 3, then we have rolled a 1, 2 or 3, and so only have a 1/3 chance of having rolled an even number. Thus the probability of rolling an even number depends on whether or not we roll a number greater than 3.

On the other hand, if we are rolling two dice, and event \(A\) is the probability that we roll a number greater than 3 on the first die, and event \(B\) is the probability that we roll an even number on the second die, then the events and are independent, because the number we roll on the first die does not affect the probability of rolling each number on the second die.

3.3.2 The law of multiplication

For independent events, the probability that events \(A\) and \(B\) both occur is equal to the probability that event happens multiplied by the probability that event \(B\) happens:

\[ P(A \text{ and } B) = P(A \cap B) = P(A) \times P(B) \]

This law can be extended for more than two independent events, as shown by Mairi in the previous video.

3.3.3 The law of addition

The probability that event \(A\) or event \(B\) occurs is equal to the probability that event \(A\) occurs plus the probability that event \(B\) occurs, minus the probability that both events \(A\) and \(B\) occur:

\[ P(A \cup B)=P(A)+P(B)-P(A \cap B) \]

We say that two events are mutually exclusive if they cannot both happen. For example, we cannot roll both an even and an odd number on a single die, and so the events ‘rolling an odd number’ and ‘rolling an even number’ are mutually exclusive. If two events are mutually exclusive, then by definition the probability of them both happening is zero (i.e. it is impossible that both events can happen). This allows us to simplify the law of addition: If \(A\) and \(B\) are mutually exclusive events, then:

\[ P(A \cup B)=P(A)+P(B) \]

since \(P(A \cap B) = 0\).

3.3.4 The law of subtraction

We often call the event \(A'\) (“not \(A\)”) the complementary event to \(A\). This is sometimes written as \(A^\mathsf{c}\). The probability that will occur is equal to 1 minus the probability that event will not occur:

\[ P(A)=1-P(A') \]

Can you see how the law of subtraction can be derived from the above laws? As a hint, note that \(A\) and \(A'\) are mutually exclusive, but also one of them has to happen.

3.4 Venn diagrams

3.4.1 Favourite animals

We have surveyed a group of first year statistics students on their favourite animals, in an attempt to see what characteristics make an animal more appealing to students. 42% of students responded saying they favoured an animal that was fluffy, while 28% of students responded saying they favoured an animal that had big teeth. 4% of students responded saying they favoured an animal with both of these characteristics. We can use these proportions to estimate the probability that a student will favour an animal that is fluffy, has big teeth, or is fluffy with big teeth.

How can we present this information so that it can be easily understood? The image below shows a Venn diagram of the probabilities that a student’s favourite animal will be fluffy, have big teeth, or be fluffy and have big teeth. Venn diagrams were introduced by the English philosopher John Venn in the late 19th century as a way of expressing statements in the theory of logic, but are nowadays a common way of presenting simple proportions or probabilities.

Image 3: Venn diagram

The left-hand circle represents the probability that a student will favour a fluffy animal, 42%, and the circle on the right represents the probability that a student will favour an animal with big teeth, 28%. The overlap of the circles indicates the probability that a student will favour an animal with both characteristics, 4%. The part of the left-hand circle that is not also in the right-hand circle represents the probability that a student will favour an animal that is fluffy but doesn’t have big teeth, 38% (=42%-4%). Similarly, the part of the right-hand circle that is not also in the left-hand circle represents the probability that a student will favour an animal that has big teeth but is not fluffy, 24% (=28%-4%).

Whenever we have two events, \(A\) and \(B\), we can create a Venn diagram to illustrate the probabilities \(P(A)\), \(P(B)\), and \(P(A \text{ and } B)\) (or, equivalently, \(P(A \cap B)\)): The full left-hand circle represents \(P(A)\), the full right-hand circle represents \(P(B)\) and the overlap represents \(P(A \cap B)\). Similarly, we can represent the probability \(P(A \text{ but not }B)\), or equivalently \(P(A \cap B')\), by the part of the left-hand circle that is not also inside the right-hand circle, and the probability \(P(B \text{ but not }A)\), or equivalently \(P(B \cap A')\), by the part of the right-hand circle that is not also inside the left-hand circle.

3.4.2 Venn diagrams and the addition law

Earlier this week, we introduced the law of addition: \(P(A \text{ or } B)=P(A)+P(B)-P(A \text{ and }B)\) or, equivalently, \(P(A\cup B=P(A)+P(B)-P(A\cap B)\). Can you see how to work this out from the Venn diagram?

3.5 Probability trees

Probability trees are a useful way of allowing us to see and compare all the possible outcomes of an experiment. They are particularly useful when the outcomes we are interested in occur after several different events have happened, and these events are not independent of one another, so that the outcome of one experiment will affect the outcome of future experiments. To illustrate the idea we will take a simple example where we have five counters in a bag of which two counters are blue and three counters are red. We randomly choose two counters from the bag and wish to investigate the probability of drawing either two red counters; two blue counters or one red counter and one blue counter. In order to calculate these probabilities it is useful to draw a probability tree where we consider the possible outcomes of drawing the first counter; and then the outcomes of drawing the second counter (given the counter we picked in our first draw) and the associated probabilities of these different events. For this example we obtain the following probability tree:

Image 4: Probability tree

The circle on the far left-hand side represents the initial state of the experiment; in our case, we have three red counters and two blue counters. The two branches of the tree leaving this initial circle represent the two outcomes of the first part of our experiment, where we pick one counter out of the bag. The event ‘picking a blue counter on the first go’ is represented by the top branch, and the probability of this happening, 2/5, is shown next to the branch (there are two blue counters out of a total of five counters). The event ‘picking a red counter on the first go’ is represented by the lower branch, and the probability of this happening, 3/5, is shown next to this branch. Both branches lead to new circles, which represent the state of the experiment after one counter is drawn from the bag at random.

Each of the two new states have two branches emerging from them. Each branch represents the outcome of having picked either a blue or a red counter on the second turn, and the probability of each outcome is shown next to the branch. The four circles on the far right-hand side of the tree represent the final states of the experiment.

Each path through the tree represents a unique outcome of the experiment. Because of how the probability tree splits up the outcomes, the branches making up a path correspond to independent events, and so the probability of each outcome along a given path from the initial state is calculated by multiplying together the probabilities on each branch of the given path. Additionally, the events corresponding to the different possible paths are mutually exclusive, and so the probability of one or the other happening is easily found by summing the probabilities corresponding to the individual paths.

3.6 Exploring the birthday paradox

In the earlier video, Mairi and Chris looked at the probability that two people in a room would share a birthday (ignoring leap years). Naturally, this probability varies depending on how many people there are in the room. However, the probability is generally a lot higher than people might intuitively think it should be, which has led to the naming of this problem the birthday paradox.

This interactive graph allows you to explore the birthday paradox. The horizontal axis represents the number of people in a room, and the vertical axis represents the probability of two people in such a room sharing a birthday. You can move the slider in the centre of the graph to change the number of people in the room, and see how this affects the probability of two people in the room sharing a birthday. This probability can be read off the vertical axis, but is also shown in the centre of the graph, below the slider.

If you are wondering how these probabilities are calculated, you may want to refer back to the earlier video in which Mairi calculates the probability that two people in a room of 23 people will share a birthday: The method used can be extended to calculate the probability for any number of people in a room. Explanations of how these probabilities can be calculated can also be found online, for example in this article on the Statistics by Jim website.

3.7 Olympic birthdays

In August 2016, the British tabloid newspaper the Daily Mail, amongst other UK newspapers, reported on the fact that four of the greatest British athletes of all time were born on the 23rd March (you can see the web-based article on their website here). Is this a sign of some paranormal link between birthdays and sporting ability, or is it all a huge coincidence? What is the probability that four of the greatest athletes would be born on the same day?

The activity above should make you wonder just how low this probability is. Let’s consider British athletes who have claimed 3 or more Olympic gold medals. There are 19 of them, and so we know that the probability of two of them sharing a birthday is nearly 38%. Of course, the probability that four will share a birthday is considerably lower. This exact figure is much harder to compute, but we can approximate it to be around 1 in 400. This might not seem like a high probability, but it is not zero. And we know that very unlikely events do happen, every day - a product of the fact that we live in a huge world where lots of different events are happening or could be considered all the time.

3.8 Making predictions

One of the key things we want to do using probability is make predictions. Consider, for example, the horse racing game that we looked at earlier in this section. On each go, two dice are rolled, and the two numbers that are rolled are then added together. The horse whose number equals this sum moves forward. On any given go, the probability that a horse moves forward depends on how likely it is that the two numbers rolled on the dice will add up to its number. These probabilities are listed in the table below.

Image 5: Horse game probabilities

We can see from this table, for example, that the probability of rolling a sum of 7 is 0.17, or 17%. We can use this probability to predict that if we were to roll the two dice 100 times, we should roll a sum of 7 roughly 17% of the time, and so around 17 of these rolls should give us a total of 7. However, in practice, this does not mean that if we roll the two dice 100 times we will definitely get a sum of 7 on 17 of these rolls.

The interactive graph below shows the outcome of a simulated experiment in which two dice are rolled a given number of times, and on each dice roll, the two numbers are added together and recorded. The horizontal axis shows the possible sums of the two dice, the numbers 2 to 12, and the vertical axis shows the number of times each possible sum is obtained. The darker histogram shows the results of the simulated experiment, while the paler histogram in the background shows the predicted number of times each number should be obtained, based on the probabilities in the above table.

Moving the slider at the top of the graph allows you to change the number of times the two dice are rolled in the experiment. Pressing the ‘Repeat’ button repeats the experiment. You may be interested to see how the outcome of the experiment varies as you vary the number of dice rolls, or as you repeat the experiment.

Since the outcome of each dice roll is random, the results of such an experiment will nearly always vary if the experiment is repeated. Therefore, the probabilities cannot tell us exactly what will happen, only what is most likely to happen.

Note that one interpretation of probability is that it is equal to the proportion of times an event will occur if we were to repeat the given experiment an infinite number of times.

4 Conditional probability