foundations of probability in R

Flipping coins in R

probability is the study of how data can be generated from a model

Flipping a coin in R

random draw from a binomial distribution
two outcomes, 0 (tails) or 1 (heads)

rbinom(1, 1, .5)
#[1] 1

parameter 1: number of random draws / outcomes from a random variable
parameter 2: number of coins we are flipping on each draw
parameter 3: probability of a heads

Flipping multiple coins

rbinom(10, 1, .5)
# [1] 0 1 1 0 1 1 1 0 1 0

10 draws of one coin

rbinom(1, 10, .5)
# [1] 4

1 draw of 10 coins
4 heads out of 10

rbinom(10, 10, .5)
# [1] 3 6 5 7 4 8 5 6 4 5

10 draws of 10 coins
each draw is how many heads out of 10

Unfair coins

rbinom(10, 10, .8)
# [1] 6 7 9 10 7 7 8 9 9 8

rbinom(10, 10, .2)
# [1] 2 2 1 2 2 4 3 1 0 2

Binomial distribution

\(X_{1..n} \sim Binomial(size, p)\)

size is number of flips
p is probability that it is heads

Simulating coin flips

In these exercises, you’ll practice using the rbinom() function, which generates random “flips” that are either 1 (“heads”) or 0 (“tails”).

# Generate 10 separate random flips with probability .3
rbinom(10, 1, .3)

##  [1] 0 1 0 0 0 0 1 0 1 0

That’s much easier than actually flipping all those coins!

Simulating draws from a binomial

In the last exercise, you simulated 10 separate coin flips, each with a 30% chance of heads. Thus, with rbinom(10, 1, .3) you ended up with 10 outcomes that were either 0 (“tails”) or 1 (“heads”).

But by changing the second argument of rbinom() (currently 1), you can flip multiple coins within each draw. Thus, each outcome will end up being a number between 0 and 10, showing the number of flips that were heads in that trial.

# Generate 100 occurrences of flipping 10 coins, each with 30% probability
rbinom(100, 10, .3)

##   [1] 3 5 5 2 1 3 1 2 3 1 2 3 3 3 3 2 1 2 1 4 2 4 3 2 4 2 2 1 5 1 3 4 5 1 3 1 3
##  [38] 2 4 1 4 1 3 4 3 2 2 5 2 3 1 6 2 3 2 1 3 4 5 3 4 1 0 3 4 4 5 2 4 4 3 5 2 2
##  [75] 1 2 2 4 2 4 3 3 2 5 3 2 3 2 4 0 4 2 3 5 3 3 3 2 0 3

Wow, that’s a lot of flips in almost no time at all!

Density and cumulative density

Calculating exact probability density

dbinom(5, 10, .5)
# [1] 0.2460938

parameter 1: the outcome we are estimating the density at
parameter 2: the number of coins
parameter 3: the probability of each being heads

Calculating density of a binomial

If you flip 10 coins each with a 30% probability of coming up heads, what is the probability exactly 2 of them are heads?

# Calculate the probability that 2 are heads using dbinom
dbinom(2, 10, .3)

## [1] 0.2334744

# Confirm your answer with a simulation using rbinom
mean(rbinom(10000, 10, .3) == 2)

## [1] 0.2398

Calculating cumulative density of a binomial

If you flip ten coins that each have a 30% probability of heads, what is the probability at least five are heads?

# Calculate the probability that at least five coins are heads
1 - pbinom(4, 10, .3)

## [1] 0.1502683

# Confirm your answer with a simulation of 10,000 trials
mean(rbinom(10000, 10, .3) >= 5)

## [1] 0.1502

Varying the number of trials

In the last exercise you tried flipping ten coins with a 30% probability of heads to find the probability at least five are heads. You found that the exact answer was 1 - pbinom(4, 10, .3) = 0.1502683, then confirmed with 10,000 simulated trials.

Did you need all 10,000 trials to get an accurate answer? Would your answer have been more accurate with more trials?

# Here is how you computed the answer in the last problem
mean(rbinom(10000, 10, .3) >= 5)

## [1] 0.1508

# Try now with 100, 1000, 10,000, and 100,000 trials
mean(rbinom(100, 10, .3) >= 5)

## [1] 0.16

mean(rbinom(1000, 10, .3) >= 5)

## [1] 0.146

mean(rbinom(10000, 10, .3) >= 5)

## [1] 0.1523

mean(rbinom(100000, 10, .3) >= 5)

## [1] 0.15025

Expected value and variance

expected value

\(E|X| = size \cdot p\)

variance: average squared distance of each value from the mean of a sample

\(Var(X) = size \cdot p \cdot (1-p)\)

Calculating the expected value

What is the expected value of a binomial distribution where 25 coins are flipped, each having a 30% chance of heads?

# Calculate the expected value using the exact formula
25 * .3

## [1] 7.5

# Confirm with a simulation using rbinom
mean(rbinom(10000, 25, .3))

## [1] 7.5118

Great work! Do you prefer using the simulation or the exact formula to find the mean?

Calculating the variance

What is the variance of a binomial distribution where 25 coins are flipped, each having a 30% chance of heads?

# Calculate the variance using the exact formula
25 * .3 * (1 - .3)

## [1] 5.25

# Confirm with a simulation using rbinom
var(rbinom(10000, 25, .3))

## [1] 5.257301

Awesome job! You can simulate the variance just like you can with the mean.

Probability of event A and event B

the probability of A and B assumes that both variables are independent

Simulating the probability of A and B

You can also use simulation to estimate the probability of two events both happening.

# Simulate 100,000 flips of a coin with a 40% chance of heads
A <- rbinom(100000, 1, .4)

# Simulate 100,000 flips of a coin with a 20% chance of heads
B <- rbinom(100000, 1, .2)

# Estimate the probability both A and B are heads
mean(A & B)

## [1] 0.08025

Simulating the probability of A, B, and C

Randomly simulate 100,000 flips of A (40% chance), B (20% chance), and C (70% chance). What fraction of the time do all three coins come up heads?

# You've already simulated 100,000 flips of coins A and B
A <- rbinom(100000, 1, .4)
B <- rbinom(100000, 1, .2)

# Simulate 100,000 flips of coin C (70% chance of heads)
C <- rbinom(100000, 1, .7)

# Estimate the probability A, B, and C are all heads
mean(A & B & C)

## [1] 0.05526

Great job! That was much easier than using a nasty formula.

Probability of A or B

Simulating probability of A or B

In the last multiple choice exercise you found that there was a 64% chance that either coin A (60% chance) or coin B (10% chance) would come up heads. Now you’ll confirm that answer using simulation.

# Simulate 100,000 flips of a coin with a 60% chance of heads
A <- rbinom(100000, 1, .6)

# Simulate 100,000 flips of a coin with a 10% chance of heads
B <- rbinom(100000, 1, .1)

# Estimate the probability either A or B is heads
mean(A | B)

## [1] 0.63984

Probability either variable is less than or equal to 4

Suppose X is a random Binom(10, .6) variable (10 flips of a coin with 60% chance of heads) and Y is a random Binom(10, .7) variable (10 flips of a coin with a 70% chance of heads), and they are independent.

What is the probability that either of the variables is less than or equal to 4?

# Use rbinom to simulate 100,000 draws from each of X and Y
X <- rbinom(100000, 10, .6)
Y <- rbinom(100000, 10, .7)

# Estimate the probability either X or Y is <= to 4
mean(X <= 4 | Y <= 4)

## [1] 0.20467

# Use pbinom to calculate the probabilities separately
prob_X_less <- pbinom(4, 10, .6)
prob_Y_less <- pbinom(4, 10, .7)

# Combine these to calculate the exact probability either <= 4
prob_X_less + prob_Y_less - prob_X_less * prob_Y_less

## [1] 0.2057164

Awesome! Simulation is a great way of avoiding difficult calculations.

Multiplying random variables

Simulation: Effect of multiplying on expected value

Simulation: Effect of multiplying on variance

Simulating multiplying a random variable

In this exercise you’ll use simulation to confirm the rule you just learned about how multiplying a random variable by a constant effects its expected value.

# Simulate 100,000 draws of a binomial with size 20 and p = .1
X <- rbinom(100000, 20, .1)

# Estimate the expected value of X
mean(X)

## [1] 1.99723

# Estimate the expected value of 5 * X
mean(5 * X)

## [1] 9.98615

Variance of a multiplied random variable

In the last exercise you simulated X from a binomial with size 20 and p = .1 and now you’ll use this same simulation to explore the variance.

# X is simulated from 100,000 draws of a binomial with size 20 and p = .1
X <- rbinom(100000, 20, .1)

# Estimate the variance of X
var(X)

## [1] 1.785386

# Estimate the variance of 5 * X
var(5 * X)

## [1] 44.63466

Stellar work! How does the variance compare to the mean?

Adding two random variables

\(X \sim Binom(10, .5)\) \(Y \sim Binom(100, .2)\)

Simulating adding two binomial variables

In the last multiple choice exercise, you found the expected value of the sum of two binomials. In this problem you’ll use a simulation to confirm your answer.

# Simulate 100,000 draws of X (size 20, p = .3) and Y (size 40, p = .1)
X <- rbinom(100000, 20, .3) 
Y <- rbinom(100000, 40, .1)

# Estimate the expected value of X + Y
mean(X + Y)

## [1] 9.99855

Great job! The fact that you can add the means makes stuff much simpler.

Simulating variance of sum of two binomial variables

In the last multiple choice exercise, you examined the expected value of the sum of two binomials. Here you’ll estimate the variance.

# Simulation from last exercise of 100,000 draws from X and Y
X <- rbinom(100000, 20, .3) 
Y <- rbinom(100000, 40, .1)

# Find the variance of X + Y
var(X + Y)

## [1] 7.773503

# Find the variance of 3 * X + Y
var(3 * X + Y)

## [1] 41.19083

Great simulating! Remember this rule only works when X and Y are independent.

Updating with evidence

| indicates “given that”

Updating with simulation

We see 11 out of 20 flips from a coin that is either fair (50% chance of heads) or biased (75% chance of heads). How likely is it that the coin is fair? Answer this by simulating 50,000 fair coins and 50,000 biased coins. Without doing any math, which do you now think is more likely- that the coin is fair, or that the coin is biased?

# Simulate 50000 cases of flipping 20 coins from fair and from biased
fair <- rbinom(50000, 20, .5)
biased <- rbinom(50000, 20, .75)

# How many fair cases, and how many biased, led to exactly 11 heads?
fair_11 <- sum(fair == 11)
biased_11 <- sum(biased == 11)

# Find the fraction of fair coins that are 11 out of all coins that were 11
fair_11 / (fair_11 + biased_11)

## [1] 0.8588423

Great job! How does this compare with your intuition?

Updating with simulation after 16 heads

We see 16 out of 20 flips from a coin that is either fair (50% chance of heads) or biased (75% chance of heads). How likely is it that the coin is fair?

# Simulate 50000 cases of flipping 20 coins from fair and from biased
fair <- rbinom(50000, 20, .5)
biased <- rbinom(50000, 20, .75)

# How many fair cases, and how many biased, led to exactly 16 heads?
fair_16 <- sum(fair == 16)
biased_16 <- sum(biased == 16)

# Find the fraction of fair coins that are 16 out of all coins that were 16
fair_16 / (fair_16 + biased_16)

## [1] 0.02489927

Great job! How does this compare with your intuition?

Prior probability

Updating with priors

We see 14 out of 20 flips are heads, and start with a 80% chance the coin is fair and a 20% chance it is biased to 75%.

You’ll solve this case with simulation, by starting with a “bucket” of 10,000 coins, where 8,000 are fair and 2,000 are biased, and flipping each of them 20 times.

# Simulate 8000 cases of flipping a fair coin, and 2000 of a biased coin
fair_flips <- rbinom(8000, 20, .5)
biased_flips <- rbinom(2000, 20, .75)

# Find the number of cases from each coin that resulted in 14/20
fair_14 <- sum(fair_flips == 14)
biased_14 <- sum(biased_flips == 14)

# Use these to estimate the posterior probability
fair_14 / (fair_14 + biased_14)

## [1] 0.4913928

Awesome! How did adding a prior into the mix change your outcome?

Updating with three coins

Suppose instead of a coin being either fair or biased, there are three possibilities: that the coin is fair (50% heads), low (25% heads), and high (75% heads). There is a 80% chance it is fair, a 10% chance it is biased low, and a 10% chance it is biased high.

You see 14/20 flips are heads. What is the probability that the coin is fair?

# Simulate 80,000 draws from fair coin, 10,000 from each of high and low coins
flips_fair <- rbinom(80000, 20, .5)
flips_high <- rbinom(10000, 20, .75)
flips_low <- rbinom(10000, 20, .25)

# Compute the number of coins that resulted in 14 heads from each of these piles
fair_14 <- sum(flips_fair == 14)
high_14 <- sum(flips_high == 14)
low_14 <- sum(flips_low == 14)

# Compute the posterior probability that the coin was fair
fair_14 / (fair_14 + low_14 + high_14)

## [1] 0.6366379

Wow! Great work! Adding another coin doesn’t make things too much harder, does it?

Bayes’ theorem

Updating with Bayes theorem

In this chapter, you used simulation to estimate the posterior probability that a coin that resulted in 11 heads out of 20 is fair. Now you’ll calculate it again, this time using the exact probabilities from dbinom(). There is a 50% chance the coin is fair and a 50% chance the coin is biased.

# Use dbinom to calculate the probability of 11/20 heads with fair or biased coin
probability_fair <- dbinom(11, 20, .5)
probability_biased <- dbinom(11, 20, .75)

# Calculate the posterior probability that the coin is fair
probability_fair / (probability_fair + probability_biased)

## [1] 0.8554755

Awesome job! Do you think calculating or simulating the answer is easier? Which makes more sense?

Updating for other outcomes

In the last exercise, you solved for the probability that the coin is fair if it results in 11 heads out of 20 flips, assuming that beforehand there was an equal chance of it being a fair coin or a biased coin. Recall that the code looked something like:

probability_fair <- dbinom(11, 20, .5)
probability_biased <- dbinom(11, 20, .75)
probability_fair / (probability_fair + probability_biased)

Now you’ll find, using the dbinom() approach, the posterior probability if there were two other outcomes.

# Find the probability that a coin resulting in 14/20 is fair
dbinom(14, 20, .5) / (dbinom(14, 20, .5) + dbinom(14, 20, .75))

## [1] 0.179811

# Find the probability that a coin resulting in 18/20 is fair
dbinom(18, 20, .5) / (dbinom(18, 20, .5) + dbinom(18, 20, .75))

## [1] 0.002699252

More updating with priors

Suppose we see 16 heads out of 20 flips, which would normally be strong evidence that the coin is biased. However, suppose we had set a prior probability of a 99% chance that the coin is fair (50% chance of heads), and only a 1% chance that the coin is biased (75% chance of heads).

You’ll solve this exercise by finding the exact answer with dbinom() and Bayes’ theorem. Recall that Bayes’ theorem looks like:

\(Pr(fair|A) = \frac{Pr(A|fair)Pr(fair)}{Pr(A|fair)Pr(fair)+Pr(A|biased)Pr(biased)}\)

# Use dbinom to find the probability of 16/20 from a fair or biased coin
probability_16_fair <- dbinom(16, 20, .5)
probability_16_biased <- dbinom(16, 20, .75)

# Use Bayes' theorem to find the posterior probability that the coin is fair
(probability_16_fair * .99) / (probability_16_fair * .99 + probability_16_biased * .01)

## [1] 0.7068775

Amazing! It seems like your choice of prior can have a pretty big effect on the final answer.

The normal distribution

Normal (Gaussian) distribution has a mean and standard deviation

Simulating from the binomial and the normal

In this exercise you’ll see for yourself whether the normal is a reasonable approximation to the binomial by simulating large samples from the binomial distribution and its normal approximation and comparing their histograms.

#getAnywhere(compare_histograms)
library(ggplot2)
compare_histograms <- function(variable1, variable2, name1 = "Variable 1", name2 = "Variable 2") {
                        x <- data.frame(value = variable1, variable = name1)
                        y <- data.frame(value = variable2, variable = name2)
                        ggplot(rbind(x, y), aes(value)) +
                        geom_histogram() +
                        facet_wrap(~ variable, nrow = 2)
}

# Draw a random sample of 100,000 from the Binomial(1000, .2) distribution
binom_sample <- rbinom(100000, 1000, .2)

# Draw a random sample of 100,000 from the normal approximation
normal_sample <- rnorm(100000, 200, sqrt(160))

# Compare the two distributions with the compare_histograms function
compare_histograms(binom_sample, normal_sample, "Binomial Sample", "Normal Sample")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Great work! How similar do the two histograms look to you?

Comparing the cumulative density of the binomial

If you flip 1000 coins that each have a 20% chance of being heads, what is the probability you would get 190 heads or fewer?

You’ll get similar answers if you solve this with the binomial or its normal approximation. In this exercise, you’ll solve it both ways, using both simulation and exact calculation.

# Simulations from the normal and binomial distributions
binom_sample <- rbinom(100000, 1000, .2)
normal_sample <- rnorm(100000, 200, sqrt(160))

# Use binom_sample to estimate the probability of <= 190 heads
mean(binom_sample <= 190)

## [1] 0.2267

# Use normal_sample to estimate the probability of <= 190 heads
mean(normal_sample <= 190)

## [1] 0.21418

# Calculate the probability of <= 190 heads with pbinom
pbinom(190, 1000, .2)

## [1] 0.2273564

# Calculate the probability of <= 190 heads with pnorm
pnorm(190, 200, sqrt(160))

## [1] 0.2145977

Great job! There are a lot of different ways to go about getting similar answers.

Comparing the distributions of the normal and binomial for low n

When we flip a lot of coins, it looks like the normal distribution is a pretty close approximation. What about when we flip only 10 coins, each still having a 20% chance of coming up heads? Is the normal still a good approximation?

# Draw a random sample of 100,000 from the Binomial(10, .2) distribution
binom_sample <- rbinom(100000, 10, .2)

# Draw a random sample of 100,000 from the normal approximation
normal_sample <- rnorm(100000, 2, sqrt(1.6))

# Compare the two distributions with the compare_histograms function
compare_histograms(binom_sample, normal_sample, "Binomial Sample", "Normal Sample")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Good work! How do the sample size and the probability of success effect the accuracy of the normal approximation?

The Poisson distribution

Flipping many coins, each with low probability

with the Poisson distribution, you do not necessarily care about numbers from populations, you care about the number you observe.

Simulating from a Poisson and a binomial

If we were flipping 100,000 coins that each have a .2% chance of coming up heads, you could use a Poisson(2) distribution to approximate it. Let’s check that through simulation.

# Draw a random sample of 100,000 from the Binomial(1000, .002) distribution
binom_sample <- rbinom(100000, 1000, .002)

# Draw a random sample of 100,000 from the Poisson approximation
poisson_sample <- rpois(100000, 2)

# Compare the two distributions with the compare_histograms function
compare_histograms(binom_sample, poisson_sample, "Binomial Sample", "Poisson Sample")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Fantastic! It’s interesting how the binomial distribution is related to so many others.

Density of the Poisson distribution

In this exercise you’ll find the probability that a Poisson random variable will be equal to zero by simulating and using the dpois() function, which gives an exact answer.

# Simulate 100,000 draws from Poisson(2)
poisson_sample <- rpois(100000, 2)

# Find the percentage of simulated values that are 0
mean(poisson_sample == 0)

## [1] 0.13302

# Use dpois to find the exact probability that a draw is 0
dpois(0, 2)

## [1] 0.1353353

Sum of two Poisson variables

One of the useful properties of the Poisson distribution is that when you add multiple Poisson distributions together, the result is also a Poisson distribution.

Here you’ll generate two random Poisson variables to test this.

# Simulate 100,000 draws from Poisson(1)
X <- rpois(100000, 1)

# Simulate 100,000 draws from Poisson(2)
Y <- rpois(100000, 2)

# Add X and Y together to create Z
Z <- X + Y

# Use compare_histograms to compare Z to the Poisson(3)
compare_histograms(Z, rpois(100000, 3), "X(\U03BB = 1) + Y(\U03BB = 2) Poisson sample", "\U03BB = 3 Poisson sample")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Awesome! It’s convenient that two Poisson distributions sum to another Poisson distribution.

The geometric distribution

geometric sampling is defined as the number of tails before the first heads

Waiting for first coin flip

You’ll start by simulating a series of coin flips, and “waiting” for the first heads.

# Simulate 100 instances of flipping a 20% coin
flips <- rbinom(100, 1, .2)

# Use which to find the first case of 1 ("heads")
which(flips == 1)[1]

## [1] 5

Wow! What situations can you think of that are good examples of a geometric distribution?

Using replicate() for simulation

Use the replicate() function to simulate 100,000 trials of waiting for the first heads after flipping coins with 20% chance of heads. Plot a histogram of this simulation by calling qplot().

# Existing code for finding the first instance of heads
which(rbinom(100, 1, 0.2) == 1)[1]

## [1] 2

# Replicate this 100,000 times using replicate()
replications <- replicate(100000, which(rbinom(100, 1, 0.2) == 1)[1])

# Histogram the replications with qplot
qplot(replications)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Great work! How does the density of the geometric distribution compare to the other ones you’ve seen?

Simulating from the geometric distribution

In this exercise you’ll compare your replications with the output of rgeom().

# Replications from the last exercise
replications <- replicate(100000, which(rbinom(100, 1, .2) == 1)[1])

# Generate 100,000 draws from the corresponding geometric distribution
geom_sample <- rgeom(100000, .2)

# Compare the two distributions with compare_histograms
compare_histograms(replications, geom_sample, "Replicated 'first draw' binomial sample", "Geometric sample")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Fantastic simulating! And the rgeom() function is definitely easier to type than using rbinom() to simulate a geometric distribution.

Probability of a machine lasting X days

A new machine arrives in a factory. This type of machine is very unreliable: every day, it has a 10% chance of breaking permanently. How long would you expect it to last?

Notice that this is described by the cumulative distribution of the geometric distribution, and therefore the pgeom() function. pgeom(X, .1) would describe the probability that there are X working days before the day it breaks (that is, that it breaks on day X + 1).

# Find the probability the machine breaks on 5th day or earlier
pgeom(4, .1)

## [1] 0.40951

# Find the probability the machine is still working on 20th day
1 - pgeom(19, .1)

## [1] 0.1215767

Great work! Problems like this are one of the reasons the geometric distribution is so useful.

Graphing the probability that a machine still works

If you were a supervisor at the factory with the unreliable machine, you might want to understand how likely the machine is to keep working over time. In this exercise, you’ll plot the probability that the machine is still working across the first 30 days.

# Calculate the probability of machine working on day 1-30
still_working <- 1 - pgeom(0:29, .1)

# Plot the probability for days 1 to 30
qplot(1:30, still_working)

Great work! You’ve learned all the basics you need to start understanding the distributions underlying statistical models!

foundations of probability in R

DataCamp - David Robinson

11/30/2020