The different probability distribution functions all start with one of the following 4 letters:
d \(\rightarrow\) density: Find the probability for a specific value \(P(Y=a)\)
p \(\rightarrow\) Find the probability for the specific value and all values less than it (aka, cumulative probability): \(P(Y \le a)\)
q \(\rightarrow\) quantile: Finds the smallest value of the random variable, \(a\), so that \(P(Y \le a) \ge p\)
Parameters:
# Total number of trials
n = 16
# Probability of success
pi = 0.2
dbinom()
has 3 arguments:
x =
the value of the random variable we want to find
the probability for
size =
the number of trials
prob =
the probability of a single success
# Probabilities of a single value: P(Y = a)
# ?dbinom
# dbinom(x = value, size = # of trials, prob = probability of a success)
dbinom(x = 4, size = n, prob = pi)
## [1] 0.2001111
# We can also find the probabilities of all possible values of Y by giving x = a vector of outcomes
tibble(Y = 0:16,
P_Y = dbinom(x = Y,
size = n,
prob = pi)) |>
# Round the results
round(digits = 4) ->
binom_df
binom_df
## # A tibble: 17 × 2
## Y P_Y
## <dbl> <dbl>
## 1 0 0.0281
## 2 1 0.113
## 3 2 0.211
## 4 3 0.246
## 5 4 0.200
## 6 5 0.120
## 7 6 0.055
## 8 7 0.0197
## 9 8 0.0055
## 10 9 0.0012
## 11 10 0.0002
## 12 11 0
## 13 12 0
## 14 13 0
## 15 14 0
## 16 15 0
## 17 16 0
Let’s use ggplot()
to create a graphical representation
of the binomial distribution with varying parameter choices
# Start by creating the data set
tibble(
# Outcome Range
Y = 0:n,
# P(Y = y)
P_Y = dbinom(x = Y, size = n, prob = pi)
) |>
# Graph of probabilities
ggplot(
mapping = aes(x = Y, y = P_Y)
) +
# Using geom_col() to create the bars
geom_col(
fill = "steelblue",
color = "black"
) +
# Adding the line for the mean of the binomial:
geom_vline(
mapping = aes(xintercept = sum(Y * P_Y)),
color = "darkorange",
linewidth = 1
) +
labs(
x = "a",
y = "P(Y = a)",
title = paste0("Y ~ Bin(", n, ", ", pi, ")")
) +
# Changing the theme
theme_bw() +
theme(plot.title = element_text(hjust = 0.5, size = 16)) +
# Changing the tick marks on the x-axis
scale_x_continuous(
breaks = 0:n,
minor_breaks = NULL
) +
# Having the bars sit on the x-axis
scale_y_continuous(expand = c(0, 0, 0.05, 0))
pbinom
If you want to find the cumulative probability, \(P(Y <= a)\), you can use
pbinom()
, which has the same arguments as
dbinom
with one change:
x =
is replaced by q =
## P(Y <= 4)
# pbinom(q = 4, size = n, prob = pi)
# Adding all the cumulative probabilities to binom_df
binom_df <-
binom_df |>
mutate(
cumulative_prob = pbinom(q = Y, size = n, prob = pi)
) |>
round(digits = 4)
binom_df
## # A tibble: 17 × 3
## Y P_Y cumulative_prob
## <dbl> <dbl> <dbl>
## 1 0 0.0281 0.0281
## 2 1 0.113 0.141
## 3 2 0.211 0.352
## 4 3 0.246 0.598
## 5 4 0.200 0.798
## 6 5 0.120 0.918
## 7 6 0.055 0.973
## 8 7 0.0197 0.993
## 9 8 0.0055 0.998
## 10 9 0.0012 1.00
## 11 10 0.0002 1
## 12 11 0 1
## 13 12 0 1
## 14 13 0 1
## 15 14 0 1
## 16 15 0 1
## 17 16 0 1
pbinom()
also has an additional optional argument:
lower = F
will let you find the probability of greater
than \(a\): \(P(Y > a)\)# P(Y > a)
pbinom(q = 4, size = n, prob = pi, lower = F)
## [1] 0.2017546
# We can check to see if the probability of pbinom(lower = T) and pbinom(lower = F) add up to 1
binom_df |>
mutate(
ge_prob = pbinom(q = Y, size = n, prob = pi, lower = F),
total_prob = cumulative_prob + ge_prob
) |>
round(digits = 4)
## # A tibble: 17 × 5
## Y P_Y cumulative_prob ge_prob total_prob
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0.0281 0.0281 0.972 1
## 2 1 0.113 0.141 0.859 1
## 3 2 0.211 0.352 0.648 1
## 4 3 0.246 0.598 0.402 1
## 5 4 0.200 0.798 0.202 1
## 6 5 0.120 0.918 0.0817 1
## 7 6 0.055 0.973 0.0267 1
## 8 7 0.0197 0.993 0.007 1
## 9 8 0.0055 0.998 0.0015 1
## 10 9 0.0012 1.00 0.0002 1
## 11 10 0.0002 1 0 1
## 12 11 0 1 0 1
## 13 12 0 1 0 1
## 14 13 0 1 0 1
## 15 14 0 1 0 1
## 16 15 0 1 0 1
## 17 16 0 1 0 1
Now let’s create a graph showing the cumulative probabilities of a binomial:
# Start by creating the data set
tibble(
Y = 0:n,
P_Y = pbinom(q = Y, size = n, prob = pi)) |>
ggplot(
mapping = aes(x = Y, y = P_Y)
) +
# Using geom_col() to create the bars
geom_col(
fill = "steelblue",
color = "black"
) +
labs(
x = "a",
y = "P(Y <= a)",
title = paste0("Y ~ Bin(", n, ", ", pi, ")")
) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5, size = 16)) +
scale_x_continuous(
breaks = 0:n,
minor_breaks = NULL
) +
scale_y_continuous(expand = c(0, 0, 0.05, 0))
qbinom()
What do we do if we want to find the quantile, \(a\), associated with a probability? Aka, what value of \(a\) has a 3% probability?
You can use qbinom()
! It has the same arguments as
pbinom()
but replaces q =
(quantile) with
p =
(probability).
Since we are working with discrete random variables, often there
isn’t a \(a\) value that has the exact
probability that is given to p =
. Instead, what
qbinom()
and other q
functions do is find the
lowest \(a\) such that \(P(Y \le a) > p\)
binom_df
## # A tibble: 17 × 3
## Y P_Y cumulative_prob
## <dbl> <dbl> <dbl>
## 1 0 0.0281 0.0281
## 2 1 0.113 0.141
## 3 2 0.211 0.352
## 4 3 0.246 0.598
## 5 4 0.200 0.798
## 6 5 0.120 0.918
## 7 6 0.055 0.973
## 8 7 0.0197 0.993
## 9 8 0.0055 0.998
## 10 9 0.0012 1.00
## 11 10 0.0002 1
## 12 11 0 1
## 13 12 0 1
## 14 13 0 1
## 15 14 0 1
## 16 15 0 1
## 17 16 0 1
# If p = 0.02, should return 0 since P(Y <= 0) = 0.028
qbinom(p = 0.02, size = n, prob = pi)
## [1] 0
# If p = 0.03, should return 1 since P(Y <= 1) = 0.1126
qbinom(p = 0.03, size = n, prob = pi)
## [1] 1
# What should p = 0.75 return?
# You can also give p a vector to find the corresponding values:
tibble(
cumul_probs = (0:20)/20, # Vector of 5% probability increments
`Y <= a` = qbinom(p = cumul_probs, size = n, prob = pi)
)
## # A tibble: 21 × 2
## cumul_probs `Y <= a`
## <dbl> <dbl>
## 1 0 0
## 2 0.05 1
## 3 0.1 1
## 4 0.15 2
## 5 0.2 2
## 6 0.25 2
## 7 0.3 2
## 8 0.35 2
## 9 0.4 3
## 10 0.45 3
## # ℹ 11 more rows
rbinom()
rbinom()
has 3 arguments
n =
the number of random binomials to be
generated
size =
the number of trials
prob =
the probability of an individual
success
# Generating 10 random binomial outcomes for Y ~ Bin(16, 0.2)
rbinom(n = 10, size = n, prob = pi)
## [1] 2 4 2 3 2 2 4 1 5 1