The different probability distribution functions all start with one of the following 4 letters:
d \(\rightarrow\) density: Find the probability for a specific value \(P(Y=a)\)
p \(\rightarrow\) Find the probability for the specific value and all values less than it (aka, cumulative probability): \(P(Y \le a)\)
q \(\rightarrow\) quantile: Finds the smallest value of the random variable, \(a\), so that \(P(Y \le a) \ge p\)
Unlike the binomial distribution, there are just 2 functions:
dbinom =
\(P(Y_1 = a, Y_2
= b, ...)\)
rmultinom
for generating a random sample of
multinomial data
dmultinom()
dmultinom()
has 2 arguments
x =
which has to be a vector with length equal to the
number of different outcomesx
needs
to be a vector with 3 numbers:
prob =
the vector of probabilities that an observation
falls into that categoryIt has an size =
argument, but it will calculate \(N\) to be the sum of the elements in
x
so you shouldn’t use it!
Let’s say from a random sample of 20, there were 4 bad, 11 moderate, and 5 good
# Done correctly since prob sums to 1:
dmultinom(x = c( 4, 11, 5),
prob = c(0.20, 0.50, 0.30))
## [1] 0.04017656
# Done incorrectly: prob = c(0.2, 0.4, 0.3)
dmultinom(x = c( 4, 11, 5),
prob = c(0.20, 0.40, 0.30))
## [1] 0.02838653
If the prob
vector doesn’t sum to 1, it will “normalize”
the vector (aka, force it to sum to 1):
\[\left\{ \frac{p_1}{p_1 + p_2 + p_3}, \frac{p_2}{p_1 + p_2 + p_3}, \frac{p_3}{p_1 + p_2 + p_3} \right\}\]
rmultinom()
has 3 arguments
n =
: Scalar - how many random vectors to
create
size =
: Scalar - the total number of trials
prob =
: Vector - the vector of probabilities for
each outcome
Let’s create 10 random vectors with 20 trials each for 4 groups of equal probability \(\pi_i = 0.25\)
rmultinom(n = 10,size = 20, prob = rep(x = 0.25, times = 4))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 4 4 5 8 1 7 5 3 4 5
## [2,] 6 4 6 6 5 5 4 7 4 8
## [3,] 6 4 5 5 5 4 4 7 5 6
## [4,] 4 8 4 1 9 4 7 3 7 1
# by default, it creates a column for each random multinomial result
# If you want the rows named to their corresponding groups (say g1 to g4), you can do so in prob =
rmultinom(n = 10,
size = 20,
prob = c(g1 = 0.25,
g2 = 0.25,
g3 = 0.25,
g4 = 0.25))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## g1 3 6 7 4 5 5 7 8 5 4
## g2 6 4 3 5 3 6 4 4 3 5
## g3 7 5 7 5 6 6 6 5 5 7
## g4 4 5 3 6 6 3 3 3 7 4
# It's often more convenient to have each new sample in a row rather than a column. We can use t() to transpose the results:
rmultinom(n = 10,
size = 20,
prob = c(g1 = 0.25,
g2 = 0.25,
g3 = 0.25,
g4 = 0.25)) |>
t() |>
data.frame() |>
mutate(trial = row_number(),
.before = g1)
## trial g1 g2 g3 g4
## 1 1 4 6 3 7
## 2 2 7 4 4 5
## 3 3 2 7 7 4
## 4 4 5 2 8 5
## 5 5 4 6 6 4
## 6 6 6 6 4 4
## 7 7 6 6 4 4
## 8 8 2 6 10 2
## 9 9 3 7 5 5
## 10 10 3 6 6 5