Initial Description: Four Distribution Letters

The different probability distribution functions all start with one of the following 4 letters:

  1. d \(\rightarrow\) density: Find the probability for a specific value \(P(Y=a)\)

  2. p \(\rightarrow\) Find the probability for the specific value and all values less than it (aka, cumulative probability): \(P(Y \le a)\)

  3. q \(\rightarrow\) quantile: Finds the smallest value of the random variable, \(a\), so that \(P(Y \le a) \ge p\)

  1. r \(\rightarrow\) generate a value of the random variable Y given the parameters

Multinomial Distribution - The distributions for non-binary, categorical outcomes

Unlike the binomial distribution, there are just 2 functions:

  1. dbinom = \(P(Y_1 = a, Y_2 = b, ...)\)

  2. rmultinom for generating a random sample of multinomial data

1) dmultinom()

dmultinom() has 2 arguments

  1. x = which has to be a vector with length equal to the number of different outcomes
  • If it is a trinomial (ie poor/moderate/good), x needs to be a vector with 3 numbers:
    1. the number of poor results in the sample
    2. the number of moderate results in the sample
    3. the number of good results in the sample
  1. prob = the vector of probabilities that an observation falls into that category
  • trinomial with \(\{p_1, p_2, p_3\} = \{0.2, 0.5, 0.3\}\)
  • Make sure that the vector of probabilities sums to 1, or R will force it to sum to 1!

It has an size = argument, but it will calculate \(N\) to be the sum of the elements in x so you shouldn’t use it!

Let’s say from a random sample of 20, there were 4 bad, 11 moderate, and 5 good

# Done correctly since prob sums to 1:
dmultinom(x = c(      4,   11,    5), 
          prob = c(0.20, 0.50, 0.30))
## [1] 0.04017656
# Done incorrectly: prob = c(0.2, 0.4, 0.3)
dmultinom(x = c(      4,   11,    5), 
          prob = c(0.20, 0.40, 0.30))
## [1] 0.02838653

If the prob vector doesn’t sum to 1, it will “normalize” the vector (aka, force it to sum to 1):

\[\left\{ \frac{p_1}{p_1 + p_2 + p_3}, \frac{p_2}{p_1 + p_2 + p_3}, \frac{p_3}{p_1 + p_2 + p_3} \right\}\]

2) rmultinom

rmultinom() has 3 arguments

  1. n =: Scalar - how many random vectors to create

  2. size =: Scalar - the total number of trials

  3. prob =: Vector - the vector of probabilities for each outcome

Let’s create 10 random vectors with 20 trials each for 4 groups of equal probability \(\pi_i = 0.25\)

rmultinom(n = 10,size = 20, prob = rep(x = 0.25, times = 4))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    4    4    5    8    1    7    5    3    4     5
## [2,]    6    4    6    6    5    5    4    7    4     8
## [3,]    6    4    5    5    5    4    4    7    5     6
## [4,]    4    8    4    1    9    4    7    3    7     1
# by default, it creates a column for each random multinomial result


# If you want the rows named to their corresponding groups (say g1 to g4), you can do so in prob = 
rmultinom(n = 10,
          size = 20,
          prob = c(g1 = 0.25,
                   g2 = 0.25,
                   g3 = 0.25,
                   g4 = 0.25))
##    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## g1    3    6    7    4    5    5    7    8    5     4
## g2    6    4    3    5    3    6    4    4    3     5
## g3    7    5    7    5    6    6    6    5    5     7
## g4    4    5    3    6    6    3    3    3    7     4
# It's often more convenient to have each new sample in a row rather than a column. We can use t() to transpose the results:
rmultinom(n = 10,
          size = 20,
          prob = c(g1 = 0.25,
                   g2 = 0.25,
                   g3 = 0.25,
                   g4 = 0.25)) |> 
  t() |> 
  
  data.frame() |> 
  
  mutate(trial = row_number(),
         .before = g1)
##    trial g1 g2 g3 g4
## 1      1  4  6  3  7
## 2      2  7  4  4  5
## 3      3  2  7  7  4
## 4      4  5  2  8  5
## 5      5  4  6  6  4
## 6      6  6  6  4  4
## 7      7  6  6  4  4
## 8      8  2  6 10  2
## 9      9  3  7  5  5
## 10    10  3  6  6  5