π Discrete Probability
π Discrete probability
\[\mbox{Pr}(A) = \mbox{probability of event A}\] - An event is defined as an outcome that can occur when when something happens by chance. - We can determine probabilities related to discrete variables (picking a red bead, choosing 48 Democrats and 52 Republicans from 100 likely voters) and continuous variables (height over 6 feet).
sample() function draws random outcomes from a set of options.replicate() function repeats lines of code a set number of times. It is used with sample() and similar functions to run Monte Carlo simulations.beads <- rep(c("red", "blue"), times = c(2,3)) # Create an urn with 2 red, 3 blue
beads # View beads object## [1] "red" "red" "blue" "blue" "blue"
## [1] "blue"
B <- 10000 # Number of times to draw 1 bead
events <- replicate(B, sample(beads, 1)) # Draw 1 bead, B times
tab <- table(events) # Make a table of outcome counts
tab # View count table## events
## blue red
## 6061 3939
## events
## blue red
## 0.6061 0.3939
set.seed() functionBefore we continue, we will briefly explain the following important line of code:
Throughout this book, we use random number generators. This implies that many of the results presented can actually change by chance, which then suggests that a frozen version of the book may show a different result than what you obtain when you try to code as shown in the book. This is actually fine since the results are random and change from time to time. However, if you want to to ensure that results are exactly the same every time you run them, you can set Rβs random number generation seed to a specific number. Above we set it to 1986. We want to avoid using the same seed every time. A popular way to pick the seed is the year - month - day. For example, we picked 1986 on December 20, 2018: 2018 β 12 β 20 = 1986.
You can learn more about setting the seed by looking at the documentation:
In the exercises, we may ask you to set the seed to assure that the results you obtain are exactly what we expect them to be.
R was recently updated to version 3.6 in early 2019. In this update, the default method for setting the seed changed. This means that exercises, videos, textbook excerpts and other code you encounter online may yield a different result based on your version of R.
If you are running R 3.6, you can revert to the original seed setting behavior by adding the argument sample.kind="Rounding". For example:
## Warning in set.seed(1, sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
Using the sample.kind="Rounding" argument will generate a message:
non-uniform βRoundingβ sampler used
This is not a warning or a cause for alarm - itβs a confirmation that R is using the alternate seed generation method, and you should expect to receive this message in your console.
If you use R 3.6, you should always use the second form of set.seed() in this course series (outside of DataCamp assignments). Failure to do so may result in an otherwise correct answer being rejected by the grader. In most cases where a seed is required, you will be reminded of this fact.
mean() functionIn R, applying the mean() function to a logical vector returns the proportion of elements that are TRUE. It is very common to use the mean() function in this way to calculate probabilities and we will do so throughout the course.
Suppose you have the vector beads from a previous video:
## [1] "red" "red" "blue" "blue" "blue"
To find the probability of drawing a blue bead at random, you can run:
## [1] 0.6
This code is broken down into steps inside R. First, R evaluates the logical statement beads == "blue", which generates the vector:
FALSE FALSE TRUE TRUE TRUE
When the mean function is applied, R coerces the logical values to numeric values, changing TRUE to 1 and FALSE to 0:
0 0 1 1 1
The mean of the zeros and ones thus gives the proportion of TRUE values. As we have learned and will continue to see, probabilities are directly related to the proportion of events that satisfy a requirement.
π Independence
π Independence, Conditional probability and the Multiplication rule
Conditional probabilities compute the probability that an event occurs given information about dependent events. For example, the probability of drawing a second king given that the first draw is a king is:
\[\mbox{Pr(Card 2 is a king}\mid \mbox{Card 1 is a king)} = 3/51\]
If two events \(A\) and \(B\) are independent, \[Pr(Aβ£B)=Pr(A)\].
To determine the probability of multiple events occurring, we use the multiplication rule.
The multiplication rule for independent events is:
\[\mbox{Pr}(A \mbox{ and }B \mbox{ and }C) = \mbox{Pr}(A) \times \mbox{Pr}(B) \times \mbox{Pr}(C)\]
The multiplication rule for dependent events considers the conditional probability of both events occurring:
\[\mbox{Pr}(A \mbox{ and }B) = \mbox{Pr}(A)\times\mbox{Pr}(B \mid A)\]
We can expand the multiplication rule for dependent events to more than 2 events:
\[\mbox{Pr}(A \mbox{ and }B \mbox{ and }C) = \mbox{Pr}(A) \times \mbox{Pr}(B \mid A) \times \mbox{Pr}(C \mid A \mbox{ and } B)\]