Before our first meeting do the following (all free):
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows, and MacOS. RStudio is a desktop environment we will be using to run statistical computations.
Below, example R code is shown in a grey box. The output of code is shown below in a white box.
To come up with an anonymous code for each student in this session, suppose I give the following algorithm.
1. Write down the first two letters of your favorite color.
2. Write down how many siblings you have. (as two digits)
For example, if your favorite color is blue and you have 3 siblings, your code would be “BL03.”
Counting:
How many different codes are possible?
How many different codes are likely?
Probability:
What do you think the chance two people selected at random have the same code? Would you bet $10 on two codes being the same?
What do you think the chance two people in your group have the same code? Would you bet $10 on two codes being the same?
What do you think the chance two people in this session have the same code? Would you bet $10 on two codes being the same?
What is the difference between counting and finding probability? (Discuss)
When measuring discrete (whole number) outcomes, the probability of a desired outcome \(A\) is written: \[P(A) = \frac{ \text{number of outcomes in which A occurs}}{\text{total number of possible outcomes}}\]
What is the probability that, given two people selected, they have the same code?
If there are \(n\) different codes, there are \(n\) different codes, so:
\(P(\text{two given people have the same code}) = \frac{n}{n^2} = \frac{1}{n}\).
What is the probability that, given three people selected, they have the same code?
Sometimes probabilities are easier to calculate if we look at their complement.
The complement of an event \(A\) is the event “\(A\) doesn’t happen.” The notation \(\bar{A}\) is used for the complement of event \(A\). We can compute the probability of the complement using \(P(\bar{A}) = 1 - P(E)\). Notice also that complement of \(\bar{A}\) the original event \(A\), so that \(P(A) = 1 - P(\bar{A})\)
\(P(\text{two people out of the 5 have the same code}) = 1-P(\text{no people have the same code})\).
\(P(\text{no people have the same code}) = \frac{\text{number of different possible codes for 5 people}}{\text{total possible codes for 5 people}} = \frac{ n(n-1)(n-2)(n-3)(n-4) }{n^{5}}\)
If we were to randomly groups again, what is the probability that YOU are placed in the exact same group??
How many different ways are there to place \(n=15\) students into 3 groups. (Hint: consider an easier problem.)
If we were to randomly groups again, what is the probability that YOU are placed in the exact same group?
Example 1 - How many different ways are there to line up 5 students?
5*4*3*2*1
## [1] 120
Factorial: \(n! = n(n-1)(n-2) ... 3 \cdot 2 \cdot 1\)
n = 5
factorial(n)
## [1] 120
Example 2 - How many ways can a four-person executive committee (president, vice-president, secretary, treasurer) be selected from a 16-member board of directors of a non-profit organization?
16*15*14*13
## [1] 43680
Permutations: The number of ways \(k\) items may be selected from among \(n\) choices (without replacement) when order matters is:
\[\begin{align*} _n P_r =& n(n-1)(n-2) ... (n-k+2)(n-k+1) \\ =& n(n-1)(n-2) ... (n-k+2)(n-k+1)\frac{(n-k)(n-k-1) ... 3 \cdot 2 \cdot 1}{(n-k)(n-k-1) ... 3 \cdot 2 \cdot 1} \\ =& \frac{n!}{(n-k)!} \end{align*}\]
# Example 2
n = 16
k = 4
factorial(n)/factorial(n-k)
## [1] 43680
Example 3 - How many ways can a four-person committee be selected from a 16 members where all committe members have equal positions?
There are \(_4 P_{16}=43680\) ways to choose the members where the order matters. If it doesn’t, we overcounted. By how much? For any give 4 member selection, there are \(4!\) ways to oder those 4 members, thus we overcounted by a factor of \(4!\)
# Example 3
43680/factorial(4)
## [1] 1820
Combinations: The number of ways \(k\) items may be selected from among \(n\) choices (without replacement) when order does NOT matter is: \[_n C_r = \frac{_n P_k}{k!} = \frac{n!}{k!(n-k)!}\] This is also denoted \(\binom{n}{k}\) and referred as “n choose k.”
So we could be fancy and calculate it like this:
# Example 3
n = 16
k = 4
factorial(n)/( factorial(k) * factorial(n-k) )
## [1] 1820
Or even this!
# Example 3
choose(16,4)
## [1] 1820
Example 4 - A group of four students is to be chosen from a 35-member class to represent the class on the student council. How many ways can this be done?
Two events are indepdent if the outcome of one does not affect the probability of the other. If events A and B are independent, then the probability of both \(A\) and \(B\) occurring is \[P(A \text{ and } B) = P(A) \cdot P(B)\] where \(P(A \text{ and } B)\) is the probability of events \(A\) and \(B\) both occurring, \(P(A)\) is the probability of event A occurring, and \(P(B)\) is the probability of event \(B\) occurring.
A brewery utilizes two bottling machines, but they do not operate simultaneously. The second machine acts as a backup system to the first machine and operates only when the first breaks down during operating hours. The probability that the first machine breaks down during operating hours is .20. If, in fact, the first breaks down, then the second machine is turned on and has a probability of .30 of breaking down.
If A is the event that the first machine is broken and B is the event that the second machine is broken, the probability both are broken is \(P(A \text{ and }) = P(A) \cdot P(B)\), if we assume A and B are independent events.
# a.
.2*.3
## [1] 0.06
The probability that both are working is the complement of this evenSt.
# b.
1-.2*.3
## [1] 0.94
Example 5 - A multiple-choice question on an quiz contains 5 questions with four possible answers each. Compute the probability of randomly guessing the answers and getting a 100% on the quiz (all five questions correct).
The total number of possible outcomes = \(4*4*4*4*4 = 4^5\).
Only one outcome is correct. So \(P(\text{all correct}) = \frac{1}{4^5}\)
# Example 5
1/4^5
## [1] 0.0009765625
Fat chance!
Example 6 - On this same quiz, what is the probability of getting 4 questions correct?
So there are \(3+3+3+3+3 = 3*5 = 15\) ways to get all the remaining correct, and thus \(P(\text{getting four correct}) = \frac{15}{4^5}\).
# Example 6
15/4^5
## [1] 0.01464844
About 1.46%
Two events are mutually exclusive if they cannot happen at the same time, so \(P(A \text{ and } B) = 0\). If A and B are mutually exclusive, then \[P(A \text{ or } B) = P(A) + P(B)\]
Example 7 - One the quiz, what is the probability of getting a “B” or higher (at least 4 out of 5 answers correct)?
\(P(\text{getting 5 correct or getting 4 correct}) = P(\text{getting 5 correct}) + P(\text{getting 4 correct})\)$
# Example 7
1/4^5 + 15/4^5
## [1] 0.015625
About 1.56%.
Two events are mutually exclusive if they cannot happen at the same time, so \(P(A \text{ and } B) = 0\). If A and B are mutually exclusive, then \[P(A \text{ or } B) = P(A) + P(B)\]
Example 7 - One the quiz, what is the probability of getting a “B” or higher (at least 4 out of 5 answers correct)?
\(P(\text{getting 5 correct or getting 4 correct}) = P(\text{getting 5 correct}) + P(\text{getting 4 correct})\)$
# Example 7
1/4^5 + 15/4^5
## [1] 0.015625
About 1.56%.
Expected value provides a way of evaluating the value of a decision of multiple outcomes.
Expected Value defined as the average gain or loss of an event if the procedure is repeated many times. We can compute the expected value by multiplying each outcome by the probability of that outcome, then adding up the products.
Example 8 - You purchase a raffle ticket to help out a charity. The raffle ticket costs $5. The charity is selling 2000 tickets. One of them will be drawn and the person holding the ticket will be given a prize worth $4000. Compute the expected value for this raffle.
So expected value is:
# Example 8
4859*1/2000 + -5*1999/2000
## [1] -2.568
On average, each person is giving about $2.57 to charity.
The probability the event B occurs, given that event A has happened is represented by \(P(B|A)\), read “the probability of B given A.”
Contional probabilities can be used to find the probability of joint events, even when they are not independent:
\[P(A \text{ and } B) = P(A|B) \cdot P(B)\]
By the above fact both \[P(A \text{ and } B) = P(A|B) \cdot P(B)\] and \[P(A \text{ and } B) = P(B|A) \cdot P(A)\] By setting these equal we get a way to “invert” conditional probabilities: \[P(A|B) \cdot P(B)=P(B|A) \cdot P(A)\] OR
\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]
If we only know \(P(A)\) and \(P(B|A)\), we can find \(P(B)\) because:
This accounts of all the ways \(B\) can occur, so \[P(B) = P(B|A) \cdot P(A) + P(B|\bar{A}) \cdot P(\bar{A})\].
Now, plugging in \(P(B)\) gives us Bayes’ Rule!
Bayes’ Rule: Given two events \(A\) and \(B\),
\[ P(A|B) = \frac{P(B|A) \cdot P(A)} { P(B|A) \cdot P(A) + P(B|\bar{A}) \cdot P(\bar{A})} \]
Example 9 - Suppose a COVID19 has an incidence rate of 0.1% (that is, it afflicts 0.1% of the population). A new test has been devised to detect this disease. The test does not produce false negatives (that is, anyone who has the disease will test positive for it), but the false positive rate is 5% (that is, about 5% of people who take the test will test positive, even though they do not have the disease). Should doctors use this test?
Well, suppose you test negative for the disease. Great, that means you don’t have it! But if you test positive, what is the probability that you actually have the disease?
So we want to know P(disease|positive), but we only know the following:
Using Bayes’ Rule (B = positive, A = disease):
\[ P(\text{disease}|\text{postive}) = \frac{P(\text{postive}|\text{disease}) \cdot P(\text{disease})} { P(\text{postive}|\text{disease}) \cdot P(\text{disease}) + P(\text{positive}|\text{no disease}) \cdot P(\text{no disease})} \]
\[ P(\text{disease}|\text{postive}) = \frac{1 \cdot 0.001} { 1 \cdot 0.001 + 0.999 \cdot 0.05} \]
# Example 9
(1*0.001)/(1*0.001+0.999*0.05)
## [1] 0.01962709
This shows that only about 2% of the people who test positive for COVID19 using this test will actually have COVID19!
In your subgroup, select a problem you want to work on. Work together toward finding a solution to the provided questions and/or any related questions you find interesting. In a 1-2 page write up, present:
On Friday, each group will share their work, 8-10 minutes to present, followed by 5 minutes for questions.
Compute the probability of randomly drawing five cards from a deck and getting:
a. a pair
b. three of a kind
c. four of a kind
d. a full house (three of a kind and a pair)
e. a flush (all the same suit)
After you have answers your group is convinced of your answers, try checking your answers: https://en.wikipedia.org/wiki/Poker_probability
Suppose you have three of a kind? What is the probability that someone else has a higher hand?
In a certain state’s lottery, \(64\) balls numbered 1 through \(64\) are placed in a machine and six of them are drawn at random. If the six numbers drawn match the numbers that a player had chosen, the player wins jackpot $1,000,000. If they match 5 numbers, then win $1,000. It costs $1 to buy a ticket. Find the expected value.
Over time the jackpot will increase if no one wins. How much would the jackpot have to be for the expected value of playing the lottery to be positive? (In this case, would you still buy a lottery ticket?)
An unmanned monitoring system uses high-tech video equipment and microprocessors to detect intruders. A prototype system has been developed and is in use outdoors at a weapons munitions plant. The system is designed to detect intruders with a probability of .90. However, the design engineers expect this probability to vary with weather condition. The system automatically records the weather condition each time an intruder is detected. Based on a series of controlled tests, in which an intruder was released at the plant under various weather conditions, the following information is available: Given the intruder was, in fact, detected by the system, the weather was clear 75% of the time, cloudy 20% of the time, and raining 5% of the time. When the system failed to detect the intruder, 60% of the days were clear, 30% cloudy, and 10% rainy. Use this information to find the probability of detecting an intruder, given rainy weather conditions. (Assume that an intruder has been released at the plant.)
Suppose a certain type of cancer has an incidence rate of 0.5% (that is, it afflicts 0.5% of the population). A new test has been devised to detect this cancer, which is very cheap and easy to administer in comparison to existing tests. The test produces false negatives at a rate of 1.4% (that is, 1.4% of those that have the disease will test negative), and the false positive rate is 1% (that is, about 1% of people who take the test will test positive, even though they do not have the disease). How accurate is this test?
Based on this, what recommendations would you make to doctors using this test.