1 Preliminaries

Before our first meeting do the following (all free):

Download and install R https://cran.r-project.org/
Download and install Desktop RStudio https://rstudio.com/products/rstudio/download/ (choose the free version)

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows, and MacOS. RStudio is a desktop environment we will be using to run statistical computations.

Below, example R code is shown in a grey box. The output of code is shown below in a white box.

2 Annoymous IDs

To come up with an anonymous code for each student in this session, suppose I give the following algorithm.

1. Write down the first two letters of your favorite color.
2. Write down how many siblings you have. (as two digits)

For example, if your favorite color is blue and you have 3 siblings, your code would be “BL03.”

Counting:

How many different codes are possible?
How many different codes are likely?

Probability:

What do you think the chance two people selected at random have the same code? Would you bet $10 on two codes being the same?
What do you think the chance two people in your group have the same code? Would you bet $10 on two codes being the same?
What do you think the chance two people in this session have the same code? Would you bet $10 on two codes being the same?

What is the difference between counting and finding probability? (Discuss)

3 Probability

When measuring discrete (whole number) outcomes, the probability of a desired outcome $A$ is written: \[P(A) = \frac{ \text{number of outcomes in which A occurs}}{\text{total number of possible outcomes}}\]

What is the probability that, given two people selected, they have the same code?

If there are $n$ different codes, there are $n$ different codes, so:

the number of ways people could have same code = $n$
the number of ways possible arrangements of codes two people could have = $n*n$

$P(\text{two given people have the same code}) = \frac{n}{n^2} = \frac{1}{n}$.

What is the probability that, given three people selected, they have the same code?

4 Complementary events

Sometimes probabilities are easier to calculate if we look at their complement.

The complement of an event $A$ is the event “$A$ doesn’t happen.” The notation $\bar{A}$ is used for the complement of event $A$. We can compute the probability of the complement using $P(\bar{A}) = 1 - P(E)$. Notice also that complement of $\bar{A}$ the original event $A$, so that $P(A) = 1 - P(\bar{A})$

$P(\text{two people out of the 5 have the same code}) = 1-P(\text{no people have the same code})$.

$P(\text{no people have the same code}) = \frac{\text{number of different possible codes for 5 people}}{\text{total possible codes for 5 people}} = \frac{ n(n-1)(n-2)(n-3)(n-4) }{n^{5}}$

5 Group assignment

If we were to randomly groups again, what is the probability that YOU are placed in the exact same group??

How many different ways are there to place $n=15$ students into 3 groups. (Hint: consider an easier problem.)
1. How many different ways are there to arrange $5$ given students in a row from left to right?
2. How many different ways are there to arrange $5$ of the $15$ students in a row from left to right?
3. Hw many different ways are there to choose $5$ of $15$ students (order does not matter)?
4. How many different ways are there to place $10$ of the $15$ students into two uniquely labeled groups (GROUP1, GROUP2) each with $5$ students?
5. How many different ways are there to place $15$ students into three uniquely labeled groups: GROUP1 (5 students), GROUP2 (5 students), and GROUP3 (5 students).
6. How many different ways are there to place $15$ students into 3 groups, annonymosly labeled.
If we were to randomly assign groups again, what is the probability EVERYONE is placed in the exact same groups?
If we were to randomly groups again, what is the probability that YOU are placed in the exact same group?

6 Counting Definitions

Example 1 - How many different ways are there to line up 5 students?

5*4*3*2*1

## [1] 120

Factorial: $n! = n(n-1)(n-2) ... 3 \cdot 2 \cdot 1$

n = 5
factorial(n)

## [1] 120

Example 2 - How many ways can a four-person executive committee (president, vice-president, secretary, treasurer) be selected from a 16-member board of directors of a non-profit organization?

16*15*14*13

## [1] 43680

Permutations: The number of ways $k$ items may be selected from among $n$ choices (without replacement) when order matters is:

\[\begin{align*} _n P_r =& n(n-1)(n-2) ... (n-k+2)(n-k+1) \\ =& n(n-1)(n-2) ... (n-k+2)(n-k+1)\frac{(n-k)(n-k-1) ... 3 \cdot 2 \cdot 1}{(n-k)(n-k-1) ... 3 \cdot 2 \cdot 1} \\ =& \frac{n!}{(n-k)!} \end{align*}\]

# Example 2
n = 16
k = 4
factorial(n)/factorial(n-k)

## [1] 43680

Example 3 - How many ways can a four-person committee be selected from a 16 members where all committe members have equal positions?

There are $_4 P_{16}=43680$ ways to choose the members where the order matters. If it doesn’t, we overcounted. By how much? For any give 4 member selection, there are $4!$ ways to oder those 4 members, thus we overcounted by a factor of $4!$

# Example 3
43680/factorial(4)

## [1] 1820

Combinations: The number of ways $k$ items may be selected from among $n$ choices (without replacement) when order does NOT matter is: \[_n C_r = \frac{_n P_k}{k!} = \frac{n!}{k!(n-k)!}\] This is also denoted $\binom{n}{k}$ and referred as “n choose k.”

So we could be fancy and calculate it like this:

# Example 3
n = 16
k = 4
factorial(n)/( factorial(k) * factorial(n-k) )

## [1] 1820

Or even this!

# Example 3
choose(16,4)

## [1] 1820

Example 4 - A group of four students is to be chosen from a 35-member class to represent the class on the student council. How many ways can this be done?

7 Independent Events

Two events are indepdent if the outcome of one does not affect the probability of the other. If events A and B are independent, then the probability of both $A$ and $B$ occurring is \[P(A \text{ and } B) = P(A) \cdot P(B)\] where $P(A \text{ and } B)$ is the probability of events $A$ and $B$ both occurring, $P(A)$ is the probability of event A occurring, and $P(B)$ is the probability of event $B$ occurring.

A brewery utilizes two bottling machines, but they do not operate simultaneously. The second machine acts as a backup system to the first machine and operates only when the first breaks down during operating hours. The probability that the first machine breaks down during operating hours is .20. If, in fact, the first breaks down, then the second machine is turned on and has a probability of .30 of breaking down.
1. What is the probability that the brewery’s bottling system is not working during operating hours?
2. The reliability of the bottling process is the probability that the system is working during operating hours. Find the reliability of the bottling process at the brewery.

If A is the event that the first machine is broken and B is the event that the second machine is broken, the probability both are broken is $P(A \text{ and }) = P(A) \cdot P(B)$, if we assume A and B are independent events.

# a. 
.2*.3

## [1] 0.06

The probability that both are working is the complement of this evenSt.

# b. 
1-.2*.3

## [1] 0.94

8 Guessing on a Quiz

Example 5 - A multiple-choice question on an quiz contains 5 questions with four possible answers each. Compute the probability of randomly guessing the answers and getting a 100% on the quiz (all five questions correct).

The total number of possible outcomes = $4*4*4*4*4 = 4^5$.
Only one outcome is correct. So $P(\text{all correct}) = \frac{1}{4^5}$

# Example 5
1/4^5

## [1] 0.0009765625

Fat chance!

Example 6 - On this same quiz, what is the probability of getting 4 questions correct?

If you miss the 1st question, there are 3 ways to do this and get all the remaining correct.
If you miss the 2nd question, there are 3 ways to do this and get all the remaining correct.
etc.

So there are $3+3+3+3+3 = 3*5 = 15$ ways to get all the remaining correct, and thus $P(\text{getting four correct}) = \frac{15}{4^5}$.

# Example 6
15/4^5

## [1] 0.01464844

About 1.46%

What is the probability of getting 3 questions correct?
What is the probability of getting 2 questions correct?
What is the probability of getting 1 questions correct?
What is the probability of getting 0 questions correct? What do you notice?

Two events are mutually exclusive if they cannot happen at the same time, so $P(A \text{ and } B) = 0$. If A and B are mutually exclusive, then \[P(A \text{ or } B) = P(A) + P(B)\]

Example 7 - One the quiz, what is the probability of getting a “B” or higher (at least 4 out of 5 answers correct)?

$P(\text{getting 5 correct or getting 4 correct}) = P(\text{getting 5 correct}) + P(\text{getting 4 correct})$$

# Example 7
1/4^5 + 15/4^5

## [1] 0.015625

About 1.56%.

Two events are mutually exclusive if they cannot happen at the same time, so $P(A \text{ and } B) = 0$. If A and B are mutually exclusive, then \[P(A \text{ or } B) = P(A) + P(B)\]

Example 7 - One the quiz, what is the probability of getting a “B” or higher (at least 4 out of 5 answers correct)?

$P(\text{getting 5 correct or getting 4 correct}) = P(\text{getting 5 correct}) + P(\text{getting 4 correct})$$

# Example 7
1/4^5 + 15/4^5

## [1] 0.015625

About 1.56%.

Which is more likely, passing the quiz with a D or higher (at least 3 out of 5 correct), or getting all 5 wrong?

9 Expected Value

Expected value provides a way of evaluating the value of a decision of multiple outcomes.

Expected Value defined as the average gain or loss of an event if the procedure is repeated many times. We can compute the expected value by multiplying each outcome by the probability of that outcome, then adding up the products.

Example 8 - You purchase a raffle ticket to help out a charity. The raffle ticket costs $5. The charity is selling 2000 tickets. One of them will be drawn and the person holding the ticket will be given a prize worth $4000. Compute the expected value for this raffle.

If your ticket is drawn, you net $4000-$5 = $4950. The probability of this is 1/2000.
If your ticket is not drawn, you net -$5. The probability of this is 1999/2000.

So expected value is:

# Example 8
4859*1/2000 + -5*1999/2000

## [1] -2.568

On average, each person is giving about $2.57 to charity.

A friend offers to play a game, in which you roll 3 standard 6-sided dice. If all the dice roll different values, you give him $1. If any two dice match values, you get $2. What is the expected value of this game? Would you play?
An insurance company estimates the probability of an earthquake in the next year to be 0.0013. The average damage done by an earthquake it estimates to be $60,000. If the company offers earthquake insurance for $100, what is their expected value of the policy?

10 Conditional probability

The probability the event B occurs, given that event A has happened is represented by $P(B|A)$, read “the probability of B given A.”

Contional probabilities can be used to find the probability of joint events, even when they are not independent:

\[P(A \text{ and } B) = P(A|B) \cdot P(B)\]

11 Bayes’ Rule

By the above fact both \[P(A \text{ and } B) = P(A|B) \cdot P(B)\] and \[P(A \text{ and } B) = P(B|A) \cdot P(A)\] By setting these equal we get a way to “invert” conditional probabilities: \[P(A|B) \cdot P(B)=P(B|A) \cdot P(A)\] OR

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]

If we only know $P(A)$ and $P(B|A)$, we can find $P(B)$ because:

if A occurs, the probability of B occuring is $P(B|A) \cdot P(A)$.
if A does not occur, the probability of B occuring is $P(B|\bar{A}) \cdot P(\bar{A})$.

This accounts of all the ways $B$ can occur, so \[P(B) = P(B|A) \cdot P(A) + P(B|\bar{A}) \cdot P(\bar{A})\].

Now, plugging in $P(B)$ gives us Bayes’ Rule!

Bayes’ Rule: Given two events $A$ and $B$,

\[ P(A|B) = \frac{P(B|A) \cdot P(A)} { P(B|A) \cdot P(A) + P(B|\bar{A}) \cdot P(\bar{A})} \]

Example 9 - Suppose a COVID19 has an incidence rate of 0.1% (that is, it afflicts 0.1% of the population). A new test has been devised to detect this disease. The test does not produce false negatives (that is, anyone who has the disease will test positive for it), but the false positive rate is 5% (that is, about 5% of people who take the test will test positive, even though they do not have the disease). Should doctors use this test?

Well, suppose you test negative for the disease. Great, that means you don’t have it! But if you test positive, what is the probability that you actually have the disease?

So we want to know P(disease|positive), but we only know the following:

P(disease) = 0.001
P(no disease) = 0.999
P(positive|disease) = 1
P(negative|disease) = 0
P(positive|no disease) = 0.05
P(negative|no disease) = 0.95

Using Bayes’ Rule (B = positive, A = disease):

\[ P(\text{disease}|\text{postive}) = \frac{P(\text{postive}|\text{disease}) \cdot P(\text{disease})} { P(\text{postive}|\text{disease}) \cdot P(\text{disease}) + P(\text{positive}|\text{no disease}) \cdot P(\text{no disease})} \]

\[ P(\text{disease}|\text{postive}) = \frac{1 \cdot 0.001} { 1 \cdot 0.001 + 0.999 \cdot 0.05} \]

# Example 9
(1*0.001)/(1*0.001+0.999*0.05)

## [1] 0.01962709

This shows that only about 2% of the people who test positive for COVID19 using this test will actually have COVID19!

12 Porfolio Problems

In your subgroup, select a problem you want to work on. Work together toward finding a solution to the provided questions and/or any related questions you find interesting. In a 1-2 page write up, present:

the problem you worked on
a solution and the tools and reasoning you used to arrive at a solution
the significance of the result and how it can contribute toward better decision making

On Friday, each group will share their work, 8-10 minutes to present, followed by 5 minutes for questions.

12.1 Poker Odds

Compute the probability of randomly drawing five cards from a deck and getting:

a. a pair
b. three of a kind
c. four of a kind
d. a full house (three of a kind and a pair)
e. a flush (all the same suit)

After you have answers your group is convinced of your answers, try checking your answers: https://en.wikipedia.org/wiki/Poker_probability

Suppose you have three of a kind? What is the probability that someone else has a higher hand?

12.2 Playing the lottery

In a certain state’s lottery, $64$ balls numbered 1 through $64$ are placed in a machine and six of them are drawn at random. If the six numbers drawn match the numbers that a player had chosen, the player wins jackpot $1,000,000. If they match 5 numbers, then win $1,000. It costs $1 to buy a ticket. Find the expected value.

Over time the jackpot will increase if no one wins. How much would the jackpot have to be for the expected value of playing the lottery to be positive? (In this case, would you still buy a lottery ticket?)

12.3 Shared birthday

Suppose two people meet. What is the probability that they share a birthday?

Suppose 3 people are in a room. What is the probability that there is at least one shared birthday among these 3 people?

Suppose 15 people are in BCSSI session together. What is the probability that there is at least one shared birthday among these 15 people?

Suppose $n$ people are in a room. What is the probability that there is at least one shared birthday among these $n$ people?

How many people would need to be in a room until you would bet $100 on the event that at least two people share a birthday? (Find the expected value of the wager for $n=1,2,3,...,50$.)

12.4 Intruder Detection

An unmanned monitoring system uses high-tech video equipment and microprocessors to detect intruders. A prototype system has been developed and is in use outdoors at a weapons munitions plant. The system is designed to detect intruders with a probability of .90. However, the design engineers expect this probability to vary with weather condition. The system automatically records the weather condition each time an intruder is detected. Based on a series of controlled tests, in which an intruder was released at the plant under various weather conditions, the following information is available: Given the intruder was, in fact, detected by the system, the weather was clear 75% of the time, cloudy 20% of the time, and raining 5% of the time. When the system failed to detect the intruder, 60% of the days were clear, 30% cloudy, and 10% rainy. Use this information to find the probability of detecting an intruder, given rainy weather conditions. (Assume that an intruder has been released at the plant.)

12.5 Correct Diagnosis?

Suppose a certain type of cancer has an incidence rate of 0.5% (that is, it afflicts 0.5% of the population). A new test has been devised to detect this cancer, which is very cheap and easy to administer in comparison to existing tests. The test produces false negatives at a rate of 1.4% (that is, 1.4% of those that have the disease will test negative), and the false positive rate is 1% (that is, about 1% of people who take the test will test positive, even though they do not have the disease). How accurate is this test?

Suppose a randomly selected person takes the test and tests positive. What is the probability that this person actually has the disease?
Suppose a randomly selected person takes the test and tests negative. What is the probability that this person does not have the disease?

Based on this, what recommendations would you make to doctors using this test.

BSSI 2020 - Probability Session Notes

Paul Regier

6/1 - 6/5/2020