Random Variables, Probability

Rasim Muzaffer Musal

Goals

Learn basic definitions of: {Event, Sample Space, Random Variable, Difference between Discrete and Continuous random variables. Probability Distributions}

Random Variable Definition/Representation

A random variable is a quantified observation from a process whose outcome is not known with uncertainty. \[\begin{align} X=x \end{align}\]
The capital letter “X” is the definition of the random variable.
The small caps letter “x” is the value it can take.

Examples of Random Variables, Definition of Sample Space

Stock Prices, Soccer Game Results, Number of Earthquakes in a year.
For instance let X represent Soccer Game Results for a team
- Outcomes of the game can be Lose, Draw, Win.
- We can map these outcomes to 0,1,3

Examples of Random Variables, Definition of Sample Space

Sample Space is the possible set of outcomes of an experiment/process for which the random variable is defined. Sample space for X can be defined as

\[\begin{align} X=\{0,1,3\} \end{align}\]

When we write X=x, we are referring to a case where X is equal to one of these values, for instance X=3.

Events

Events are special random variables that can take one of two values.

\[\begin{align} X = \{0,1\} \end{align}\]

In a couple of slides we will refer back to events to define a probability distribution.

Probability

There are different interpretations of probability.
Words such as chance and likelihood can be used colloquially but it does not define anything. In fact in statistics, likelihood refers to something else.
- Objective interpretation: Relative frequency of the value of a random variable in a large number of trials which obeys laws of probability.
- Subjective interpretation: Quantification of uncertainty which obeys laws of probability.

Probability: Thinking experiment.

Imagine a coin. You toss the coin and you will assign a probability to heads (X=1) and tails (X=0)
P(X=1)=0.5 & P(X=0) = 0.5
But why? It can not be just because there are 2 outcomes. Would you assign 0.5 as probability to sun rising tomorrow?
You either did a mental calculation of having a really large number of coin tosses and calculated the relative frequency (objective) or you used your mental image of the world and what you have experienced in your life about coin tosses (subjective).

Probability: Types

Let 1 represent heads, 0 for tails.
Marginal: \(P(X=x)\) Probability associated with the value of a single random variable. What is the probability of the first coin toss being heads. P(X=1)
Joint: \(P(X_{1}=x_{1},X_{2}=x_{2})\) Probability of having two or more random variables happening together. What is the variable I will toss a coin two times and the first time it lands heads and the second time it lands tails. \(P(X_{1}=1,X_{2}=0)\)

Probability: Types

Conditional: \(P(X_{1}=x_{1}|X_{2}=x_{2})\) Probability of one random variable happening given information on the right side of the sign \(\vert\) is referred to as given. What is the probability of the first coin toss to be heads if I know the second coin toss is tails. \(P(X_{1}=1|X_{2}=0)\)
The definition of conditional probability is \[P(X=x \vert Y=y) = \frac{P(X=x,Y=y)}{P(Y=y)} \]

Calculating Primitive Probabilities

Random Variable definitions
X=1/0 Middle Aged/Not Middle Aged
Y=1/0 Buys product/Does not Buy Product
Contingency table (table of counts)

X=0 X=1

Y=0 22 33

Y=1 44 55

	X=0	X=1
Y=0	22	33
Y=1	44	55

Calculating Primitive Probabilities

A random variable’s marginal probability. \[P(X=0)=\frac{(22+44)}{(22+44+33+55)}=0.43 \]
Joint probability of 2 random variables. \[P(X=0,Y=0)=\frac{(22)}{(22+44+33+55)}=0.14 \]

Calculating Primitive Probabilities

Technically, the numerator and denominator should really be divided by the sum of the of the contingency table (22+44+33+55=154).
This would be done to turn the numerator and the denominator into probabilities before the ratio is evaluated but the sum is common to both so they would cancel out. \(\frac{\frac{22}{154}}{\frac{22+33}{154}}=0.40\)

Calculating Primitive Probabilities

Conditional probability of a random variables given information on the other one by ignoring the common division of numerator and denominator by 154. \[P(X=0 \vert Y=0)=\frac{(22)}{(22+33)}=0.40 \]
All of this is required to understand the laws of probability. These laws are used to define properties of probability.

Probability: Laws

Convexity: \(0<P(X=x)<1\)
Addition: \(P(X_{1}=x_{1} \cup X_{2}=x_{2} )=\)
- \(P(X_{1}=x_{1})+P(X_{2}=x_{2})-P(X_{1}=x_{1},X_{2}=x_{2})\)
Multiplication: \(P(X_{1}=x_{1} , X_{2}=x_{2} )=\) \[ \begin{split} & P(X_{1}=x_{1}\vert X_{2}=x_2) \times P(X_{2}=x_2) \\ & P(X_{2}=x_{2}\vert X_{1}=x_1) \times P(X_{1}=x_1) \end{split} \]
\(P(\Omega)=1\) where \(\Omega\) is sample space of X.

Probability definitions/properties

Independence: This concept is about the information that a random variable has on another. If X does not provide information on Y then X is independent of Y and vice versa.

\[\begin{align} P(Y=y|X=x) = P(Y=y) \\ P(X=x|Y=y) = P(X=x) \end{align}\]

Mutually Exclusiveness: Events that do not happen together. Ex: Assume a coin toss. You will not observe tails and heads at the same time. \(P(X_{1}=1,X_{1}=0)=0\)

Conceptual Examples of Independence

Assume a fair coin. Given that the first coin toss was Heads (Heads==1) what is the probability that the second coin toss will be Tails (Tails==0)? Independence: \[P(X_{2}=0\vert X_{1}=1)=P(X_{2}=0)=0.5\]
This says the result of the coin toss that happened first does not inform us as to what will happen in the next coin toss. This is only because we are assuming a fair coin. Not all coins are fair.

Conceptual Examples of Dependence

Yesterday, at end of closing, stock price S \((S_{t-1})\) was down. What is the probability distribution of its closing price \((S_{t})\) today?

\[P(S_{t}=s_{t} \vert S_{t-1}=s_{t-1}) \ne P(S_{t}=s_{t}) \]

i=If we assume that yesterday provides some information regarding today’s stock price we can think about a possible probabilistic structure.

However formulating the whole distribution might be a challenge for this part of the course so we will focus on the expected value of a probability distribution.

Conceptual Examples of Dependence

You have not yet learned the normal distribution but imagine a symmetric distribution with a mean \((\mu_{S})\)and standard deviation \((\sigma_{S})\).

Conceptual Examples of Dependence

If the previous day’s closing stock price does not effect today’s closing stock price we can choose to model the mean of \(S_{t}\), today’s closing stock price as; \[ \begin{aligned} \mu_{S_{t}}=\mu_{S}+\epsilon_{t}\\ \epsilon_{t} \sim Normal(0,\sigma_{S}) \end{aligned} \]
\(\epsilon_{t}\) is a Normal distribution with mean 0 and \(\sigma_{S}\),
\(\mu_{S}\) is a constant value.
Together all of this leads to constant expected mean for all \(t\).

Conceptual Examples of Dependence

Compare this with the structure below for dependence to become apparent where \(s_{t-1}\) is the observed previous day’s closing price. \[ \begin{aligned} \mu_{S_{t}\vert{s_{t-1}}} = s_{t-1}+\epsilon_{t} \end{aligned} \]
The \(\epsilon_{t}\) can be a Notmal distribution similar to the one above,with mean 0 and \(\sigma_{S}\) what makes the information about \(S_{t}\) dependent on information on \(S_{t-1}\) is that we are adding it as a constant to calculate \(\mu_{S_{t}\vert{s_{t-1}}}\).

Conceptual Examples of Dependence

Imagine a customer relationship database for a hair dye products company with age and amount spent. Assuming that age can indicate the greying and eventual whitening of hair age is not independent of amount spent. It might be worth thinking about how much you are willing to pay for such imperfect (dependent does not mean deterministic) information. However for this information to be worth anything you first need to

Probability definitions/properties

There are several properties of probability to keep in mind.
\(P(X=x,Y=y)=P(Y=y,X=x)\) The order in which you write the joint probability does not matter. The probability that a randomly chosen customer is middle aged (X=x) and has more than 10K in annual disposable income (Y=y) is the same has that a randomly chosen customer has more than 10K in annual disposable income (Y=y) and is middle aged (X=x).

Probability definitions/properties

\(P(X=x \vert Y=y) \ne P(Y=y \vert X=x)\) The order in which you write a conditional probability does matter. The probability that a person who is middle aged will buy your product is not the same as the probability that a person who bought your product is going to be middle aged.
If this is hard to think about, let us give an example with frequency counts that we will convert to probability using relative frequency.

Probability definitions/properties

In order for the demonstration we will need to make up some numbers for a sample of individuals. These numbers need to be joint descriptions.
X=1/0 Middle Aged/Not Middle Aged
Y=1/0 Buys product/Does not Buy Product

X=0 X=1

Y=0 22 33

Y=1 44 55

	X=0	X=1
Y=0	22	33
Y=1	44	55

Probability definitions/properties

Probability of buying the product given middle aged.
\(P(Y=1 \vert X=1)= \frac{55}{(33+55)}=0.625\)
Probability of being middle aged given the product is bought.
\(P(X=1 \vert Y=1)= \frac{55}{(44+55)}=0.555\)

Probability definitions/properties

The marginal probability of a random variable can be calculated from joint probabilities.
To calculate the joint probability of a random variable X, fixed at a particular value x, you can sum the joint probability of X and Y for all the values of y while X is fixed at x.
For a r.v. with K values \(P(X=x)=\sum_{i=1}^{i=K} P(X=x,Y=y_{i})\)

Probability definitions/properties

X=2/1/0 Senior Citizen/Middle Aged/Not Middle Aged
Y=1/0 Buys product/Does not Buy Product

X=0 X=1 X=2

Y=0 22 33 66

Y=1 44 55 77
Probability that a randomly selected individual in the sample is a senior citizen is \(P(X=2)=P(X=2,Y=0)+P(X=2,Y=1)\).

	X=0	X=1	X=2
Y=0	22	33	66
Y=1	44	55	77

Probability definitions/properties

The sum of joint probabilities of a randomly selected individual being a senior citizen and purchasing a product and the probability of a randomly selected individual to bea senior citizen and not purchasing a product.
There are a total of 22+33+44+55+66+77=297 individuals in our sample.

\[ \begin{aligned} & P(Y=2)=&P(X=2,Y=2)&+&P(X=2,Y=1)\\ & P(Y=2)=&\frac{66}{297}&+&\frac{77}{297}= 0.78 \end{aligned} \]

Conditional Independence

Conditional independence is similar to independence but only if you know the value of another common random variable/parameter.

Think about the consecutive coin tosses again. \[ \begin{split} & P(X_{1}=1,X_{2}=1|p=0.5)= \\ & P(X_{1}=1|p=0.5) \times P(X_{2}=1|p=0.5) \end{split} \] where p is the common probability of the outcome of a coin toss being heads.

Poor Person’s (Intellectually) Independence

For intro statistics sometimes we have to cut corners to make presentation simpler. Even though all probabilities are really conditional we will use the following argument.
If two random variables say \(X_{1}\) and \(X_{2}\) are independent of each other. Their joint probability can be found by multiplying each other.

\[ P(X_{1}=x_{1},X_{2}=x_{2})=P(X_{1}=x_{1}) \times P(X_{2}=x_{2}) \]

Independence Example

Let us work with events (special random variables)
What is the probability of observing 3 heads in a row in independent coin tosses. Use H to represent heads and T for tails. \(P(H)=0.5,P(T)=0.5\)
Below, the subscript identify the order of the throw. We would like to calculate the probability of \(H_{1},H_{2},H_{3}\) \[P(H_{1},H_{2},H_{3})=P(H_{1}) \times P(H_{2}) \times P(H_{3})\]

Independence Example

\[P(H_{1},H_{2},H_{3})=P(H_{1}) \times P(H_{2}) \times P(H_{3})\] \[0.5 \times 0.5 \times 0.5=0.125\]

Gatos Curiosos

Maddie asks, does that lead to the idea that the following sequences have all the same probability?
\(P(H_{1},H_{2},H_{3},H_{4},H_{5},H_{6},H_{7},H_{8},H_{9},H_{10},H_{11})\)
\(P(T_{1},T_{2},T_{3},T_{4},T_{5},T_{6},T_{7},T_{8},T_{9},T_{10},T_{11})\)
\(P(H_{1},T_{2},H_{3},H_{4},T_{5},T_{6},H_{7},T_{8},H_{9},T_{10},H_{11})\)

Gatos Curiosos

Yes all those sequences have exactly the same probability of occurring.
However if the question you were interested in was what is the probability that you are going to have 11 heads in 11 coin tosses or 6 heads in 11 coin tosses you will not get the same probability.
Why is that?

Gatos Curiosos

There is a difference between specifying a particular sequence of heads and tails in N throws of coins and asking for the probability of a number of heads in N throw of coins without being worried about the order in which they were observed.
If I ask for y (number of) heads in N coin tosses you will have to think about in how many ways can this happen? Once you get the answer to that question you can multiply it with the probability of a particular sequence of y heads in N coin tosses to get the answer. The combinatorial function \(n \choose y\) will calculate the number of ways.

Inverse Probability

Y=1 if person has a virus 0 otherwise, X=1 if a test returns a positive result and 0 otherwise.
If we have \(P(X=x \vert Y=y)\) can we obtain \(P(Y=y \vert X=x)\)
Perhaps. First let us understand the difference between these conditional probability constructs.

\[ \begin{aligned} P(X=x \vert Y=y)=\frac{P(X=x,Y=y)}{P(Y=y)}; \\ P(Y=y \vert X=x)= \frac{P(Y=y,X=x)}{P(X=x)}; \end{aligned}\]

Inverse Probability

Both of the conditional probabilities have the same numerator (the order of the variables in the joint probability does not matter). The denominator is where the difference is.
If you remember from the properties of the probability you can sum joint probabilities to get the marginal probability.

\[\begin{aligned} P(X=1)=P(X=1,Y=1)+P(X=1,Y=0) \\ P(Y=1)=P(X=1,Y=1)+P(X=0,Y=1) \end{aligned} \] - In general \(P(X=1)=\sum_{i=1}^{i=K}=P(X=1,Y=y_{i})\)

Inverse Probability- Context

A person is injected with virus, so we know the person has the disease. The test to detect this disease is going to return a positive result \(P(X=1 \vert Y=1)\)
A person goes to the hospital, the test returns a positive result, what is the probability that the person has the disease \(P(Y=1 \vert X=1)\).

Inverse Probability

\[ \begin{aligned} & P(Y=1 \vert X=1)=\frac{P(Y=1,X=1)}{P(X=1)}= \\ & \frac{P(Y=1,X=1)}{P(X=1,Y=1)+P(X=1,Y=0)} \\ & \frac{P(X=1 \vert Y=1) \times P(Y=1)}{P(X=1 \vert Y=1) \times P(Y=1)+P(X=1 \vert Y=0) \times P(Y=0)} \\ \end{aligned} \]

\[\text{In General,}P(Y=i \vert X=m) \frac{P(X=m \vert Y=i) \times P(Y=i)}{ \sum_{j=1}^{j=K}P(X=m \vert Y=j) \times P(Y=j)} \]

Numerical Example

Probability of virus being present \(P(Y=1)=0.01, P(Y=0)=1-0.01=0.99\)
\(P(X=1 \vert Y=1)=0.98,P(X=1 \vert Y=0)=0.05\) \[\begin{aligned} & P(Y=1 \vert X=1)=\\ & \frac{P(X=1 \vert Y=1)\times P(Y=1)}{P(X=1 \vert Y=1)\times P(Y=1)+P(X=1 \vert Y=0)\times P(Y=0)} \end{aligned} \]
Plugging in the numbers \[\frac{0.98 \times 0.01}{0.98 \times 0.01 + 0.05 \times 0.99}=\frac{0.0098}{0.0593}=0.165 \]

Gatos Curiosos

Are you surprised at probability of being actually infected given that the test returns a positive result is 16.5\(\%\).
\(P(Y=1 \vert X=1 )=0.98\) what is the driver behind the low percentage of 16.5\(\%\)?
Try to problem by changing the virus prevalence rate to 0.001 and then to 0.1. How do the results change? Why?

Probability Distributions

We need to come up with ways that will assign probabilities to uncertain outcomes without us breaking our minds for these calculations hence the need for named probability distributions.
Probability distributions are functions that assign uncertainty to random variables’ values which obey the laws of probability.
If you take any arbitrary set of two values from a random variable, and a middle point can always be found within the sample space you have a continuous variable. Otherwise you do have a discrete one.

Probability Distributions: Discrete

We already discussed a probability distribution. Bernoulli.

\[ \begin{split} & P(X=x) = p^{x}\times(1-p)^{1-x}\\ & P(X=1) = p^{1}\times(1-p)^{1-1}=p^{1}\times (1-p)^{0}=p\\ & P(X=0) = p^{0}\times(1-p)^{1-0}=p^{1}\times (1-p)^{1}=1-p \end{split} \] - The function \(P(X=x)\) assigns probability to the two values that X can take. If it takes the value 1, \(P(X=1)=p\) and \(P(X=0)=1-p\).

Probability Distributions: Parameter(s)

An important concept for the named probability distributions we use is parameter. A parameter is a fixed quantity (for our purposes) which you do not know for certain but you can estimate or assume. In named probability distributions these parameters allow you to distinguish between distributions. They are the values that determine the uncertainties within the function itself and has specific meanings.

Probability Distributions: Parameter(s)

The letter p is the parameter of the distribution.
If X has a Bernoulli distribution p is the probability of success and (1-p) is the probability of failure.

-Furthermore if X has a Bernoulli distribution. \[ \begin{aligned} E(X)=&p\\ Var(X)=&p\times(1-p)\\ \end{aligned} \]

Probability Distributions: Binomial

The sum of Bernoulli variables are Binomial. If Y represents a Binomial distribution.

\[ P(Y=y) = \binom{n}{y} \times p^{y} \times (1-p)^{n-y} \] - where n is the total number of Bernoulli trials, p is probability of success and y is the total number of successes.

\[ E(Y)=n\times p , Var(Y)=n \times p \times (1-p) \]

Understanding the combinatorial

\[ \binom{n}{y}=\frac{n!}{y!\times(n-y)!} \]

This equation calculates in how many ways you can have y successes out of n Bernoulli trials. An example will be illustrated below.

Inspecting Quality

Assume you are inspecting 4 widgets. The probability of finding a defective widget is 0.2. If this probability did not change what is the probability that you will find 3 defective items.
Y is the sum of 4 bernoulli trials, each leading to either defective \((X_{i}=1)\) or nondefective \((X_{i}=0)\) where \(i\) is 1 \(\ldots\) 4.

\[ P(Y=3 \vert n=4, p =0.2)= \binom{4}{3} \times 0.2^{3} \times (1-0.2)^{4-3} \]

The combinatorial

\[ \binom{4}{3} = \frac{4!}{3!\times (4-3)!}=\frac{4 \times 3 \times 2 \times 1}{3 \times 2 \times 1 \times 1}=4 \] - This simply implies that there are 4 different ways you can choose 3 successes (and one failure by default) out of 4 tries.

Recall defective widgets are represented with 1, non-defective 0.

Representing the combinatorial

Each row represents a particular way you can have 3 successes out of 4 trials.

Row	Widget 1	Widget 2	Widget 3	Widget 4
1	1	1	1	0
2	1	1	0	1
3	1	0	1	1
4	0	1	1	1

Each row represents a mutually exclusive set of bernoulli events.

Calculating the joint probability

Recall Row 2.

Row	Widget 1	Widget 2	Widget 3	Widget 4
2	1	1	0	1

This is joint probability and can be represented with notation as \(p(X_{1}=1,X_{2}=1,X_{3}=0,X_{4}=1)\)
How do we calculate this joint probability?
- Assuming independence between the probability of events.

Joint Probability Calculation

\[ \begin{split} & p(X_{1}=1,X_{2}=1,X_{3}=0,X_{4}=1)\\ & p(X_{i}=1)=0.2 ~\&~ (1-p(X_{i}=1))=0.8\\ & p \times p \times (1-p) \times p = \\ & 0.2 \times 0.2 \times 0.8 \times 0.2 \\ & 0.2^{3} \times 0.8 = 0.0064 \end{split} \]

Summarizing and concluding

Row	\(X_{1}\)	\(X_{2}\)	\(X_{3}\)	\(X_{4}\)	P(Row)
1	1	1	1	0	0.0064
2	1	1	0	1	0.0064
3	1	0	1	1	0.0064
4	0	1	1	1	0.0064
\(\sum_{1}^{4}\)					\(4 \times 0.0064\)

Therefore probability of observing 3 defectives out of 4 widgets inspected is: \(4 \times 0.0064 = 0.0256\)

Question

You are inspecting a set of 5 widgets. If everything is stable in the manufacturing process, the probability of an item being defective is 0.1 We will eventually answer “What is the probability that there will be 2 defective items among the 5 inspected items?”

Question 1

In how many different ways can you have 2 defective items among 5? Among the 5 inspected items, if there are 2 defective items there will be exactly 3 non-defective items.
D is defective N is non-defective. Subscript is the order in which the item is observed. You would need to count each possible scenario:

\[ \begin{aligned} D_{1},D_{2},N_{3},N_{4},N_{5}\\ D_{1},N_{2},D_{3},N_{4},N_{5}\\ \cdots \cdots \cdots \cdots \cdots \\ N_{1},N_{2},N_{3},D_{4},D_{5}\\ \end{aligned} \]

Question 1

We are not crazy therefore we need to use a shortcut to obtain the answer to the combinatorial question.
Reminder: n is the number of trials. y is the number of successes. n-y is the number of failures.
A factorial, symbolized with \(!\), of an integer is the integer multiplied with the integers that are smaller than it until you get to 1.
FYI: 0! is defined as 1.

Question 1

If we wanted to obtain in how many different ways can we have y (2) successes out of n (5) trials \[ \begin{aligned} &\binom{n}{y}=\binom{5}{2}= \\ &\frac{n!}{y!\times{n-y}!}=\frac{5!}{2!\times(5-2)!}= \\ &\frac{5\times4\times3\times2\times1}{2\times1\times3\times2\times1}=10 \end{aligned} \]

Question 2

What is the probability of having a particular sequence of having 2 defective widgets among 5 inspected widgets. Say the following one:

\[ N_{1},N_{2},N_{3},D_{4},D_{5} \]

This question is asking for the calculation of joint probability. Now we really need to apply conditional probability here but we are just going to use the poor person’s independence.

Calculating the joint probability

Technically we should use Latin letters but this should be self explanatory.
Probability of the sequence of \(N_{1},N_{2},N_{3},D_{4},D_{5}\) is \[\begin{aligned} &P(N_{1},N_{2},N_{3},D_{4},D_{5})&=&\\ &P(N_{1})P(N_{2})P(N_{3})P(D_{4})P(D_{5})&=&\\ &0.9\times0.9\times0.9\times0.1\times0.1&=&0.00729 \end{aligned} \]

Calculating P(Y=y|p,n)

So we can simply multiply 10 which is number of ways you can have 2 successes out of 5 trials with 0.00729 which is the probability of having one such way to get 0.0729 which is the probability of having 2 successes out of 5 trials.
This in effect is doing the following operation:

\[ P(N_{1},N_{2},N_{3},D_{4},D_{5})+P(N_{1},N_{2},D_{3},N_{4},D_{5})+\\ \cdots+P(D_{1},D_{2},N_{3},N_{4},N_{5})=0.0729 \]

Calculating P(Y=y|p,n) Equation

\[ \binom{n}{y}p^{y}(1-p)^{(n-y)}=\\ \binom{5}{2}\times0.1^{2}\times(1-0.1)^{(5-2)}=\\ 10\times0.00729=0.0729 \]

Calculating P(Y=y|p,n) via Excel

In calculating \(P(Y=y|p,n)\) the excel function to use is binom.dist. \[ =binom.dist(y,n,p,CUMULATIVE) \]
All we need to do is \[binom.dist(2,5,0.1,FALSE)=\\ P(Y=2\vert p=0.1,n=5)=0.0729\]

Calculating P(Y=y|p,n) via Excel

\[ =binom.dist(y,n,p,CUMULATIVE) \]

y is the number of successes. n is the number of Bernoulli trials. p is probability of success. Cumulative is either TRUE of FALSE. When we use the parameter FALSE we declare we do not want to calculate the cumulative distribution function. When we executed \(binom.dist(2,5,0.1,FALSE)\) we evaluated \(P(Y=2 \vert n=5,p=0.1)\).

Calculating \(P(Y \le y|p,n)\) via Excel

Changing just the last parameter of the excel function to TRUE so that \(binom.dist(2,5,0.1,TRUE)\)
With the change in the last parameter we obtain the probability that you will have at most 2 defective items among the inspected 5.

\[ \begin{aligned} & P(Y \le 2 \vert n=5,p=0.1)=\\ & P(Y=2 \vert n=5,p=0.1)+ \\ & P(Y=1 \vert n=5,p=0.1)+\\ & P(Y=0 \vert n=5,p=0.1) = 0.9914 \end{aligned} \]

Conceptual thinking

How is \(P(Y \le 2 \vert n=5,p=0.1)\) different from \(P(Y = 2 \vert n=5,p=0.1)\)
In the cumulative probability, \(P(Y = 2 \vert n=5,p=0.1)\) is included but so are other probabilities.
If you are to stop a manufacturing process when you have inspected at most 2 defective items among 5 items you have inspected that means you will stop the process if you see not just 2 defective items but even if you observe 1 or 0 defective items!
Of course stopping the process with no defective items makes no sense but that is how I formulated the statement.

Visualization \(P(Y=y) and P(Y \le y)\)

Making it a bit more interesting

You are going to inspect 5 items from lot A and 10 items from lot B. If the process is under control the probability of having a defective item in lot A is 0.1 and it is 0.2 in lot B. You have a quality focus in the organization so you decide to stop the process if you observe at least 1 defective item in lot A and at least 1 defective item in lot B. Find the probability that the process is going to be stopped.
Define your random variables first!

Defining the random variables

A \(\equiv\) Number of defective items in lot A \(A=\{0,1,\ldots5\}\).
B \(\equiv\) Number of defective items in lot B \(B=\{0,1,\ldots10\}\).
We can define these random variables as Binomial distributed because there is a fixed number of trial. The probability of success is known and does not have a reason to change from trial to trial. We will assume that the random variables A and B are independent from each other.

Defining the question with probability notation

\[P(A>0,B>0 \vert n_{A}=5,p_{A}=0.1,n_{B}=10,p_{B}=0.2)=\] \[P(A>0 \vert n_{A}=5 ,p_{A}=0.1 ) \times P(B>0 \vert n_{B}=10 ,p_{B}=0.2) \] The decomposition can only happen because we assume A and B are independent.

We can now calculate \(P(A>0)=P(A=1)+P(A=2)+\cdots+P(A=5)\) then repeat the process for \(P(B>0)\) but nobody got time for that. So instead we need to think laws of probability.

Thinking back to laws of probability

Sum of all probabilities associated with the values of a random variable has to add up to 1. \(P(\Omega)=1\).
In our example this would simply mean \(P(A=0)+P(A=1)\cdots+P(A=5)=1\)
This means to evaluate \(P(A>0)\) I can either as mentioned already, \(P(A=1)+P(A=2)+\cdots +P(A=5)\) or better yet
\(P(A=0)+P(A=1)\cdots+P(A=5)=1\) therefore \(P(A=1)\cdots+P(A=5)=1-P(A=0)\)

Calculating \(P(A>0) and P(B>0)\)

\[P(A>0)=1-P(A=0)=\\ 1-binom.dist(0,5,0.1,FALSE)=\\ 1-0.59=0.41 \] - The last parameter FALSE/TRUE does not matter in this particular case since A can not be less than 0.

\[P(B>0)=1-P(B=0)=\\1-binom.dist(0,10,0.2,FALSE)=\\1-0.11=0.89\]

Result

\[P(A>0,B>0)=(1-P(A=0))\times(1-P(B=0))\\ 0.41 \times 0.89=0.37 \] - There is a \(37\%\) probability that the process is going to be stopped.

Gatos Curiosos

This is a terrible quality control process decision, why is that?
What is a better one?

Example

You are going to inspect 5 items from lot A and 10 items from lot B. If the process is under control the probability of having a defective item in lot A is 0.1 and it is 0.2 in lot B. You have a quality focus in the organization so you decide to stop the process if you observe at least 1 defective item in lot A or at least 1 defective item in lot B. Find the probability that the process is going to be stopped.

Poisson Distribution

So far the Bernoulli and Binomial random variables had an upper limit. Bernoulli can at most be 1 (single trial) which of course leads to Binomial distribution to be at most N. However there are cases where there is no theoretical upper limit to the random variable.
Number of patients arriving to the E.R.
Number of calls to a call center.
Number of eggs a chicken hatches.

Poisson Distribution

To be clear just because something is measured in numbers does not mean it is Poisson distributed.
Poisson p.d.f. just like the other pdfs a figment of our imagination. But they are useful figment that allows us to describe uncertainty under certain conditions. **When applied it is usually defined within rates. Number of customers per hour, number lines of software bugs per 100 lines of code etc…

Poisson Distribution

\[ P(X = x) = \frac{e^{-\lambda} \lambda^x}{x!}, \quad x = 0,1,2,\ldots \] - Assumptions of the Poisson Distribution

Each event occurs (i.e. customer arrival) occurs independently.
\(\lambda\) does not change and is equal to mean and variance of r.v.. Which means the mean and variance is constant.
No simultaneous events. Two customers do not arrive exactly at the same time etc…

Poisson Distribution Properties

The mean and variance is equal to \(\lambda\)
Usually a random variable which is assumed to have a Poisson Distribution will have an associated rate. For instance 3 customers per hour arrives on average to a store. Per 8 hours the expectation is \(3 \times 8=24\) arriving on average to the store.
Fun fact if a random variable has Binomial distribution as number of Bernoulli trials becomes “large” the random variable approaches Poisson distribution.

Poisson Distribution Visualization

\(P(X=2|\lambda=5), P(X \le 2|\lambda=5)\)

Excel functions on this slide.
\(P(X=2|\lambda=5)=poisson.dist(2,5,FALSE)\) \[0.08\]
\(P(X \le 2|\lambda=5)=poisson.dist(2,5,TRUE)\) \[0.13\]

Example

I will hire a new employee if the number of customers that purchase an item is more than 5 within an hour. We can assume that the customers purchase an item with a Poisson distribution. The standard deviation of the customers that purchase an item within an hour is 1.5. What is the probability that I will hire a new employee.
First define your random variable.

Example

X= number of customers that purchase an item within the hour.
Define the question with probability notation
- \(P(X > 5 \vert \lambda=2.25)\) where 2.25 is \(1.5^{2}\) since variance is squared standard deviation.
We can not evaluate every probability for X greater than 5. Nobody has time for that. So instead?

Example

Yes we can evaluate \(P(X \le 5 \vert lambda=2.25)\) and subtract this value from 1.

\[P(X > 5 \vert \lambda=2.25) = 1- P(X \le 5 \vert \lambda=2.25) \] \[1-poisson.dist(5,2.25,TRUE) = 0.03\]

Shortcut to R scrips

This calculates P(Y=3|n=4,p=0.2)

dbinom(3,4,0.2,log=FALSE)

[1] 0.0256

The last parameter is FALSE by default. What about \[P(Y \le 3)= \\ P(Y=0)+P(Y=1)+P(Y=2)+P(Y=3)\]

pbinom(3,4,0.2,log=FALSE)

[1] 0.9984

A bit more involved example

The quality control procedure inspects 8 items in lot 1, 10 items from lot 2. If there is at least 2 items that are defective in lot 1 or at least 3 items that are defective in lot 2. You stop the manufacturing process. What is the probability that you will stop the process. Probability of a random item to be defective in lot 1 is 0.3. Probability of a random item to be defective in lot 2 is 0.25.
Start by defining the random variables and identify prob.
Define the question with notation and the random variables you have defined.

Defining the r.v.s and prob.

Let Y represent the number of items in lot A that are defective. \(Y=\{0,\cdots,8\},p_{y}=0.3,n_{y}=8\)
Let Z represent the number of items in lot B that are defective. \(Z=\{0,\cdots,10\},p_{z}=0.25,n_{z}=10\)
The question redefined with notation and r.v.s \[ \begin{split} & P(Y \ge 2 \cup Z \ge 3) \\ & P(Y \ge 2)+P(Z \ge 3) - P(Y \ge 2, Z \ge 3)\\ \end{split} \]

Finding \(P(Y \ge 2)\)

Since the probability of all values of Y has to add up to 1. \[ \begin{split} & P(Y \le 1) + P(Y \ge 2)) =1 \\ & P(Y \ge 2)=1-P(Y \le 1) \end{split} \]
Find each component in the equation separately.

Finding \(P(Y \ge 2)\)

easy to find \(P(Y \le 1)\) and therefore \(P(Y \ge 2)\)

## P(Y<=1)
pbinom(1,8,0.3,log=FALSE)

[1] 0.2552983

## 1-P(Y<=1) = P(Y>=2)
1-pbinom(1,8,0.3,log=FALSE)

[1] 0.7447017

Finding \(P(Z \ge 3)\)

Same reasoning as before

\[ \begin{split} & P(Z \le 2) + P(Z \ge 3)) =1 \\ & P(Z \ge 3)=1-P(Z \le 2) \end{split} \]

## P(Z<=2|n=10,p=0.25)
pbinom(2,10,0.25,log=FALSE)

[1] 0.5255928

## 1-P(Z<=2|n=10,p=0.25) = P(Z>=3|n=10,p=0.25)
1-pbinom(2,10,0.25,log=FALSE)

[1] 0.4744072

Finding \(P(Y \ge 2, Z \ge 3)\)

In order to find this probability we need to assume independence between the probability of Y and Z given \(n_y,n_z,p_y,p_z\)

\[ P(Y \ge 2) \times P(Z \ge 3)= 0.7447 \times 0.4744=0.3533\] - Putting it all together

\[ \begin{split} & P(Y \ge 2) &+& & P(Z \ge 3) &-& & P(Y \ge 2 , Z \ge 3)&= \\ & 0.7447 &+& & 0.4744 &-& & 0.3533 &= 0.8658 \end{split} \]

Thinking about the numbers

What values do you think were involved to assume these cutoff values that lead to process stoppage?
Think short term. Think long term.
Think about what you are trying to optimize and why.

Continous Distributions: Normal distribution

Off course we have a multitude more of distributions than the ones below.
Discrete random variables (Values it can take):
Bernoulli (0,1),
Binomial (0,1,\(\ldots\),n)
Poisson (0,1,\(\ldots,+\infty\))
Continuous random variable:
Normal Distribution \((-\infty,+\infty)\)

Normal distribution

Normal Distribution and its Properties

The mean (\(\mu\)) divides the distribution into 2 equivalent halves.
50\(\%\) of the data is above (to the right) of the mean and 50\(\%\) is below the mean.
The standard deviation (\(\sigma\)) is an index of how much the values of the random variable deviates around the mean (same interpretation as variance)

\[\sigma_{X}=\sqrt{\frac{\Sigma_{i=1}^{i=N}(x-\mu_{x})^{2}}{N}} \]

Visualizations of Normal Distributions

Normal Distribution Probabilities

f(X=x) gives you the height of the distribution
The height of the distribution is the likelihood/density (NOT PROBABILITY).
The probability of a single value, \(X=x\) is a infinitely small value.
We can not calculate meaningful probabilities for single values.
But we can calculate probabilities for X’s intervals.

Normal Distribution Probabilities

\(P(X>x \vert \mu, \sigma)\)
\(P(X<x \vert \mu, \sigma)\)
\(P(x_{l}<X<x_{u} \vert \mu, \sigma)\)
How are we going to calculate these probabilities?
If we did not know about the properties and relationship between arbitrary normal distributions and what we will refer to as the standard distribution you would have to integrate the f(X=x) under the curve between the values of interest.

Normal Distribution Probabilities

\[ P(X>x \vert \mu, \sigma)= \int_{50}^{\infty} \frac{1}{\sigma*\sqrt(2*\pi)} \times exp^{- \frac{(x - \mu)^2}{2*\sigma^2}}\]

Normal Distribution Probabilities

\[ P(X>x \vert \mu, \sigma)= \int_{50}^{90} \frac{1}{\sigma*\sqrt(2*\pi)} \times exp^{- \frac{(x - \mu)^2}{2*\sigma^2}}\]

Finding Probabilities

If we are not to do integration, how are we to calculate probabilities?
We need to make use of the standard normal distribution.
What is a standard normal distribution?
It is a normal distribution with mean 0 and standard deviation 1.

Why is the standard normal distribution important?

Any value of a random variable with a normal distribution can have its value mapped to the x-axis of the standard normal distribution.
The x-axis values of the standard normal distribution are called z-scores.

\[z=\frac{(x-\mu)}{\sigma} \]

Why does the z-score matter?

Because it signifies the number of standard deviations a particular value of X is away from \(\mu_{X}\). This in turn implies that the probability that is associated with a random variable X can be associated with the random variable Z.
\(P(X<x)=P(Z=z)\) for correct x to z mapping.

\[z=\frac{(x-\mu_{x})}{\sigma_{x}} \]

Remember z scores are the number of standard deviations that x is away from \(\mu_{X}\).

Going back to the example of \(N(70,10^{2})\)

\(X \sim N(70,10)\)
\(P(x_{l}<X<x_{u})\)
\(P(50<X<90)\)

\[z_{90}=\frac{90-70}{10}=2 \] \[z_{50}=\frac{50-70}{10}=-2 \]

And then what?

We can use these numbers in order to calculate the probability we are interested
Given that \(X \ sim N(70,10^{2})\) \[P(z_{50}<Z<z_{90})=P(50<X<90) \] \[P(-2<Z<2)=P(50<X<90) \]
\(P(Z<2)=0.98\)
\(P(Z<-2)=0.02\)

Visualization

Good what else can we do with it?

X and Z have a direct relationship that is easy to interpret.
Prob is associated with a Z score that in turn can allow you to obtain an X value that you are interested in.
A stats professor just did an exam and saw that her students had an average of 60 and standard deviation of 12 (assume these population parameters). She wants to give a B to the top \(20\%\) of her class. What is the cutoff value for a B?

Visualization of the problem

Explanation of the question:

In Excel the function \(norm.s.inv\) returns you a z-score. You feed into it the probability to the left of the point you find. 0.20 is the area to the right of the point you would like to find. Therefore we need to subtract 0.20 from 1 in order to find the area to the left of the point we are interested in.
In R the same exact value is obtained with the function qnorm requiring the exact same parameter for exactly the same reason.

\[\text{Excel }=norm.s.inv(0.8)=\text{R }qnorm(0.8)=0.84 \]

The final answer

What we find using the norm.s.inv or qnorm function is how many standard deviations is the X value away from the mean and in which direction. The positive values indicate above the mean and the negative values indicate below the mean.
For this particular question we know that the B is 0.84 \(\sigma\) above the mean. \(60+0.84\times 12=70.08\)

Information and Definitions:

A machine cuts a widget’s length on average 7 cm with standard deviation 0.01.
A machine cuts a widget’s width on average 6 cm with standard deviation 0.01.
Define a random variable L number of cm of length cut.
Define a random variable W number of cm of width cut.
The engineering specs for the widgets is 6.975 and 7.025 on the length and 5.985 and 6.02 on the width.

Questions:

What is the probability that the widget length is going to fit the specs? P(6.975<L<7.025)=?
What is the probability that the widget width is going to fit the specs? P(5.985<W<6.02)=?
What is the probability that the widget is going to fit the specs? P(6.975<L<7.025,5.985<W<6.02)=?

Context

Service times in a call center has a waiting time of 545 seconds with a standard deviation of 45 seconds (exponential distribution is more appropriate for waiting times but we will not learn this in 2025). Management does not want any call to wait more than 630 seconds.

Questions and answers

If everything is under control what is the probability that there will be a call that will have to wait more than 630 seconds? Define the random variable. Write the question with probabilistic notation.
T = number of seconds a call has to wait.
\(P(T>630 \vert \mu=545, \sigma = 45)\)

Questions and answers

To answer the question we will first transform 630 into a z-score which will tell me how many standard deviations above the mean is 630.

\[\begin{aligned} & z=\frac{(630-545)}{45}=1.89\\ & P(T>630 \vert \mu=545, \sigma=45)=P(Z>1.89) \end{aligned} \]

Both in excel and R we are given \(P(Z \le z)\). We will need to subtract from \(1\) the value we get from \(P(Z \le z)\).

Questions and answers

\(P(Z \le 1.89)=P(T \le 630 \vert \mu=545, \sigma = 45)\)
Excel norm.s.dist(1.89,TRUE)=0.97 ; R pnorm(1.89) = 0.97 \[\begin{aligned} & 1-P(Z \le 1.89)=P(Z>1.89)=\\ & P(T>630 \vert \mu =545,\sigma=45)= 0.03 \end{aligned} \]

Questions and answers

If 10 calls have been received what is the probability that there will be more than 2 calls that have to wait more than 630 seconds? Why is this number different from just calculating the answer to the first question and squaring it?
To answer this question we will have to define a new random variable. W can be defined as the number of calls that wait more than 630 seconds in 10 calls. If we assume that everything is stable, we can model W with a binomial distribution. Recall that we have already identified that the probability that a single individual has to wait more than 630 seconds is 0.03. This is your probability of success.

\[P(W>2 \vert n=10, p =0.03)=P(W=3)+\ldots+P(W=10)=? \]

Questions and answers

\[\begin{aligned} 1= P(W>2 \vert n=10, p =0.03)+P(W \le 2 \vert n=10, p =0.03) \\ P(W>2 \vert n=10, p =0.03) = 1-P(W \le 2 \vert n=10, p =0.03) \end{aligned} \]

We will use R and Excel to calculate \(P(W \le 2)\)

\[\begin{aligned} & Excel =BINOM.DIST(2,10,0.03,TRUE)=\\ & R= pbinom(2,10,0.03)=0.997 \end{aligned}\]

\[\begin{aligned}\text{Therefore } P(W>2)=& 1-P(W \le 2)=\\ & 1-0.997=0.003 \end{aligned}\]

Questions: Context

Information: P(X<=0)=0.05 , P(X<=1)=0.20, P(X<=2)=0.42, P(X<=3)=0.65 P(X<=4)=0.82, P(X<=5)=0.92, P(X<=6)=0.97.

On average a hospital E.R. receives 3 patients per hour. We can assume Poisson distribution arrivals of patients. You will determine the number of beds to keep open in the hospital. Each open bed costs 3,000 dollars per hour to operate whether it is used or not. You want number of beds to be such that you will not have more than 10\(\%\) chance of diverting patients to other hospitals. Each insured patient brings the hospital an expected 5,000 dollars. Each uninsured patient costs the hospital 2,000 dollars. 22\(\%\) of patients are uninsured.

Questions: Hospital

What random variable do you need to define for a question that will answer probability of having to divert a patient? X = Number of patients arriving per hour. Note that number of beds you open up is not a random variable, it is a decision variable.
Write the following question in rotational form. Probability of having a diversion event not being more than 0.1 (Hint: Yes you can answer this question with a Bernoulli distribution but ignore that). \(P(X \ge x)=0.10\)
How many beds should you open so that you will not have more than 10\(\%\) chance of diverting a patient while maximizing profit? 5 beds.

Questions: Hospital

How many beds should you open up to maximize profit disregarding the 10\(\%\) constraint? Each patient arriving can be thought of as a Bernoulli trial, insured or uninsured. Expected profit/cost from a patient regardless of number of beds can be calculated as:
Expected Profits from a patient: \[ \begin{aligned} & p*(Profit)+ (1-p)*Cost= \\ & 0.78*5000+0.22*(-2000)=3,460\\ \end{aligned} \]

Questions: Hospital

Deterministic cost from opening a bed is 3,000 dollars.
Each served patient leads to (3460-3000) 460 dollars in profits.
If you open 5 beds there will be a 15,000 dollars of deterministic cost.
If 5 patients actually arrive the hospital will make 2,300 dollars.
But there is a probability distribution.
We will leave this sort of questions to Operations Management Classes.