Random Variables, Probability

Rasim Muzaffer Musal

Goals

  • Learn basic definitions of: {Event, Sample Space, Random Variable, Difference between Discrete and Continuous random variables. Probability Distributions}

Random Variable Definition/Representation

  • A random variable is a quantified observation from a process whose outcome is not known with uncertainty. \[\begin{align} X=x \end{align}\]

  • The capital letter “X” is the definition of the random variable.

  • The small caps letter “x” is the value it can take.

Examples of Random Variables, Definition of Sample Space

  • Stock Prices, Soccer Game Results, Number of Earthquakes in a year.
  • For instance let X represent Soccer Game Results for a team
    • Outcomes of the game can be Lose, Draw, Win.
    • We can map these outcomes to 0,1,3

Examples of Random Variables, Definition of Sample Space

  • Sample Space is the possible set of outcomes of an experiment/process for which the random variable is defined. Sample space for X can be defined as

\[\begin{align} X=\{0,1,3\} \end{align}\]

  • When we write X=x, we are referring to a case where X is equal to one of these values, for instance X=3.

Events

  • Events are special random variables that can take one of two values.

\[\begin{align} X = \{0,1\} \end{align}\]

  • In a couple of slides we will refer back to events to define a probability distribution.

Probability

  • There are different interpretations of probability.

  • Words such as chance and likelihood can be used colloquially but it does not define anything. In fact in statistics, likelihood refers to something else.

    • Objective interpretation: Relative frequency of the value of a random variable in a large number of trials which obeys laws of probability.

    • Subjective interpretation: Quantification of uncertainty which obeys laws of probability.

Probability: Thinking experiment.

  • Imagine a coin. You toss the coin and you will assign a probability to heads (X=1) and tails (X=0)

  • P(X=1)=0.5 & P(X=0) = 0.5

  • But why? It can not be just because there are 2 outcomes. Would you assign 0.5 as probability to sun rising tomorrow?

  • You either did a mental calculation of having a really large number of coin tosses and calculated the relative frequency (objective) or you used your mental image of the world and what you have experienced in your life about coin tosses (subjective).

Probability: Types

  • Let 1 represent heads, 0 for tails.

  • Marginal: \(P(X=x)\) Probability associated with the value of a single random variable. What is the probability of the first coin toss being heads. P(X=1)

  • Joint: \(P(X_{1}=x_{1},X_{2}=x_{2})\) Probability of having two or more random variables happening together. What is the variable I will toss a coin two times and the first time it lands heads and the second time it lands tails. \(P(X_{1}=1,X_{2}=0)\)

Probability: Types

  • Conditional: \(P(X_{1}=x_{1}|X_{2}=x_{2})\) Probability of one random variable happening given information on the other \(\vert\) is referred to as given. What is the probability of the first coin toss to be heads if I know the second coin toss is tails. \(P(X_{1}=1|X_{2}=0)\)

Probability: Laws

  • Convexity: \(0<P(X=x)<1\)

  • Addition: \(P(X_{1}=x_{1} \cup X_{2}=x_{2} )=\)

    • \(P(X_{1}=x_{1})+P(X_{2}=x_{2})-P(X_{1}=x_{1},X_{2}=x_{2})\)
  • Multiplication: \(P(X_{1}=x_{1} , X_{2}=x_{2} )=\) \[ \begin{split} & P(X_{1}=x_{1}\vert X_{2}=x_2) \times P(X_{2}=x_2) \\ & P(X_{2}=x_{2}\vert X_{1}=x_1) \times P(X_{1}=x_1) \end{split} \]

  • \(P(\Omega)=1\) where \(\Omega\) is sample space of X.

Probability definitions/properties

  • Independence: This concept is about the information that a random variable has on another. If X does not provide information on Y then X is independent of Y and vice versa.

\[\begin{align} P(Y=y|X=x) = P(Y=y) \\ P(X=x|Y=y) = P(X=x) \end{align}\]

  • Mutually Exclusiveness: Events that do not happen together. Ex: Assume a coin toss. You will not observe tails and heads at the same time. \(P(X_{1}=1,X_{1}=0)=0\)

Conditional Independence

  • Conditional independence is similar to independence but only if you know the value of another common random variable/parameter.

Think about the consecutive coin tosses again. \[ \begin{split} & P(X_{1}=1,X_{2}=1|p=0.5)= \\ & P(X_{1}|p=0.5) \times P(X_{2}=1|p=0.5) \end{split} \] where p is the common probability of the outcome of a coin toss being heads.

Probability Distributions

  • Probability distributions are functions that assign uncertainty to random variables’ values which obey the laws of probability. They can come in two types. Discrete or Continuous.

  • If you take any arbitrary set of two values from a random variable, and a middle point can always be found within the sample space of that random variable what you have is a continuous variable. Otherwise you do have a discrete one.

Probability Distributions

  • We already discussed a probability distribution. Bernoulli.

\[ \begin{split} & P(X=x) = p^{x}\times(1-p)^{1-x}\\ & P(X=1) = p^{1}\times(1-p)^{1-1}=p^{1}\times (1-p)^{0}=p\\ & P(X=0) = p^{0}\times(1-p)^{1-0}=p^{1}\times (1-p)^{1}=1-p \end{split} \]

Probability Distributions: Parameter(s)

  • A parameter is a fixed quantity (for our purposes) which you do not know for certain but you can estimate or assume. In named probability distributions these parameters allow you to distinguish between distributions. They are the values that determine the uncertainties within the function itself and has specific meanings. For instance p is the probability of success and (1-p) is the probability of failure.

Probability Distributions: Parameter(s)

Furthermore if X has a Bernoulli distribution. \[ E(X)=p\]
\[ Var(X)=p\times(1-p)\]

Probability Distributions: Binomial

  • The sum of Bernoulli variables are Binomial. If Y represents a Binomial distribution.

\[ P(Y=y) = \binom{n}{y} \times p^{y} \times (1-p)^{n-y} \] - where n is the total number of Bernoulli trials, p is probability of success and y is the total number of successes.

\[ E(Y)=n\times p , Var(Y)=n \times p \times (1-p) \]

Understanding the combinatorial

\[ \binom{n}{y}=\frac{n!}{y!\times(n-y)!} \]

  • This equation calculates in how many ways you can have y successes out of n Bernoulli trials. An example will be illustrated below.

Inspecting Quality

  • Assume you are inspecting 4 widgets. The probability of finding a defective widget is 0.2. If this probability did not change what is the probability that you will find 3 defective items.

  • Y is the sum of 4 bernoulli trials, each leading to either defective \((X_{i}=1)\) or nondefective \((X_{i}=0)\) where \(i\) is 1 \(\ldots\) 4.

\[ P(Y=3 \vert n=4, p =0.2)= \binom{4}{3} \times 0.2^{3} \times (1-0.2)^{4-3} \]

The combinatorial

\[ \binom{4}{3} = \frac{4!}{3!\times (4-3)!}=\frac{4 \times 3 \times 2 \times 1}{3 \times 2 \times 1 \times 1}=4 \] - This simply implies that there are 4 different ways you can choose 3 successes (and one failure by default) out of 4 tries.

  • Recall defective widgets are represented with 1, non-defective 0.

Representing the combinatorial

  • Each row represents a particular way you can have 3 successes out of 4 trials.
Row Widget 1 Widget 2 Widget 3 Widget 4
1 1 1 1 0
2 1 1 0 1
3 1 0 1 1
4 0 1 1 1
  • Each row represents a mutually exclusive set of bernoulli events.

Calculating the joint probability

  • Recall Row 2.
Row Widget 1 Widget 2 Widget 3 Widget 4
2 1 1 0 1
  • This is joint probability and can be represented with notation as \(p(X_{1}=1,X_{2}=1,X_{3}=0,X_{4}=1)\)

  • How do we calculate this joint probability?

    • Assuming independence between the probability of events.

Joint Probability Calculation

\[ \begin{split} & p(X_{1}=1,X_{2}=1,X_{3}=0,X_{4}=1)\\ & p(X_{i}=1)=0.2 ~\&~ (1-p(X_{i}=1))=0.8\\ & p \times p \times (1-p) \times p = \\ & 0.2 \times 0.2 \times 0.8 \times 0.2 \\ & 0.2^{3} \times 0.8 = 0.0064 \end{split} \]

Summarizing and concluding

Row \(X_{1}\) \(X_{2}\) \(X_{3}\) \(X_{4}\) P(Row)
1 1 1 1 0 0.0064
2 1 1 0 1 0.0064
3 1 0 1 1 0.0064
4 0 1 1 1 0.0064
\(\sum_{1}^{4}\) \(4 \times 0.0064\)
  • Therefore probability of observing 3 defectives out of 4 widgets inspected is: \(4 \times 0.0064 = 0.0256\)

Shortcut to R scrips

  • This calculates P(Y=3|n=4,p=0.2)
dbinom(3,4,0.2,log=FALSE)
[1] 0.0256
  • The last parameter is FALSE by default. What about \[P(Y \le 3)= \\ P(Y=0)+P(Y=1)+P(Y=2)+P(Y=3)\]
pbinom(3,4,0.2,log=FALSE)
[1] 0.9984

A bit more involved example

  • The quality control procedure inspects 8 items in lot 1, 10 items from lot 2. If there is at least 2 items that are defective in lot 1 or at least 3 items that are defective in lot 2. You stop the manufacturing process. What is the probability that you will stop the process. Probability of a random item to be defective in lot 1 is 0.3. Probability of a random item to be defective in lot 2 is 0.25.

  • Start by defining the random variables and identify prob.

  • Define the question with notation and the random variables you have defined.

Defining the r.v.s and prob.

  • Let Y represent the number of items in lot A that are defective. \(Y=\{0,\cdots,8\},p_{y}=0.3,n_{y}=8\)

  • Let Z represent the number of items in lot B that are defective. \(Z=\{0,\cdots,10\},p_{z}=0.25,n_{z}=10\)

  • The question redefined with notation and r.v.s \[ \begin{split} & P(Y \ge 2 \cup Z \ge 3) \\ & P(Y \ge 2)+P(Z \ge 3) - P(Y \ge 2, Z \ge 3)\\ \end{split} \]

Finding \(P(Y \ge 2)\)

  • Since the probability of all values of Y has to add up to 1. \[ \begin{split} & P(Y \le 1) + P(Y \ge 2)) =1 \\ & P(Y \ge 2)=1-P(Y \le 1) \end{split} \]

  • Find each component in the equation separately.

Finding \(P(Y \ge 2)\)

  • easy to find \(P(Y \le 1)\) and therefore \(P(Y \ge 2)\)
## P(Y<=1)
pbinom(1,8,0.3,log=FALSE)
[1] 0.2552983
## 1-P(Y<=1) = P(Y>=2)
1-pbinom(1,8,0.3,log=FALSE)
[1] 0.7447017

Finding \(P(Z \ge 3)\)

  • Same reasoning as before

\[ \begin{split} & P(Z \le 2) + P(Z \ge 3)) =1 \\ & P(Z \ge 3)=1-P(Z \le 2) \end{split} \]

## P(Z<=2|n=10,p=0.25)
pbinom(2,10,0.25,log=FALSE)
[1] 0.5255928
## 1-P(Z<=2|n=10,p=0.25) = P(Z>=3|n=10,p=0.25)
1-pbinom(2,10,0.25,log=FALSE)
[1] 0.4744072

Finding \(P(Y \ge 2, Z \ge 3)\)

  • In order to find this probability we need to assume independence between the probability of Y and Z given \(n_y,n_z,p_y,p_z\)

\[ P(Y \ge 2) \times P(Z \ge 3)= 0.7447 \times 0.4744=0.3533\] - Putting it all together

\[ \begin{split} & P(Y \ge 2) &+& & P(Z \ge 3) &-& & P(Y \ge 2 , Z \ge 3)&= \\ & 0.7447 &+& & 0.4744 &-& & 0.3533 &= 0.8658 \end{split} \]

Thinking about the numbers

  • What values do you think were involved to assume these cutoff values that lead to process stoppage?

  • Think short term. Think long term.

  • Think about what you are trying to optimize and why.