Introduction to Statistics

Spring 2018

Success-Failure Trials

Roll a die \(3\) times. Create the probability distribution of the number of sixes.

Let's say,

Success = S = six
Failure = F = not a six

Now, possible number of successes (i.e. getting sixes) out of \(3\) trials: \(k = 0,1,2,3\)
Note: \(k\) is a random variable

Find: \[ P(k = 0) = ? \\ P(k = 1) = ?\\ P(k = 2) = ?\\ P(k = 3) = ? \]

Probability Distribution

Probability distribution of the number of successes \((k)\) in \(3\) independent success/failure trials, each of which is a success with chance \(\frac{1}{6}.\)

\[ \begin{array}{lclr} k & \text{pattern} & \text{chance of pattern} & \text{chance of value} \\ \hline 0 & FFF & (5/6)^3 & (5/6)^3=0.5787 \\ \hline 1 & SFF & (1/6)(5/6)^2 & \\ & FSF & (1/6)(5/6)^2 & \\ & FFS & (1/6)(5/6)^2 & 3(1/6)(5/6)^2 = 0.3472 \\ \hline 2 & SSF & (1/6)^2(5/6) & \\ & SFS & (1/6)^2(5/6) & \\ & FSS & (1/6)^2(5/6) & 3(1/6)^2(5/6) = 0.0694 \\ \hline 3 & SSS & (1/6)^3 & (1/6)^3 = 0.0046 \end{array} \]

Binomial Formula

Suppose the probability of a single trial being a success is \(p\). Then the probability of observing exactly \(k\) successes in \(n\) independent trials is given by

\[ \bbox[yellow,5px] { \color{black}{\binom{n}{k}p^k(1-p)^{n-k}=\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}} } \]

When \(k = 0,\) the chance of no successes (in other words, the chance of \(n\) failures in a row) is

\[ \frac{n!}{0!(n)!}p^0(1-p)^{n} = (1-p)^n \]

Sampling with Replacement

A random number generator draws at random with replacement from the ten digits \(0,1,2,3,4,5,6,7,8,9. \space\) Run the generator 20 times.

Find the chance that \(0\) appears once. \[binomial: n=20, p = 0.1, k = 1: \binom{20}{1}(0.1)^1(0.9)^{19} = 0.2702\]

Find the chance that \(0\) appears at most once. \[binomial: n=20, p = 0.1, k = (0,1): \\ \binom{20}{0}(0.1)^0(0.9)^{20} + \binom{20}{1}(0.1)^1(0.9)^{19} = 0.3917\]

Find the chance that \(0\) appears more than once. \[binomial: n=20, p = 0.1, k = (2,3,...,20): \\ 1-P(k=0,1)= (1- 0.3917) = 0.6083. \]

Simulation

The figure shows the probability disribution of the number of heads out of \(100\) trials. Approximately same distribution can be generated from Binomial formula:

\[\binom{100}{k}_{k=0,...,100}0.5^k(1-0.5)^{100-k}\]

Probability Distributions

Summary

A probability distribution can be presented as a table of all disjoint outcomes and their associated probabilities (as shown in slide 3).

Rules of probability distributions

A probability distribution is a list of the possible outcomes with corresponding probabilities that satisfies three rules:

The outcomes listed must be disjoint
Each probability must be between \(0\) and \(1\).
The probabilities must total \(1\).

Expected Value

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, the expected value in rolling a six-sided die is \(3.5\), because the average of all the numbers that come up in an extremely large number of rolls is close to \(3.5\). The law of large numbers states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity.

The expected value of a discrete random variable is the probability-weighted average of all possible values. In other words, each possible value the random variable can assume is multiplied by its probability of occurring, and the resulting products are summed to produce the expected value. The same principle applies to an absolutely continuous random variable, except that an integral of the variable with respect to its probability density replaces the sum.

The expected value is a key aspect of how one characterizes a probability distribution; it is one type of location parameter. By contrast, the variance is a measure of dispersion of the possible values of the random variable around the expected value. The variance itself is defined in terms of two expectations: it is the expected value of the squared deviation of the variable's value from the variable's expected value.

Expectation

Random Variable
A random process or variable with a numerical outcome.

Expected value of a discrete random variable

If \(X\) takes outcomes \(x_1, x_2,...,x_m\) with probabilities \(p_1, p_2,..., p_m\) the expected value of \(X\) is the sum of each outcome multiplied by its corresponding probability:

\[ \begin{align} E(X) &= \mu_x = x_1 \times p_1 + x_2 \times p_2 +...+x_m \times p_m \\ &= \sum^{m}_{i=0}(x_i \times p_i) \end{align} \]

\(\text{Random Variable } X: \text{the number of spots on one roll of a die}\)

Probability distribution table for \(X\)

\[ \begin{array}{c|c|c|c|c|c|c} value & 1 & 2 & 3 & 4 & 5 & 6\\ \hline \text{Probability} & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \end{array} \] \[ E(X) = 1.(1/6) + 2.(1/6) + 3.(1/6) + 4.(1/6) + 5.(1/6) + 6.(1/6) = 3.5 \]

Variability in Random Variables

Variance and standard deviation of a discrete random variable

If \(X\) takes outcomes \(x_1, x_2,...,x_m\) with probabilities \(p_1, p_2,...,p_m\) and expected value \(\mu_x = E(X),\) then to find the standard deviation of \(X\), we first find the variance and then take its square root.

\[ \begin{align} Var(X)=\sigma^2_x &= (x_1 - \mu_x)^2 \times p_1 + (x_2 - \mu_x)^2 \times p_2 +...+ (x_m - \mu_x)^2 \times p_m \\ &= \sum^m_{i=1}(x_i-\mu_x)^2 \times p_i \\ \\ SD(X) = \sigma_x &= \sqrt{\sum^m_{i=1}(x_i-\mu_x)^2 \times p_i} \\ \\ Var(X)=\sigma^2_x &= [(1 - 3.5)^2 + (2 - 3.5)^2 +...+ (6 - 3.5)^2]\times (1/6) \\ &= 2.92 \\ \\ SD(X) = \sigma_x &= \sqrt{2.92} = 1.71 \end{align} \]

Expected Value of the Binomial

\(X:\) number of successes in \(n\) independent trials with probability \(p\) of success on each trial.

Probability distribution table for \(X:\)

\[ \begin{array}{c|c|c|c|c|c|c} value & 1 & 0 \\ \hline \text{Probability} & p & 1-p \end{array} \]

\[ \begin{align} E(X) &= 1 \times p + 0 \times (1-p) \\ &= p \end{align} \]

Standard Deviation of the Binomial

\(X:\) number of successes in \(n\) independent trials with probability \(p\) of success on each trial.

Probability distribution table for \(X:\)

\[ \begin{array}{c|c|c|c|c|c|c} value & 1 & 0 \\ \hline \text{Probability} & p & 1-p \end{array} \]

\[ \begin{align} E(X) &= p \\ \\ SD(X) &= \sqrt{(1-p)^2 \times p + (0-p)^2 \times (1-p)} \\ &= \sqrt{(1-p)^2 \times p + p^2 \times (1-p)} \\ &= \sqrt{p(1-p)(1-p+p)} \\ &= \sqrt{p(1-p)} \end{align} \]

Linear Transformation of Random Variables

\[ \begin{align} E(aX + b) &= a \times E(X) + b \\ \\ SD(aX + b) &= \vert a \vert \times SD(X) \\ \\ E(aX + bY) &= a \times E(X) + b \times E(Y) \\ \\ Var(aX + bY) &= [a \times SD(X)]^2 + [b \times SD(Y)]^2 \\ \\ SD(aX + bY) &= \sqrt{[a \times SD(X)]^2 + [b \times SD(Y)]^2} \\ \end{align} \]

Properties of Binomial Distribution

In a nutshell

\(X\) has the binomial distribution with parameters \(n\) and \(p\)

\(X\) is the number of successes in \(n\) repeated, independent success-failure trials with probability \(p\) of success on a single trial.

Probability distribution of \(X\):

\[ \binom{n}{k}p^k(1-p)^{n-k}=\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}, \space k = 0,1,2,...,n \]

\[ \begin{align} \text {Expected value of X: } E(X) &= E(X_1)+E(X_2)+...+E(X_n) \\ &= p+p+...+p = np \\ \\ \text {Variance of X: } Var(X) &= p(1-p)+...+p(1-p) = np(1-p) \\ \\ \text {Standard Error of X: } SE(X) &= \sqrt{np(1-p)} \end{align} \]

Calculate \(E(X)\) and \(SE(X)\) for \(100\) tosses

\(X\) has the binomial distribution with parameters \(n=100\) and \(p=0.5\)

\[ \begin{align} P(X=k) &= \binom{100}{k}(0.5)^k(1-0.5)^{100-k}, k = 0,1,2,...,100 \\ \\ E(X) &= 100 \times 0.5 \\ &= 50 \\ \\ SE(X) &= \sqrt{100 \times 0.5 \times 0.5} \\ &= 5 \end{align} \]

The Approximating Normal Curve

Probability Calculation from the Normal Curve

\(X\) has the binomial distribution with parameters \(n=100\) and \(p=0.5\)

What is the probability that \(X\) is between \(45\) and \(55?\)

\[ \begin{align} E(X) &= 50 \\ SE(X) &= 5 \\ \\ z_1 = (44-50)/5 &= -1.1 \\ z_2 = (55 - 50)/5 &= 1.1 \end{align} \] Area under the standard normal curve between \(-1.1\) and \(1.1\) is \(72.87\%.\)

Binomial - Large \(n\) and \(p \ne 0\)

Example:

According to genetic theory, every plant of a particular species has chance \(25\%\) of being red-flowering, independent of all other plants. What is the chance that among \(10,000\) plants of this species, more than \(2,400\) are red-flowering?

Answer:

\[\text{Binomial } n = 10,000, \space p = 0.25, \space k>2400\]

\[ \begin{align} \text{Expected number red-flowering } &= 10,000 \times 0.25 &= 2,500 \\ \\ \text{SE of the number red-flowering} &= \sqrt{10,000 \times 0.25 \times 0.75} &= 43.30 \\ \\ z &= (2400 - 2500)/43.30 &= -2.3 \\ \\ P(z>-2.3) &= 98.93\% \end{align} \]

Normal Approximation

sums of random variables

We have seen that many distributions are approximately normal. The sum and difference of normally distributed variables are also normal.

Example:

Three friends are playing a corporate video game in which they have to compete a puzzle as fast as possible. Assume that the individual times of the \(3\) friends are independent of each other. The individual times of the friends in similar puzzles are approximately normally distributed with the following means and standard deviations.

\[ \begin{array}{c c c} & Mean & SD \\ \text{Friend 1} & 5.6 & 0.11 \\ \text{Friend 2} & 5.8 & 0.13 \\ \text{Friend 3} & 6.1 & 0.12 \end{array} \]

To advance to the next level of the game, the friends' total time must not exceed 17.1 minutes. What is the probability that they will advance to the next level?

Sums of Random Variables

Solution: Because each friend's time is approximately normally distributed, the sum of their times is also approximately normally distributed.

Let the three friends be labeled \(X, Y, Z. \space\) Calculate \(P(X+Y+Z<17.1).\)

\[ \begin{align} \mu_{sum} &= E(X+Y+Z) \\ &= E(X) + E(Y) + E(Z) \\ &= 5.6 + 5.8 + 6.1 = 17.5 \\ \\ \sigma_{sum} &= \sqrt{(SD_x)^2 + (SD_y)^2 + (SD_z)^2} \\ &= \sqrt{(0.11)^2 + (0.13)^2 + (0.12)^2} = 0.208 \\ \\ z &= (17.1 - 17.5)/0.208 = -1.92 \\ &P(z < -1.92) = 2.7\% \end{align} \]

There is a \(2.7\%\) chance that the friends will advance to the next level.

Success-Failure Trials

Probability Distribution

Binomial Formula

Sampling with Replacement

Simulation

Probability Distributions

Summary

Expected Value

Expectation

Variability in Random Variables

Expected Value of the Binomial

Standard Deviation of the Binomial

Linear Transformation of Random Variables

Properties of Binomial Distribution

In a nutshell

Calculate \(E(X)\) and \(SE(X)\) for \(100\) tosses

The Approximating Normal Curve

Probability Calculation from the Normal Curve

Binomial - Large \(n\) and \(p \ne 0\)

Normal Approximation

sums of random variables

Sums of Random Variables

Next Week

Chapter 15: Sampling Distribution Models
Chapter 16: Confidence Interval for Proportions

Success-Failure Trials

Probability Distribution

Binomial Formula

Sampling with Replacement

Simulation

Probability Distributions

Summary

Expected Value

Expectation

Variability in Random Variables

Expected Value of the Binomial

Standard Deviation of the Binomial

Linear Transformation of Random Variables

Properties of Binomial Distribution

In a nutshell

Calculate \(E(X)\) and \(SE(X)\) for \(100\) tosses

The Approximating Normal Curve

Probability Calculation from the Normal Curve

Binomial - Large \(n\) and \(p \ne 0\)

Normal Approximation

sums of random variables

Sums of Random Variables

Next Week

Chapter 15: Sampling Distribution Models Chapter 16: Confidence Interval for Proportions

Chapter 15: Sampling Distribution Models
Chapter 16: Confidence Interval for Proportions