Lesson 2: Continuous distributions and the normal distribution

Probability density function: probability that a continuous random variable will fall in the interval $(a,b)$ is represented by the area from $a$ to $b$ under the density function of the random variable

Quick review of Empirical Rule

%	Sigma from Mean
Empirical Rule
68%	1
95%	2
99%	3

R functions

Using probability distributions in R: dnorm, pnorm, qnorm, and rnorm

function	description	main argument
R Functions
dnorm	density; exact value	quantity
pnorm	distribution function; cumulative	quantity
qnorm	quantile function; find a value given probability; transpose pnorm	probabilities

For a given probability we can use the qnorm function in R to find a number such that the area under the standard normal density function to the left of this number is the given probability

Standard normal distribution

Normal distribution is standard if $\mu = 0$ and $\sigma =1$

Normal distribution does not have to be standard

Symmetric around the mean
Mean and standard deviation determine the distribution
Probability that the normal random variable will be equal to a single point is ZERO
The function spreads from $-\infty$ to $\infty$

Standard normal distribution

$\mu =0$
$\sigma =1$

x <- seq(-3,3,length=1000)
y <- dnorm(x,mean=00,sd=1)
plot(x,y)

Fat normal distribution

$\mu =0$
$\sigma =5$

x <- seq(-15,15,length=5000)
y <- dnorm(x,mean=0,sd=5)
plot(x,y)

Normal distribution (axis)

$\mu =10$
$\sigma =5$

#different axis of symmetry
x <- seq(-5,25,length=5000)
y <- dnorm(x,mean=10,sd=5)
plot(x,y)

Standardization; Z-score

Standardized value (Z-score): gives # of standard deviations from mean.

If $X$ follows normal distribution with mean $\mu$ and standard deviation $\sigma$, THEN $Z = \frac{X-\mu}{\sigma}$ follows STANDARD normal distribution (convert normal standard variable into standard normal variable)

$X$ = observed variable that follows normal distribution
$Z$ = number of standard deviations from mean

The following calculates a Z-score (standard score): measure of how many standard deviations below or above the population mean a raw score is. \[ Z=\frac{X-\mu}{\sigma} \]

Z-core 1.96 = 95% proof

Suppose that $Z$ is a standard normal random variable. Find the value $w$ so that $P[-w<X<+w]=0.95$

By symmetry, \[ \begin{align} P[-w<Z<+w]&=0.95\\ P(0<Z<w)&=0.95/2\\ &=0.475\\ P(Z>w)&=P(Z<w)\\ &= 0.5 \end{align} \] Strategy, use qnorm() since we’re given a probability. qnorm outputs the desired value given a cumulative probability. (Think: transpose pnorm)

Add up known cumulative probabilities, such as

0.5 from the entirety of the “negative” values and
0.475 from given probability 0.95 (taking whatever is left of the “positive” values)

Whatever value that comes out of the qnorm will give us the value for that cumulative 0.975. We know that this $w$ works because we used half of the desired 0.95 probability.

\[ \begin{align} P(Z>w)+P(0<Z<w)&=0.5+0.475\\ &=0.975 \end{align} \]

qnorm(0.975,mean=0,sd=1)

## [1] 1.959964

Z-score function

zscore <- function(given_value,population_mean,sd) {
  Z <- (given_value-population_mean)/sd
  return(Z)
}

The original $X$ is $Z$ standard deviations away from the mean of $X$

The Z-score will allow us to understand

NOTATION ($z_a$)

Finding $z_a$ given $P(Z>z_a)=a$; also will allow us to use qnorm()

Let $z_a$ be a number s.t. $P(Z>z_a)=a$, where $Z$ follows standard normal distribution.
The area to the left of $a$ will be $1-a=P(Z<z_a)$, which follows finding $P(X<w)$ (meaning we can just use qnorm) \[ \begin{align} P(Z>z_a)&= a\\ 1-P(Z<z_a)&=a\\ P(Z<z_a)&=1-a\\ \end{align} \]

$P(Z<z_a)$ will allow us to use qnorm

Revisiting binomial distribution

The binomial distribution looks like the normal, at least if $n$ is large.

The mean and the standard deviation of the normal distribution that approximates the binomial distribution will be the same as $\mu$ and $\sigma$ for the binomial distribution: \[ \mu=n\times p\\ \\ \sigma=\sqrt{n\times p \times (1-p)} \]

small $n$

$n=5$
$p=0.7$

y <- 0:5
plot(y,dbinom(y,5,.7),type = "h")

large $n$

y <- 6900:7100
plot(y,dbinom(y,10000,0.7),type="h")

Quick review

Success or fail
independent trials
2 parameters: $n$,$p$ (number of trials,$P(\text{success})$)

input = desired number of success
dbinom(input,n,p) yields $P(\text{input})$

Examples

Basic probabilities

Suppose that $Z$ is a standard normal random variable ($\mu=0$ and $\sigma =1$).

$P(Z>0)$
- standard normal density function is symmetric around zero
- $\mu = 0.5$
$P(0<Z<1)$

pnorm(1,mean=0,sd=1)

## [1] 0.8413447

\[ \begin{align} P(Z<1)&=0.8413447\\ P(Z>0)&=0.5\\ P(0<Z<1)&=P(Z<1)-P(Z>0)\\ &=0.84-0.5\\ &=0.34 \end{align} \]

$P(-1<Z<1)$
- The distribution is symmetric, so it would be twice of $P(0<Z<1)$

\[ \begin{align} P(-1<Z<1)&=2\times P(0<Z<1)\\ &= 0.34\times 2\\ &= 0.68 \end{align} \]

$P(-2<Z<2)$

pnorm(2, mean=0, sd=1)  # P(Z<2)

## [1] 0.9772499

pnorm(-2, mean=0, sd=1) # P(Z<-2)

## [1] 0.02275013

\[ \begin{align} P(Z<2)&=0.9772499\\ P(Z<-2)&=0.02275013\\ &= 0.68\\ P(-2<Z<2)&=P(Z<2)-P(Z<-2)\\ &=0.9772499-0.02275013\\ &=0.9544998\\ &\approx 0.95 \end{align} \]

Notice: Empirical rule’s 95%

$P(−3 < Z < 3)$

pnorm(3,mean=0,sd=1) #P(Z<3)

## [1] 0.9986501

pnorm(-3,mean=0,sd=1) #P(Z<-3)

## [1] 0.001349898

\[ \begin{align} P(−3 < Z < 3)&= P(Z<3) - P(Z<-3)\\ &=0.9986501-0.001349898\\ &=0.9973002 \end{align} \]

Notice: Empirical rule’s 99%

Standardization

Suppose that we observe $W=185\text{ pounds}$ and $W$ follows normal standardization, but not standard normal. How do we interpret this information?
Suppose $\mu=150$ and $\sigma=25$. The standardized value is: \[ \begin{align} Z&=\frac{W-\mu}{\sigma}\\ &=\frac{185-150}{25}\\ &= 1.4 \end{align} \]

User defined function `zscore`

You take the SAT and score 1100. The mean score for the SAT is 1026 and the standard deviation is 209. How well did you score on the test compared to the average test taker?

zscore(1100,1026,209)

## [1] 0.354067

Upper cumulative

Suppose that $X$ is a normal random variable with mean $\mu=200$ and standard deviation $\sigma=40$. What is the probability that $X$ will take a value greater than 228?

\[ P(X>228)=1-P(X<228) \]

pnorm(228,mean=200,sd=40)

## [1] 0.7580363

\[ \begin{align} P(X>228)&=1-P(X<228)\\ &=1- 0.7580363\\ &=0.2419637 \end{align} \]

Auto mufflers

Suppose that an automobile muffler is designed so that its lifetime (in months) is approximately normally distributed with mean 26.4 months and standard deviation 3.8 months. The manufacturer has decided to use a marketing strategy in which the muffler is covered by warranty for 18 months. Approximately what proportion of the mufflers will fail within 18 months?

pnorm(18, mean=26.4,sd=3.8)

## [1] 0.01353433

Corn flakes

A machine that dispenses corn-flakes into packages provides amounts that are approximately normally distributed with mean weight 20 ounces and standard deviation 0.6 ounce. Suppose that the weights and measures law under which you must operate allows you to have only 5% of your packages under the weight stated on the package. What weight should you print on the package?

\[ P(X<w)=0.05 \]

qnorm(0.05,mean=20, sd=0.6)

## [1] 19.01309

Left area

$Z$ follows a standard normal distribution. What is $z_0.025$?
REMEMBER: we’re looking for the area to the LEFT of $a$

\[ \begin{align} P(Z>z_a)&=a\\ P(Z>z_.025)&=.025\\ 1-P(Z<z_.025)&=.025\\ P(Z<z_.025)&=.975 \end{align} \]

qnorm(.975,mean=0,sd=1)

## [1] 1.959964

\[ \therefore z_.025=1.96 \]

What is $z_.05$?

\[ \begin{align} P(Z>z_.05)&=0.05\\ 1-P(Z<z_.05)&=0.05\\ P(Z<z_.05)&=1-0.05\\ &=0.95 \end{align} \]

qnorm(0.95,mean=0,sd=1)

## [1] 1.644854

\[ \therefore z_.05=1.644854 \]

Exercises

1

A normal random variable X has mean 3.0 and standard deviation 0.2. What is the probability that X falls between 2.75 and 3.1?

\[ \begin{align} P(2.75< X <3.1)&=P(X<3.1)-P(X<2.75)\\ &=0.6914625-0.1056498\\ &=0.5858127 \end{align} \]

pnorm(3.1,mean=3,sd=.2)

## [1] 0.6914625

pnorm(2.75,mean=3,sd=.2)

## [1] 0.1056498

pnorm(3.1,mean=3,sd=.2)-pnorm(2.75,mean=3,sd=.2)

## [1] 0.5858127

2

Suppose that X follows normal distribution with mean 5.5 and standard deviation 0.3. Find a number w such that X < w with 30% probability.

qnorm(.3,mean=5.5,sd=0.3)

## [1] 5.34268

3

The quality control section of a purchasing contract for valves specifies that the diameter must be between 2.53 and 2.57 centimeters. Assume that the production equipment is set so that the diameter follows normal distribution with mean diameter 2.56 centimeters and the standard deviation 0.01 centimeters. What is the percent of valves produced, over the long run, will be within these specifications?

Let $X$ be random variable expressing diameter produced

\[ \begin{align} P(2.53<X<2.57)&=P(X<2.57)-P(X<2.53)\\ &=0.8413447-0.001349898\\ &=0.8399948 \end{align} \]

pnorm(2.57,mean=2.56,sd=0.01)

## [1] 0.8413447

pnorm(2.53,mean=2.56,sd=0.01)

## [1] 0.001349898

pnorm(2.57,mean=2.56,sd=0.01)-pnorm(2.53,mean=2.56,sd=0.01)

## [1] 0.8399948

4

The chocolate chip cookies that are produced at Perry’s Cookie Emporium have weights which are approximately normally distributed with the mean weight 180 grams and with standard deviations 20 grams. The cookies, however, are sold by count, not by weight.
Perry wants to improve his image, so he decides to set aside lightest 20% of the cookies to be packaged and sold separately.
What cookie weight will divide the lightest 20% from the heaviest 80%?

$\mu=180$
$\sigma=20$
Want weight at 20th percentile

qnorm(.2,mean=180,sd=20)

## [1] 163.1676

Assessment 1

1

2

3

4 Casino; $10 mil

The following game is offered in a casino. An employee flips a coin 20 times, but the player does not see the outcomes of these coin flips. After each flip of the coin the player has to guess whether the coin turned up head or tail. At the end the player receives k dollars, where k is the number of correct guesses, except that if she guesses all twenty coin flips correctly, then she will receive an additional 10,000,000 dollars (so in that case the total reward will be 10, 000, 000 + 20 = 10, 000, 020 dollars)

Calculate the expected amount of winnings in this game. Without the additional 10,000,000 dollar reward, the winnings would follow binomial distribution with number of trials $n = 20$ and $p = .5$, in which case the expected winnings would be $\$10$.

However, there is a probability of $.5^{20} = .00000095$ that the player guesses correctly all 20 coin flips. This adds $10, 000, 000 \times .00000095 = 9.5$ dollars to the expected winnings, so the expected winnings will be $10 + 9.5 = 19.5 \text{ dollars}$.

In other words: The winnings follow binomial distribution $n=20$ and $p=.5$ ($10) $((0.5^{20})*10,000,000)+10= 19.5 \text{ dollars}$

\[ \begin{align} \mu &=\sum_{i=1}^{n} x_i p_i\\ &=x_1p_1+x_2p_2\\ &=(20*.5)+(10000000\times.5^{20})\\ &=10+9.536743\\ &\approx 19.54 \end{align} \]

There is a modification of this game in which the player receives the additional 10,000,000 dollars not when she guesses all coin flips correctly, but when she guesses all flips incorrectly. So if the number of correct guesses is zero, then she receives 10,000,000 dollars, and in all other cases she receives as many dollars as the number of correct guesses. What is the expected amount of winnings in this modified game?

The probability of zero correct guesses is the same as the probability of 20 correct guesses, so the expected amount of winnings will be the same, that is, 19.5 dollars.

Casino; random number

There is a game in a casino with the following rules. A machine generates a random number $X$ from a normal distribution with mean $100$ and standard deviation $10$. If $X$ is at least $100$ then the player receives the amount $X − 100$ dollars. If $X$ is below $100$ then the player pays $100 − X$ dollars. Let $Y$ be the amount the player is going to win, with the understanding that $Y$ is negative if the player has to pay. For example, if $X = 109$ then $Y =9$,and if $X=95$ then $Y =−5$.

What is the expected value of $Y$? (10 pts.)
Since $Y = X − 100$ and $X$ has expected value $100$, thus $Y$ has expected value $0$
What is the standard deviation of $Y$?
The deviation of $Y$ from $0$ is exactly the same as the deviation of $X$ from $100$, so the standard deviation of $Y$ is also $10$.
A player plans to play this game 20 times. What is the probability that at least 5 times out of the 20 games she will win more than 13 dollars? (10 pts.)

There are two things to consider in this situation:

Probability of winning more than $13
Probability of at least 5 times out of the 20 tries

To win 13 dollars, $Y=13$. For $Y=13$, $13=X-100 \therefore X=113$
\[ \begin{align} P(X>113)&=1-P(X<113)\\ &=1-\text{pnorm}(113,100, 10)\\ &=0.09680048 \end{align} \]

1-pnorm(113,mean=100,sd=10)

## [1] 0.09680048

At each game the probability of winning more than 13 dollars is 0.097.
Winning at whatever number of times where each trial is either win or lose is a binomial distribution. The pbinom() takes one less the desired wins, the total number of trials, and the probability of the win, which is in this case is winning more than $13.

\[ \begin{align} P(W\geq X)&=1-\text{pbinom}((X-1),n,\text{prob})\\ P(W\geq5)&=1-\text{pbinom}((5-1),\text{size}=20,\text{prob}=.097)\\ &=0.03855293 \end{align} \]

1-pbinom(4,20,.097)

## [1] 0.03855293

Hence the probability that the player wins more than 13 dollars at least 5 times is 0.04.

Lesson 2: Continuous distributions and the normal distribution

Quick review of Empirical Rule

R functions

Standard normal distribution

Standard normal distribution

Fat normal distribution

Normal distribution (axis)

Standardization; Z-score

Z-core 1.96 = 95% proof

Z-score function

NOTATION (\(z_a\))

Revisiting binomial distribution

small \(n\)

large \(n\)

Quick review

Examples

Basic probabilities

Standardization

User defined function `zscore`

Upper cumulative

Auto mufflers

Corn flakes

Left area

Exercises

1

2

3

4

Assessment 1

1

2

3

4

Casino; $10 mil

Casino; random number

Lesson 2: Continuous distributions and the normal distribution

Quick review of Empirical Rule

R functions

Standard normal distribution

Standard normal distribution

Fat normal distribution

Normal distribution (axis)

Standardization; Z-score

Z-core 1.96 = 95% proof

Z-score function

NOTATION (\(z_a\))

Revisiting binomial distribution

small \(n\)

large \(n\)

Quick review

Examples

Basic probabilities

Standardization

User defined function zscore

Upper cumulative

Auto mufflers

Corn flakes

Left area

Exercises

1

2

3

4

Assessment 1

1

2

3

4

Casino; $10 mil

Casino; random number

User defined function `zscore`