Probability density function: probability that a continuous random variable will fall in the interval \((a,b)\) is represented by the area from \(a\) to \(b\) under the density function of the random variable
| Empirical Rule | |
| % | Sigma from Mean |
|---|---|
| 68% | 1 |
| 95% | 2 |
| 99% | 3 |
Using probability distributions in R: dnorm, pnorm, qnorm, and rnorm
| R Functions | ||
| function | description | main argument |
|---|---|---|
| dnorm | density; exact value | quantity |
| pnorm | distribution function; cumulative | quantity |
| qnorm | quantile function; find a value given probability; transpose pnorm | probabilities |
For a given probability we can use the qnorm function in
R to find a number such that the area under the standard normal density
function to the left of this number is the given
probability
Normal distribution is standard if \(\mu = 0\) and \(\sigma =1\)
Normal distribution does not have to be standard
x <- seq(-3,3,length=1000)
y <- dnorm(x,mean=00,sd=1)
plot(x,y)
x <- seq(-15,15,length=5000)
y <- dnorm(x,mean=0,sd=5)
plot(x,y)
#different axis of symmetry
x <- seq(-5,25,length=5000)
y <- dnorm(x,mean=10,sd=5)
plot(x,y)
Standardized value (Z-score): gives # of standard deviations from
mean.
If \(X\) follows normal distribution
with mean \(\mu\) and standard
deviation \(\sigma\), THEN \(Z = \frac{X-\mu}{\sigma}\) follows STANDARD
normal distribution (convert normal standard variable into standard
normal variable)
The following calculates a Z-score (standard score): measure of how many standard deviations below or above the population mean a raw score is. \[ Z=\frac{X-\mu}{\sigma} \]
Suppose that \(Z\) is a standard
normal random variable. Find the value \(w\) so that \(P[-w<X<+w]=0.95\)
By symmetry, \[
\begin{align}
P[-w<Z<+w]&=0.95\\
P(0<Z<w)&=0.95/2\\
&=0.475\\
P(Z>w)&=P(Z<w)\\
&= 0.5
\end{align}
\] Strategy, use qnorm() since we’re given a
probability. qnorm outputs the desired value given a
cumulative probability. (Think: transpose pnorm)
Add up known cumulative probabilities, such as
Whatever value that comes out of the qnorm will give us
the value for that cumulative 0.975. We know that this \(w\) works because we used half of the
desired 0.95 probability.
\[ \begin{align} P(Z>w)+P(0<Z<w)&=0.5+0.475\\ &=0.975 \end{align} \]
qnorm(0.975,mean=0,sd=1)
## [1] 1.959964
zscore <- function(given_value,population_mean,sd) {
Z <- (given_value-population_mean)/sd
return(Z)
}
The original \(X\) is \(Z\) standard deviations away from the mean
of \(X\)
The Z-score will allow us to understand
Finding \(z_a\) given \(P(Z>z_a)=a\); also will allow us to use
qnorm()
Let \(z_a\) be a number s.t. \(P(Z>z_a)=a\), where \(Z\) follows standard normal
distribution.
The area to the left of \(a\) will be
\(1-a=P(Z<z_a)\), which follows
finding \(P(X<w)\) (meaning we can
just use qnorm) \[
\begin{align}
P(Z>z_a)&= a\\
1-P(Z<z_a)&=a\\
P(Z<z_a)&=1-a\\
\end{align}
\]
\(P(Z<z_a)\) will allow us to use
qnorm
The binomial distribution looks like the normal, at least if \(n\) is large.
The mean and the standard deviation of the normal distribution
that approximates the binomial distribution will be the same as
\(\mu\) and \(\sigma\) for the binomial distribution:
\[
\mu=n\times p\\
\\
\sigma=\sqrt{n\times p \times (1-p)}
\]
y <- 0:5
plot(y,dbinom(y,5,.7),type = "h")
y <- 6900:7100
plot(y,dbinom(y,10000,0.7),type="h")
input = desired number of success
dbinom(input,n,p) yields \(P(\text{input})\)
Suppose that \(Z\) is a standard normal random variable (\(\mu=0\) and \(\sigma =1\)).
pnorm(1,mean=0,sd=1)
## [1] 0.8413447
\[ \begin{align} P(Z<1)&=0.8413447\\ P(Z>0)&=0.5\\ P(0<Z<1)&=P(Z<1)-P(Z>0)\\ &=0.84-0.5\\ &=0.34 \end{align} \]
\[ \begin{align} P(-1<Z<1)&=2\times P(0<Z<1)\\ &= 0.34\times 2\\ &= 0.68 \end{align} \]
pnorm(2, mean=0, sd=1) # P(Z<2)
## [1] 0.9772499
pnorm(-2, mean=0, sd=1) # P(Z<-2)
## [1] 0.02275013
\[ \begin{align} P(Z<2)&=0.9772499\\ P(Z<-2)&=0.02275013\\ &= 0.68\\ P(-2<Z<2)&=P(Z<2)-P(Z<-2)\\ &=0.9772499-0.02275013\\ &=0.9544998\\ &\approx 0.95 \end{align} \]
Notice: Empirical rule’s 95%
pnorm(3,mean=0,sd=1) #P(Z<3)
## [1] 0.9986501
pnorm(-3,mean=0,sd=1) #P(Z<-3)
## [1] 0.001349898
\[ \begin{align} P(−3 < Z < 3)&= P(Z<3) - P(Z<-3)\\ &=0.9986501-0.001349898\\ &=0.9973002 \end{align} \]
Notice: Empirical rule’s 99%
Suppose that we observe \(W=185\text{
pounds}\) and \(W\) follows
normal standardization, but not standard normal. How do we interpret
this information?
Suppose \(\mu=150\) and \(\sigma=25\). The standardized value is:
\[
\begin{align}
Z&=\frac{W-\mu}{\sigma}\\
&=\frac{185-150}{25}\\
&= 1.4
\end{align}
\]
zscoreYou take the SAT and score 1100. The mean score for the SAT is 1026 and the standard deviation is 209. How well did you score on the test compared to the average test taker?
zscore(1100,1026,209)
## [1] 0.354067
Suppose that \(X\) is a normal random variable with mean \(\mu=200\) and standard deviation \(\sigma=40\). What is the probability that \(X\) will take a value greater than 228?
\[ P(X>228)=1-P(X<228) \]
pnorm(228,mean=200,sd=40)
## [1] 0.7580363
\[ \begin{align} P(X>228)&=1-P(X<228)\\ &=1- 0.7580363\\ &=0.2419637 \end{align} \]
Suppose that an automobile muffler is designed so that its lifetime (in months) is approximately normally distributed with mean 26.4 months and standard deviation 3.8 months. The manufacturer has decided to use a marketing strategy in which the muffler is covered by warranty for 18 months. Approximately what proportion of the mufflers will fail within 18 months?
pnorm(18, mean=26.4,sd=3.8)
## [1] 0.01353433
A machine that dispenses corn-flakes into packages provides amounts that are approximately normally distributed with mean weight 20 ounces and standard deviation 0.6 ounce. Suppose that the weights and measures law under which you must operate allows you to have only 5% of your packages under the weight stated on the package. What weight should you print on the package?
\[ P(X<w)=0.05 \]
qnorm(0.05,mean=20, sd=0.6)
## [1] 19.01309
\[ \begin{align} P(Z>z_a)&=a\\ P(Z>z_.025)&=.025\\ 1-P(Z<z_.025)&=.025\\ P(Z<z_.025)&=.975 \end{align} \]
qnorm(.975,mean=0,sd=1)
## [1] 1.959964
\[ \therefore z_.025=1.96 \]
\[ \begin{align} P(Z>z_.05)&=0.05\\ 1-P(Z<z_.05)&=0.05\\ P(Z<z_.05)&=1-0.05\\ &=0.95 \end{align} \]
qnorm(0.95,mean=0,sd=1)
## [1] 1.644854
\[ \therefore z_.05=1.644854 \]
A normal random variable X has mean 3.0 and standard deviation 0.2. What is the probability that X falls between 2.75 and 3.1?
\[ \begin{align} P(2.75< X <3.1)&=P(X<3.1)-P(X<2.75)\\ &=0.6914625-0.1056498\\ &=0.5858127 \end{align} \]
pnorm(3.1,mean=3,sd=.2)
## [1] 0.6914625
pnorm(2.75,mean=3,sd=.2)
## [1] 0.1056498
pnorm(3.1,mean=3,sd=.2)-pnorm(2.75,mean=3,sd=.2)
## [1] 0.5858127
Suppose that X follows normal distribution with mean 5.5 and standard deviation 0.3. Find a number w such that X < w with 30% probability.
qnorm(.3,mean=5.5,sd=0.3)
## [1] 5.34268
The quality control section of a purchasing contract for valves
specifies that the diameter must be between 2.53 and 2.57 centimeters.
Assume that the production equipment is set so that the diameter follows
normal distribution with mean diameter 2.56 centimeters and the standard
deviation 0.01 centimeters. What is the percent of valves produced, over
the long run, will be within these specifications?
Let \(X\) be random variable expressing
diameter produced
\[ \begin{align} P(2.53<X<2.57)&=P(X<2.57)-P(X<2.53)\\ &=0.8413447-0.001349898\\ &=0.8399948 \end{align} \]
pnorm(2.57,mean=2.56,sd=0.01)
## [1] 0.8413447
pnorm(2.53,mean=2.56,sd=0.01)
## [1] 0.001349898
pnorm(2.57,mean=2.56,sd=0.01)-pnorm(2.53,mean=2.56,sd=0.01)
## [1] 0.8399948
The chocolate chip cookies that are produced at Perry’s Cookie
Emporium have weights which are approximately normally distributed with
the mean weight 180 grams and with standard deviations 20 grams. The
cookies, however, are sold by count, not by weight.
Perry wants to improve his image, so he decides to set aside lightest
20% of the cookies to be packaged and sold separately.
What cookie weight will divide the lightest 20% from the heaviest
80%?
qnorm(.2,mean=180,sd=20)
## [1] 163.1676
The following game is offered in a casino. An employee flips a coin 20 times, but the player does not see the outcomes of these coin flips. After each flip of the coin the player has to guess whether the coin turned up head or tail. At the end the player receives k dollars, where k is the number of correct guesses, except that if she guesses all twenty coin flips correctly, then she will receive an additional 10,000,000 dollars (so in that case the total reward will be 10, 000, 000 + 20 = 10, 000, 020 dollars)
\[ \begin{align} \mu &=\sum_{i=1}^{n} x_i p_i\\ &=x_1p_1+x_2p_2\\ &=(20*.5)+(10000000\times.5^{20})\\ &=10+9.536743\\ &\approx 19.54 \end{align} \]
The probability of zero correct guesses is the same as the probability of 20 correct guesses, so the expected amount of winnings will be the same, that is, 19.5 dollars.
There is a game in a casino with the following rules. A machine generates a random number \(X\) from a normal distribution with mean \(100\) and standard deviation \(10\). If \(X\) is at least \(100\) then the player receives the amount \(X − 100\) dollars. If \(X\) is below \(100\) then the player pays \(100 − X\) dollars. Let \(Y\) be the amount the player is going to win, with the understanding that \(Y\) is negative if the player has to pay. For example, if \(X = 109\) then \(Y =9\),and if \(X=95\) then \(Y =−5\).
What is the expected value of \(Y\)? (10 pts.)
Since \(Y = X − 100\) and \(X\) has expected value \(100\), thus \(Y\) has expected value \(0\)
What is the standard deviation of \(Y\)?
The deviation of \(Y\) from \(0\) is exactly the same as the deviation of
\(X\) from \(100\), so the standard deviation of \(Y\) is also \(10\).
A player plans to play this game 20 times. What is the
probability that at least 5 times out of the 20 games she will win more
than 13 dollars? (10 pts.)
There are two things to consider in this situation:
1-pnorm(113,mean=100,sd=10)
## [1] 0.09680048
At each game the probability of winning more than 13 dollars is
0.097.
Winning at whatever number of times where each trial is either win or
lose is a binomial distribution. The pbinom() takes one
less the desired wins, the total number of trials, and the probability
of the win, which is in this case is winning more than $13.
\[ \begin{align} P(W\geq X)&=1-\text{pbinom}((X-1),n,\text{prob})\\ P(W\geq5)&=1-\text{pbinom}((5-1),\text{size}=20,\text{prob}=.097)\\ &=0.03855293 \end{align} \]
1-pbinom(4,20,.097)
## [1] 0.03855293
Hence the probability that the player wins more than 13 dollars at least 5 times is 0.04.