Continuous Random Variables - The Normal Distribution

A continuous random variable takes on any numerical value in some interval, which may stretch all the way from \(-\infty\) to \(+\infty\). Every continuous random variable has a probability density function which describes how probability is distributed among its possible values. This function is different from the probability distribution function of a discrete random variable. Note the words “distribution” and “density.” The value (height) of a probability density function is not a probability. In the case of a continuous random variable, the probability of an interval is given by the area under the density function above the interval. The probability of any specific value of a continuous random variable is always \(0\). For these random variables, probability is an area and a line segment has no area.

The most important type of continuous random variable has a normal probability density function. Such a random variable has a central peak with high probability for intervals near the peak and low probability for intervals far away from the peak. There are two parameters which chanracterize a normal random variable, the mean and the standard deviation. In the most important case, the standard normal distribution, the mean is \(0\) and the standard deviation is \(1\). The graph below shows a standard normal distribution and indicates the probability of a value being less than 1.5. Strictly speaking it shows the probability of the interval from \(-4\) to \(+1.5\). However, the amount of probability (area) to the left of \(-4\) is essentially zero.

This is a function I created to do normal probability plots. It involves several tricks that I don’t expect you to learn. However, learning to use it is easy and I do expect you to do that.

MyNormProb <- function(
  lb = NA,                       # Lower bound
  ub = NA,                       # Upper bound
  mean = 0,                      # Mean 
  sd = 1,                        # Standard deviation
  MyLabel = "Standard Normal RV" # Description of the variable
  
  # This function produces a plot and the probability that
  # a normal random variable with the given mean and
  # standrard deviation falls between the given lower and 
  # upper bounds. 
  
  ) 
{

if (is.na(lb)) {lb <- mean - 4 * sd}
if (is.na(ub)) {ub <- mean + 4 * sd}

x <- seq(-4,4,length=1000)*sd + mean
hx <- dnorm(x,mean,sd)

plot(x, hx, type="n", xlab=MyLabel, ylab="Density", axes=FALSE)

i <- x >= lb & x <= ub
lines(x, hx)
polygon(c(lb,x[i],ub), c(0,hx[i],0), col="red")

area <- pnorm(ub, mean, sd) - pnorm(lb, mean, sd)
result <- paste("P(",lb,"< ",MyLabel," <" ,ub,") =",
                signif(area, digits=3))
mtext(result,3)
abline(0,0)
segments(x0 = mean, y0 = 0, x1 = mean, y1 = dnorm(mean,mean,sd))
return(area)
}

Test it. Find the probability that a standard N(0,1) random variable is to the left of z = 1.5.

v <- MyNormProb(ub=1.5,MyLabel="N(0,1)")

What I want to do is show you how the formula I used to calculate the value of Zstar in the code snippet for finding a confidence interval for the mean of a population works. Recall tha the formula is \[Zstar = qnorm(1-.5*(1-CL))\] This formula, for example, tells you that if you want a 95% confidence interval, the appropriate value of Zstar is 1.96. To see that use MyNormProb() to find the probability that a standard normal random variable falls between -1.96 and +1.96.

v = MyNormProb(lb=-1.96,ub=1.96)

CL = .95
Zstar = qnorm(1-.5*(1-CL))
Zstar
## [1] 1.959964
qnorm(.975)
## [1] 1.959964

For a video walkthrough of this, right-click on https://www.youtube.com/watch?v=36BMKszwoMM and select open in new tab.