The Normal Distribution

In this activity we will introduce the normal distribution. There are three goals in this activity:

  1. Sketch the probability density function using R’s dnorm command.

  2. Find a specific area under the probability density function using R’s pnorm command.

  3. Given the area under the probability density function, find a specific percentile (quantile) using R’s qnorm command.

Sketching the Normal Density Function

Our first sketch is known as the Standard Normal Distribution. We first create a sequence of 200 numbers, beginning at \(x=-3\) and ending at \(x=3\) (we’ll see why we make this choice in a moment).

x=seq(-3,3,length=200)

Next, we use R’s dnorm command to compute the \(y\)-values of the standard normal probability density function (mean \(\mu=0\), standard deviation \(\sigma=1\)).

y=dnorm(x,mean=0,sd=1)

Now we can plot the result.

plot(x,y)

Note that this method just plots the 200 points we used (like a scatterplot). If we would like to connect our data points with line segments instead, we use the argument type=“l”, which means “line type” (the character after type= is a lower case L).

plot(x,y,type="l")

Because it is shaped like a bell, many people call this curve a “Bell Curve.” Note that the normal curve is completely symmetric, and the mean \(\mu=0\) is its balance point. Moreover, because the standard deviation we entered was \(\sigma=1\), note that the curve finishes on both the right and the left within three standard deviations of the mean. That is why we picked our domain values to vary from \(x=-3\) to \(x=3\).

Let’s draw another normal curve, but this time let’s use a mean of \(\mu=50\) and a standard deviation of \(\sigma=10\).

x=seq(20,80,length=200)
y=dnorm(x,mean=50,sd=10)
plot(x,y,type="l")

Again, note that the curve is “Bell Shaped,” completely symmetrical, and the mean \(\mu=50\) is the balance point of the distribution. Because the standard deviation is \(\sigma=10\), note again the the curve finishes on both the left and right within three standard deviations of the mean \(\mu=50\). That is why we chose our domain values to vary from \(x=20\) to \(x=80\). Those are precisely the numbers you get when you subtract (add) three standard deviations of \(\sigma=10\) from (to) the mean \(\mu=50\).

Find the Area Under the Normal Density Curve

The total area under any normal density curve is always equal to one. We will now learn how to use R’s pnorm command to find areas of a variety of regions under a normal density curve. First, let’s shade the area to the left of the mean under the standard normal density curve (\(\mu=0\) and \(\sigma=1\)).

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(-3,0,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(-3,x,0),c(0,y,0),col="red")

Note: You will not be required to learn how to shade regions like this in R, although the code is here if you’d like to give it a try. You will usually be drawing these regions by hand, then using R’s pnorm command to find the area of the shaded region.

Now, because the total area under the curve is 1, and because of the symmetry, the area to the left of \(\mu=0\) should be 0.5. To verify this with R, enter the following command.

pnorm(0,mean=0,sd=1)
## [1] 0.5

Next, let’s see if we can find the area to the left of \(1\). First, we’ll draw an image.

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(-3,1,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(-3,x,1),c(0,y,0),col="red")

The area is definitely larger than 0.5. Again, let’s use R’s pnorm command to find the area of the shaded region to the left of 1.

pnorm(1,mean=0,sd=1)
## [1] 0.8413447

It is very important to understand that the command pnorm(x,mean,sd) always gives the area TO THE LEFT of x. Now, suppose we’d like to get the area to the right of 2. We begin again with an image, one that you will usually do by hand on your homework.

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(2,3,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(2,x,3),c(0,y,0),col="red")

Note that the area to the right of 2 seems quite small.

The command pnorm(2,mean=0,sd=1) will give the area to the left of 2. However, we want the area to the right of 2. Because the total area under the curve is 1, the area to the right of 2 is found by subtracting the area to the left of 2 from the total area 1.

1-pnorm(2,mean=0,sd=1)
## [1] 0.02275013

Now, suppose we would like to find the area between \(-1\) and \(1\). Let’s start with a picture that shows the shaded region of interest.

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(-1,1,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(-1,x,1),c(0,y,0),col="red")

We can find the area of the shaded region between \(-1\) and \(1\) by subtracting the area to the left of \(-1\) from the area to the left of \(1\).

pnorm(1,mean=0,sd=1)-pnorm(-1,mean=0,sd=1)
## [1] 0.6826895

In light of the fact that the total area under the curve is 1, 0.6827 seems like a reasonable answer.

We can use R’s pnorm command to find areas of shaded regions under the normal density curve, regardless of the mean and standard deviation values. For example, suppose we want to find the area of the shaded region within two standard deviations of the mean of the normal distribution having mean \(\mu=50\) and standard deviation \(\sigma=10\). We start again with an image of the shaded region, that is within two standard deviations \(\sigma=10\) of the mean \(\mu=50\) (i.e., between 30 and 70).

x=seq(20,80,length=200)
y=dnorm(x,mean=50,sd=10)
plot(x,y,type="l")
x=seq(30,70,length=100)
y=dnorm(x,mean=50,sd=10)
polygon(c(30,x,70),c(0,y,0),col="red")

Again, remember that the pnorm(x,mean,sd) always produces the area to the left of x. So to get the area of the shaded region between 30 and 70, we’ll need to subtract the area to the left of 30 from the area to the left of 70.

pnorm(70,mean=50,sd=10)-pnorm(30,mean=50,sd=10)
## [1] 0.9544997

Again, knowing that the total area under the curve is 1, 0.9545 seems like a reasonable answer.

Finding the Quantile (Percentile)

We will show how to reverse the process. Above, we were given a number \(x\) on the horizontal axis and then we were asked to find the area under the curve to the left of \(x\). Now we will give the area under the curve and ask you to find the number on the \(x\)-axis which has the given area to the left of \(x\). That is, we give you the area, you find the value of \(x\).

Let’s start with an image.

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(-3,-0.2533,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(-3,x,-0.2533),c(0,y,0),col="red")
text(-1,0.1,"0.40")

We’re given that the area of the shaded region is 0.40, then we are asked to find the value of \(x\) so that the area to the left of \(x\) is 0.40. To do this, we use R’s qnorm command.

qnorm(0.40,mean=0,sd=1)
## [1] -0.2533471

The number \(-0.2533\) seems a quite reasonable answer.

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(-3,-0.2533,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(-3,x,-0.2533),c(0,y,0),col="red")
text(-1,0.1,"0.40")
arrows(0.5,0.1,-0.2,0,length=.15)
text(0.5,0.12,"-0.2533")

Next, suppose you are given a region that is shaded to the right and we are then asked to find a value of \(x\) where the area shaded under the normal density curve to the right of \(x\) is the given area. We start with an image.

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(0.5244,3,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(0.5244,x,3),c(0,y,0),col="red")
text(1,0.1,"0.30")

Because both of R’s pnorm and qnorm commands always work with the area of the region SHADED TO THE LEFT, we must make an adjustment. The area to the right of the \(x\) value of interest is 0.30, so the area to the left of the \(x\) value of interest is found by subtracting 0.30 from 1. That is, the area to the left of the \(x\) value of interest is 0.70. This is the number we must insert into R’s qnorm command.

qnorm(0.70,mean=0,sd=1)
## [1] 0.5244005

Thus, the area to the right of 0.5244 is 0.30 (and the area to the left of 0.5244 is 0.70).

x=seq(-3,3,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l")
x=seq(0.5244,3,length=100)
y=dnorm(x,mean=0,sd=1)
polygon(c(0.5244,x,3),c(0,y,0),col="red")
text(1,0.1,"0.30")
arrows(-0.5,0.1,0.45,0,length=.15)
text(-0.5,0.12,"0.5244")

Note that on close examination of the image, the \(x\)-value 0.5244 is quite reasonable.

An Application

The distribution of a professor’s exam scores is normally distributed with a mean of 60 points and a standard deviation of 15 points. The professor promises his students that the top 20% of the scores will receive a grade of A in his class. What is the minimum score you must achieve to receive an A in the class?

To answer the question, let’s begin by drawing a normal distribution with a mean \(\mu=60\) and a standard deviation \(\sigma=15\).

x=seq(15,105,length=200)
y=dnorm(x,mean=60,sd=15)
plot(x,y,type="l",xaxt="n")
axis(1,at=c(15,30,45,60,75,90,105))

There is more fancy code used to draw this image, for which you are not responsible, but the code is here if you would like to experiment. Again, you will be drawing these images by hand on your homework.

Next, let’s shade the top 20% of the area under the normal density curve.

x=seq(15,105,length=200)
y=dnorm(x,mean=60,sd=15)
plot(x,y,type="l",xaxt="n")
axis(1,at=c(15,30,45,60,75,90,105))
x=seq(72.62,105,length=100)
y=dnorm(x,mean=60,sd=15)
polygon(c(72.62,x,105),c(0,y,0),col="red")
text(80,0.005,"0.20")

Now, the area to the right of the minimum score required for an A in the class is 0.20, so the area to the left of the score is 0.80. Remember, the qnorm command demands the area TO THE LEFT. Thus:

qnorm(0.80,mean=60,sd=15)
## [1] 72.62432

Thus, the minimum score required for an A is 72.62.

x=seq(15,105,length=200)
y=dnorm(x,mean=60,sd=15)
plot(x,y,type="l",xaxt="n")
axis(1,at=c(15,30,45,60,75,90,105))
x=seq(72.62,105,length=100)
y=dnorm(x,mean=60,sd=15)
polygon(c(72.62,x,105),c(0,y,0),col="red")
text(80,0.005,"0.20")
arrows(60,0.004,71.5,0,length=.15)
text(60,0.0055,"72.62")

Again, examining the image, this answer seems quite reasonable.