z-scores

A z-score tells us how many standard deviations an observation is from its mean:
\[z=\frac{x - \bar x}{s}\]

Example 1: Grades on an exam for a large class had a mean of 75 with a standard deviation of 10. Find the z scores for students who received the following grades: 70, 75, 90

Grade z score Interpretation
70 \(\frac{70-75}{10}=-0.5\) This student’s grade is one half of a standard deviation below the mean.
75 \(\frac{75-75}{10}=0\) This student’s grade is the same as the mean.
90 \(\frac{90-75}{10}=1.5\) This student’s grade is one and a half standard deviations above the mean.

Sometimes, we want to compare two measurements from different scales. We can use z scores to make this comparison.

Example 2: Dan plays golf on two different golf courses. Genesee Valley Golf Course has a mean score of 72.1 with a standard deviation of 3.3. Durand Eastman Golf Course has a mean score of 70.5 with a standard deviation of 1.5. If Dan shot 68 at both courses, which one was more impressive?

To answer this question, we will find the z score for Dan on each golf course.

Genesee Valley Durand Eastman
\(z=\frac{68−72.1}{3.3}=-1.24\) \(z=\frac{68-70.5}{1.5}=-1.67\)

Dan scored 1.24 standard deviations below the mean at Genesee Valley Golf Course and 1.67 standard deviations below the mean at Durand Eastman Golf Course. His score at Durand Eastman is more impressive (since lower scores are better in golf).

Chebyshev’s Theorem

For any dataset, the percent of observations that lie within the interval \(\bar x \pm ks\) is at least \(100 \cdot (1-\frac{1}{k^2})\%\) where \(k>1\), \(\bar x\) is the sample mean and \(s\) is the sample standard deviation.

Example 3: A random sample of data has a mean of 60 and a variance of 25. Without knowing anything else about the sample, what can be said about the percentage of observations that lie between 47.5 and 72.5?

Our first step is to determine the z scores for 47.5 and 72.5 so we can determine what \(k\) is.

\[z=\frac{47.5-60}{5}=-2.5 \text{ and } z=\frac{72.5-60}{5}=+2.5\]

Our observations are 2.5 standard deviations above and below the mean, so \(k=2.5\). You can check that \(\bar x \pm ks = 60 \pm 2.5 \cdot 5\) gives us the desired endpoints.

According to Chebyshev’s Rule, we can expect at least \(100 \cdot (1 - \frac{1}{k^2})\% = 100 \cdot (1- \frac{1}{2.5^2})\% = 84\%\) of the observations in this data to lie between 47.5 and 72.5.

Furthermore, we can say that at most \(16\%\) of the observations in the data lie below 47.5 and above 72.5.


Example 4: An auditor finds that the values of a corporation’s accounts receivable have a mean of 295 and a standard deviaton of 63. It can be guaranteed that at least 60% of these values will be in what interval?

First solve for \(k\): \[100 \cdot (1 - \frac{1}{k^2}) = 60 \Rightarrow k = 1.58\]

Next, apply Chebyshev’s Rule with \(k=1.58\): \[\bar x \pm 1.58s = 295 \pm 1.58\cdot 63 = 295 \pm 99.54 = 195.46 \text{ to } 394.54\]

We would expect at least 60% of the corporation’s accounts receivable to have values between $195.46 and $394.54.

The Empirical Rule

Figure 1

Figure 1





If a data set is approximately normally distributed then

  • about 68% of the data lies between \(\bar x - s\) and \(\bar x + s\) (i.e. \(z = \pm 1\))
  • about 95% of the data lies between \(\bar x - 2s\) and \(\bar x + 2s\) (i.e. \(z = \pm 2\))
  • about 99.7% of the data lies between \(\bar x - 3s\) and \(\bar x + 3s\) (i.e. \(z = \pm 3\))

The main difference between the Empirical Rule and Chebyshev’s Rule is whether we know the shape of the data or not. Since the Empirical Rule only applies to symmetric data we can make more statements than just the 68%, 95% and 99.7% statements.

Figure 2

Figure 2



For example, if we know that 68% of the data lie between \(\bar x - s\) and \(\bar x + s\) and we know the data is symmetric around \(\bar x\) then it follows that we must have half of that (i.e. 34% of the data) between \(\bar x - s\) and \(\bar x\) and 34% of the data between \(\bar x\) and \(\bar x + s\).

Furthermore, we must have 95% - 68% = 27% of the data combined between \(\bar x - 2s\) and \(\bar x - s\) and between \(\bar x + s\) and \(\bar x + 2s\). Using symmetry again, we can divide 27% by 2 to get 13.5% between \(\bar x - 2s\) and \(\bar x - s\) and 13.5% between \(\bar x + s\) and \(\bar x + 2s\).

See if you can derive the remaining values of 2.35% and 0.15% in Figure 2.

Example 5: A random sample of data is approximately normally distributed with mean 100 and standard deviation 6. What can be said about the percentage of observations that lie

  • between 88 and 112?
  • above 112?
  • below 94?
  • between 88 and 94?

  • Approximately 95% of the data lies between 88 and 112 since the z score for 88 is \(z=\frac{88−100}{6}=-2\) and the z score for 112 is \(z=\frac{112-100}{6}=2\).

  • Approximately 2.5% of the data lies above 112. From the previous problem, we know that 95% of the data lie between 88 and 112. So 5% must lie outside of 88 and 112. By symmetry, we can divide 5% in half to get 2.5% below 88 and 2.5% above 112. Or you could see that 2.35% + 0.15% = 2.5% lie above \(\bar x + 2s\) from Figure 2.

  • approximately 16% of the data lies below 94. The z score for 94 is -1. Adding up the values below z = -1 on Figure 2 we get 13.5% + 2.35% + 0.15% = 16%. Or try drawing a figure marking the bottom 16% and the top 16% which will leave 68% left in the middle so the values you marked must be one standard deviation above and below the mean.

  • approximately 13.5% of the data is between 88 and 94. The z scores for 88 and 94 are -1 and -2, respectively. Looking at Figure 2, we can see that 13.5% of the data lies between those two values.


Example 6: The entrance exam for law school is the LSAT. Previous data on the LSAT’s show that the distribution of scores is approximately normal with a mean of 150 and a standard deviation of 10.

  • Margo is very competitive. She is planning on taking the LSAT’s and her goal is to score in the top 2.5% of all students taking the exam. How high does Margo need to score on the LSAT to achieve her goal?

  • Sheila is going to take the LSAT’s too but she doesn’t want to go to law school so she wants to be in the bottom 0.15% of all students taking the exam. How low does Sheila need to score on the LSAT to achieve her goal?

  • Margo must score at least a 150 + 2(10) = 170 on the LSAT. To see this, draw a figure marking the top and bottom 2.5% which leaves 95% in the middle so the values marked must be two standard deviations away from the mean.

  • Sheila must score at most a 150 - 3(10) = 120 on the LSAT. To see this, draw a figure marking the top and bottom 0.15% which leaves 99.7% in the middle so the values marked must be three standard deviations away from the mean.