The practice problems below will ask you to calculate Z scores. To do so, you’ll need to calculate means and sample standard deviations. If you need a refresher of this material, keep reading. If you feel comfortable calculating those quantities, you can go directly to the questions.

A quick review of calculating means

To calculate a mean, we just add up the all the values of X and divide it by the number of data points. This is expressed in the following equation: \[ \bar{X}=\frac{\sum X}{N} \] \(\sum\) is a mathematical symbol for summation. It means add up each value of X.

So, if we have the following values of X: \[ 1, 6, 8, 7, 10 \] We first just add them all together

1 + 6 + 8 + 7 + 10
## [1] 32

And divide this number by \(N = 5\)

(1+ 6  +8 + 7 + 10)/5
## [1] 6.4

And we get a mean of 6.4.

A quick review of calculating the standard deviation.

Remember that the standard deviation tells us about how spread out the data points are. A larger standard deviation means greater variability than a smaller standard deviation. The formula for the sample standard deviation is:

\[ SD =\sqrt{\frac{\sum (X - \bar{X})^2}{N - 1}} \] That’s a bit hairy looking. But first, notice a couple similarities between the formula for the SD and the formula for the mean. Ignore the square root and squared term for now. Both these formula take the sum of a set of numbers and divide it by N (or N-1, which is really close to N). This is because the standard deviation tells us roughly the average distance from the mean. So, at it’s heart, this equation is just calculating the difference between each value of X and the mean, \(\bar{X}\), and dividing it by the number of data points.

To understand this better, let’s break it up into a few parts. First let’s look at the numerator (ignoring the square root) \[ \sum(X - \bar{X})^2 \] The part inside the parentheses tells us that we first subtract the mean of \(\bar{X}\) from each value of X. This tells us how far each data point is from the mean (\(\bar{X}\)). In our example above this means doing the following:

\[ 1 - 6.4 = -5.4\\ 6 - 6.4 = -.4 \\ 8 - 6.4 = 1.6 \\ 7 - 6.4 = .6 \\ 10 - 6.4 = 3.6\\ \]

If we added up all those difference scores, we would get 0. This is because, by definition, the mean is in the middle of my data. If the data are normally distributed, half of the data points will be above the mean and half will be below. So if we subtract the mean from each data point, the positive and negative values will cancel each other out.

However, if we want to know the average deviation from the mean, the sign of the deviations don’t matter. A deviation of -5 is as far from the mean as a deviation of 5. So, we can get rid of the signs by squaring each of the deviations, hence the \((X - \bar{X})^{2}\), which will result in all positive values.

Remember that the denominator is N-1, not N. The reason for this is a bit more technical than I want to be here, but it’s because we used a sample estimate of the mean. If we knew the population mean, we could divide by N. For the moment, it’s good enough to notice that N-1 is almost the same as N, so dividing by N-1 is telling us the average squared deviation from the mean.

The average squared deviation from the mean isn’t easy to interpret, so we take the square root of the value to get back to the scale of the data, and we get (roughly) the average deviation from the mean.

How to calculate the SD in Practice.

Usually the best way to calculate the SD is to make a table. Let’s say that I have 7 data points.

##   X
## 1 4
## 2 7
## 3 8
## 4 5
## 5 6
## 6 7
## 7 8

We can calculate the mean (approximately 6.43), and make a second column telling us how far each data point is from the mean.

##   X      X_dev
## 1 4 -2.4285714
## 2 7  0.5714286
## 3 8  1.5714286
## 4 5 -1.4285714
## 5 6 -0.4285714
## 6 7  0.5714286
## 7 8  1.5714286

We can then make a new column that takes the squares of each of those values.

##   X      X_dev  X_dev_sq
## 1 4 -2.4285714 5.8979592
## 2 7  0.5714286 0.3265306
## 3 8  1.5714286 2.4693878
## 4 5 -1.4285714 2.0408163
## 5 6 -0.4285714 0.1836735
## 6 7  0.5714286 0.3265306
## 7 8  1.5714286 2.4693878

Now we can just sum up all of the values of X_dev_sq, divide that quantity by N-1, and take the square root.