The sample mean, \(\bar x\), of a set of data, \(x_1, x_2, ...., x_n\) is the sum of the data values divided by the number of observations:
\[\bar x = \frac{1}{n}(x_1 + x_2 + ... + x_n) = \frac{1}{n} \sum_{i=1}^{n} x_i\]
Example 1: Find the sample mean of the following set of data: 5.3, 4.1, 4.2, 6.6, 2.9
Click For AnswerIn this case, we have \(x_1=5.3\), \(x_2=4.1\), \(x_3=4.2\), \(x_4=6.6\), \(x_5=2.9\) and \(n=5\) so \[\bar x = \frac{5.3+4.1+4.2+6.6+2.9}{5}=4.62\]
In R, we can use the mean command
> ex1.data <-c(5.3,4.1,4.2,6.6,2.9)
> mean(ex1.data)
[1] 4.62
The sample median, \(m\), is the middle observation of a set of observations that are arranged in increasing order. The median will be the number located in the \(\frac{n+1}{2}\) position in the ordered list. If the sample size is an odd number then the median is the middle observation. If the sample size is an even number then the median is the average of the two middle observations.
Example 2: Find the sample median of the following set of data: 5.3, 4.1, 4.2, 6.6, 2.9
Click For AnswerFirst we must order the data: 2.9, 4.1, 4.2, 5.3, 6.6 Next we must compute \(\frac{n+1}{2}=\frac{5+1}{2}=3\) which means that the median is the third observation in the ordered list, i.e. \(m = 4.2\)
In R, we can use the median command
> median(ex1.data)
[1] 4.2
Often, the mean and the median provide similar values as in the case of the examples above. However, if one of the values in our data were extremely small or large then the mean and median can take on dissimilar values. This is because the mean is affected by extreme observations while the median is not.
Example 3: Find the sample mean and the sample median of the following data: 8, 10, 4, 56, 2, 18
Click For AnswerThe sample mean is \(\bar x = \frac{8+10+4+56+2+18}{6}=16.33\)
To find the sample median we must first order the data: 2, 4, 8, 10, 18, 56
Next, we must compute \(\frac{n+1}{2}=\frac{6+1}{2}=3.5\) which means that the median is the average of the third and fourth observations in the ordered list, i.e. \(m = \frac{8+10}{2} = 9\). In this case, the median was the average of the two middle observations since \(n\) was even.
Using R,
> ex3.data <-c(8, 10, 4, 56, 2, 18)
> mean(ex3.data)
[1] 16.33333
> median(ex3.data)
[1] 9
The sample mean is much larger than the sample median due to the one large observation in our data. For this reason, it is recommended that the median be used to describe skewed data and the mean be used to describe symmetric data.
In fact, the mean and the median can be useful in determining the shape of a distribution. If a distribution is symmetric then the mean is roughly equal to the median. If the distribution is right skewed then the mean will be larger than the median because the larger data values that skew the right tail will cause the mean to be inflated. And distributions that are left skewed will have a mean that is smaller than the median.
Example 4: For the following descriptions, would you recommend using the mean or the median as a measure of center?