## Using poppler version 20.12.1
Reflective commentary is in red.
I was helping my son with his A-level Maths revision and he had the following question from a past paper.
A company manager is investigating the times taken, t minutes, to complete an aptitude test. The human resources manager produced the table below of coded times, x minutes, for a random sample of 30 applicants.
(you may use ∑fy=355 and ∑fy2=5675)
Interpolation is assuming that the data is linearly distributed across the range which is a reasonable assumption over short distances.
The median is the midpoint between the 15th and 16th data point. It will lie in the 5≤x<10 region. Using linear interpolation it will be 12.5/15 of the distance across the range. This is 5/6 of the distance between 5 and 10 which will be 9.17 minutes.
This would usually be solved just using the built in calculator functions but the question gives you the values of ∑fy2 and also ∑fy and so you can also use the alternative form of the standard deviation formula to calculate it
SD=√∑x2−(∑x)2nn
SD=√5675−(355)23030
= 7 minutes
This is a nice round number which often is a clue that you have done it correctly.
The company manager is told by the human resources manager that he subtracted 15 from each of the times and then divided by 2, to calculate the coded times.
This is where I got a little stuck. I had no problems with the median that will just be shifted to 2x+15 but I could not remember what the rule we were taught about the standard deviation was. So I decided to take the brute force approach and tabulate the modified data and calculate the ∑fy2 and also the ∑fy so that I could use the same formula as before.
The median will be shifted proportionally. The median of the coded times was 9.17. The median of the time t will be (9.17 x 2) + 15 = 33.34 minutes.
This can be checked by tabulating the data again for t where x is replaced by 2x+15 and the midpoint of each range of t, z.
(∑fz=1160 and ∑fz2=50750)
SD=√50750−(1160)23030=14mins
The answer 14 is obviously double the previous answer of 7 and so the rule I learned at school must have been that the standard deviation is altered by the multiplication factor but you ignore the addition. This got me to thinking why this is true and it is fairly obvious. Anyway the correct way of getting the answer is below based on using the knowledge that the standard deviation is only affected by the scale factor not the translation
The calculation for the coding involves a scale factor of 2 and a coordinate shift of 15. Only the scale factor affects the standard deviation and so the standard deviation will be double that of the coded time data and it will be 14 minutes.
In using the terms scale factor and thinking of the addition as a translation I am subconsciously actually showing a better understanding of the problem and if I had thought about this deeply I could have saved myself a lot of time working out the calculation. The 2x+15 is a transformation. It is a doubling followed by a translation. In statistics data is defined by three properties. Its position on the number line (usually represented by the mean or median), its width (represented by the variance or standard deviation) and its shape (here that is not important but it is not normal and we use the median)
When we apply the transformation it shifts the position of the data ont he number line. It doubles the magnitude and then translates it 15 units further along the line. The width however is NOT affected by the translation but only the magnitude. I have performed the same transformation on some normally distributed random data.
This is my reflection on the reflection. Looking at the original version the differences for the histograms was not very clear because of the changes in scale of the axes and so I added a combined plot
This shows a few key points: