Making Measurements

Author

Andrew Dalby

Introduction

Before I wrote a page where I weighed chocolate bars to show how the estimated weights on packets are rarely the actual weights of packets and that the chocolate bars more often than not weight more than it says on the packet. This is because their is an asymmetry in consequences of not meeting customer expectations. A customer will complain if they are sold short and might even take legal action, but they are never going to complain about getting too much.

But what else could have happened when we made the measurements, how else might we have introduced errors in the measurements than could explain the difference between what we measure and reality?

This is going to be very much a thought experiment and I am sure that I am going to miss other potential causes of error, but I am going to try hard to pin-point what I think are the most important possibilities.

Imagine that at the factory they have a chocolate bar standard. This is the idealised chocolate bar against which all other have to be measured. We have standardised weights and measures on which all our systems of weights and measures are based. These have recently been updated as we become aware of effects that cause these standards to change and fluctuate. Examples have been radioactive decay of the materials the standard kg was made from and now the standards often have more complex definitions than originally used. If we make measurements of this standard then we can test our measuring devices for accuracy and precision.

The Measuring Device

Have I chosen the right measuring device?

In the chocolate bar weighing example I first used the wrong scales. I used kitchen scales that only measured to the nearest 0.5g which was not suitable for measuring something that is estimated to a have a particular weight with a precision of 0.1g.

Different quantities have different measurement issues. For lengths you can use a ruler or a tape measure which has a precision of 1mm. When we measure heights in the UK we often give them to the nearest inch - a precision of 2.5cm! But if you are measuring long distances you might use a laser measure, or if you are measuring small distances a caliper.

A general rule of thumb is the more expensive your caliper or laser measure the better its precision will usually be. This is because making the measuring tool and calibrating it becomes more complex and costly.

Which is worse an error of 500 metres in the diameter of the moon or an error of .05mm in the width of a tooled piston head on a lathe? The precision we need is defined by the use we are going to make of the measurement.

Imagine I am looking at the variation in weights of an elephant and a dormouse to see which weight is measured more accurately. If I work on an absolute scale then you would say it is the dormouse but this is not a fair test because they are measured to different levels of precision on different scales. What you need to do is create a dimensionless measure. That is something that does not depend on the scale or the units. This is the coefficient of variation.

Coefficient of Variation

This is the standard deviation of the data divided by the mean.

DATA WILL ALWAYS HAVE SOME DISPERSION (VARIATION). NO TWO MEASUREMENTS WILL BE COMPLETELY IDENTICAL EVEN IF THEY ARE OF THE SAME THING.

All data will have a standard deviation even if we measure the same standard multiple times using the same measuring device there will be differences. I used we to indicate more than one person might make the measurement which then introduces error. Between measurements the apparatus might have drifted from its calibrated values (this happens in electrochemistry for example), there might be contamination adding material, there might be evaporation losing material, there can be chemical reactions …

Looking at our dormouse and elephant. I weight the dormouse on a scale and mean of three measurements is 34.6g and the standard deviation is 0.2g. The measuring device measures to the first decimal place and has a precision which is 0.1g. When repeating the measures with a living breathing creature given the difficulties we managed to get a standard deviation of 0.2g, which is not bad all things considered. For the elephant we need an entirely different weighing platform that only measures to the nearest 10kg. The precision is 100000 times worse than for the dormouse. In this case the elephant weights 3320kg with a standard deviation of 20kg which is again twice the precision of the weighing instrument.

Now to calculate the coefficient of variation.

library(tidyverse)
library(kableExtra)
data <- tibble(Species=c("Dormouse","Elephant"),Mean=c(34.6,3320000),sd=c(0.2,2000),cv=(sd/Mean))
data %>%
  kbl() %>%
  kable_styling()
Species Mean sd cv
Dormouse 3.46e+01 2e-01 0.0057803
Elephant 3.32e+06 2e+03 0.0006024

The coefficient of variation shows that that errors from the measuring device are proportionately much LESS in the case of the elephant than in the case of the dormouse, even though in absolute terms they are much larger.

The way of calculating differences as either a subtraction or a proportion is a very common issue in statistics and it can be very difficult to chose which one makes for a better understanding. In this case looking at the effects of measurement error I think that the proportions makes more sense but it will depend on the question you are asking. If you think about this then at some point a measuring cylinder is going to be a more accurate measure of volume than a micro-pipette.

In general when working with very small quantities the measurement errors scale a lot more, and the costs of measuring accurately increase.

Measuring from a Population

If I make measurements of a sample from a population then the sources of error increase dramatically as each individual is different. From the factory with the chocolate bars there will be batch effects, factory location effects, even daily effects dependent on air temperature, humidity etc. You should feel a certain amount of respect for industrial design that tries to keep these variations to a minimum. Imagine how dangerous life would be if the nuts and bolts that hold cars together varied more than the tiniest fraction.

Going back to the mice and elephants. If I weigh a sample of female dormice in the summer months (they hibernate so there is a seasonal weight fluctuation, there is also a gender effect on weight with males weighing slightly more, I also need to discount the age effect), the mean is 34g and the standard deviation is 2.5g (This is made up data but a reasonable estimate). Then I do the same for female elephants where the mean weight is 3200kg and the standard deviation is 400kg. Clearly a variation of 400kg is much more than 2.5g, but is it as a proportion?

library(tidyverse)
library(kableExtra)
data <- tibble(Species=c("Dormouse","Elephant"),Mean=c(34,3200000),sd=c(2.5,400000),cv=(sd/Mean))
data %>%
  kbl() %>%
  kable_styling()
Species Mean sd cv
Dormouse 34 2.5e+00 0.0735294
Elephant 3200000 4.0e+05 0.1250000

In this case unlike the measuring device case, yes it is but you cannot always be sure that it will be true. Biology gives perhaps the best examples of just how terrible measurement variation can be when trying to use samples to infer the properties of the population.

Measuring Devices

I picked a single measuring device to carry out my experiment. What happens if I have multiple devices available? Then I have the population of measuring devices. Will they all give the same measure?

In an ideal world they would all give the same result but there will be variation in the age of the parts, how much wear they have, the quality of their manufacture and how accurately they measure. Again these errors should be smaller than the population variation but they will be larger than the variation between measures of the same thing on the same device.

One thing that we can do to reduce the error of the device or between devices is that we can calibrate it to make sure that it is giving accurate readings. I mentioned standards earlier and we can have a set of standard weights, volumes and lengths to use to calibrate our devices.

When I did the chocolate bar measuring experiment I actually did this as I had multiple sets of scales and I also had multiple sets of calibration weights. The errors increase significantly as the scales are used close to their level of precision. If the scales measure to the nearest 0.01g (their supposed precision) as you use calibration weights of 0.02g or 0.05g the errors increase. When you use a calibration weight of 20g or 10g there is less variation. I have tabulated the data from the calibrations below.

library(tidyverse)
library(kableExtra)
data <- tibble(Standard=c(50,20,20,10,5,2,2,1,0.5,0.20,0.10,0.05,0.02),Measure=c(50.14,20,20,10,5,2.01,2.01,0.99,0.48,0.20,0.09,0.04,0.00),Difference=(Standard-Measure), Percentage_Difference = (Difference/Standard*100))
data %>%
  kbl() %>%
  kable_styling()
Standard Measure Difference Percentage_Difference
50.00 50.14 -0.14 -0.28
20.00 20.00 0.00 0.00
20.00 20.00 0.00 0.00
10.00 10.00 0.00 0.00
5.00 5.00 0.00 0.00
2.00 2.01 -0.01 -0.50
2.00 2.01 -0.01 -0.50
1.00 0.99 0.01 1.00
0.50 0.48 0.02 4.00
0.20 0.20 0.00 0.00
0.10 0.09 0.01 10.00
0.05 0.04 0.01 20.00
0.02 0.00 0.02 100.00

Again this is the increased difficulty and cost of making small calibration weights and the better your calibration sets the more expensive they will be. From this we could also try more sets of calibration weights leading to another set of potential errors and a population of calibration measures.

Conclusion

  1. If you do not understand the possible errors in the measurements of the data that you are using then you are going to make mistakes in analysing that data.

  2. As we move to smaller scales the apparatus get more expensive and the errors get larger. Biology is now in the age of microfluidics where samples can be in nano or pico moles and focus on single cells. This introduces a larger degree of error and more noise.