Correlation vs Agreement

Author

JC Meyer

Validating a new method of measurement

A recent article compared a point of care lateral flow method for measuring insulin in the horse with a laboratory method (Berryhill et al. 2022). The purpose of this type of comparison is to determine whether the new method agrees with the older method sufficiently to replace it. In this case the advantage of the new method would be that it could provide rapid results for both diagnosis and monitoring of the horse.

These types of papers frequently evaluate measures such as the correlation between the results of the older and the newer method. If both continuous variables are normally distributed then Pearson’s correlation coefficient is used if one of the variables is not normally distributed Spearman’s correlation coefficient is the approach. This approach does not meet the goal of measuring agreement. Correlation measures association not agreement and in fact does not require the two variables being compared to have the same scale. The explanation that follows stems from an article cited over half a million times, authored by Martin Bland and Doug Altman in 1986 https://doi.org/10.1016/S0140-6736(86)90837-8 ,which despite criticisms is still the method of choice when the research question is method comparison (Mansournia et al. 2021).

Scatter plot with regression line

Data from the article have been extracted using a plot digitizer Online-Plot-Digitizer

Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'ggthemes' was built under R version 4.2.2
Warning: package 'ggpubr' was built under R version 4.2.2
Warning: The dot-dot notation (`..rr.label..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(rr.label)` instead.
`geom_smooth()` using formula = 'y ~ x'

Plotting the two variables is essential to visualize and understand the relationship. The data may be related but in a monotonic non-linear fashion rather than a linear fashion . Summary statisitics can be identical as Anscombe’s quartet reveals:

In the plot of the lateral flow vs RIA test results the relationship is linear. With an R^2 of 0.82, it can be seen that the residuals increase in size as the value of insulin rises. However, what does correlation mean?

Correlation is the degree and direction in which two variables are related or associated. Therefore, correlation coefficients are used to measure the strength of the association between two variables. It is a unitless measure ranging between 1 and -1. It would be unexpected for the two methods of measure not to be correlated, that does not however imply concordance in measurement.

Bland-Altman Measure of Agreement

What a clinician is interested in is the difference in the measurement between the two methods in the same individual. If the methods agree perfectly there will be no difference, if one instrument consistently measures higher or lower than the other then that is termed bias and it is measured as the mean difference. It is unlikely that both instruments measure identically so bias is anticipated. What is of concern to the clinician is the magnitude of the bias and if it is clinically important. Additionally is the variability in bias consistent across the range of measurement, in this case insulin. There is a 95% limit of agreement which is the: \(mean\:difference\pm\:1.96\times\:standard\:deviation\). these limits represent the limits between which the majority of mean differences lie. The mean difference on the B-A plot below is represented by the dark horizontal line at 23.36 pmol/L and the limits of agreement are represented by the dotted red lines at -108.98 pmol/L and 155.71. pmol/L The limits of agreement are more like a reference interval than a confidence interval.In this study there are repeated measures on some individuals, unless this was accounted for the standard deviation is likely too conservative Bland and Altman. The clinician will decide if these differences in measurement are clinically acceptable, that is could they impact decisions such as treatment or management of the horse. If they are then the tests can be used interchangeably.

[1] 23.36426
[1] -108.9795
[1] 155.708

The Bland Altman plot demonstrates that the difference in measurement between RIA and the Wellness Ready Test increases as the insulin measurement increases above approximately 250 pmol/L. Ideally the distribution of the mean difference should be normal. The plot of mean difference between WRT and RIA is slightly right skewed.

Warning in geom_histogram(bindwidth = 15, fill = "white", color = "black"):
Ignoring unknown parameters: `bindwidth`
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Bland and Altman recommended transforming the two measures if the difference plot was not normally distributed. Which produces this plot:

The important points to remember :

  • correlation and agreement are not the same

  • clinicians are primarily interested in how well the two instruments agree

  • the significance in the measurement difference is determined by the clinician not by statistical testing

References

Berryhill, Emily H., Naomi S. Urbina, Sam Marton, William Vernau, and Flavio H. Alonso. 2022. “Validation and Method Comparison for a Point-of-Care Lateral Flow Assay Measuring Equine Whole Blood Insulin Concentrations.” Journal of Veterinary Diagnostic Investigation, December, 104063872211422. https://doi.org/10.1177/10406387221142288.
Mansournia, Mohammad Ali, Rachel Waters, Maryam Nazemipour, Martin Bland, and Douglas G. Altman. 2021. “Bland-Altman Methods for Comparing Methods of Measurement and Response to Criticisms.” Global Epidemiology 3 (November): 100045. https://doi.org/10.1016/j.gloepi.2020.100045.