Other Types of Studies

(lewis?) categorize method comparison studies into threedifferent types. The key difference between the first two is whether or not a gold standard method is used.

In situations where one instrument or method is known to be accurate and precise', it is considered as thegold standard’ . A method that is not considered to be a gold standard is referred to as an `approximate method’. In calibration studies they arereferred to a criterion methods and test methods respectively.

1. Calibration problems.

The purpose is to establish a relationship between methods, one of which is an approximate method, the other a gold standard. The results of the approximate method can be mapped to a known probability distribution of the results of the gold standard (lewis1991?).

In such studies, the gold standard method and corresponding approximate method are generally referred to a criterion method and test method respectively. Altman and Bland (1983) make clear that their framework is not intended for calibration problems.

2. Comparison problems.

When two approximate methods, that use the same units of measurement, are to be compared. This is the case which the Bland-Altman methodology is specfically intended for, and therefore it is the most relevant of the three.

3. Conversion problems.

When two approximate methods, that use different units of measurement, are to be compared. This situation would arise when the measurement methodsuse ‘different proxies’, i.e different mechanisms of measurement. (lewis?) deals specifically with this issue. In the context of this study, it is the least relevant of the three.


Gold Standards

cautions that gold standards should not be assumed to be error free. `It is of necessity a subjective decision when we come to decide that a particular method or instrument can be treated as if it was a gold standard’. The clinician gold standard, the sphygmomanometer, is used as an example thereof.

The sphygmomanometer leaves considerable room for improvement (Dunn (2002)). Pizzi (1999) similarly addresses the issue of glod standards, well-established gold standard may itself be imprecise or even unreliable.

The NIST F1 Caesium fountain atomic clock is considered to be the gold standard when measuring time, and is the primary time and frequency standard for the United States. The NIST F1 is accurateto within one second per 60 million years . In even extreme cases, there must be an assumption of inaccuracy with gold standard systems.

Measurements of the interior of the human body are, by definition, invasive medical procedures. The design of method must balance the need for accuracy of measurement with the well-being of the patient. This will inevitably lead to the measurement error as described by Dunn (2002).

The magnetic resonance angiogram, used to measure internal anatomy, is considered to the gold standard for measuring aortic dissection. Medical tests based upon the angiogram are reported to have a false positive reporting rate of 5% and a false negative reporting rate of 8% ( i.e. a sensitivity of 95% and a specificity of 92% ) (“Acute Chest Pain ( Suspected Aortic Dissection) - American College of Radiology Expert Group Report” 2008).

In literature they are, perhaps more accurately, referred to as fuzzy gold standards' \citep{phelps}. Consequently when one of the methods is essentially a fuzzy gold standard, as opposed to atrue’ gold standard, the comparison of the criterion and test methods should be consider in the context of a comparison study, as well as of a calibration study.

discusses the importance of gold standards in the context of method comparison studies. Currently the phrase gold standard' describes the most accurate method of measurement available. No other criteria are set out. Further to \citet{DunnSEME}, various gold standards have a varying levels of repeatability. Dunn cites the example of the sphygmomanometer (i.e. a blood pressure measurement cuff), which is prone to measurement error. Consequently it can be said that a measurement method can be thegold standard’, yet have poor repeatability. recognizes this problem. Hence, if the most accurate method is considered to have poor repeatability, it is referred to as a `bronze standard’. Again, no formal definition of a bronze standard exists.

In literature gold standards are, perhaps more accurately, can be referred to as fuzzy gold standards . Consequently, when one of the methods is essentially a fuzzy gold standard, as opposed to a `true’ gold standard, the comparison of the criterion and test methods should be consider both in the context of a comparison study and acalibration study.

According to Bland and Altman, one should use the approach previous outlined, even when one of the methods is a gold standard.


Fuzzy Gold Standards

The Gold Standard is considered to be the most accurate measurement of a particular parameter. But even gold standard raters must be assumed to have some level of measurement error. Fuzzy gold standard are considered by Phelps and Hutson 1994)

makes two important points in relation to these categories. Firstly he remarks that there isn’t clear cut differences between each category.

Secondly he comments on the clinician gold standard, the sphygmomanometer, . also attends to this issue: .

Repeatability and Gold Standards

Currently the phrase gold standard describes the most accurate method of measurement available. No other criteria are set out. Further to (dunnSEME?), various gold standards have a varying levels of repeatability. Dunn cites the example of the sphygmomanometer, which is prone to measurement error. Consequently it can be said that a measurement method can be the `gold standard’, yet have poor repeatability.

Bronze Standard

(dunnSEME?) recognizes this problem. Hence, if the most accurate method is considered to have poor repeatability, it is referred to as a bronze standard. Again, no formal definition of a `bronze standard’ exists.

References

“Acute Chest Pain ( Suspected Aortic Dissection) - American College of Radiology Expert Group Report.” 2008.
Altman, DG, and JM Bland. 1983. “Measurement in Medicine: The Analysis of Method Comparison Studies.” Journal of the Royal Statistical Society. Series D (The Statistician) 32 (3): 307–17.
Dunn, Graham. 2002. Statistical Evaluation of Measurement Error. Second. Stanford: American Mathematical Society; Digital Press.
Pizzi, N. J. 1999. “Fuzzy Pre-Processing of Gold Standards as Applied to Biomedical Spectra Classification.” Artificial Intelligence in Medicine 16: 171–82.