Figure 1 shows the molecular structure of l-Ascorbic acid (Vitamin C, systematic name (5R)-[(1S)-1,2-Dihydroxyethyl]-3,4-dihydroxyfuran-2(5H)-one), as rendered from the crystal structure1. This report details an analysis of the data collected in an attempt to establish the validity of a published bioassay procedure for reliably measuring Vitamin C intake2. This same data set has also been used previously as an example for analytical procedures for carrying out bioassays3. The work presented here details the exploratory data analysis, a more detailed statistical assessment of the significance of Vitamin C intake and its delivery mechanism and conclusions drawn from that.
The data analysed were those in the ToothGrowth
data provided as part of the R {datasets} package. Before beginning the analysis, it is necessary to clarify details of the nature of the data where the information provided in R is either misleading or incorrect. The data consists of measurements of the mean length of the odontoblast cells harvested from the incisor teeth of a population of 60 guinea pigs. These animals were divided into 6 groups of 10 and consistently fed a diet with one of 6 Vitamin C supplement regimes for a period of 42 days. The Vitamin C was administered either in the form of Orange Juice (OJ) or chemically pure Vitamin C in aqueous solution (VC). Each animal received the same daily dosage of Vitamin C (either 0.5, 1.0 or 2.0 milligrams) consistently. Since each combination of supplement type and dosage was given to 10 animals this required a total of 60 animals for the study. After 42 days, the animals were euthanized, their incisor teeth were harvested and subject to analysis via optical microscopy to determine the length (in microns) of the odontoblast cells (the layer between the pulp and the dentine). The ToothGrowth data set therefore consists of 60 observations of the 3 variables - mean length of odontoblasts (microns), supplement type (OJ or VC) and Vitamin C dosage (milligrams/day).
The code chunk in Appendix A reads in the ToothGrowth data set and performs an initial data analysis by verifying the structure of the data and then outputting the mean and variance of the odontoblast cell length data for each of the six distinct groups of animals as the following table:
Dose | Supplement | Mean(Length) | Variance(Length) |
---|---|---|---|
0.5 | OJ | 13.2 | 19.9 |
1.0 | OJ | 22.7 | 15.3 |
2.0 | OJ | 26.1 | 7.0 |
0.5 | VC | 8.0 | 7.5 |
1.0 | VC | 16.8 | 6.3 |
2.0 | VC | 26.1 | 23.0 |
The structure of the ToothGrowth data.frame is in excellent agreement with that to be expected in light of the discussion above, namely 60 observations of 3 variables - len, supp & dose. The mean(length) results appear to show a strong association between mean odontoblast length and dose. It is also interesting to note that the variance(length) results show greater variances of the OJ supplement data at the two lower dosages compared to the higher dosage and the opposite for the VC supplement data which shows lower variances at lower dosages and higher at the higher dosage. At the very least this observation confirms that it would be unsafe to assume any common variance when testing between groups.
It is tempting to run other tests on these sub-populations to verify whether or not their distributions are sufficienty normal to allow meaningful Student’s t based testing to establish the significance of the observed differences in mean values (eg Shapiro-Wilkes, skewness, kurtosis etc). However, this would be pointless (and potentially misleading) since at these very small sample sizes (10 per sub-population) none of these tests will return reliable results. The best way forwards is therefore to assume (on the basis of the experimental design) that the parent distributions from which these samples are drawn are sufficiently normal to support Student’s t based testing.
The code chunk in Appendix B plots a chart showing the distribution of the odontoblast length measurements as a function of the dosage administered and supplement type. The exploratory scatterplot (Figure 2) of length vs dose with symbology/colour showing supplement also reveals some interesting features:
For the data as a whole there again appears to be a strong association between mean odontoblast length and dose.
For the 0.5 & 1.0 milligram/day dose data, the mean odontoblast length and supplement appear to be associated with OJ length usually (but not exclusively) greater than VC length data.
For the 2.0 milligram/day dose data it is not obvious that there is any association between length and supplement.
The vertical spread of the various groups of plotted points confirms the observations from the variance calculations above.
The code chunk in Appendix C evaluates odontoblast cell length mean differences against 99.9% confidence intervals calculated using Student’s t distribution tests for various permutations of the other two variables and outputs them as the following table:
pop1 | pop2 | filter | lower | upper | h0 |
---|---|---|---|---|---|
1.0 | 0.5 | all | -11.98 | -6.28 | FALSE |
1.0 | 0.5 | OJ | -13.42 | -5.52 | FALSE |
1.0 | 0.5 | VC | -11.27 | -6.31 | FALSE |
2.0 | 1.0 | all | -9.00 | -3.73 | FALSE |
2.0 | 1.0 | OJ | -6.53 | -0.19 | FALSE |
2.0 | 1.0 | VC | -13.05 | -5.69 | FALSE |
VC | OJ | all | -3.01 | 10.41 | TRUE |
VC | OJ | 0.5 | -1.50 | 12.00 | TRUE |
VC | OJ | 1.0 | -0.03 | 11.89 | TRUE |
VC | OJ | 2.0 | -7.25 | 7.09 | TRUE |
The table above compares mean odontoblast length differences between:
0.5 and 1.0 milligram dosages for all data, for OJ supplement only and VC supplement only.
1.0 and 2.0 milligram dosages for all data, for OJ supplement only and VC supplement only.
VC and OJ supplements for all data, for 0.5 milligram, 1.0 milligram and 2.0 milligram dosages only.
Assumptions underlying the detailed assessment test procedure used above are that:
The Student’s t-test for comparing population means is appropriate (ie, in all comparisons the two samples are randomly drawn from independent populations with normal distributions).
The observations are unpaired between the two samples compared, in each case.
The variances in each case between the two samples being compared are not necessarily equal and hence the Welch approximation based on degrees of freedom is appropriate.
A 95% confidence interval (CI) based test is insufficiently rigorous since when conducting multiple t.test CI evaluations the probability of a type 1 error becomes too great. Therefore a more rigorous 99.9% threshold is used.
Given the assumptions above, then the results tabulated for the various combinations of samples compared under the differing filter conditions provide a good assessment of the impact of differing dosages and supplement delivery methods on mean odontoblast length in this data set. In all the tests performed above, the \(h_0\) hypothesis is that the means of the two samples selected for comparison are not significantly different. The evaluation of whether \(h_0\) is true or false is based on determining whether the 99.9% CI (columns lower and upper in code chunk output above) for the difference between the two sample means includes the value 0. A 99.9% CI has been selected instead of the more usual 95% because of the number of CI tests being performed, the probability of at least one type I error occuring amongst 10 tests being \(p = 1 - 0.95^{10} \approx 40\%\) at 95% versus \(p = 1 - 0.999^{10} \approx 1\%\) at 99.9%. If the CI does include 0 then \(h_0\) is considered true (or at least, not proven false). If it does not, then \(h_0\) is considered false, ie there is a significant difference between the means of the two samples being compared.
Examining the results of the t-tests provided above, all results comparing different dosages indicate that \(h_0\) is false, ie, there is a signficant difference in the mean values of the two samples being compared. However, the results comparing different supplement types at the same dosages all show that that \(h_0\) is true (or at least not proven false).
Considering only the \(h_0\) false results returned from the t-tests for comparison of dosages, in all cases the confidence intervals returned negative values, confirming that the higher doses of Vitamin C are associated with higher mean odontoblast lengths, regardless of the supplement type used.
The overall conclusion from this analysis is that test protocol based on determination of mean odontoblast length in guinea pigs is an effective bioassay methodology for determining their Vitamin C intake (at least within the dosage limits tested) but cannot reliably distinguish between the same doses administered by the two different supplements examined.
The following code chunk computes and displays the exploratory data analysis results table.
## Redacted for publication since it contains part of a solution to SI Project Part 2.
The following code chunk computes and displays the exploratory data analysis scatterplot.
## Redacted for publication since it contains part of a solution to SI Project Part 2.
The following code chunk computes and displays the Student’s t-test results table.
## Redacted for publication since it contains part of a solution to SI Project Part 2.
J. Hvoslef, “The Crystal Structure of L-Ascorbie Acid, ‘Vitamin C’. I. The X-ray Analysis”. Acta Cryst., 1968, B24, 23.↩
E.W. Crampton, “The growth of the odontoblasts of the incisor tooth as a criterion of the Vitamin C intake of the Guinea Pig.”, J. Nutr., 1947, 491.↩
C.I. Bliss, “The Statistics of Bioassay”. Academic Press Inc., 1952, 499.↩