Basic Inferential Data Analysis¹

Summary

This second part of the project is aimed at testing and evaluating hypotheses based on data, using the statistical methods discussed in the course. The assignment refers to the ‘TootGrowth’ dataset described below. It investigates: (i) whether the delivery method makes a difference in the outcome; (ii) whether dosage makes a difference; and (iii) what “treatement(s)” appears to be the most effective. I first perform some Exploratory Data Analysis, after having properly processed the data, and then conduct formal statistical testing about differences in means, correcting for the number of the hypotheses tested. Findings suggest that Vitamin C dosage is linked with statistically significant gains in cells’ length, with no significant difference between delivery methods for the highest (and most effective) dose level.

Overview

The ‘TootGrowth’ dataset² contains measurements³ on 60 guinea pigs in total. The response or outcome variable, coded as ‘len’, is the length of odontoblasts⁴. The covariates are ‘dose’, which is formally a numeric variable indicating vitamin C mg/day dosage given to the subjects, and ‘supp’, a categorical variable (factor) indicating delivery method for the drug, with two “levels” (orange juice, OJ, or ascorbic acid, VC). Since only 3 different doses were administered, it can be considered a factor variable as well. Thus, the data are effectively split into either 2 or 3 major groups, according to delivery method and dosage, respectively, for a total of 6 subsamples (dose-supp pairs).

Data Processing & Basic Exploratory Data Analysis

The preliminary data processing is minimal, since there are no missing values in the TootGrowth dataset. The data manipulation necessary for the analysis (subsetting and grouping, for the most part) is carefully reported in the Appendix, where I also report a call to the ‘str’ function, a routinary general ‘summary’ call, and summaries by group.

A boxplot (Figure 1 in the Appendix) helps visualize information about the distribution of the response variable of interest, length, and aids intuition about the hypotheses I want to test, namely that there are differences in outcome based both on delivery method and dosage. In particular, it suggest that OJ might be better for low and medium dosages, while there is no clear-cut difference for high dosage (2 mg/day), and that higher dosages perform better. I assume we want to understand what is most effective in favoring growth; as such, I am interested in testing the following Null Hypotheses, for a total of 6 tests to be performed: whether a higher dose is associated with a statistically significant longer length (high vs medium and medium vs low dosages); whether there is an overall difference between the subjects given orange juice or ascorbic acid; and which supp-dose combination is the more effective (by comparing OJ-High to OJ-Medium, VC-High to VC-Medium, and testing whether OJ-High and VC-High are really different).

Statistical Testing

Assumptions

Normality A full array of Shapiro-Wilk normality test is reported in the Appendix. Following Royston (1995)⁵, since no test reported a p-value greater than 0.01, we fail to reject the Null Hypothesis; we thus assume normality of the data.

Variances If Welch tests are used, unequal variances are allowed. We need no further assumption on variances⁶.

Unpaired Since each animal received one of the three dosages by one of the two delivery methods, there is no overlapping between the (sub-)samples. In other words, the data are unpaired, and we consider the (sub-)samples as independent samples.

Methodology As such, Welch t-tests with unequal variances are performed, correcting for multiplicity.

Results

Naïve Inference: possible p-value inflation

In setups such as the present one, where there is no pre-determined target for the experiment/data analysis, and the outcome of interest can be compared in different ways, we are at risk of incurring into the Multiplicity Problem, an issue that arises whenever multiple comparisons are at hand. The more statistical hypotheses we test, the more likely we are to get an incorrect inference, purely by randomness. There are numerous methods to try to deal with this issue; as seen in class, I’ll apply the Bonferroni correction and the BH-adjustments to my tests’ p-values.

Accounting for n° of tests: comparison with corrections

The classical approach, the Bonferroni correction, is directed towards control of the Family-Wise Error Rate (FWER) “in a strong sense”⁷. While reported for completeness, for the issue at hand the “BH” (Benjamini-Hochberg) approach⁸, which controls the False Discovery Rate (FDR), is more appropriate.

The table below reports the Welch t-test p-values, as well as the Bonferroni-corrected and the BH-adjusted ones⁹.

	Test	p-value	Bonferroni	BH-adjusted
Dosage	High vs Medium	0.00001906\(^{***}\)	0.00011439\(^{***}\)	0.00005719\(^{***}\)
	Medium vs Low	0.00000013\(^{***}\)	0.00000076\(^{***}\)	0.00000076\(^{***}\)
Delivery method	OJ vs VC (overall)	0.06063451\(^{*}\)	0.36380705	0.07276141\(^{*}\)
Combinations	OJ-High vs OJ-Medium	0.03919514\(^{**}\)	0.23517085	0.05879271\(^{*}\)
	VC-High vs VC-Medium	0.00009156\(^{***}\)	0.00054934\(^{***}\)	0.00018311\(^{***}\)
	OJ-High vs VC-High	0.96385159	1.00000000	0.96385159

From the table we see that the differences in mean length between dosages of vitamin C are both statistically significant at the 1% level¹⁰. The significance of the overall difference between mean length by delivery methods gets eaten away by the Bonferroni correction, and at most is significant at the 10% level. Differences between high and medium dose levels are significant even within delivery method subgroups, though only at 10% for the BH-corrected OJ one. As it seems intuitive from the boxplot, there is no real difference between delivery methods for the high dose level.

Conclusions

Results show that while delivery method might have some bearing on the observed differences in mean cells length, it is not that much statistically significant. On the other hand, the level of Vitamin C dosage (mg/day) leads to statistically significant differences, suggesting a direct relationship between dose and cells’ length. Each delivery method at high dosages “performed better” (greater mean lenght) than at medium dosage, in particular. There does not seem to be any statistically appreciable difference between the combinations with associated highest mean (and median) length, namely the Orange Juice-High Dosage and Ascorbic Acid (VC)-High Dosage subsamples.

Appendix

Routinary EDA

library(datasets) 
data("ToothGrowth") 
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Summaries by groups

library(data.table) 
dg <- ToothGrowth 
setDT(dg) 
# Summary of length 'len'... 
by.supp <- dg[, as.list(summary(len)), by = supp] ; by.supp

##    supp Min. 1st Qu. Median     Mean 3rd Qu. Max.
## 1:   VC  4.2  11.200   16.5 16.96333  23.100 33.9
## 2:   OJ  8.2  15.525   22.7 20.66333  25.725 30.9

by.dose <- dg[, as.list(summary(len)), by = dose] ; by.dose

##    dose Min. 1st Qu. Median   Mean 3rd Qu. Max.
## 1:  0.5  4.2   7.225   9.85 10.605  12.250 21.5
## 2:  1.0 13.6  16.250  19.25 19.735  23.375 27.3
## 3:  2.0 18.5  23.525  25.95 26.100  27.825 33.9

by.both <- dg[, as.list(summary(len)), by = c("supp", "dose")] ; by.both

##    supp dose Min. 1st Qu. Median  Mean 3rd Qu. Max.
## 1:   VC  0.5  4.2   5.950   7.15  7.98  10.900 11.5
## 2:   VC  1.0 13.6  15.275  16.50 16.77  17.300 22.5
## 3:   VC  2.0 18.5  23.375  25.95 26.14  28.800 33.9
## 4:   OJ  0.5  8.2   9.700  12.25 13.23  16.175 21.5
## 5:   OJ  1.0 14.5  20.300  23.45 22.70  25.650 27.3
## 6:   OJ  2.0 22.4  24.575  25.95 26.06  27.075 30.9

Subsetting data by groups

OJ.all <- ToothGrowth[ToothGrowth$supp=="OJ", ] 
VC.all <- ToothGrowth[ToothGrowth$supp=="VC", ] 
OJ.low <- ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose==0.5, ] 
VC.low <- ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose==0.5, ] 
OJ.med <- ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose==1, ] 
VC.med <- ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose==1, ] 
OJ.h   <- ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose==2, ] 
VC.h   <- ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose==2, ] 
OJ.lm  <- ToothGrowth[ToothGrowth$supp=="OJ" & 
                         (ToothGrowth$dose==0.5 | ToothGrowth$dose==1), ] 
VC.lm  <- ToothGrowth[ToothGrowth$supp=="VC" & 
                         (ToothGrowth$dose==0.5 | ToothGrowth$dose==1), ] 
low.all  <- ToothGrowth[ToothGrowth$dose==0.5, ] 
med.all  <- ToothGrowth[ToothGrowth$dose==1, ] 
high.all <- ToothGrowth[ToothGrowth$dose==2, ] 
lm.all   <- ToothGrowth[ToothGrowth$dose==0.5 | ToothGrowth$dose==1, ] 
print("Please see full version on RPubs for the code")

## [1] "Please see full version on RPubs for the code"

Normality Tests (“full array” reported here)

# Overall 
shap1.1 <- shapiro.test(ToothGrowth$len)  
# Delivery method 'supp' 
shap2.1 <- shapiro.test(OJ.all$len)  
shap2.2 <- shapiro.test(VC.all$len)  
# Dosage 'dose' 
shap3.1 <- shapiro.test(low.all$len)  
shap3.2 <- shapiro.test(med.all$len)  
shap3.3 <- shapiro.test(high.all$len)  
# 6 subsamples 
shap4.1 <- shapiro.test(OJ.low$len)  
shap4.2 <- shapiro.test(OJ.med$len)  
shap4.3 <- shapiro.test(OJ.h$len)  
shap4.4 <- shapiro.test(VC.low$len)  
shap4.5 <- shapiro.test(VC.med$len)  
shap4.6 <- shapiro.test(VC.h$len)

shapiro.results <- data.frame(stat = rep(0,12), pval = rep(0,12)) 
shapiro.results$stat <- rbind(shap1.1$statistic, shap2.1$statistic, shap2.2$statistic, 
                              shap3.1$statistic, shap3.2$statistic, shap3.3$statistic, 
                              shap4.1$statistic, shap4.2$statistic, shap4.3$statistic, 
                              shap4.4$statistic, shap4.5$statistic, shap4.6$statistic) 
shapiro.results$pval <- rbind(shap1.1$p.value, shap2.1$p.value, shap2.2$p.value, 
                              shap3.1$p.value, shap3.2$p.value, shap3.3$p.value, 
                              shap4.1$p.value, shap4.2$p.value, shap4.3$p.value, 
                              shap4.4$p.value, shap4.5$p.value, shap4.6$p.value) 
shapiro.cols <- c("W-statistics", "p-value") 
shapiro.rows <- c("All sample", "OJ (all)", "VC (all)", 
                  "Low dose", "Medium dose", "High dose", 
                  "OJ-Low", "OJ-Medium", "OJ-High", 
                  "VC-Low", "VC-Medium", "VC-High") 
colnames(shapiro.results) <- shapiro.cols 
rownames(shapiro.results) <- shapiro.rows 
shapiro.results

##                     W    p-value
## All sample  0.9674286 0.10910049
## OJ (all)    0.9178434 0.02358773
## VC (all)    0.9656715 0.42844646
## Low dose    0.9406451 0.24660149
## Medium dose 0.9313431 0.16388214
## High dose   0.9777535 0.90191150
## OJ-Low      0.8927435 0.18204079
## OJ-Medium   0.9266004 0.41529828
## OJ-High     0.9625752 0.81479083
## VC-Low      0.8899969 0.16956273
## VC-Medium   0.9083405 0.26978551
## VC-High     0.9732764 0.91944968

Full code and output for Welch t-tests

library(ggpubr)

## Loading required package: ggplot2

highVSmed <- t.test(x = high.all$len, y = med.all$len, 
                    alternative = "two.sided", var.equal = FALSE) 
medVSlow  <- t.test(x = med.all$len, y = low.all$len, 
                    alternative = "two.sided", var.equal = FALSE)
overall   <- t.test(x = OJ.all$len, y = VC.all$len, 
                    alternative = "two.sided", var.equal = FALSE) 
lm.OJvsVC    <- t.test(x = OJ.lm$len, y = VC.lm$len, 
                       alternative = "two.sided", var.equal = FALSE) 
OJ.highVSmed <- t.test(x = OJ.h$len, y = OJ.med$len, 
                       alternative = "two.sided", var.equal = FALSE) 
VC.highVSmed <- t.test(x = VC.h$len, y = VC.med$len, 
                       alternative = "two.sided", var.equal = FALSE) 
high.OJvsVC  <- t.test(x = OJ.h$len, y = VC.h$len, 
                       alternative = "two.sided", var.equal = FALSE) 
pvals <- c(highVSmed$p.value, medVSlow$p.value, overall$p.value, 
           OJ.highVSmed$p.value, VC.highVSmed$p.value, high.OJvsVC$p.value) 
round(pvals, 8)

## [1] 0.00001906 0.00000013 0.06063451 0.03919514 0.00009156 0.96385159

# Bonferroni 
bonf <- p.adjust(pvals, method = "bonferroni") 
round(bonf, 8)

## [1] 0.00011439 0.00000076 0.36380705 0.23517085 0.00054934 1.00000000

# Benjamini-Hochberg 
bh.pvals <- p.adjust(pvals, method = "BH") 
round(bh.pvals, 8)

## [1] 0.00005719 0.00000076 0.07276141 0.05879271 0.00018311 0.96385159

print("Test for overall difference in means between High and Medium dosage")

## [1] "Test for overall difference in means between High and Medium dosage"

highVSmed

## 
##  Welch Two Sample t-test
## 
## data:  high.all$len and med.all$len
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.733519 8.996481
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

print("Test for overall difference in means between Medium and Low dosage")

## [1] "Test for overall difference in means between Medium and Low dosage"

medVSlow

## 
##  Welch Two Sample t-test
## 
## data:  med.all$len and low.all$len
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276219 11.983781
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

print("Test for overall difference in means by delivery method: OJ vs VC")

## [1] "Test for overall difference in means by delivery method: OJ vs VC"

overall

## 
##  Welch Two Sample t-test
## 
## data:  OJ.all$len and VC.all$len
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

# print("Test for difference in means for low-medium dosage (OJ vs VC)") 
# lm.OJvsVC 
print("Test for difference in means for OJ-High vs OJ-Medium")

## [1] "Test for difference in means for OJ-High vs OJ-Medium"

OJ.highVSmed

## 
##  Welch Two Sample t-test
## 
## data:  OJ.h$len and OJ.med$len
## t = 2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1885575 6.5314425
## sample estimates:
## mean of x mean of y 
##     26.06     22.70

print("Test for difference in means for VC-High vs VC-Medium")

## [1] "Test for difference in means for VC-High vs VC-Medium"

VC.highVSmed

## 
##  Welch Two Sample t-test
## 
## data:  VC.h$len and VC.med$len
## t = 5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   5.685733 13.054267
## sample estimates:
## mean of x mean of y 
##     26.14     16.77

print("Test for difference in means for High dosage, OJ vs VC")

## [1] "Test for difference in means for High dosage, OJ vs VC"

high.OJvsVC

## 
##  Welch Two Sample t-test
## 
## data:  OJ.h$len and VC.h$len
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Figure 1

library(ggplot2) 
library(RColorBrewer) 
CBfriendly <- c("#E69F00", "#56B4E9") # We should all try to make colorblindness friendliness the default 
gg <- ggplot(data = ToothGrowth, 
             aes(x = as.factor(dose), y = len, fill = supp)) + 
    scale_fill_manual(values = CBfriendly) + 
    geom_boxplot() + 
    xlab("Vitamin C Dose (mg/day)") + 
    ylab("Tooth Cells Length") + 
    ggtitle("Boxplot of Tooth Cells Length by Delivery Method & Dose") + 
    theme_light() + 
    theme(legend.position = "top", 
          legend.title = element_text(color = "darkgreen", size = 14), 
          legend.text = element_text(color = "darkred", size = 14) 
          ) + 
    theme(plot.title = element_text(size = 16, hjust = 0.5)) + 
    theme(axis.text = element_text(size = 12), 
          axis.title = element_text(size = 12)) 
gg

Please note that blank space and division in sections were used to make the report more readable. Moreover, the Appendix shows some code chunks and extensive output for clarity and reproducibility. Please do factor these: overall, length requirements are (more or less) met!↩︎
See ?TootGrowth (R help page) for further details.↩︎
See first chunk in the Appendix.↩︎
Henceforth “tooth cells”, for simplicity. I apologize to biologists, vets & doctors alike for this.↩︎
Royston, P. (1995), Remark AS R94: A Remark on Algorithm AS 181: The W test for Normality, Applied Statistics, 44, 547–551. doi: 10.2307/2986146.↩︎
Just eyeballing the data already makes it difficult to argue for equal variances. For the sake of completeness, we should perform a F-test for equality of variances. Paraphrasing Fermat, I have a truly marvelous output of this, which this margin is too narrow to contain↩︎
See Benjamini, Y. & Hochberg, Y. (1995), Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society, Series B (Methodological), Vol 57 No. 1, 289-300. The authors specify that “control of the FWER is important when a conclusion from the various individual inferences is likely to be erroneous when at least one of them is”. The Bonferroni correction is the most conservative approach, thus.↩︎
ibidem. The “BH” approach has more power, as the authors show.↩︎
Legend for statistical significance levels: \(^{***}\) 1%, \(^{**}\) 5%, \(^{*}\) 10%↩︎
They’re actually significant up to the 0.0001% level!↩︎

Statistical Inference Course Project - Part 2

Francesco Aldo Tucci

4/19/2021

Basic Inferential Data Analysis¹

Summary

Overview

Data Processing & Basic Exploratory Data Analysis

Statistical Testing

Assumptions

Results

Naïve Inference: possible p-value inflation

Accounting for n° of tests: comparison with corrections

Conclusions

Appendix

Routinary EDA

Summaries by groups

Subsetting data by groups

Normality Tests (“full array” reported here)

Full code and output for Welch t-tests

Figure 1

Statistical Inference Course Project - Part 2

Francesco Aldo Tucci

4/19/2021

Basic Inferential Data Analysis1

Summary

Overview

Data Processing & Basic Exploratory Data Analysis

Statistical Testing

Assumptions

Results

Naïve Inference: possible p-value inflation

Accounting for n° of tests: comparison with corrections

Conclusions

Appendix

Routinary EDA

Summaries by groups

Subsetting data by groups

Normality Tests (“full array” reported here)

Full code and output for Welch t-tests

Figure 1

Basic Inferential Data Analysis¹