Introduction

The following analysis is for the purposes of course project of Coursera Statistical Inference course. The data for the project are the ToothGrowth data in the R datasets package. The data set is consists of 60 observations of 3 variables. The variables are the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two supplement types (orange juice or ascorbic acid). The analysis shows that the supplement type does not have a significant impact on tooth growth, whereas bigger dosage means greater tooth growth.

Data Description

The data is a data frame of 60 observations on 3 variables: 1. len: The length of odontoblasts (teeth) in each of 10 guinea pigs (numeric) 2. supp: the supply method, oJ stands for Orange Juice and VC stands for ascorbic acid (factor) 3. dose: the dosage of the vitamin C given (0.5, 1, and 2 mg),(numeric)

Exploratory Data analysis

The data are loaded in R and renamed to “tg” in order to have a smaller name. The variable dose is converted to “factor”.

The summary of the data are given below. The average tooth length is 18.81 with minimum of 4.2 and a maximum of 33.9. The standard deviation is 7.65.

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

The average tooth length by supplement type is 20.66 for orange juice and 16.96 for ascorbic acid. The standard deviation is 6.61 for orange juice and 8.27 for of ascorbic acid. The differences of the two groups are shown in the Appendix Figure 1 and 2.

##    Min. 1st Q Median  Mean 3rd Q  Max   sd
## OJ  8.2 15.52   22.7 20.66 25.72 30.9 6.61
## VC  4.2 11.20   16.5 16.96 23.10 33.9 8.27

The average tooth length by the dosage is 10.61 for the dose of 0.5ml, 19.73 for the dose of 1ml and 26.1 for the dose of 2ml. The standard deviations are 4.5, 4.42 and 3.77 respectively.

The differences between the three groups are shown in Appendix figure 3 and 4.

##     Min.  1st Q Median  Mean 3rd Q  Max   sd
## 0.5  4.2  7.225   9.85 10.60 12.25 21.5 4.50
## 1   13.6 16.250  19.25 19.74 23.38 27.3 4.42
## 2   18.5 23.520  25.95 26.10 27.83 33.9 3.77

T-Tests

Differences in the mean tooth length by supplement type

The first test that will be performed will test whether there is a difference in the mean tooth length in the two supplement groups.

The null and alternative hypothesis are: H0: The difference between the two group means is 0 H1: The difference between the two group means is not 0

The t-statistic is 1.915 and the p-value is 0.06 which means that we cannot reject H0. So H0 is accepted and there is no statistical significant difference in the two group means in the 95% confidence interval. The 95% confidence interval is -0.1710156 >= X <= 7.5710156, it contains zero and that is the reason that we cannot reject the null hypothesis. The results of the the t-test are presented at the appendix.

Differences in in the mean tooth length by dosage

There are three group of dosages (0.5,1 and 2). The tests comparing the group means can be done in pairs, i.e. differences in length mean between the 0.5 group and 1 group, differences in length mean between the 0.5 group and 2 group and differences in length mean between the 1 group and 2 group. For that reason the dataset is subseted in 3 subsets: 1. only groups 0.5 and 1 2. only groups 0.5 and 2 3. only groups 1 and 2

The Hypothessis are the same as the previous test.

The t-statistics of the three tests are -6.47, -11.80 and -4.9 respectively. The p-values are too small. All are below 0.01, which means that the null hypothesis is rejected in all three cases. There is a statistical significant difference between the group means in all three cases. The 95% confidence interval does not contain the zero in neither of the cases. The full results of the test are given at the appendix.

Assumptions

The assumptions under the Student’s T-test are that each of the two populations being compared should follow a normal distribution. This can be tested using a normality test, such as the Shapiro–Wilk or Kolmogorov–Smirnov test, or it can be assessed graphically using a normal quantile plot. Also, the two populations being compared should have the same variance (testable using F-test, Levene’s test, Bartlett’s test, or the Brown–Forsythe test; or assessable graphically using a Q–Q plot). If the sample sizes in the two groups being compared are equal, Student’s original t-test is highly robust to the presence of unequal variances. Finally, the data used to carry out the test should be sampled independently from the two populations being compared. This means that there should be a random assignment of guinea pigs to dosage and supplement type.

Conclusions

The T-test performed show that the supplement type does not have a significant impact on tooth growth. But because the result is very close to accepting that there is a diference, and it would be accepted under 90% confidence interval, it is proposed to re take the experiment with more observations. On the contrary, the dosage has a clear significant impact on tooth growth with bigger dosage leading to greater growth.

Appendix

Data import

library(datasets)
data(ToothGrowth)
tg <- ToothGrowth
tg$dose <- as.factor(tg$dose)

Summary

summary(tg)
sd(tg$len)

Summary per supplement method

supp.sum <- tapply(tg$len, tg$supp, summary,)
supp.sum <- as.table(matrix(unlist(supp.sum), nrow=2, byrow=T))
supp.sd <- round(tapply(tg$len, tg$supp, sd),2)
supp.sd <- as.table(matrix(unlist(supp.sd), nrow=2, byrow=T))
supp.sum <- cbind(supp.sum,supp.sd)
colnames(supp.sum) <- c("Min.", "1st Q", "Median", "Mean", "3rd Q", "Max", "sd")
rownames(supp.sum) <- c("OJ", "VC")
supp.sum

Tooth Mean Length and Distribution by Supplement type

library(ggplot2)
m <- ggplot(tg, aes(x = supp, y = len, ))
m <- m + geom_boxplot(aes(fill = supp))
m + xlab("Supply method") + ylab("Length") + ggtitle("Figure 1. Tooth Length analysis by Supplement type")

Summary per dosage

dose.sum <- tapply(tg$len, tg$dose, summary,)
dose.sum <- as.table(matrix(unlist(dose.sum), nrow=3, byrow=T))
dose.sd <- round(tapply(tg$len, tg$dose, sd),2)
dose.sd <- as.table(matrix(unlist(dose.sd), nrow=3, byrow=T))
dose.sum <- cbind(dose.sum,dose.sd)
colnames(dose.sum) <- c("Min.", "1st Q", "Median", "Mean", "3rd Q", "Max", "sd")
rownames(dose.sum) <- c("0.5", "1", "2")
dose.sum

Tooth Mean Length and distribution by Dosage

m <- ggplot(tg, aes(x = dose, y = len, ))
m <- m + geom_boxplot(aes(fill = dose))
m + xlab("Dose") + ylab("Length") + ggtitle("Figure 2. Tooth Length analysis by Dosage")

T-Tests for Supplement Type

t.test(len ~ supp, data = tg)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Subsetting

subtg1 <- subset(tg, dose == "0.5" | dose == "1")
subtg2 <- subset(tg, dose == "0.5" | dose == "2")
subtg3 <- subset(tg, dose == "1" | dose == "2")

T-Tests for dosage

t.test(len ~ dose, data = subtg1)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735
t.test(len ~ dose, data = subtg2)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100
t.test(len ~ dose, data = subtg3)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100