Overview

In this assignment, we will perform basic explortory data analysis on the ToothGrowth dataset in R. The dataset contains data measuring the effect of Vitamin C on Tooth Growth in Guinea Pigs. We will also conduct hypothesis testing on the data to see whether we can make any significant conclusion to show the effectiveness of the supplement type and dosage to tooth growth in guinea pigs.

Data - ToothGrowth

The dataset recorded the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

The data is stored in a data frame with 60 observations on 3 variables:

Column Name Class Description
len numeric Tooth length
supp factor Supplement type (VC or OJ)
dose numeric Dose in milligrams/day (0.5, 1 or 2)

Information of the dataset is obtained from the link below:

https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html


Basic Exploratory Data Analysis

data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
dfToothGrowth <- ToothGrowth %>% 
    group_by(supp, dose) %>% 
    summarise(Mean = mean(len), Num = length(len), Var = var(len), StdDev = sd(len), 
              Skew = skewness(len),ExcKurt = kurtosis(len)-3)

dfToothGrowth
## Source: local data frame [6 x 8]
## Groups: supp [?]
## 
##     supp   dose  Mean   Num       Var   StdDev       Skew    ExcKurt
##   (fctr) (fctr) (dbl) (int)     (dbl)    (dbl)      (dbl)      (dbl)
## 1     OJ    0.5 13.23    10 19.889000 4.459709  0.5131269 -0.9872929
## 2     OJ      1 22.70    10 15.295556 3.910953 -0.7970101 -0.1357745
## 3     OJ      2 26.06    10  7.049333 2.655058  0.4316050 -0.6367104
## 4     VC    0.5  7.98    10  7.544000 2.746634  0.1558587 -1.5270309
## 5     VC      1 16.77    10  6.326778 2.515309  1.0839861  0.7978228
## 6     VC      2 26.14    10 23.018222 4.797731  0.1880107 -0.8173491

As expected, there are a total of 60 observations. Each record for each guinea pig making every record independent. Supp (Supplyment Type) and dose (Dosage) have 2 and 3 factor levels respectively. This tallies with the description of the dataset. Also observed and checked that there are 10 records (number of guinea pigs) in each unique group of Supplement Type and Dosage.

The boxplot showed the following:

  1. Increase in dosage for each supplement type increases the length of tooth growth
  2. Orange juice is more effective than ascorbic acid in increasing the length of tooth growth
  3. At a dosage of 2 mg, both supplement types have similar effectiveness
  4. Change in the length of tooth growth for orange juice is less between 1-2 mg as compared to the change between 0.5-1 mg

For hypothesis testing below, we seek to answer and confirm, with confidence, observations we see on the box plot.

Specifically,

  1. whether any of the 2 supplement types is superior than the other in term of having an effect on tooth growth
  2. whether increasing dosage for each supplement type has an effect on tooth growth

Hypothesis Testing

Data Preparation
TG0 <- subset(ToothGrowth, dose %in% c(0.5))
TG1 <- subset(ToothGrowth, dose %in% c(1))
TG2 <- subset(ToothGrowth, dose %in% c(2))

OJ01 <- subset(ToothGrowth, dose %in% c(0.5, 1) & supp == 'OJ')
OJ12 <- subset(ToothGrowth, dose %in% c(1, 2) & supp == 'OJ')
VC01 <- subset(ToothGrowth, dose %in% c(0.5, 1) & supp == 'VC')
VC12 <- subset(ToothGrowth, dose %in% c(1, 2) & supp == 'VC')

Data are filtered into 2-sample data subset for hypothesis testing.

Assumption
  1. As data was collected from 60 guinea pigs, we can safely assumed that the 60 records are independent and not paired.
  2. Although the sample size for each 2-sample data subset is the same, the size at 10 is not large. The standard deviation of each group is also quite difference from each other. Hence, for hypothesis testing, we will use non-pooled (separate variance) t-test.
  3. As skewness and excess kurtosis of the sample data (see dfToothGrowth above) is not high, we can assume that the each sample data subset are normally distributed

I) Difference between the 2 sample means of Orange Juice & Ascorbic Acid

\(\mu_1\) = population mean length of Tooth Growth by Orange Juice
\(\mu_2\) = population mean length of Tooth Growth by Ascorbic Acid

\(H_o\): \(\mu_1\) - \(\mu_2\) = 0
\(H_a\): \(\mu_1\) - \(\mu_2\) \(\neq\) 0
\(\alpha\) = 0.05

a) t-test for dosage = 0.5 mg
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=TG0)$p.value
## [1] 0.006358607

Since the p-value of 0.006 is less than \(\alpha\) = 0.05, we reject the null hypothesis.

At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth by orange juice and ascorbic acid are different.

b) t-test for dosage = 1 mg
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=TG1)$p.value
## [1] 0.001038376

Since the p-value of 0.001 is less than \(\alpha\) = 0.05, we reject the null hypothesis.

At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth by orange juice and ascorbic acid are different.

c) t-test for dosage = 2 mg
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=TG2)$p.value
## [1] 0.9638516

Since the p-value of 0.964 is greater than \(\alpha\) = 0.05, we cannot reject the null hypothesis.

At 5% level of significance, the data does not provide sufficient evidence that the mean length of tooth growth by orange juice and ascorbic acid are different.

II) Difference between the 2 sample means of 0.5 & 1 mg dosage

\(\mu_1\) = population mean length of Tooth Growth with 0.5 mg dosage
\(\mu_2\) = population mean length of Tooth Growth with 1 mg dosage

\(H_o\): \(\mu_1\) - \(\mu_2\) = 0
\(H_a\): \(\mu_1\) - \(\mu_2\) \(\neq\) 0
\(\alpha\) = 0.05

a) t-test for supplement type = Orange Juice
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJ01)$p.value
## [1] 8.784919e-05

Since the p-value of 0 is less than \(\alpha\) = 0.05, we reject the null hypothesis.

At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 0.5 and 1 mg dosage for supplement type orange juice are different.

b) t-test for supplement type = Ascorbic Acid
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VC01)$p.value
## [1] 6.811018e-07

Since the p-value of 0 is less than \(\alpha\) = 0.05, we reject the null hypothesis.

At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 0.5 and 1 mg dosage for supplement type ascorbic acid are different.

III) Difference between the 2 sample means of 1 & 2 mg dosage

\(\mu_1\) = population mean length of Tooth Growth with 1 mg dosage
\(\mu_2\) = population mean length of Tooth Growth with 2 mg dosage

\(H_o\): \(\mu_1\) - \(\mu_2\) = 0
\(H_a\): \(\mu_1\) - \(\mu_2\) \(\neq\) 0
\(\alpha\) = 0.05

a) t-test for supplement type = Orange Juice
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=OJ12)$p.value
## [1] 0.03919514

Since the p-value of 0.039 is less than \(\alpha\) = 0.05, we reject the null hypothesis.

At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 1 and 2 mg dosage for supplement type orange juice are different.

b) t-test for supplement type = Ascorbic Acid
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data=VC12)$p.value
## [1] 9.155603e-05

Since the p-value of 0 is less than \(\alpha\) = 0.05, we reject the null hypothesis.

At 5% level of significance, the data does provide sufficient evidence that the mean length of tooth growth with 1 and 2 mg dosage for supplement type ascorbic acid are different.


Conclusion

In hypothesis test I, there is sufficient evidence to conclude that at a lower dosage of 0.5 to 1 mg, orange juice has more impact to the length of tooth growth than ascorbic acid. At a dosage of 2 mg, the evidence is not sufficient to make a similar conclusion. It appears that there is a limit to tooth growth and cannot be any longer beyond a length of 26. Maybe data can be collected at a dosage of 1.5, 2.5 and 3 mg to have a more complete analysis.

In hypothesis test II and III, there is sufficient evidence to conclude that, for the given dosage of 0.5 mg to 2 mg, the increase in dosage of each supplement type, will increase the length of tooth growth. Similar to test I, more data has to be collected and analysed to obtain conclusion beyond dosage of 2 mg.

In short, both supplement types (orange juice and ascorbic acid) of Vitamin C has a positive enhancement on tooth growth in guinea pigs.


Libraries required for this assignment project: ggplot2, moments, dplyr