Executive Summary

This report is a course project within the Statistical Inference Course on the Data Science Specialization by Johns Hopkins University on Coursera.

The project consists of two parts:

Basic Inferential Data Analysis Instructions

We are going to analyze the ToothGrowth data in the R datasets package.

  1. Load the ToothGrowth data and perform some basic exploratory data analyses
  2. Provide a basic summary of the data.
  3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
  4. State your conclusions and the assumptions needed for your conclusions.

1. Load the ToothGrowth data and perform some basic exploratory data analyses

About ToothGroth Dataset

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC).

Loading the Data

# Loading the data
data(ToothGrowth)

# Convert the variable dose from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

# Display the data
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

Basic exploratory data analyses

# Box Plotx
plot1 <- ggplot(ToothGrowth, aes(x=dose, y=len)) + 
      geom_boxplot(fill = "#ff5c82") + 
     ggtitle("Tooth growth by dose (mg/day)") +
     theme(axis.text = element_text(face="bold")) +
     scale_x_discrete(name = "Dose") +
     scale_y_continuous(name = "Tooth length",
                              breaks = seq(0, 35, 5))

plot2 <- qplot(supp, len, data = ToothGrowth, 
      facets = ~dose, 
      main = "Tooth growth by supplement and dose (mg/day)", 
      xlab = "Supplement", 
      ylab = "Tooth length") + 
        geom_boxplot(aes(fill = supp)) +
    theme(legend.position = "none") +
    theme(axis.text = element_text(face="bold")) +
    scale_y_continuous(breaks = seq(0,35,5)) +
    scale_fill_manual(values= c("#fff68f", "#ffec06"))

grid.arrange(plot1, plot2, ncol=2)

We can see that as the dose is increased, the length of the tooth also increases for both supplement types.
Regarding the supplements, it seems like OJ is more efficient for 0.5 and 1 mg/day dose levels. Although, this is not so clear for the 2 mg/day dose.

2. Provide a basic summary of the data.

summary(ToothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

For this question, we will perform a Welch’s t-test. Because we want to test the hypothesis that two populations have equal means. This is an adaptation of Student’s t-test, and is more reliable when the two samples have unequal variances and/or unequal sample sizes.

Testing influence on tooth growth according to supplement and dose:

Hypothesis 1 - Tooth Growth By Supplement and 0.5 mg/day Dose

H0: (H0 = M1-M2 = 0) There is no difference in tooth growth given OJ or VC, for a 0.5 mg dose.
H1: (H1 = M1-M2 > 0) Tooth growth is bigger when usign OJ, than VC, for a 0.5 mg dose.

Where:
M1 is the average of tooth growth length given the OJ supplement and 0.5 mg dose.
M2 is the average of tooth growth length given the VC supplement and 0.5 mg dose.

Student’s t-Test

VC.length <- ToothGrowth$len[which(ToothGrowth$supp == "VC" & ToothGrowth$dose == 0.5)]
OJ.length <- ToothGrowth$len[which(ToothGrowth$supp == "OJ" & ToothGrowth$dose == 0.5)]
t.test(OJ.length, VC.length, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.length and VC.length
## t = 3.1697, df = 14.969, p-value = 0.003179
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.34604     Inf
## sample estimates:
## mean of x mean of y 
##     13.23      7.98

At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis (p-value < Significance level). For a 0.5 mg/day dose, the tooth growth is greater when OJ supplement is given.

Hypothesis 2 - Tooth Growth By Supplement and 1 mg/day Dose

H0: (H0 = M1-M2 = 0) There is no difference in tooth growth given OJ or VC, for a 1 mg dose.
H1: (H1 = M1-M2 > 0) Tooth growth is bigger when usign OJ, than VC, for a 1 mg dose.

Where:
M1 is the average of tooth growth length given the OJ supplement and 1 mg dose.
M2 is the average of tooth growth length given the VC supplement and 1 mg dose.

Student’s t-Test

VC.length <- ToothGrowth$len[which(ToothGrowth$supp == "VC" & ToothGrowth$dose == 1.0)]
OJ.length <- ToothGrowth$len[which(ToothGrowth$supp == "OJ" & ToothGrowth$dose == 1.0)]
t.test(OJ.length, VC.length, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.length and VC.length
## t = 4.0328, df = 15.358, p-value = 0.0005192
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  3.356158      Inf
## sample estimates:
## mean of x mean of y 
##     22.70     16.77

At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis (p-value < Significance level). For a 1.0 Mg/day dose, the tooth growth is greater when OJ supplement is given.

Hypothesis 3 - Tooth Growth By Supplement and 2 mg/day Dose

H0: (H0 = M1-M2 = 0) There is no difference in tooth growth given OJ or VC, for a 2 mg dose.
H1: (H1 = M1-M2 > 0) Tooth growth is bigger when usign OJ, than VC, for a 2 mg dose.

Where:
M1 is the average of tooth growth length given the OJ supplement and 1 mg dose.
M2 is the average of tooth growth length given the VC supplement and 1 mg dose.

Student’s t-Test

VC.length <- ToothGrowth$len[which(ToothGrowth$supp == "VC" & ToothGrowth$dose == 2.0)]
OJ.length <- ToothGrowth$len[which(ToothGrowth$supp == "OJ" & ToothGrowth$dose == 2.0)]
t.test(OJ.length, VC.length, alternative = "greater", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.length and VC.length
## t = -0.046136, df = 14.04, p-value = 0.5181
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -3.1335     Inf
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

At 5% significance level (1-conf.level), we fail to reject the Null Hypothesis (p-value > Significance level). For a 2.0 mg/day dose, the tooth growth is the same when OJ or VC supplements are given.

Testing the impact of different doses

Hypothesis 4 - Tooth Growth By dose 0.5 and 1 mg/day

H0: Null hypothesis, there is no difference in tooth growth given 0.5 or 1 mg/day dose. (H0 = M0-M1 = 0)
H1: Alternative hypothesis, tooth growth is bigger when usign 1 mg/day dose, than 0.5. (H1 = M0-M1 > 0)

Where:
M0 is the average of tooth growth length for a dose of 0.5 mg/day.
M1 is the average of tooth growth length for a dose of 1 mg/day.

Student’s t-Test
Now we will perform the t-test using the two tooth length vectors dose_0.5 and dose_1.

dose_0.5 <- ToothGrowth$len[ToothGrowth$dose == "0.5"]
dose_1   <- ToothGrowth$len[ToothGrowth$dose == "1"]
t.test(dose_0.5, dose_1, 
       alternative = "less", # is the alterntavie that dose_0.5 has a smaller mean than dose_1 (which should be true, looking at the graphics before)
       paired = FALSE, # the data point are not paired with each other
       var.equal = FALSE, # the variances are not equal
       conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  dose_0.5 and dose_1
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -6.753323
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis. The tooth growth is greater when a 1 mg/day dose is given instead of a 0.5 mg/day dose.

Hypothesis 5 - Tooth Growth By dose 1 and 2 mg/day

H0: Null hypothesis, there is no difference in tooth growth given 1 or 2 mg/day dose. (H0 = M0-M1 = 0)
H1: Alternative hypothesis, tooth growth is bigger when usign 2 mg/day dose, than 1. (H1 = M0-M1 > 0)

Where:
M0 is the average of tooth growth length for a dose of 1 mg/day.
M1 is the average of tooth growth length for a dose of 2 mg/day.

Student’s t-Test
Now we will perform the t-test using the two tooth length vectors dose_1 and dose_2.

dose_1   <- ToothGrowth$len[ToothGrowth$dose == "1"]
dose_2   <- ToothGrowth$len[ToothGrowth$dose == "2"]
t.test(dose_1, dose_2, 
       alternative = "less", # is the alterntavie that dose_1 has a smaller mean than dose_2 (here as well: which should be true, looking at the graphics before)
       paired = FALSE, # the data point are not paired with each other
       var.equal = FALSE, # the variances are not equal
       conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  dose_1 and dose_2
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf -4.17387
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

At 5% significance level (1-conf.level), there is sufficient evidence to reject the Null Hypothesis. The tooth growth is greater when a 2 mg/day dose is given instead of 1 mg/day dose.

4. State your conclusions and the assumptions needed for your conclusions.

After running the hypothesis tests, based on the Tooth Growth sample, we can confirm the following:

  • As the dose is increased, the length of the tooth also increases for both supplement types.
  • OJ supplement is more efficient than VC, for 0.5 and 1.0 mg/day dose. Except for a 2.0 dose where the supplement type doesn’t make any difference.