=============================================================

1. Synopsis1

In this assignment, we will conduct an exploratory analysis of the ToothGrowth dataset. After a quick summary analysis of the dataset, we will investigate whether the tooth growth of guinea pigs is affected by dosage and/or delivery method. We will use Hypothesis Testing to conduct the investigation, and then publish our conclusions.

=============================================================

2. Toothgrowth Data - Basic Summary

The R Documentation for the ToothGrowth dataset provides the following description:

“The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).”

The dataset has 3 variables:

  • len: A numeric variable that captures tooth length
  • supp: A factor variable that captures the delivery method by which the Vitamin C supplement was fed to the guinea pigs
  • dose: A numeric variable that captures the dosage amount fed to the guinea pigs - 0.5, 1, or 2 mg/day

Before we perform any summary analysis on the data, let’s save the ToothGrowth dataset as a data frame called tg, and convert dose to a factor variable.

Let’s summarize tg using the summary() function.

len supp dose
Min. : 4.20 OJ:30 0.5:20
1st Qu.:13.07 VC:30 1 :20
Median :19.25 2 :20
Mean :18.81
3rd Qu.:25.27
Max. :33.90
Let’s tabulate the supp and dose variables:
0.5 1 2
OJ 10 10 10
VC 10 10 10

The above summary data enable us to make the following observations:

  • The mean length of odontoblasts is 18.81, with a minumum of 4.20, and a maximum of 33.90
  • 30 guinea pigs have received the supplement through OJ; 30 through VC
  • 20 guinea pigs each have received 0.5, 1 and 2 mg/day doses of the supplement
  • 10 guinea pigs each have received a specific dosage + delivery method combination

=============================================================

3. Hypothesis Testing - Key Assumptions

Before we decide whether to compare tooth growth by delivery method or dosage quantity, let’s run a quick box-plot analysis on the data. A perusal of the box-plot leads us to hypothesise that both delivery method and dosage may be affecting tooth growth in guinea pigs.

We will test the below concrete claims using Hypothesis Testing:

  1. Orange juice (OJ) is a more effective delivery method than ascorbic acid (VC), in terms of its positive effect on tooth growth
  2. A dosage of 2 mg/day is more effective than a dosage of 1 mg/day, in terms of its positive effect on tooth growth

In the first case, we will use a sample size of 30 - all OJ guinea pigs vs. all VC guinea pigs. In the second case, we will use a sample size of 20 - all 0.5 mg/day guinea pigs vs. all 1 mg/day guinea pigs.

Before we perform Hypothesis Testing, however, let’s check whether the data is normal.

First, let’s look at a panel plot that looks at & compares the distribution of data by delivery method (supp).

Then, let’s look at a panel plot that looks at & compares the distribution of data by dosage (dose).

Both plots suggest that the data, while not normal, is symmetrical and not heavily-skewed. In such a case, we can use the t-test to analyse the data, provided the sample size is a minimum of 152. Since both our potential analysis contain a minimum of 20 guinea pigs in each sample, we can safely perform t-test analysis.

Other assumptions:

  • Independence: The samples are independent, not paired
  • Unequal Variances: The samples have unequal variances

=============================================================

4. Hypothesis Testing

Hypothesis 1

Orange juice (OJ) is a more effective delivery method than ascorbic acid (VC), in terms of its effect on tooth growth.

This will be our alternate hypothesis (\(H_{a}\)). The null hypothesis (\(H_{0}\)) then posits that OJ is as effective a delivery method as VC.

Let \(\mu_{1}\) denote the mean odontoblast length (len) in guinea pigs who received dosage through ascorbic acid (VC), and \(\mu_{2}\) denote len in guinea pigs who received dosage through orange juice(VC). In this case:

  • Null Hypothesis:- \(H_{0} : \mu_{2} - \mu_{1} = 0\)
  • Alternate Hypothesis:- \(H_{a} : \mu_{2} - \mu_{1} > 0\)

We will run a 95% confidence interval one-sided t-test. The confidence level will be 0.95, and \(\alpha = 0.05\). Let’s run the test in R.

We get a p-value of 0.0303173. Since \(0.0303173 < 0.05 (\alpha)\), we will reject the Null Hypothesis.

Thus, we reject the claim that Orange juice (OJ) is as effective a delivery method as ascorbic acid (VC), in terms of its effect on tooth growth.


Hypothesis 2

A dosage of 2 mg/day is more effective than a dosage of 1 mg/day, in terms of its positive effect on tooth growth.

This will be our alternate hypothesis (\(H_{a}\)). The null hypothesis (\(H_{0}\)) then posits that a 2 mg/day dosage is as effective as a 1 mg/day dosage.

Let \(\mu_{1}\) denote the mean odontoblast length (len) in guinea pigs who received 1 mg/day dosage (VC), and \(\mu_{2}\) denote len in guinea pigs who received dosage of 2 mg/day. In this case:

  • Null Hypothesis:- \(H_{0} : \mu_{2} - \mu_{1} = 0\)
  • Alternate Hypothesis:- \(H_{a} : \mu_{2} - \mu_{1} > 0\)

We will run a more stringent 99% confidence interval one-sided t-test, since we don’t want to spend additional money on increased dosage, unless the case for rejecting \(H_{0}\) is very strong. The confidence level will be 0.99, and \(\alpha = 0.01\). Let’s run the test in R.

We get a p-value of 9.532147610^{-6}. Since \(9.5321476\times 10^{-6} < 0.01 (\alpha)\), we will reject the Null Hypothesis.

Thus, we reject the claim that a dosage of 2 mg/day is as effective as a dosage of 1 mg/day, in terms of its effect on tooth growth.

=============================================================

5. Conclusions

We fail to reject the following null hypotheses \(H_{0}\):

  1. Orange juice is as effective a delivery method as ascorbic acid
  2. A dosage of 2 mg/day is as effective a delivery method as ascorbic acid

However, before accepting the alternate hypotheses (\(H_{a}\)), we need to make the following assumption:

The probablility of accepting \(H_{a}\) when \(H_{0}\) is true (Type II Error, denoted by \(\beta\)) is lower than the acceptable threshold we set for it (say, 5%).

Once we have made the above assumption, we can confidently conclude that both delivery method and dosage affect tooth growth in guinea pigs.

Furthermore, we can conclude that:

  1. Orange juice is a more effective delivery method than ascorbic acid.
  2. The higher the dosage, the higher the tooth growth3

=============================================================

Appendix

A. Plot Code

SECTION 3: Boxplot of Tooth Growth:

ggplot(tg, aes(supp, len)) +
        geom_boxplot(aes(fill = supp)) +
        facet_wrap( ~ dose) +
        labs(title = "Tooth growth box-plot",
             x = "Dosage (panel) and delivery method (box-plot)",
             y = "Tooth growth")

SECTION 3: Density Plot of Tooth Growth:

ggplot(tg, aes(len, ..density..)) +
        geom_histogram(bins = 20, fill = "tomato2", colour = "black") +
        geom_density(size = 2) +
        facet_grid(supp ~ .) +
        labs(title = "Tooth growth density by method",
             x = "Tooth growth in units",
             y = "Density of tooth growth")
ggplot(tg, aes(len, ..density..)) +
        geom_histogram(bins = 20, fill = "tomato2", colour = "black") +
        geom_density(size = 2) +
        facet_grid(dose ~ .) +
        labs(title = "Tooth growth density by dosage",
             x = "Tooth growth in units",
             y = "Density of tooth growth")

B. Other Code

SECTION 2: Loading the “tidyverse” set of packages:

###Install and Load the tidyverse set of packages
#install.packages("tidyverse") (#Remove comment sign if already installed)
library(tidyverse)

SECTION 2: Saving ToothGrowth as data frame “tg”:

tg <- as_tibble(ToothGrowth)
tg$dose <- factor(tg$dose)

SECTION 2: Summary of tg:

#load the kableExtra package
library(kableExtra)

knitr::kable(
        summary(tg), 
        align = "ccc"
) %>%
        kable_styling(full_width = TRUE)

SECTION 2: Table combining delivery method & dosage:

knitr::kable(
        table(tg$supp, tg$dose),
        align = "ccc"
) %>%
        kable_styling(full_width = TRUE)

SECTION 4: Hypothesis Test 1 - Delivery method impact on tooth growth:

#Create separate dataframes basis delivery method
tg1 <- tg[tg$supp == "VC",]; tg2 <- tg[tg$supp == "OJ",]
#Isolate "len" variable of each data frame
tg1len <- tg1$len; tg2len <- tg2$len
#Perform t-test and extraxt p-value
pval1 <- t.test(tg2len, tg1len, paired = FALSE, var.equal = FALSE,
                alternative = "greater", conf.level = 0.95)$p.value

SECTION 4: Hypothesis Test 2 - Dosage impact on tooth growth:

#Create separate dataframes basis delivery method
tgdose2 <- tg[tg$dose == 2,]; tgdose1 <- tg[tg$dose == 1,]
#Isolate "len" variable of each data frame
tgdose2_len <- tgdose2$len; tgdose1_len <- tgdose1$len
#Perform t-test and extraxt p-value
pval2 <- t.test(tgdose2_len, tgdose1_len, paired = FALSE, 
                var.equal = FALSE, alternative = "greater",
                conf.level = 0.99)$p.value

=============================================================

References


  1. This report is based on an assignment for the online course “Statistical Inference” on coursera.org↩︎

  2. Statistics For Business and Economics, Anderson et al (https://www.cengage.com/c/statistics-for-business-economics-14e-anderson/9781337901062PF)↩︎

  3. The p-value for the difference between the means of 1 mg/day and 0.5 mg/day is 0.00000006, i.e. lesser than 0.01. We’ve already shown in the assignment that the p-value of 2 mg/day - 1 mg/day dosage means is 9.532147610^{-6}, also < \(\alpha\).↩︎