This report intends to analyse the ToothGrowth data in the R datasets package by performing some basic exploratory data analyses to provide a basic summary for the data.It then aims to utilise the confidence intervals and hypothesis tests to compare tooth growth by supp and dose. We start by loading the datasets library and then loading the ToothGrowth dataset from the library.

library(datasets)
data(ToothGrowth)

Before we can start performing analysis on the dataset we should know the format about the dataset and the information it provides us. This can be performed using the R command help(ToothGrowth).

The dataset is a dataframe with 60 observation on the numeric Tooth Length, the Supplement Type(VC or OJ) and the Numeric Dose in miligrams.

Now we can start analysing the dataset as we now have the required information about the parameters measured and stored in the dataset.

Question 1. Performing Exploratory Data Analyses

We will use the ggplot2 library to generate the required plot to perform a basic exploratory data analysis.

library(ggplot2)

Now we plot the tooth length against the dose separated by supplement type as a bar graph to analyze if and how the data is correlated.

We can notice from the plot that there is a clear positive correlation between the tooth length and the dose levels of Vitamin C for both delivery methods.

ggplot(data=ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
    geom_bar(stat="identity",) +
    facet_grid(. ~ supp) +
    xlab("Dose in miligrams") +
    ylab("Tooth length") +
    guides(fill=guide_legend(title="Supplement type"))

plot of chunk unnamed-chunk-3

Question 2. Basic Summary of the data

We use the summary() function in R to provide a basic summary of the data.

summary(ToothGrowth)
##       len       supp         dose     
##  Min.   : 4.2   OJ:30   Min.   :0.50  
##  1st Qu.:13.1   VC:30   1st Qu.:0.50  
##  Median :19.2           Median :1.00  
##  Mean   :18.8           Mean   :1.17  
##  3rd Qu.:25.3           3rd Qu.:2.00  
##  Max.   :33.9           Max.   :2.00

So from this, we find that the Interquantile Range of the tooth length is `R 25.27-13.07’ and the tooth length is distributed with a mean of \(18.81\) and a median of \(19.25\). We also find that exactly half of the population(30) were given Orange Juice supplement whilst the rest were directly given the supplement Vitamin C.

Question 3. Comparing tooth growth by supplement and dose

To compare the tooth growth by supplement and dose, we first need to split the data into groups either by supplements or dose. I chose to subset it by dose as the grouping factor whilst performing t-tests must have only 2 levels which makes supplements a better choice in the future grouping whilst making does an apt choice for diving the population initially.

g1 <- subset(ToothGrowth, ToothGrowth$dose==0.5)
g2 <- subset(ToothGrowth, ToothGrowth$dose==1.0)
g3 <- subset(ToothGrowth, ToothGrowth$dose==2.0)

Here g1 corresponds to the subset of the population that were delivered a dose of 0.5 mg, g2 corresponds to the subset of the population that were delievred a dose of 1.0 mg and g3 corresponds to the subset of the poputlation that were delivered a dose of 2.0 mg.

Next, we perform t-tests on the each subset with tooth length grouped by the supplement.

t.test(len ~ supp,data=g1,paired=FALSE,var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.17, df = 14.97, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719 8.781
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
t.test(len ~ supp,data=g2,paired=FALSE,var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.033, df = 15.36, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802 9.058
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
t.test(len ~ supp,data=g3,paired=FALSE,var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.798  3.638
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

Let mean of tooth length for supplement type orange juice be represented by \(\mu_O\) and mean of tooth length for supplement type Vitamin C be represented by \(\mu_V\).

Now we assume that our null Hypothesis is as follows : \(H_0 : \mu_O = \mu_V\),and

The Alternate Hypothesis is : \(H_A : \mu_O \neq \mu_V\)

For \(dose = 0.5mg\), we get a completely positive 95% confidence interval in the difference of the means. Hence for this level of dosage, we will reject the null hypothesis and conclude that \(\mu_O > \mu_V\).

For \(dose = 1.0mg\), we get a completely positive 95% confidence interval in the difference of the means. Hence for this level of dosage, we will reject the null hypothesis and conclude that \(\mu_O > \mu_V\).

For \(dose = 2.0mg\), we get a symmetric about zero 95% confidence interval in the diference of the means. Hence for this level of dosage, we will accept the null hypothesis and conclude that \(\mu_O = \mu_V\).

Question 4. Conclusions and Assumptions We can conclude that if the dosage amount is high (eg. 2mg) then the delivery/supplement type is irrelevant. However for small dosage amounts, Orange Juice is a better Viatmin C source than the Vitamin C supplement itself for the teeth length.

Note that we can only make these conclusions by assuming that there is no bias in the sample dataset observed and more importantly by assuming that no other factors affect tooth length and that Vitamin C is the major factor that affects the tooth length.