Basic Inferential Data Analysis

By: Mohammed Teslim
6/10/22

Overview

The ToothGrowth dataset is one of the datasets in the R dataset package. The data is a set of observations recorded when 60 guinea pigs were divided into groups and given one of three doses of vitamin C i.e 0.5, 1, and 2, via one of two different media; orange juice, OJ, and ascorbic acid, VC. The length of odontoblasts, len was recorded in each of them. Basic inferential analysis will be carried out on this dataset and the aim is to compare the tooth growth by the method of delivery and dose.

Load Data and required Packages

We start out by loading this data, it comes with the R dataset package and storing it into an object.

library(tidyverse)

library(datasets)
ToothGrowth <- ToothGrowth

Exploratory Data Analysis

We perform some basic data transformation and exploratory analyses to get an idea of what the data is like.

ToothGrowth <- ToothGrowth %>% mutate(dose = as_factor(dose))

g <- ggplot(ToothGrowth , aes(dose, len, fill = supp)) +
    geom_boxplot() +
    scale_fill_discrete(name = "Delivery Method",
                        labels = c("Orange Juice", "Ascorbic acid")) +
    ylab("Length of Odontoblast") +
    scale_y_continuous(breaks = seq(0, 35, 5))

print(g)

Notice from the code that the dose variable is turned into a factor with 3 levels representing the three doses. ggplot is used to make the plot.

From the figure, the length of the odontoblasts can be seen to increase with increasing dose and that orange juice as a delivery method generally results into longer lengths.

Basic summary

knitr::kable(ToothGrowth %>% group_by(supp, dose) %>% 
                 summarise(mean= mean(len), 
                           median = median(len), 
                           Standard.Dev = round(sd(len),2)),
             caption = "Summary Table")
Summary Table
supp dose mean median Standard.Dev
OJ 0.5 13.23 12.25 4.46
OJ 1 22.70 23.45 3.91
OJ 2 26.06 25.95 2.66
VC 0.5 7.98 7.15 2.75
VC 1 16.77 16.50 2.52
VC 2 26.14 25.95 4.80

Table shows a basic summary of the data, grouped by the method of delivery (supp) and dose, we summarize by finding the mean, median and standard deviation of each groups.

Inferential Tests

Firstly we compare the two delivery methods for each of the three doses. i.e we run the t.test three times comparing the two groups by each of the doses. As seen in the summary tables, they have unequal variance, so this shall reflect appropriately and worthy of note is that they are independent groups.

Dose0.5 <- filter(ToothGrowth, dose == 0.5)
Dose1.0 <- filter(ToothGrowth, dose == 1.0)
Dose2.0 <- filter(ToothGrowth, dose == 2.0)

t.test(len ~ supp, paired = F, var.equal = F, data = Dose0.5)
t.test(len ~ supp, paired = F, var.equal = F, data = Dose1.0)
t.test(len ~ supp, paired = F, var.equal = F, data = Dose2.0)

Next, we compare the effect of an increase in dose to the length of odontoblasts. We do this for the two delivery methods.

supp_VC <- filter(ToothGrowth, supp == "VC" & (dose == 0.5 | dose == 2))
supp_OJ <- filter(ToothGrowth, supp == "OJ" & (dose == 0.5 | dose == 2))

t.test(len ~ dose, paired = F, var.equal = F, data = supp_OJ)
t.test(len ~ dose, paired = F, var.equal = F, data = supp_VC)

Results and Conclusion

Comparing Delivery Methods

Dose 0.5

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

The p-value is lower than 5%, and the confidence interval is above 0, we will therefore reject the null-hypothesis that says the difference in means is 0 and conclude with a 5%level of significance that orange juice is a better delivery method to ascorbic acid for a dose of 0.5

Dose 1.0

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

The p-value is lower than 5%, and the confidence interval is above 0, we will therefore reject the null-hypothesis that says the difference in means is 0 and conclude with a 5% level of significance that orange juice is a better delivery method to ascorbic acid for a dose of 1.0

Dose 2.0

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

The p-value is greater than 5%, and the confidence interval includes 0, we will therefore fail to reject the null-hypothesis that says the difference in means is 0 and conclude that there is no sufficient evidence to suggest that the delivery methods have different potency for a dose of 2.0

Comparing Dose 0.5 and Dose 2.0

Orange Juice

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means between group 0.5 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -16.335241  -9.324759
## sample estimates:
## mean in group 0.5   mean in group 2 
##             13.23             26.06

The p-value is lower than 5%, and the confidence interval does not include 0, we will therefore reject the null-hypothesis that says the difference in means is 0 and conclude with a 95% confidence interval that a dose of 0.5 results into smaller lengths when compared to a dose of 2.0 when orange juice is used as the delivery method.

Ascorbic Acid

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -10.388, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means between group 0.5 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -21.90151 -14.41849
## sample estimates:
## mean in group 0.5   mean in group 2 
##              7.98             26.14

The p-value is lower than 5%, and the confidence interval does not include 0, we will therefore reject the null-hypothesis that says the difference in means is 0 and conclude with a 95% confidence interval that a dose of 0.5 results into smaller lengths when compared to a dose of 2.0 when ascorbic acid is used as the delivery method.