Dataset Description

In the ToothGrowth dataset from the R environment, the variable of interest is the length of odontoblasts, which are cells responsible for tooth growth, measured in 60 guinea pigs. Each guinea pig was administered one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) via either orange juice(coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC), representing two distinct delivery methods.

Load necessary libraries and the ToothGrowth dataset from the datasets package, preparing it for analysis.

library(ggplot2)
library(dplyr)
library(datasets)
data(ToothGrowth)
attach(ToothGrowth)

Convert the ‘dose’ variable to a factor, ensuring correct categorical representation for analysis.

ToothGrowth$dose <- factor(ToothGrowth$dose)

Examine the structure of the ToothGrowth dataset to understand its variables and data types.

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

Exploratory Data Analysis

Scatterplot & Box Plot

Create a scatter plot and box plot to visualize the relationship between tooth length and dose levels, colored by the supplement delivery method.

set.seed(123)
ggplot(ToothGrowth, aes(dose, len)) + 
  geom_boxplot(aes(fill = supp)) +
  geom_jitter(alpha = 0.7, aes(color = supp)) +
  scale_color_manual(values = c("green", "blue")) +
  labs(title = "Scatter Plot of Tooth Length and Dose Levels", 
       x = "Dose Levels", y = "Tooth Length (Millimeters)") +
  theme_minimal()

Density Plot

Density plot illustrating the distribution of tooth lengths, categorized by the method of supplement delivery.

ggplot(ToothGrowth, aes(len, fill = supp)) + 
  geom_density(alpha = 0.5) +
  scale_fill_manual(values = c("green", "blue")) +
  labs(title = "Density Plot of Tooth Length by Supplement Type", 
       x = "Tooth Length (Millimeters)", y = "Density") +
  theme_minimal()

Mean Tooth Length by Dose Level and Supplement Delivery Method

Bar plot with facets to visualize the mean tooth length for each combination of dose level and supplement delivery method.

# Calculate mean tooth length for each combination of dose level and supplement delivery method
mean_lengths <- ToothGrowth %>%
  group_by(dose, supp) %>%
  summarise(mean_length = mean(len))

# Create a faceted bar plot
ggplot(mean_lengths, aes(x = supp, y = mean_length, fill = supp)) +
  geom_bar(stat = "identity") +
  facet_wrap(~dose, scales = "free_x", ncol = 3) +
  labs(title = "Mean Tooth Length by Dose Level and Supplement Delivery Method",
       x = "Supplement Delivery Method",
       y = "Mean Tooth Length (Millimeters)") +
  scale_fill_manual(values = c("green", "blue")) +
  theme_minimal()

Data Summary

Generate summary statistics to gain insights into the central tendency and spread of the tooth growth data.

Summary Statistics

summary(ToothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

Distribution by Dose Levels and Delivery Methods

Tabulate the distribution of observations by supplement delivery method and dose levels, providing a categorical overview of the data.

table(ToothGrowth$supp,ToothGrowth$dose)
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

Hypothesis Testing Using Confidence Intervals

Using Supplement Delivery Method as a Factor

We aim to assess the relationship between the delivery method of supplements and the change in tooth growth. We proceed under the assumption of unequal variances between the two groups.

  • Null hypothesis (H0): There is no significant correlation between the delivery method and tooth length.
  • Alternative hypothesis (H1): There is a correlation between the delivery method and tooth length.
  • The significance level, commonly denoted as alpha, is set to 0.05 as per standard practice.
t.test(len ~ supp, paired = F, var.equal = F, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

The calculated 95% confidence interval, [-0.1710156, 7.5710156], encompasses zero, and the resulting p-value, 0.06063, exceeds the significance threshold of 0.05. Consequently, we fail to reject the null hypothesis.

Based on the t-test analysis, we conclude that there is no significant correlation between the delivery method and tooth length.

Considering Supplement Dosage Level as a Factor

Subset the data for different dose level combinations for further analysis.

# Subset for dose levels 0.5 and 1.0
Dose_0510 <- ToothGrowth %>% filter(dose == 0.5 | dose == 1.0)
# Subset for dose levels 0.5 and 2.0
Dose_0520 <- ToothGrowth %>% filter(dose == 0.5 | dose == 2.0)
# Subset for dose levels 1.0 and 2.0
Dose_1020 <- ToothGrowth %>% filter(dose == 1.0 | dose == 2.0)

Examining the data to identify any correlation between the dosage level and the change in tooth growth, while considering potential unequal variances within the two groups.

The null hypothesis for the ensuing three t-tests posits that there exists no correlation between the dose level and tooth length.

The hypotheses for the following three t-tests suggest that:

Null hypothesis (H0): There is no correlation between the dose level and tooth length.

Alternative hypothesis (H1): A correlation exists between the dose level and tooth length.

t.test(len ~ dose, paired = F, var.equal = F, data = Dose_0510)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means between group 0.5 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

In this scenario, the 95% confidence interval, ranging from -11.983781 to -6.276219, does not encompass zero. Additionally, the p-value of 1.268e-07 is below the significance level of 0.05. Consequently, we have sufficient evidence to confidently reject the null hypothesis.

t.test(len ~ dose, paired = F, var.equal = F, data = Dose_0520)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means between group 0.5 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

In this scenario, the 95% confidence interval, ranging from -18.15617 to -12.83383, does not encompass zero. Additionally, the p-value of 4.398e-14 is below the significance level of 0.05. Consequently, we have sufficient evidence to confidently reject the null hypothesis.

t.test(len ~ dose, paired = F, var.equal = F, data = Dose_1020)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

In this scenario, the 95% confidence interval, ranging from -8.996481 to -3.733519, does not encompass zero. Additionally, the p-value of 1.906e-05 is below the significance level of 0.05. Consequently, we have sufficient evidence to confidently reject the null hypothesis.

Considering Supplement Delivery Method As A Factor Within Dose Levels:

Examining the data to investigate the correlation between the Delivery Method and the change in Tooth Growth within individual Dose Levels, while considering potential unequal variances between the two groups.

Hypothesis for the following three t-tests posits that,

  • Null hypothesis (H0): There exists no correlation between the Delivery Method and Tooth Length for the specified Dose Level.

  • Alternative hypothesis (H1): There is a correlation between the Delivery Method and Tooth Length for the specified Dose Level.

Dose05 <- ToothGrowth %>% filter(dose == 0.5)
Dose10 <- ToothGrowth %>% filter(dose == 1.0) 
Dose20 <- ToothGrowth %>% filter(dose == 2.0)
t.test(len ~ supp, paired = F, var.equal = F, data = Dose05)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

In this instance, the 95% confidence interval, which ranges from 1.719057 to 8.780943, does not include zero. Additionally, the p-value of 0.006359 is lower than the commonly accepted significance level of 0.05. Thus, we have substantial evidence to reject the null hypothesis with confidence.

t.test(len ~ supp, paired = F, var.equal = F, data = Dose10)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

In this situation, the 95% confidence interval, spanning from [2.802148 to 9.057852, does not include zero. Moreover, with a p-value of 0.001038, falling below the significance threshold of 0.05, we possess substantial evidence to firmly reject the null hypothesis.

t.test(len ~ supp, paired = F, var.equal = F, data = Dose20)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

In this case, the 95% confidence interval ranges from -3.79807 to 3.63807, inclusive of zero. Furthermore, the p-value of 0.9639 exceeds the significance level of 0.05. Therefore, we do not have enough evidence to reject the null hypothesis.

Assumptions Needed For The Conclusions:

  1. Representativeness of Sample Population: It is assumed that the 60 guinea pigs selected for the study are a representative sample of the broader population of guinea pigs. This assumption allows for generalization of the results to a larger population of guinea pigs.
  2. Random Assignment of Guinea Pigs: The experiment involved the random assignment of guinea pigs to different Supplement Dose Level categories and Supplement Delivery Methods. This random assignment helps mitigate potential biases and ensures that any observed effects can be attributed to the treatments rather than other factors.
  3. Assumption of Unequal Variances: For the t-tests conducted, it is assumed that the variances of tooth lengths between the groups being compared are not equal. This assumption, known as heteroscedasticity, is considered less restrictive compared to assuming equal variances and is appropriate for the statistical tests conducted in the analysis.

Conclusions:

  1. Increase in Supplement Dose Levels leads to an overall increase in Tooth Length: The analysis indicates a significant correlation between dosage levels and tooth length. As the dosage level increases, tooth length tends to increase. This suggests that higher doses of Vitamin C may promote tooth growth in guinea pigs.

  2. Supplement Delivery Method has no overall significant impact on Tooth Length: The analysis suggests that, overall, the supplement delivery method does not significantly affect tooth length. However, within specific dose levels, there are differences. For dose levels 0.5 and 1.0, orange juice increases tooth length more rapidly compared to ascorbic acid. Yet, for the 2.0 dose level, there is no significant difference in the increase of tooth length between the two supplement delivery methods. This implies that the impact of delivery method on tooth length varies depending on the dosage level.