Part 2: Basic Inferential Data Analysis Instructions

This report pertains to the second part of the “Statistical Inference” course project on Coursera. In this section, we perform fundamental inferential analyses utilizing the ToothGrowth dataset, which is accessible within the R datasets package.

Import libraries.

library(ggplot2)
library(datasets)

1. Load the ToothGrowth data and perform some basic exploratory data analyses

Load the ToothGrowth dataset

data(ToothGrowth)

Display the structure of the dataset

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

View the first few rows of the dataset

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

2. Provide a basic summary of the data.

The ToothGrowth dataset consists of observations on the length of odontoblasts (teeth) in guinea pigs under different experimental conditions. Specifically, the data includes measurements of tooth length for 60 guinea pigs.

The dataset contains two main variables:

  • len: The length of odontoblasts (teeth) observed in the guinea pigs. This is a numeric variable representing the measured length.
  • supp: The supplement used in the experiment, categorized as either “VC” (ascorbic acid) or “OJ” (orange juice). This is a factor variable.

Additionally, there is another variable:

  • dose: The dose level of Vitamin C administered, categorized as 0.5, 1, or 2. This is also a factor variable.

Display the structure of the dataset

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Summarize the dataset

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

2.1 Create a box plot of tooth growth by supplement type.

The box plot shows the distribution of tooth length based on different supplement types. Each box represents the IQR, with the median indicated by the thick line inside the box. The whiskers extend to the minimum and maximum values within 1.5 times the IQR.

ggplot(data = ToothGrowth, aes(x = factor(supp), y = len, fill = factor(supp))) +
  geom_boxplot() +
  labs(title = "Tooth Growth by Supplement",
       x = "Supplement",
       y = "Tooth Length",
       fill = "Supplement")

2.2 Create a box plot of tooth growth by dose level.

The box plot displays the distribution of tooth length based on different dose levels. Each box represents the interquartile range (IQR), with the thick line inside the box indicating the median. The whiskers extend to the minimum and maximum values within 1.5 times the IQR.

ggplot(data = ToothGrowth, aes(x = factor(dose), y = len, fill = factor(dose))) +
  geom_boxplot() +
  labs(title = "Tooth Growth by Dose",
       x = "Dose",
       y = "Tooth Length",
       fill = "Dose")

By examining the box plots, we can visually compare the tooth growth patterns across different dose levels and supplement types. The box plots provide insights into the variability and central tendency of tooth length within each category.

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

3.1 Checking for group differences due to different supplement types (supp):

Subset the dataset based on supplement categories

supp_VC <- subset(ToothGrowth, supp == "VC")
supp_OJ <- subset(ToothGrowth, supp == "OJ")

Calculate the mean tooth growth for each supplement group

mean_VC <- mean(supp_VC$len)
mean_OJ <- mean(supp_OJ$len)

Perform a two-sample t-test to compare tooth growth between supplement groups

t.test(supp_VC$len, supp_OJ$len)
## 
##  Welch Two Sample t-test
## 
## data:  supp_VC$len and supp_OJ$len
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5710156  0.1710156
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333

The Welch Two Sample t-test results suggest that there is no strong evidence to reject the null hypothesis that the true difference in means between the VC and OJ supplement groups is equal to 0 (p-value = 0.06063). The 95% confidence interval for the difference in means is -7.571 to 0.171.

3.2 Checking for group differences due to different dose levels (dose):

Create subsets for the desired dose level pairs

ToothGrowth.doses_0.5_1.0 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
ToothGrowth.doses_0.5_2.0 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
ToothGrowth.doses_1.0_2.0 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))

Perform t-tests to compare tooth growth between the selected dose level pairs, assuming unequal variances.

# Comparing tooth growth between dose levels 0.5 and 1.0
t.test(len ~ dose, data = ToothGrowth.doses_0.5_1.0)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means between group 0.5 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735
# Comparing tooth growth between dose levels 0.5 and 2.0
t.test(len ~ dose, data = ToothGrowth.doses_0.5_2.0)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means between group 0.5 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100
# Comparing tooth growth between dose levels 1.0 and 2.0
t.test(len ~ dose, data = ToothGrowth.doses_1.0_2.0)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

The results of the t-tests conducted for all three dose level pairs show that the p-values are below the significance level of 0.05, and the confidence intervals do not include zero. These findings suggest that there are significant group differences in tooth length between the different dose levels. The mean tooth length tends to increase as the dose level increases. Therefore, we can reject the null hypothesis and conclude that increasing the dose level is associated with an increase in tooth length.

4. State your conclusions and the assumptions needed for your conclusions.

Based on the analysis of the ToothGrowth dataset, it can be concluded that the type of supplement does not have a significant effect on tooth growth, while increasing the dose level is associated with a significant increase in tooth length. The t-tests conducted for the different dose level pairs consistently showed p-values below 0.05, indicating significant group differences. Furthermore, the confidence intervals did not contain zero, further supporting the conclusion. Assumptions were made regarding random assignment, representativeness of the sample, and different variances between groups for the t-tests. These findings suggest that increasing the dose level leads to an increase in tooth growth, providing insights into the relationship between dose level and tooth development in guinea pigs.