This report pertains to the second part of the “Statistical Inference” course project on Coursera. In this section, we perform fundamental inferential analyses utilizing the ToothGrowth dataset, which is accessible within the R datasets package.
Import libraries.
library(ggplot2)
library(datasets)
Load the ToothGrowth dataset
data(ToothGrowth)
Display the structure of the dataset
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
View the first few rows of the dataset
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
The ToothGrowth dataset consists of observations on the length of odontoblasts (teeth) in guinea pigs under different experimental conditions. Specifically, the data includes measurements of tooth length for 60 guinea pigs.
The dataset contains two main variables:
len: The length of odontoblasts (teeth) observed in the
guinea pigs. This is a numeric variable representing the measured
length.supp: The supplement used in the experiment,
categorized as either “VC” (ascorbic acid) or “OJ” (orange juice). This
is a factor variable.Additionally, there is another variable:
dose: The dose level of Vitamin C administered,
categorized as 0.5, 1, or 2. This is also a factor variable.Display the structure of the dataset
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Summarize the dataset
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
The box plot shows the distribution of tooth length based on different supplement types. Each box represents the IQR, with the median indicated by the thick line inside the box. The whiskers extend to the minimum and maximum values within 1.5 times the IQR.
ggplot(data = ToothGrowth, aes(x = factor(supp), y = len, fill = factor(supp))) +
geom_boxplot() +
labs(title = "Tooth Growth by Supplement",
x = "Supplement",
y = "Tooth Length",
fill = "Supplement")
The box plot displays the distribution of tooth length based on different dose levels. Each box represents the interquartile range (IQR), with the thick line inside the box indicating the median. The whiskers extend to the minimum and maximum values within 1.5 times the IQR.
ggplot(data = ToothGrowth, aes(x = factor(dose), y = len, fill = factor(dose))) +
geom_boxplot() +
labs(title = "Tooth Growth by Dose",
x = "Dose",
y = "Tooth Length",
fill = "Dose")
By examining the box plots, we can visually compare the tooth growth patterns across different dose levels and supplement types. The box plots provide insights into the variability and central tendency of tooth length within each category.
supp):Subset the dataset based on supplement categories
supp_VC <- subset(ToothGrowth, supp == "VC")
supp_OJ <- subset(ToothGrowth, supp == "OJ")
Calculate the mean tooth growth for each supplement group
mean_VC <- mean(supp_VC$len)
mean_OJ <- mean(supp_OJ$len)
Perform a two-sample t-test to compare tooth growth between supplement groups
t.test(supp_VC$len, supp_OJ$len)
##
## Welch Two Sample t-test
##
## data: supp_VC$len and supp_OJ$len
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.5710156 0.1710156
## sample estimates:
## mean of x mean of y
## 16.96333 20.66333
The Welch Two Sample t-test results suggest that there is no strong evidence to reject the null hypothesis that the true difference in means between the VC and OJ supplement groups is equal to 0 (p-value = 0.06063). The 95% confidence interval for the difference in means is -7.571 to 0.171.
dose):Create subsets for the desired dose level pairs
ToothGrowth.doses_0.5_1.0 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
ToothGrowth.doses_0.5_2.0 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
ToothGrowth.doses_1.0_2.0 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
Perform t-tests to compare tooth growth between the selected dose level pairs, assuming unequal variances.
# Comparing tooth growth between dose levels 0.5 and 1.0
t.test(len ~ dose, data = ToothGrowth.doses_0.5_1.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means between group 0.5 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
# Comparing tooth growth between dose levels 0.5 and 2.0
t.test(len ~ dose, data = ToothGrowth.doses_0.5_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means between group 0.5 and group 2 is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
# Comparing tooth growth between dose levels 1.0 and 2.0
t.test(len ~ dose, data = ToothGrowth.doses_1.0_2.0)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
The results of the t-tests conducted for all three dose level pairs show that the p-values are below the significance level of 0.05, and the confidence intervals do not include zero. These findings suggest that there are significant group differences in tooth length between the different dose levels. The mean tooth length tends to increase as the dose level increases. Therefore, we can reject the null hypothesis and conclude that increasing the dose level is associated with an increase in tooth length.
Based on the analysis of the ToothGrowth dataset, it can be concluded that the type of supplement does not have a significant effect on tooth growth, while increasing the dose level is associated with a significant increase in tooth length. The t-tests conducted for the different dose level pairs consistently showed p-values below 0.05, indicating significant group differences. Furthermore, the confidence intervals did not contain zero, further supporting the conclusion. Assumptions were made regarding random assignment, representativeness of the sample, and different variances between groups for the t-tests. These findings suggest that increasing the dose level leads to an increase in tooth growth, providing insights into the relationship between dose level and tooth development in guinea pigs.