Observing the distribution of tooth length, to see normality.
# Load the dataset
data(ToothGrowth)
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
# Distribution of tooth length - to see normality
hist(ToothGrowth$len,
main = "Distribution of Tooth Length",
xlab = "Tooth Length",
ylab = "Frequency",
col = "skyblue",
border = "white")
Supplement type: categorical, tooth length: continuous
Null Hypothesis: there is no difference in mean tooth lebgth from supplement type. Alternative Hypothesis: there is a difference in mean length from supplement type.
As the p-value is equal to 0.06, it is not statistically significant, thus, we do not reject the null hypothesis where there is no difference in mean tooth length from supplement type.
# Boxplot
boxplot(len ~ supp, data = ToothGrowth,
main = "Effect of Supplement Type on Tooth Length",
xlab = "Supplement Type",
ylab = "Tooth Length",
col = c("orange", "skyblue"))
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
# p-value = 0.06063 > 0.05 (dont reject null hypothesis, insufficient evidence to claim that supplement type affects tooth length)
dosage: categorical (1 group, 3 variables), tooth length: continuous
Null Hypothesis: there is no difference in the mean tooth length from increasing dosage. Alternative Hypothesis: there is a difference in the mean tooth length from increasing dosage.
Degrees of freedom: n = 60, k = 3 Dosage (ESS df): 3 - 1 = 2 Residuals (RSS dr) = 60 - 3 = 57
F-value: 67.416 (large F value) Therefore, the variation between dosage across all three dosage levels is 67.416 times larger than the variation within each dosage group. Thus, this shows that dosage at different levels has a significant influence on tooth length.
# Boxplot by dose
boxplot(len ~ dose, data = ToothGrowth,
main = "Effect of Dose on Tooth Length",
xlab = "Dose (mg/day)",
ylab = "Tooth Length",
col = c("lightgreen", "skyblue", "orange"))
# Fit linear model
lm_model = lm(len ~ factor(dose), data = ToothGrowth)
# Perform ANOVA
anova(lm_model)
## Analysis of Variance Table
##
## Response: len
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(dose) 2 2426.4 1213.2 67.416 9.533e-16 ***
## Residuals 57 1025.8 18.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# F value = 67.416 --> indicates a large diff in the groups mean --> it doSE increase tooth length (a larger F value is seen from including the p-value as well)
# p-value = 9.53e-16 < 0.05 (reject null hypothesis, statistically significant difference in tooth length - increasing dose does increase tooth length)
Use Welch Two Sample t-test, as dataset is approximately normal, so t-test can be used.