Loading the data

Observing the distribution of tooth length, to see normality.

# Load the dataset
data(ToothGrowth)
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
# Distribution of tooth length - to see normality
hist(ToothGrowth$len,
     main = "Distribution of Tooth Length",
     xlab = "Tooth Length",
     ylab = "Frequency",
     col = "skyblue",
     border = "white")

1) Does supplement type (OJ vs VC) affect tooth length?

Supplement type: categorical, tooth length: continuous

Null Hypothesis: there is no difference in mean tooth lebgth from supplement type. Alternative Hypothesis: there is a difference in mean length from supplement type.

As the p-value is equal to 0.06, it is not statistically significant, thus, we do not reject the null hypothesis where there is no difference in mean tooth length from supplement type.

# Boxplot
boxplot(len ~ supp, data = ToothGrowth,
        main = "Effect of Supplement Type on Tooth Length",
        xlab = "Supplement Type",
        ylab = "Tooth Length",
        col = c("orange", "skyblue"))

t.test(len ~ supp, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333
# p-value = 0.06063 > 0.05 (dont reject null hypothesis, insufficient evidence to claim that supplement type affects tooth length)

2) Does increasing dose increase tooth length?

dosage: categorical (1 group, 3 variables), tooth length: continuous

Null Hypothesis: there is no difference in the mean tooth length from increasing dosage. Alternative Hypothesis: there is a difference in the mean tooth length from increasing dosage.

Degrees of freedom: n = 60, k = 3 Dosage (ESS df): 3 - 1 = 2 Residuals (RSS dr) = 60 - 3 = 57

F-value: 67.416 (large F value) Therefore, the variation between dosage across all three dosage levels is 67.416 times larger than the variation within each dosage group. Thus, this shows that dosage at different levels has a significant influence on tooth length.

# Boxplot by dose
boxplot(len ~ dose, data = ToothGrowth,
        main = "Effect of Dose on Tooth Length",
        xlab = "Dose (mg/day)",
        ylab = "Tooth Length",
        col = c("lightgreen", "skyblue", "orange"))

# Fit linear model
lm_model = lm(len ~ factor(dose), data = ToothGrowth)

# Perform ANOVA
anova(lm_model) 
## Analysis of Variance Table
## 
## Response: len
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## factor(dose)  2 2426.4  1213.2  67.416 9.533e-16 ***
## Residuals    57 1025.8    18.0                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# F value = 67.416 --> indicates a large diff in the groups mean --> it doSE increase tooth length (a larger F value is seen from including the p-value as well)
# p-value = 9.53e-16 < 0.05 (reject null hypothesis, statistically significant difference in tooth length - increasing dose does increase tooth length)

3) Which test would you use to compare tooth length between two supplement types?

Use Welch Two Sample t-test, as dataset is approximately normal, so t-test can be used.