This exercise analyzes the ToothGrowth data in the R datasets package. It provides a summary of the data and also compares tooth growth by supp and dose.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The histogram of the len variable is not a perfect normal distribution but can be assumed to be normal for analysis. The two groups(VC and OJ) are assumed to be independent of each other
The Student’s T test is used to determine if there is a difference in the effect of OJ and VC on the tooth growth
##
## Two Sample t-test
##
## data: ToothGrowth$len by ToothGrowth$supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1670064 7.5670064
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Thus it is observed That we fail to reject the null hypothesis. There isn’t a significant difference in the growth observed due to different suppliments.
The dosage of the growth suppliment has 3 values 0.5, 1, 2. Two of these samples are compared at a time to determine if increasing the dosage has any effect on the growth. The database is divided into groups according the suppliment.
The data for OJ for dose = 0.5 is compared with dose = 1
##
## Two Sample t-test
##
## data: t1$len by t1$dose
## t = -5.0486, df = 18, p-value = 8.358e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.410814 -5.529186
## sample estimates:
## mean in group 0.5 mean in group 1
## 13.23 22.70
It is observed that the mean increases when the dosage is increased (mean of group 0.5 is less than that of group 1)
The data for OJ for dose = 1 is compared with dose = 2
##
## Two Sample t-test
##
## data: t2$len by t2$dose
## t = -2.2478, df = 18, p-value = 0.03736
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.5005017 -0.2194983
## sample estimates:
## mean in group 1 mean in group 2
## 22.70 26.06
It is again observed that the mean increased when the dosage is increased (mean of group 1 is less than that of group 2)
The data for VC for dose = 0.5 is compared with dose = 1
##
## Two Sample t-test
##
## data: t3$len by t3$dose
## t = -7.4634, df = 18, p-value = 6.492e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.264346 -6.315654
## sample estimates:
## mean in group 0.5 mean in group 1
## 7.98 16.77
It is observed that the mean increased when the dosage is increased (mean of group 0.5 is less than that of group 1)
The data for VC for dose = 1 is compared with dose = 2
##
## Two Sample t-test
##
## data: t4$len by t4$dose
## t = -5.4698, df = 18, p-value = 3.398e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -12.96896 -5.77104
## sample estimates:
## mean in group 1 mean in group 2
## 16.77 26.14
It is again observed that the mean increased when the dosage is increased (mean of group 1 is less than that of group 2)
Thus it is observed that orange juice results in more growth than ascorbic acid. Added to this, increasing the dosage in both the cases increases the growth
library(datasets)
library(ggplot2)
library(dplyr)
data(ToothGrowth)
hist(ToothGrowth$len, xlab="Length",main="histogram of length")
data1 <- group_by(ToothGrowth, ToothGrowth$supp) %>%
summarise(
count = n(),
mean = mean(ToothGrowth$len, na.rm = TRUE),
sd = sd(ToothGrowth$len, na.rm = TRUE)
)
res <- t.test(ToothGrowth$len ~ ToothGrowth$supp, data = data1, var.equal = T)
res
t1 <- ToothGrowth[which(ToothGrowth$supp=="OJ" & ToothGrowth$dose<=1),]
t2 <- ToothGrowth[which(ToothGrowth$supp=="OJ" & ToothGrowth$dose>=1),]
t3 <- ToothGrowth[which(ToothGrowth$supp=="VC" & ToothGrowth$dose<=1),]
t4 <- ToothGrowth[which(ToothGrowth$supp=="VC" & ToothGrowth$dose>=1),]
dat1 <- group_by(t1, t1$dose) %>%
summarise(
count = n(),
mean = mean(t1$len, na.rm = TRUE),
sd = sd(t1$len, na.rm = TRUE)
)
res1 <- t.test(t1$len ~ t1$dose, data = dat1, var.equal = T)
res1
dat2 <- group_by(t2, t2$dose) %>%
summarise(
count = n(),
mean = mean(t2$len, na.rm = TRUE),
sd = sd(t2$len, na.rm = TRUE)
)
res2 <- t.test(t2$len ~ t2$dose, data = dat2, var.equal = T)
res2
dat3 <- group_by(t3, t3$dose) %>%
summarise(
count = n(),
mean = mean(t3$len, na.rm = TRUE),
sd = sd(t3$len, na.rm = TRUE)
)
res3 <- t.test(t3$len ~ t3$dose, data = dat3, var.equal = T)
res3
dat4 <- group_by(t4, t4$dose) %>%
summarise(
count = n(),
mean = mean(t4$len, na.rm = TRUE),
sd = sd(t4$len, na.rm = TRUE)
)
res4 <- t.test(t4$len ~ t4$dose, data = dat4, var.equal = T)
res4