library(datasets)
df = ToothGrowth
head(df, 5)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
summary(df)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
dim(df)
## [1] 60 3
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
The data in the ToothGrowth dataset contains the 60 odservations of 3 variables
library(ggplot2)
g <- ggplot(aes(x = supp, y = len), data = df, ) +
geom_boxplot(aes(fill = supp)) + facet_wrap(~ dose) +
ggtitle("Tooth growth of guinea pigs by supplement type and dosage (mg)") +
ylab('Tooth length') +
xlab('Top - dosage (mg) & bottom - medication type')
print(g)
It can be seen from this plot, at highest dosage - 2 mg., the tooth groth is independent on the medication type. At low dosage there can be a dependency on the medication type, but the common picture is what the tooth groth depends more on dosage than on the medication type.
####Let now check if these assumptions are true with the help of the hypotesis testing.
Split the dataset on the OJ and VC parts. Then split the results in the low dosage and highest dosage groups.
oj = ToothGrowth$len[ToothGrowth$supp == 'OJ']
vc = ToothGrowth$len[ToothGrowth$supp == 'VC']
ojHigh = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2]
vcHigh = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2]
ojLow = ToothGrowth$len[ToothGrowth$supp == 'OJ' & ToothGrowth$dose < 2]
vcLow = ToothGrowth$len[ToothGrowth$supp == 'VC' & ToothGrowth$dose < 2]
mean_vcHigh = mean(vcHigh)
mean_ojHigh = mean(ojHigh)
mean_diff = mean_vcHigh - mean_ojHigh
mean_diff
## [1] 0.08
Sx2 = var(vcHigh)
Sy2 = var(ojHigh)
Sp2 = ((length(vcHigh) - 1) * Sx2 + (length(ojHigh) - 1) * Sy2) /(length(vcHigh) + length(ojHigh) - 2)
conf_int = mean_diff + c(-1, 1) * qt(0.975, df=length(vcHigh) + length(ojHigh) - 2) * sqrt(Sp2 * (1 / length(vcHigh) + 1 / length(ojHigh) ))
conf_int
## [1] -3.562999 3.722999
The mean difference zero of the teeth length at both types of medication with high dosage lies nearly in the middle of its confidence interval with 0.95 probability. It means, what the hypotesis what the mean effect of the high dose of two medications is equal is confirmed.
mean(vcHigh)
## [1] 26.14
mean(ojHigh)
## [1] 26.06
t.test(vcHigh, ojHigh, alternative = "two.sided", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: vcHigh and ojHigh
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.63807 3.79807
## sample estimates:
## mean of x mean of y
## 26.14 26.06
#t.test(vcHigh, ojHigh, alternative = "less", paired = FALSE, var.equal = FALSE, conf.level = 0.95)
P - value is high, so it seems the alternative hypothesis: true difference in means of the high dose of two medication is not equal to 0 is not confirmed by the p - value test also.