Overview: This section covers statistical data analysis using the ToothGrowth data, using only methods covered in class.
Questions 1 + 2: Load the ToothGrowth data and perform basic exploratory data analyses, and a basic summary of the data
Load data and packages, plot a basic chart to visualise data.
library(ggplot2)
library(datasets)
colnames(ToothGrowth)
## [1] "len" "supp" "dose"
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
#Basic plot
qplot(dose ,len ,data = ToothGrowth,
col = supp,
main = "Tooth growth of guinea pigs by supplement type and dosage (mg)",
xlab = "Dosage (mg)",
ylab = "Tooth length")
Use box plots to better see the differences between the different supplements.
qplot(supp, len, data = ToothGrowth,
facets = ~dose,
main = "Tooth growth of guinea pigs by supplement type and dosage (mg)",
xlab = "Supplement type",
ylab = "Tooth length") +
geom_boxplot(aes(fill = supp))
OJ generally performs better compared to VC increasing the dosage (from 0.5, to 1, to 2) increased tooth length, for both supplement types
Comparing tooth length by supplement and dose
Hypothesis 1
Null hypothesis: there is no difference in tooth growth given OJ or VC. Alternative hypothesis: tooth growth is greater when using OJ than VC.
VC.length <- ToothGrowth$len[ToothGrowth$supp == "VC"]
OJ.length <- ToothGrowth$len[ToothGrowth$supp == "OJ"]
Students t test - use the two vectors stated above to perform a t test.
t.test(OJ.length, VC.length,
alternative = "greater", # testing hypothesis if OJ is *greater* than VC
paired = FALSE, # the data point are not paired with each other
var.equal = FALSE, # the variances are not equal
conf.level = 0.95) # a 95% confidence itnerval is taken as the default if nothing else is defined
##
## Welch Two Sample t-test
##
## data: OJ.length and VC.length
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
The p-values of this comparison is p = 3%, lower than 5%. We reject the null hypothesis. In other words: The chance that the null hypothesis (no difference in growth between OJ and VC) is true and that the data turned out as it is (one can see that there is a clear difference between OJ and VC), is 3%, which is too low to accept. We conclude that the alternative hypothesis is true: OJ has a greater impact on tooth growth than VC
Hypothesis 2:
For this case, the null hypothesis is that there is no difference in tooth growth rates for different doses.
dose_0.5 <- ToothGrowth$len[ToothGrowth$dose == "0.5"]
dose_1 <- ToothGrowth$len[ToothGrowth$dose == "1"]
dose_2 <- ToothGrowth$len[ToothGrowth$dose == "2"]
Step 1: perform a t-test between dose_0.5 and dose_1
t.test(dose_0.5, dose_1, alternative = "less", # is the alternative that dose_0.5 has a smaller mean than dose_1
paired = FALSE, # the data points are not paired
var.equal = FALSE, # the variances are not equal
conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: dose_0.5 and dose_1
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -6.753323
## sample estimates:
## mean of x mean of y
## 10.605 19.735
The p-value is very small (6.342e-8), therefore we can conclude that the null hypothesis can be rejected looking at dose_0.5 and dose_1
Step 2: perform a t-test between dose_1 and dose_2
t.test(dose_1, dose_2,
alternative = "less", # is the alternative that dose_1 has a smaller mean than dose_2
paired = FALSE,
var.equal = FALSE,
conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: dose_1 and dose_2
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -4.17387
## sample estimates:
## mean of x mean of y
## 19.735 26.100
Here too, the p-value is very small, therefore we can reject the null hypothesis. So the higher the dosage gets, more the teeth grow.
Question 4: State your conclusions and the assumptions needed for your conclusions