This is the project for the statistical inference class. In it, you will use simulation to explore inference and do some simple inferential data analysis. The project consists of two parts:
Now in the second portion, we’re going to analyze the ToothGrowth data in the R datasets package.
1. Load the ToothGrowth data and perform some basic exploratory data analyses
2. Provide a basic summary of the data.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there's other approaches worth considering)
4. State your conclusions and the assumptions needed for your conclusions.
First, load the packages knitr and ggplot2
library(knitr)
library(ggplot2)
1. Load the ToothGrowth data and perform some basic exploratory data analyses ### Load the required data We first load the data (the R code is hidden but there is a a list of required packages in the appendix). Is not very clear what the data describe from the data set itself, so we found a description on the web: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html . ### Description of data The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
2. Provide a basic summary of the data.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
The plot shows the distribution of the tooth length depending of the Supplement type recieved.
3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
table(ToothGrowth$dose, ToothGrowth$supp)
##
## OJ VC
## 0.5 10 10
## 1 10 10
## 2 10 10
xxx <- table(ToothGrowth$dose, ToothGrowth$supp)
fit <- lm(len ~ dose + supp, data=ToothGrowth)
confint(fit)
## 2.5 % 97.5 %
## (Intercept) 10.475238 14.434762
## dose1 6.705297 11.554703
## dose2 13.070297 17.919703
## suppVC -5.679762 -1.720238
Null hypothesis: “The tooth length mean, is the same regardless of the delivery supplements” Alternative hypothesis: “The tooth length mean, is different due to different supplements” Apply the two sided t’test with equal variance.
summary(fit)
##
## Call:
## lm(formula = len ~ dose + supp, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.085 -2.751 -0.800 2.446 9.650
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.4550 0.9883 12.603 < 2e-16 ***
## dose1 9.1300 1.2104 7.543 4.38e-10 ***
## dose2 15.4950 1.2104 12.802 < 2e-16 ***
## suppVC -3.7000 0.9883 -3.744 0.000429 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.828 on 56 degrees of freedom
## Multiple R-squared: 0.7623, Adjusted R-squared: 0.7496
## F-statistic: 59.88 on 3 and 56 DF, p-value: < 2.2e-16
4. State your conclusions and the assumptions needed for your conclusions.
The confidence intervals mean collect about of the sets of data and its estimate parameters of the linear model with the 95% confidenece interval. For each coefficient (i.e. intercept, dose and suppVC), the null hypothesis is that the coefficients are zero, meaning that no tooth length variation is explained by that variable. All p-values are less than 0.05, rejecting the null hypothesis and suggesting that each variable explains a significant portion of variability in tooth length, assuming the significance level is 5%.
The effect of the dose can also be identified using regression analysis. One interesting question about of the efect of the supplement type on the tooth length.
If the dose of Vitamin C is 2.0 mg, is a strong statistical evidence that the ssupplements type is not important, the probability is 0.96 and we accept the null hypothesis. However if the dose is 0.5 or 1 mg, is strong statistical evidence that the type suplement is important. The difference in doses 0.5 and the probability 0.005, and dose 1 mg and probability in 0.001. Thus we reject the null hypothesis , the type supplements does?n affect the tooth length in 0.5 or 1. mg.
My analysis has been originally created and run in RStudio Version 0.98.987 under Windows Home Premium, AMD FX(tm)-4100 Quad-Core Processor with 3.60 Ghz and 8 Gb RAM. Time and date about of the report generation
## [1] "Sun Nov 23 14:06:09 2014"
library(ggplot2)
ggplot(data=ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
geom_bar(stat="identity",) +
facet_grid(. ~ supp) +
xlab("Dose in miligrams") +
ylab("Tooth length") +
guides(fill=guide_legend(title="Supplement type"))