Part 2: Basic inferential data analysis.

Load the ToothGrowth data and perform some basic exploratory data analyses

Below I loaded the data.

We can see all variables with the names command and we can see some data with head command.

The amout of the ToothGrowth is ontained with nrow command.

data(ToothGrowth) #load the data

names(ToothGrowth)

## [1] "len"  "supp" "dose"

head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

nrow(ToothGrowth)

## [1] 60

Provide a basic summary of the data.

Below we can see a summary of the data to each variable.

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Exploratory Analysis

The plot below shows that longer teeth tend to use a higher dose.

library(ggplot2)
qplot(supp, len, data = ToothGrowth, facets= .~ dose)

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

Now we can see the hypothesus tests with the ToothGrowth data.

t.test(len~supp, paired=F, data=ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

The below code splits the data set into 3 datasets, one for each of the doses. The hypothesis test is then performed on all 3 data sets (dose values 0.5, 1.0 and 2.0).

a = subset(ToothGrowth, dose==0.5)
b = subset(ToothGrowth, dose==1)
c = subset(ToothGrowth, dose==2)

t.test(len~supp, paired=F, data=a)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

t.test(len~supp, paired=F, data=b)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

t.test(len~supp, paired=F, data=c)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

State your conclusions and the assumptions needed for your conclusions.

It can be concluded that as tooth size increases, the doses tend to be higher. The confidence interval is (-0.171, 7.571). The hypothesis test has been performed taking paired as FALSE.