Project Part 2

Part 2: Basic Inferential Data Analysis Instructions

1. Load the ToothGrowth data and perform some basic exploratory data analyses

library(datasets)
# Look at the structure of this dataset and see some distribution of variables
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

hist(ToothGrowth$len)

hist(ToothGrowth$dose)

From this plot, we can tell that dose can be treated as factor instead of numbers. I changed the dose from numerical observations to factor observations.

ToothGrowth$dose <- as.factor(ToothGrowth$dose)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

2. Provide a basic summary of the data.

# Get the summary of this dataset
summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

Becuase there are only 50 observations for each variable in this dataset, we should use the t confidence intervals. In addition, due to grouping factor must have exactly 2 levels, so we’d better compare the supp variables given by different dose. There are three different dose here: 0.5, 1 and 2, so we mainly give these three different dose t test.

dose = 0.5

dose0.5_data <- ToothGrowth[which(ToothGrowth$dose == "0.5"), ]
head(dose0.5_data)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = dose0.5_data)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

dose = 1

dose1_data <- ToothGrowth[which(ToothGrowth$dose == "1"), ]
head(dose1_data)

##     len supp dose
## 11 16.5   VC    1
## 12 16.5   VC    1
## 13 15.2   VC    1
## 14 17.3   VC    1
## 15 22.5   VC    1
## 16 17.3   VC    1

t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = dose1_data)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

dose = 2

dose2_data <- ToothGrowth[which(ToothGrowth$dose == "2"), ]
head(dose2_data)

##     len supp dose
## 21 23.6   VC    2
## 22 18.5   VC    2
## 23 33.9   VC    2
## 24 25.5   VC    2
## 25 26.4   VC    2
## 26 32.5   VC    2

t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = dose2_data)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

4. State your conclusions and the assumptions needed for your conclusions.

For dose equals to 0.5 and 1, the t test of OJ and VC are significant. Both of them reject the null hypothesis, which means that OJ and VC have totally different effect the len of teeth.
For dose equals to 2, the 0 is in the interval and p-value is so big. So we fail to reject the null hypothesis, which states that true difference in means is equal to 0.
Because of the change of dose, the significance relationship between supp and length is changed. This indicates that dose has a bigger influence on len of teeth than the supp does.