Part 2: Basic Inferential Data Analysis Instructions

1-Load the ToothGrowth data and perform some basic exploratory data analyses

data(ToothGrowth)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

plot(ToothGrowth$len~ToothGrowth$supp)

ToothGrowth$dose=as.factor(ToothGrowth$dose)
plot(ToothGrowth$len~ToothGrowth$dose)

table(ToothGrowth$supp,ToothGrowth$dose)

##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

2-Provide a basic summary of the data.

summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

3-Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

-The populations are independent —- Yes, since the samples from the two are not related.

-The population variances are equal —- Yes (ratio falls from 0.5 to 2) ===> pooled t.test

sd(ToothGrowth$len[ToothGrowth$supp=="OJ"])/sd(ToothGrowth$len[ToothGrowth$supp=="VC"])

## [1] 0.7991215

var.test(ToothGrowth$len~ToothGrowth$supp)

## 
##  F test to compare two variances
## 
## data:  ToothGrowth$len by ToothGrowth$supp
## F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.3039488 1.3416857
## sample estimates:
## ratio of variances 
##          0.6385951

-Each population is either normal or the sample size is large —– Yes n_1 and n_2 > 25

ggpubr::ggqqplot(ToothGrowth$len[ToothGrowth$supp=="OJ"])

ggpubr::ggqqplot(ToothGrowth$len[ToothGrowth$supp=="VC"])

t.test(ToothGrowth$len[ToothGrowth$supp=="OJ"],ToothGrowth$len[ToothGrowth$supp=="VC"],paired = F, var.equal = T)

## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

%%%%%ANOVA Assumptions 1-The responses for each factor level have a normal population distribution. ——Yes

ks.test(ToothGrowth$len[ToothGrowth$dose == 0.5],"pnorm" , mean=mean(ToothGrowth$len[ToothGrowth$dose == 0.5]), sd=sd(ToothGrowth$len[ToothGrowth$dose == 0.5]))

## Warning in ks.test(ToothGrowth$len[ToothGrowth$dose == 0.5], "pnorm", mean =
## mean(ToothGrowth$len[ToothGrowth$dose == : ties should not be present for the
## Kolmogorov-Smirnov test

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 0.5]
## D = 0.17117, p-value = 0.6011
## alternative hypothesis: two-sided

ks.test(ToothGrowth$len[ToothGrowth$dose == 1],"pnorm" , mean=mean(ToothGrowth$len[ToothGrowth$dose == 1]), sd=sd(ToothGrowth$len[ToothGrowth$dose == 1]))

## Warning in ks.test(ToothGrowth$len[ToothGrowth$dose == 1], "pnorm", mean =
## mean(ToothGrowth$len[ToothGrowth$dose == : ties should not be present for the
## Kolmogorov-Smirnov test

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 1]
## D = 0.15935, p-value = 0.6901
## alternative hypothesis: two-sided

ks.test(ToothGrowth$len[ToothGrowth$dose == 2],"pnorm" , mean=mean(ToothGrowth$len[ToothGrowth$dose == 2]), sd=sd(ToothGrowth$len[ToothGrowth$dose == 2]))

## Warning in ks.test(ToothGrowth$len[ToothGrowth$dose == 2], "pnorm", mean =
## mean(ToothGrowth$len[ToothGrowth$dose == : ties should not be present for the
## Kolmogorov-Smirnov test

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 2]
## D = 0.13684, p-value = 0.848
## alternative hypothesis: two-sided

2-These distributions have the same variance. ——-Yes (compare the smallest and largest sample standard deviations: alls within 0.5 to 2)

max(c(sd(ToothGrowth$len[ToothGrowth$dose==0.5]),sd(ToothGrowth$len[ToothGrowth$dose==1]),sd(ToothGrowth$len[ToothGrowth$dose==2])))/min(c(sd(ToothGrowth$len[ToothGrowth$dose==0.5]),sd(ToothGrowth$len[ToothGrowth$dose==1]),sd(ToothGrowth$len[ToothGrowth$dose==2])))

## [1] 1.192259

bartlett.test(ToothGrowth$len~as.factor(ToothGrowth$dose))

## 
##  Bartlett test of homogeneity of variances
## 
## data:  ToothGrowth$len by as.factor(ToothGrowth$dose)
## Bartlett's K-squared = 0.66547, df = 2, p-value = 0.717

3-The data are independent. ——Yes

anova(lm(ToothGrowth$len~ToothGrowth$dose))

## Analysis of Variance Table
## 
## Response: ToothGrowth$len
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## ToothGrowth$dose  2 2426.4  1213.2  67.416 9.533e-16 ***
## Residuals        57 1025.8    18.0                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#or summary(aov(ToothGrowth$len~ToothGrowth$dose))

TukeyHSD(aov(ToothGrowth$len~ToothGrowth$dose))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = ToothGrowth$len ~ ToothGrowth$dose)
## 
## $`ToothGrowth$dose`
##         diff       lwr       upr    p adj
## 1-0.5  9.130  5.901805 12.358195 0.00e+00
## 2-0.5 15.495 12.266805 18.723195 0.00e+00
## 2-1    6.365  3.136805  9.593195 4.25e-05

4-State your conclusions and the assumptions needed for your conclusions.

ggpubr::ggdensity(ToothGrowth$len, 
          main = "Density plot of tooth length",
          xlab = "Tooth length")

ggpubr::ggqqplot(ToothGrowth$len)

ks.test(ToothGrowth$len,"pnorm", mean=mean(ToothGrowth$len),sd=sd(ToothGrowth$len))

## Warning in ks.test(ToothGrowth$len, "pnorm", mean = mean(ToothGrowth$len), :
## ties should not be present for the Kolmogorov-Smirnov test

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  ToothGrowth$len
## D = 0.097092, p-value = 0.6237
## alternative hypothesis: two-sided

Part 2: Basic Inferential Data Analysis Instructions

Hasan Misaii

5/17/2020