———————————————————-

Below is some of the information returned when help() is called on the dataset.

———————————————————-

The Effect of Vitamin C on Tooth Growth in Guinea Pigs

Description

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Usage

ToothGrowth Format

A data frame with 60 observations on 3 variables.

[,1] len numeric Tooth length

[,2] supp factor Supplement type (VC or OJ).

[,3] dose numeric Dose in milligrams/day

———————————————————

We call some exploratory functions on the data set below.

dim(df)

## [1] 60  3

names(df)

## [1] "len"  "supp" "dose"

head(df)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

summary(df)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

In summary; we are dealing with 60 rows of data with 3 columns for each record with variable types as outlined above.

df_split<-split(df,df$supp)
df_OJ<-df_split$OJ
df_VC<-df_split$VC

head(df_VC)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

# Combined supp 
ggplot(df)+geom_point(aes(x=len,y=dose,colour=supp))

df_group<-df%>%group_by(supp,dose) %>%summarise(mean_len=(mean(len)),sd_len=sd(len))  #ggplot(df_group)+geom_point(aes(x=mean_len,y=dose,colour=supp))

#df_group<-summarise(df_group,mean_len=(mean(len)))


ggplot(df_group,aes(dose,mean_len))+geom_bar(stat="identity",aes(fill=supp),position="dodge")+xlab("dose amount")+ylab("mean length")+ggtitle("Mean Tooth Growth (length) under each treatment by with different dosages")

ggplot(df_group,aes(dose,sd_len))+geom_bar(stat="identity",aes(fill=supp),position="dodge")+xlab("dose amount")+ylab("standard deviation of length")+ggtitle("Standard deviation of Tooth Growth (length) under each treatment by with different dosages")

3: Comparing groups (supp and dose)

From the graphs produced, we have some hypothesis that we would like to test regards our supp / dose group combinations.

Our summary thus far gives us some hints that there are some differences between supp’s for different dosages.

df_group

## Source: local data frame [6 x 4]
## Groups: supp [?]
## 
##     supp  dose mean_len   sd_len
##   <fctr> <dbl>    <dbl>    <dbl>
## 1     OJ   0.5    13.23 4.459709
## 2     OJ   1.0    22.70 3.910953
## 3     OJ   2.0    26.06 2.655058
## 4     VC   0.5     7.98 2.746634
## 5     VC   1.0    16.77 2.515309
## 6     VC   2.0    26.14 4.797731

From visual inspection, we can see that dosage 0.5 and 1 across the two supp groups seems to have a large variance across means, where as dosage 2 seems to have similar means across the two groups.

We now want to test our guesses by way of some hypothesis testing.

We will perform four tests.

Test 1: Is there a statistically significant diffenence between the growth length across the two groups for all dosage values?

Test 2,3,4: Is there a statistically significant diffenence between the growth length between each of the dosage groups?

Because we are considering the above or below approach, this will be a two sided t-test. We will use a confidence interval of 95% (default value)

Test 1:

H0: There is no difference in means across the two supp groups (for all dosage values)

m(VC)= mean of VC group m(OJ)= mean of OJ group

H0: m(VC)-m(OJ)=0 H1: m(VC)-m(OJ)<>0

t.test(df$len ~ df$supp)

## 
##  Welch Two Sample t-test
## 
## data:  df$len by df$supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Test 2:

H0: There is no difference in means across the two supp groups (for dose = 0.5)

m(VC)= mean of VC group for dose 0.5 m(OJ)= mean of OJ group for dose 0.5

H0: m(VC)-m(OJ)=0 H1: m(VC)-m(OJ)<>0

#Create a data frame to hold the dose 0.5 results 
df_0.5<-subset(df,df$dose==0.5)


t.test(df_0.5$len ~ df_0.5$supp)

## 
##  Welch Two Sample t-test
## 
## data:  df_0.5$len by df_0.5$supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

Test 3:

H0: There is no difference in means across the two supp groups (for dose = 1)

m(VC)= mean of VC group for dose 1 m(OJ)= mean of OJ group for dose 1

H0: m(VC)-m(OJ)=0 H1: m(VC)-m(OJ)<>0

#Create a data frame to hold the dose 1 results 
df_1<-subset(df,df$dose==1)

t.test(df_1$len ~ df_1$supp)

## 
##  Welch Two Sample t-test
## 
## data:  df_1$len by df_1$supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

Test 4:

H0: There is no difference in means across the two supp groups (for dose = 2)

m(VC)= mean of VC group for dose 2 m(OJ)= mean of OJ group for dose 2

H0: m(VC)-m(OJ)=0 H1: m(VC)-m(OJ)<>0

#Create a data frame to hold the dose 0.5 results 
df_2<-subset(df,df$dose==2)

t.test(df_2$len ~ df_2$supp)

## 
##  Welch Two Sample t-test
## 
## data:  df_2$len by df_2$supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

4: Conclusions

From our hypothesis test results, we can say the following.

At 95% confidence, we can accept the null hypothsis that there is no significant difference between the two supp groups when considering all dosage levels.

When you examine data at each of the three dosage levels, there is enough evidence to support (at 95% confidence) that there is a significant difference between the supp groups for dosage levels of 0.5 and 1, but not enough support to reject the null hypothesis for dosage values of 2.

T-test Assumptions

Observed variable is a continous measurement.

That the sample is ramdomly selected from the population (assuming the experiment designers have adhered to this)

Data is normally distributed when plotted, below is what the data looks like for each of the subsets of tests that were completed.
Some of the supp / dose combinations may not conform to this…. The appendix section plots historgram results of each of the supp / dose combinations.

Concerning combinations are depicted below: -dose = 0.5 for both OJ and VC supp values -dose = 1 for OJ supp group

# dose = 0.5 and supp = OJ
df_0.5OJ <- filter(df_0.5, supp == "OJ")
hist(df_0.5OJ$len)

# dose = 0.5 and supp = VC
df_0.5VC <- filter(df_0.5, supp == "VC")
hist(df_0.5VC$len)

# dose = 1 and supp = OJ
df_1OJ <- filter(df_1, supp == "OJ")
hist(df_1OJ$len)

Sample size is assumed to be sufficently large.

The final assumption is homogeneity of variance. Homogeneous, or equal, variance exists when the standard deviations of samples are approximately equal.

Statistical Inference Course Project: Simulation with basic inferential data analysis

Leslie Bodgan

7 December 2017

Overview:

1: Loading the dataset (ToothGrowth)

2: Basic summary of the data

———————————————————-

———————————————————-

———————————————————

3: Comparing groups (supp and dose)

4: Conclusions