Part 2 project: Basic Inferential Data Analysis
This part involved analysing the ToothGrowth data set in the R package
The experiment for this data was looking at the induction of teeth growth by 2 different supplements admnistered at different doses
Load data
# Loading the data
data("ToothGrowth")
# Use the head function to just take a peep at the data
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
# Our data contained 3 columns:
# 1) len which is the lenght of teeth
# 2) supp which is the supplement that was used for the induction
# 3) dose which is the amount of the supplement administered
# Plot the distribution of the len column
hist(ToothGrowth$len)
# Plot the distribution of the dose column
hist(ToothGrowth$dose)
# The supp column is categorical so I used the function table to summarize it
table(ToothGrowth$supp)
##
## OJ VC
## 30 30
# I used the str function to look at the structure of the data
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# I also used the summary function to look at the summary of the variables
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
# Size of the data set
nrow(ToothGrowth)
## [1] 60
Now lets see if the supplement and the dose of the supplement have an effect on the growth of teeth
# I want to use the ggplot2. So load that library
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
# My response variable is the len column, so I want to put that on the y axis
# The variable that hopefully causes the response is the supplement (supp column)
# So I want to plot the relationship between OJ or VC supplement and the len column
# so put the supp data on the x axis.
# Each supplement were prescribed at 0.5,1 or 2 mg/ml dose.
# So we facet the data on dose so we can have plots for each dose
# I used the qplot function which is the most basic function in ggplot2
qplot(supp,len,data=ToothGrowth, facets=~dose)
# This display here is boring and not intuitive so I will
# convert this into a boxplot by adding the geom_boxplot() function
# I added the aesthetic by color coding the supplement
qplot(supp,len,data=ToothGrowth, facets=~dose) + geom_boxplot(aes(fill=supp))
# Analyzing the graphical display of the data , it seems that supplement OJ seem to have a positive effect on teeth length
# compared to the VC. In addition, it seems that teeth lenght increased with increasing dose.
Now we have these observations made by graphically displaying the data; the question is can we provide enough statistical evidence to back up these observations?
First Analysis: Test if OJ supplement induce more growth of teeth when compared to VC supplement Null Hypothesis: There is no difference in the length of teeth induced by OJ when compared to VC Alternative hypothesis: There is a difference in the lenght of teeth induced by OJ when compared to VC Test: I will do a two-sided unpaired t-test to examine difference in both OJ and VC teeth length
# To compare the length of OJ and VC, we need to access each data
# Subsetting all data for OJ or VC respecitively from the data set
OJ_data = subset(ToothGrowth, supp=='OJ')
VC_data = subset(ToothGrowth, supp=='VC')
# Now I just want the length colums for each
Len_OJ = OJ_data$len
Len_VC = VC_data$len
# Lets look at the two data set
Len_OJ
## [1] 15.2 21.5 17.6 9.7 14.5 10.0 8.2 9.4 16.5 9.7 19.7 23.3 23.6 26.4
## [15] 20.0 25.2 25.8 21.2 14.5 27.3 25.5 26.4 22.4 24.5 24.8 30.9 26.4 27.3
## [29] 29.4 23.0
Len_VC
## [1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 11.2 5.2 7.0 16.5 16.5 15.2 17.3
## [15] 22.5 17.3 13.6 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5
## [29] 23.3 29.5
# We have the two lengths of teeth induced by two different supplement. We can now test if those lengths
# come from the same distribution or is the sample distribution different
t.test(Len_OJ,Len_VC,paired=FALSE, alternative='two.sided', var.equal= FALSE)
##
## Welch Two Sample t-test
##
## data: Len_OJ and Len_VC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
# A two sided t-test did not show enough evidence to reject the Null and accept the alternative
# However, looking at the data, there seems to be a difference from the graphs.
# At least at 0.5 and 1mg/ml dose, OJ induced more teeth growth than VC. That difference dissipates at a higher dose of 2mg/ml.
# So lets recast our question and test specifically if OJ induces more teeth growth than VC
Null Hypothesis: There is no difference in the length of teeth induced by OJ when compared to VC Alternative hypothesis: OJ supplement induces more teeth growth than VC Test: I will do a one sided unpaired t-test to specifically ask if lenght OJ is greater than lenght VC
t.test(Len_OJ,Len_VC,paired=FALSE, alternative='greater', var.equal= FALSE)
##
## Welch Two Sample t-test
##
## data: Len_OJ and Len_VC
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
# A one sided test specifically testing if OJ length is greater than VC length
# provided enough evidence to reject the Null hypothesis in favor of the alternative hypothesis
So from the graphical display, it seems that at 2mg/ml dose there is no more difference Maybe thats why the 2 sided t.test did not yield enough statistical evidence to reject the NULL hypothesis
# Getting OJ data for just supp 0.5 or 1.0
OJ_data_0.5_1 = subset(OJ_data,dose== 1.0 | dose==0.5)
OJ_data_0.5_1
## len supp dose
## 31 15.2 OJ 0.5
## 32 21.5 OJ 0.5
## 33 17.6 OJ 0.5
## 34 9.7 OJ 0.5
## 35 14.5 OJ 0.5
## 36 10.0 OJ 0.5
## 37 8.2 OJ 0.5
## 38 9.4 OJ 0.5
## 39 16.5 OJ 0.5
## 40 9.7 OJ 0.5
## 41 19.7 OJ 1.0
## 42 23.3 OJ 1.0
## 43 23.6 OJ 1.0
## 44 26.4 OJ 1.0
## 45 20.0 OJ 1.0
## 46 25.2 OJ 1.0
## 47 25.8 OJ 1.0
## 48 21.2 OJ 1.0
## 49 14.5 OJ 1.0
## 50 27.3 OJ 1.0
Len_OJ_0.5_1 = OJ_data_0.5_1$len
Len_OJ_0.5_1
## [1] 15.2 21.5 17.6 9.7 14.5 10.0 8.2 9.4 16.5 9.7 19.7 23.3 23.6 26.4
## [15] 20.0 25.2 25.8 21.2 14.5 27.3
# Getting VC data for just supp 0.5 or 1.0
VC_data_0.5_1 = subset(VC_data,dose== 1.0 | dose==0.5)
VC_data_0.5_1
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
## 7 11.2 VC 0.5
## 8 11.2 VC 0.5
## 9 5.2 VC 0.5
## 10 7.0 VC 0.5
## 11 16.5 VC 1.0
## 12 16.5 VC 1.0
## 13 15.2 VC 1.0
## 14 17.3 VC 1.0
## 15 22.5 VC 1.0
## 16 17.3 VC 1.0
## 17 13.6 VC 1.0
## 18 14.5 VC 1.0
## 19 18.8 VC 1.0
## 20 15.5 VC 1.0
Len_VC_0.5_1 = VC_data_0.5_1$len
Len_VC_0.5_1
## [1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 11.2 5.2 7.0 16.5 16.5 15.2 17.3
## [15] 22.5 17.3 13.6 14.5 18.8 15.5
# Now we can run the 2 sided t-test on the data that excluded doses of 2mg/ml
Null Hypothesis: There is no difference in the length of teeth induced by OJ when compared to VC Alternative hypothesis: There is a difference in the lenght of teeth induced by OJ when compared to VC Test: I will do a two-sided unpaired t-test to examine difference in both OJ and VC teeth length
t.test(Len_OJ_0.5_1,Len_VC_0.5_1,paired=FALSE, alternative='two.sided', var.equal= FALSE)
##
## Welch Two Sample t-test
##
## data: Len_OJ_0.5_1 and Len_VC_0.5_1
## t = 3.0503, df = 36.553, p-value = 0.004239
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.875234 9.304766
## sample estimates:
## mean of x mean of y
## 17.965 12.375
SUMMARY: There is enough statistical evidence to reject the NULL hypothesis that there is no difference in the effect of supplement on teeth growth. A 2 sided t-test initially did not yield that statistical evidence. However, by just looking at the graphs, I could make a more specific hypothesis namely: Supplement OJ induce more teeth growth than supplement VC. As a result, I designed my statistical analysis to test that hypothesis specifically. Basically,instead of performing a two-sided t-test that asked a general question if there is any difference (and thus decreases the power of the test), I performed a one sided t-test where I asked if OJ len is greater than VC len. That gave enough evidence to reject the Null and accept the alternative hypothesis. Inspecting the graphical display, it seems the discrepancy between the one sided t test and two sided t test might be explained by the lack of difference in the 2mg/ml experiments. So I took those out and repeated the 2 sided t test And this resulted in enough statistical evidence to reject the NULL and accept the ALTERNATIVE in this scenario
CONCLUSION: OJ supplement induces more teeth growth than VC supplement, especially when administered at 0.5 or 1 mg/ml At higher doses (from 2mg/ml) both brand induce same rate of growth.
Second Analysis: Test if teeth length increases with higher doses of supplement Null Hypothesis: There is no difference in the length of teeth induced by the different doses of supplement Alternative hypothesis: Increasing doses of supplement leads to increased teeth length
Test: I will do a two-sided unpaired t-test to examine differences between a) teeth lenght of dose 0.5 and dose 1.0 b) teeth lenght of dose 0.5 and dose 2.0 c) teeth lenght of dose 1.0 and 2.0
# First thing is to subset the data based on the doses
Dose0.5 = subset(ToothGrowth,dose==0.5)
Dose1.0 = subset(ToothGrowth,dose==1.0)
Dose2.0 = subset(ToothGrowth,dose==2.0)
# Now I want the lenghts
Len_Dose0.5 = Dose0.5$len
Len_Dose1.0 = Dose1.0$len
Len_Dose2.0 = Dose2.0$len
# Inspecting the length data
Len_Dose0.5
## [1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 11.2 5.2 7.0 15.2 21.5 17.6 9.7
## [15] 14.5 10.0 8.2 9.4 16.5 9.7
Len_Dose1.0
## [1] 16.5 16.5 15.2 17.3 22.5 17.3 13.6 14.5 18.8 15.5 19.7 23.3 23.6 26.4
## [15] 20.0 25.2 25.8 21.2 14.5 27.3
Len_Dose2.0
## [1] 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5 25.5 26.4 22.4 24.5
## [15] 24.8 30.9 26.4 27.3 29.4 23.0
# Now I can go ahead and test if there are differences between these lengths
t.test(Len_Dose0.5,Len_Dose1.0,paired=FALSE, alternative='two.sided', var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: Len_Dose0.5 and Len_Dose1.0
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean of x mean of y
## 10.605 19.735
t.test(Len_Dose0.5,Len_Dose2.0,paired=FALSE, alternative='two.sided', var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: Len_Dose0.5 and Len_Dose2.0
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean of x mean of y
## 10.605 26.100
t.test(Len_Dose1.0,Len_Dose2.0,paired=FALSE, alternative='two.sided', var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: Len_Dose1.0 and Len_Dose2.0
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100
Conclusion: Based on these analysis, we have enough statistical evidence to reject the null hypothesis and accept the alternative hypothesis that increasing doses of supplements leads to increases in teeth length. We got this statistical evidence by doing a 2 sided t-test. Thus, there is no need for a one sided t-test as when we reject a null hypothesis using a 2 sided t-test, we will surely reject that null using the right one sided t-test.
In conclusion, increasing the dose of supplement increases the teeth growth length.