Tooth Growth Data Analysis

Overview

In this project, we will analyze the Tooth Growth Data that comes with R. As we review the characteristics of the data, we will also provide statistics and confidence intervals on the data.

Let’s load and review the data and the structure.

data("ToothGrowth")
str(ToothGrowth) # Provides information about the structure of the data

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Quick peek into the data reveal that we have 3 variables (columns); len (length), supp (supplement) and dose and 60 observations (rows).

Let’s check if all the elements are valid or do we have missing elements.

any(is.na(ToothGrowth)) # Checks if there is any missing 'NA' values

## [1] FALSE

The result indicates that we don’t have any ‘NA’ values in the dataset.

summary(ToothGrowth) # Gives statistical information about the data

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

The supp field has two category levels; OJ and VC levels. 30 values each. At this point, the natural question is if there is any correlation between len and dose. We can create a scatter plot to compare the average of each supplement for every dose.

d<- ToothGrowth
# Plot the data points
p<- ggplot(d, aes(x=dose, y=len, color=supp)) +geom_point()
p<- p + labs(title="Length vs. Dose by Supplement") + labs(x="Dose")+labs(y="Length")
p <- p+ theme(plot.title = element_text(hjust = 0.5))

# Group data by supp and dose then take average
d %>% group_by(supp, dose) %>% summarise(mean(len)) -> d_summary
colnames(d_summary)<- c("supp","dose","len")

# Plot the trend line
p<- p+ geom_line(data=d_summary,aes(group=supp,colour=supp))
print (p)

Looking at this plot, for doses less than 2, OJ treatment is better than Vitamin C treatment. The result delivers better average. For dose=2, either treatment delivers the same result in the population.

Let’s look at the distribution of this result in a panel plot to compare the results.

p<- ggplot(aes(x = supp, y = len), data = ToothGrowth) 
p <- p + geom_boxplot(aes(fill = supp)) + facet_wrap(~ dose)
p<- p+ labs(title="Treatment Efficacy by Dose and Method") + labs(x="Supplement")+labs(y="Length")
p <- p+ theme(plot.title = element_text(hjust = 0.5))

print(p)

Based on the plot, we can state that for lower dose amounts less than 2 units, OJ has a distinct advantage over VC to support the tooth length.

Confidence Intervals for Dose and Supplement

Dose as a factor

In this section, we will look at dosage and supplement as factors supporting tooth growth.

To compare growth by dose, we have 3 H0 cases. The alternative hypotesis would be not equal. 1. mu0.5 - mu1=0 2. mu0.5 - mu2=0 3. mu1-mu2=0

# Separate the data into groups
ds1<- subset(ToothGrowth,dose>=0.5 & dose <=1)
ds2<-subset(ToothGrowth, dose==0.5 | dose==2)
ds3<-subset(ToothGrowth, dose>=1 & dose<=2)

# Apply t-test for small samples
# For mu0.5 - mu1 =0
t.test(data=ds1, len ~ dose, var.equal=FALSE, paired=FALSE )

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

# For mu0.5 - mu1 =2
t.test(data=ds2, len ~ dose, var.equal=FALSE, paired=FALSE )

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

# For mu0=1 - mu1 =2
t.test(data=ds3, len ~ dose, var.equal=FALSE, paired=FALSE )

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

For each group, 95% the confidence interval is as follows.

dose>=0.5 & dose <=1) ==> -11.983781 -6.276219
dose==0.5 | dose==2) ==> -18.15617 -12.83383
dose>=1 & dose<=2) ==> 19.735 26.100

If we compare the group mean values for 0.5, 1 and 2 to the confidence interval, we can easily reject the null hypothesis. This means, There mean of these groups are different from each other and dose has a strong effect on tooth growth.

Supplement as a factor

If we run the same t-test on the sample data to check the effect of supplement(supp), we find the following confidence interval.

 t.test(len ~ supp, paired = F, var.equal = F, data = ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

We see that both group mean values are outside of the confidence interval. Therefore, we reject the null hypotheses stating that the mean of these groups are different from each other. Delivery method does not have an effect on the tooth growth.

Conclusions

Based on the calculations above, delivery method for the vitamin C has no effect on tooth growth. However, dose has a strong effect.

Assumptions

It is assumed that data come from normal distribution with different mean and variance
Sample variances are not equal
It is assumed that data is not biased