Introduction

This report gives you interactive graphs to understand the effect of different supplements and doses in tooth growth.

Executive Summary

We will use the dataset Tooth Growth available in the datasets package in R. The first step will be to check the structure of the data and to understand what variables we have, and what exactly has been the pattern observed for length of teeth, due to two predictor variables - Supplements and Doses. Next, we will build two predictor models and choose the one that fits better.

Exploring the Data

There are 60 records in this table and there are 2 types of supplements, OJ and VC, and three levels of doses, 0.5, 1.0, and 2.0, against which the tooth length have been observed. You can check more about the data in the APPENDIX.

In this analysis we will check:

  1. Which variable results in longer tooth growth

  2. Which predictor variable has significant effect on tooth growth

Figure 1

Figure 2

Figure 2

Observation

Figure 1 shows us that the tooth growth was maximum with OJ supplement and with Dose 2. From Figure 2 we can see the relative change in tooth growth, and we can deduce that the relative change in tooth growth has been approximately equal for OJ and VC with different doses.

We will now predict tooth growth by fitting two predictor models, and choosing the one that fits better.

Plotting the Predictor Model

Check the interactive plots here, and come back to this document for the analysis, based on which the best fitted model is selected.

MODEL 1

Model 1 is selected with one predictor variable supplement. We will plot the model graphs and check the p-values to see if supplement has any significant effect on the Length of teeth.

               Estimate Std. Error  t value     Pr(>|t|)
factor(supp)OJ 20.66333    1.36602 15.12667 8.600164e-22
factor(supp)VC 16.96333    1.36602 12.41807 5.601358e-18

Observations From the results above, we see that both Supplements have low p-value(<0.5), hence we can say that both the factors OJ and VC are statistically significant for the growth of teeth.

But from the four diagnostic plots, we see a non-linear pattern in ‘Residual vs Fitted’ graph. This means that there is a non-linear relationship between the variables. This model does not capture it and is left out in the residuals.

The ‘Scale Location’ plot is also non-linear. This implies that the residuals are not spread equally between the range of the predictor. The residuals are not homoscedastic.

This model definitely needs some further investigation and comparison with other models.

MODEL 2

Model 2 is selected for checking the affect of two predictor variables - Supplement and Dose on the Length of teeth.

model2 <- lm(len ~ factor(supp)+factor(dose) -1, data = ToothGrowth)
summary(model2)$coef
               Estimate Std. Error   t value     Pr(>|t|)
factor(supp)OJ   12.455  0.9882795 12.602710 5.490223e-18
factor(supp)VC    8.755  0.9882795  8.858830 3.052224e-12
factor(dose)1     9.130  1.2103903  7.543022 4.382799e-10
factor(dose)2    15.495  1.2103903 12.801656 2.851970e-18

Observations In this model also, we see low p-values (<0.5) for all the factors. Hence we can say that both Supplement and Dose are statistically significant for the growth of teeth.

The diagnostic plots of this model give us a better picture. The ‘Residual vs Fitted’ and the ‘Scale Location’ graphs are fairly linear.

This means that Model 2 is capturing relationship between the predictor variables and explaining it with homoscedastic residuals. But we cannot yet conclude that Model 2 is better.

Identifying the Model

Here is the analysis of variance between the two models. We are testing with the null hypothesis that Model 1 and Model 2 fit equally.

anova(model1, model2)
Analysis of Variance Table

Model 1: len ~ factor(supp) - 1
Model 2: len ~ factor(supp) + factor(dose) - 1
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     58 3246.9                                  
2     56  820.4  2    2426.4 82.811 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion

Since the p-value in analysis of variance is low (<0.5), we reject the null hypothesis and conclude the both models do not fit equally. Furthermore we now have the following conclusive evidences:

  1. As seen from ANOVA table, the Residual Sum of Squares (RSS) is much lower for Model 2 than for Model 1.

  2. From our diagnostic plots earlier we have seen that Model 2 has captured the relationship better between the predictor variables and the response variable.

  3. The summary statistics of Model 2 also shows larger t-value for factors OJ and dose 2. A large t-value simply represents an evidence against the null hypothesis that there is no significant relationship between the predictor variables and the response variables.

Therefore, for now we can conclude that Model 2 is a better fit.

A quick look at these plots with the existing data, shows that OJ and Dose 2 have helped in better tooth growth. At Dose 2, both OJ and VC converge to have the same effect on the Length of tooth growth.

Figure 5

Figure 5

Figure 6

Figure 6

Figure 7

Figure 7

APPENDIX

'data.frame':   60 obs. of  3 variables:
 $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
 $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
 $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
   len supp dose
1  4.2   VC  0.5
2 11.5   VC  0.5
3  7.3   VC  0.5
4  5.8   VC  0.5
5  6.4   VC  0.5
6 10.0   VC  0.5

OJ VC 
30 30 
[1] 60  3