Introduction

While doing a course on Coursera I came across this data set, which was meant to give hands on experience of working on R package “DPLYR”. The objective of this project is do explanatory data analysis using this data and answer following questions :

  1. To Check the effectiveness Of supplement & Dosage ?

  2. To get hands on experience on using DPLYR, GGPLOT, TIDYVERSE ?

Data Set & Metadata

later on after doing research I came to know that this data is collected by “E. W. Crampton” and a part of research essay titled “THE GROWTH OF ODONTOBLASTS OF THE INCISOR TOOTH AS A CRITERION OF THE VITAMIN C INTAKE OF THE GUNIA PIG” and was published in THE JOURNAL OF NUTRITION ( Volume 33 Issue 5 May 1947 ) For more details check http://jn.nutrition.org/content/33/5/491.full.pdf.

Data set used in this analysis originaly Include :

  1. len = Tooth length

  2. supp = Vitamin C Supplement used (“OJ” (Orange Juice), “VC” (Ascorbic Acid) )

  3. dose = 0.5, 1, and 2 mg/day

ROCCC analysis

  1. Reliable

  2. Original

  3. Comprehensive

  4. Current

  5. Cited

Data set checks yes in all criteria of ROCCC analysis.

All the data used in this analysis are inbuilt and can be Viewed and analyzed using command data(“ToothGrowth”)

Installing Required Packages

The packages used in this data analysis are : tidyverse, dplyr, ggplot

and installed using : install.packages(“package_name”)

Loading Required Packages

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.0      v purrr   0.3.5 
## v tibble  3.1.8      v dplyr   1.0.10
## v tidyr   1.2.1      v stringr 1.5.0 
## v readr   2.1.3      v forcats 0.5.2 
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)

Loading Required Packages

Loading Data set in R Studio

data("ToothGrowth")

Checking Column Names

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Checking Top 6 row

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Checking Column names & Dataframe

glimpse(ToothGrowth)
## Rows: 60
## Columns: 3
## $ len  <dbl> 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16.5, 16.5,~
## $ supp <fct> VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, V~
## $ dose <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, ~

Analyze

Grouping Gunia Pig by Vitamin C supplement and Dose and Counting number of Dose

ToothGrowth %>% group_by(supp, dose) %>% summarise(Number_of_dose = n())
## `summarise()` has grouped output by 'supp'. You can override using the
## `.groups` argument.
## # A tibble: 6 x 3
## # Groups:   supp [2]
##   supp   dose Number_of_dose
##   <fct> <dbl>          <int>
## 1 OJ      0.5             10
## 2 OJ      1               10
## 3 OJ      2               10
## 4 VC      0.5             10
## 5 VC      1               10
## 6 VC      2               10

Calulating Maximum, Minimum and Average Tooth Length and grouping them by Vitamin C supplement used

ToothGrowth %>% group_by(supp) %>% summarise(Number_of_dose = n(), maximum_tooth_length = max(len), Average_tooth_length = mean(len), Minimum_tooth_length = min(len))
## # A tibble: 2 x 5
##   supp  Number_of_dose maximum_tooth_length Average_tooth_length Minimum_tooth~1
##   <fct>          <int>                <dbl>                <dbl>           <dbl>
## 1 OJ                30                 30.9                 20.7             8.2
## 2 VC                30                 33.9                 17.0             4.2
## # ... with abbreviated variable name 1: Minimum_tooth_length

Calulating Maximum, Minimum and Average Tooth Length and grouping them by Given Dose

ToothGrowth %>% group_by(dose) %>% summarise(Number_of_dose = n(), maximum_tooth_length = max(len), Average_tooth_length = mean(len), Minimum_tooth_length = min(len))
## # A tibble: 3 x 5
##    dose Number_of_dose maximum_tooth_length Average_tooth_length Minimum_tooth~1
##   <dbl>          <int>                <dbl>                <dbl>           <dbl>
## 1   0.5             20                 21.5                 10.6             4.2
## 2   1               20                 27.3                 19.7            13.6
## 3   2               20                 33.9                 26.1            18.5
## # ... with abbreviated variable name 1: Minimum_tooth_length

Calulating Maximum, Minimum and Average Tooth Length and grouping them by Given Dose & supplement used

ToothGrowth %>% group_by(dose, supp) %>% summarise(Number_of_dose = n(), maximum_tooth_length = max(len), Average_tooth_length = mean(len), Minimum_tooth_length = min(len))
## `summarise()` has grouped output by 'dose'. You can override using the
## `.groups` argument.
## # A tibble: 6 x 6
## # Groups:   dose [3]
##    dose supp  Number_of_dose maximum_tooth_length Average_tooth_length Minimum~1
##   <dbl> <fct>          <int>                <dbl>                <dbl>     <dbl>
## 1   0.5 OJ                10                 21.5                13.2        8.2
## 2   0.5 VC                10                 11.5                 7.98       4.2
## 3   1   OJ                10                 27.3                22.7       14.5
## 4   1   VC                10                 22.5                16.8       13.6
## 5   2   OJ                10                 30.9                26.1       22.4
## 6   2   VC                10                 33.9                26.1       18.5
## # ... with abbreviated variable name 1: Minimum_tooth_length

Filtering Toothgrowth data where Supplement is Orange Juice

Orange_juice <- filter(ToothGrowth, supp == "OJ")

Statistical summary of data where Supplement is Orange Juice

summary(Orange_juice$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.20   15.53   22.70   20.66   25.73   30.90

Filtering Toothgrowth data where Supplement is Ascorbic Acid (vc)

Ascorbic_acid <- filter(ToothGrowth, supp == "VC")

Statistical summary of data where Supplement is Ascorbic Acid (vc)

summary(Ascorbic_acid$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   11.20   16.50   16.96   23.10   33.90

Checking effectiveness of Supplements by Plotting : Average Tooth Length Vs. Dose

ToothGrowth %>% group_by(dose, supp) %>% summarise(avg_tooth_len = mean(len),.groups = 'drop') %>% ggplot(aes(x = dose, y = avg_tooth_len, fill = supp)) + geom_col(position = "dodge") +labs(title = "Average Tooth Length VS. Dosage & supp" )

Checking effectiveness of Supplement by ploting : Tooth Length VS. Dosage & supp

ggplot(data = ToothGrowth) + geom_smooth(mapping = aes(x = dose, y = len)) + geom_point(mapping = aes(x = dose, y= len, color = supp))+labs(title = "Tooth Length VS. Dosage & supp" ) # As 1.5 mg has supplement dosage has not been given using different plot
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 0.4925
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.5075
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 2.5631e-16
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 2.2726
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 0.4925
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 1.5075
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 2.5631e-16
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 2.2726

Checking effectiveness of Supplement by ploting : Tooth Length VS. Dosage & supp

ToothGrowth$dose=as.factor(ToothGrowth$dose)
ggplot(data = ToothGrowth) +geom_boxplot(aes(x = dose, y = len, fill = dose  )) + facet_wrap(~supp)

Conclusion

After Viewing and analyzing this data set I come to know that :

The guinea pigs in the study were given supplement of Vitamin C in two form Orange Juice (OJ) and Ascorbic Acid (vc), in dosages of 0.5mg, 1mg, and 2mg. As the data shows, the Gunia pig experienced a growth spurt in their tooth length, particularly when their diet was supplemented with Vitamin C. The maximum tooth length reached a towering 33.9mm for VC and 30.9mm for OJ, while the minimum was 4.2mm for VC and 8.2mm for OJ. On average, VC-fed guinea pigs had tooth length of 17.0mm while OJ-fed guinea pigs had 20.7mm, indicating that OJ was more effective in promoting tooth growth at lower dosages. But as the dosage increased to 2mg, OJ and VC had similar effectiveness in promoting tooth growth. A clear indication that a balanced diet is essential for maintaining good health.