While doing a course on Coursera I came across this data set, which was meant to give hands on experience of working on R package “DPLYR”. The objective of this project is do explanatory data analysis using this data and answer following questions :
To Check the effectiveness Of supplement & Dosage ?
To get hands on experience on using DPLYR, GGPLOT, TIDYVERSE ?
later on after doing research I came to know that this data is collected by “E. W. Crampton” and a part of research essay titled “THE GROWTH OF ODONTOBLASTS OF THE INCISOR TOOTH AS A CRITERION OF THE VITAMIN C INTAKE OF THE GUNIA PIG” and was published in THE JOURNAL OF NUTRITION ( Volume 33 Issue 5 May 1947 ) For more details check http://jn.nutrition.org/content/33/5/491.full.pdf.
Data set used in this analysis originaly Include :
len = Tooth length
supp = Vitamin C Supplement used (“OJ” (Orange Juice), “VC” (Ascorbic Acid) )
dose = 0.5, 1, and 2 mg/day
ROCCC analysis
Reliable
Original
Comprehensive
Current
Cited
Data set checks yes in all criteria of ROCCC analysis.
All the data used in this analysis are inbuilt and can be Viewed and analyzed using command data(“ToothGrowth”)
The packages used in this data analysis are : tidyverse, dplyr, ggplot
and installed using : install.packages(“package_name”)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.0 v purrr 0.3.5
## v tibble 3.1.8 v dplyr 1.0.10
## v tidyr 1.2.1 v stringr 1.5.0
## v readr 2.1.3 v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(ggplot2)
Loading Data set in R Studio
data("ToothGrowth")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
glimpse(ToothGrowth)
## Rows: 60
## Columns: 3
## $ len <dbl> 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16.5, 16.5,~
## $ supp <fct> VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, V~
## $ dose <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, ~
ToothGrowth %>% group_by(supp, dose) %>% summarise(Number_of_dose = n())
## `summarise()` has grouped output by 'supp'. You can override using the
## `.groups` argument.
## # A tibble: 6 x 3
## # Groups: supp [2]
## supp dose Number_of_dose
## <fct> <dbl> <int>
## 1 OJ 0.5 10
## 2 OJ 1 10
## 3 OJ 2 10
## 4 VC 0.5 10
## 5 VC 1 10
## 6 VC 2 10
ToothGrowth %>% group_by(supp) %>% summarise(Number_of_dose = n(), maximum_tooth_length = max(len), Average_tooth_length = mean(len), Minimum_tooth_length = min(len))
## # A tibble: 2 x 5
## supp Number_of_dose maximum_tooth_length Average_tooth_length Minimum_tooth~1
## <fct> <int> <dbl> <dbl> <dbl>
## 1 OJ 30 30.9 20.7 8.2
## 2 VC 30 33.9 17.0 4.2
## # ... with abbreviated variable name 1: Minimum_tooth_length
ToothGrowth %>% group_by(dose) %>% summarise(Number_of_dose = n(), maximum_tooth_length = max(len), Average_tooth_length = mean(len), Minimum_tooth_length = min(len))
## # A tibble: 3 x 5
## dose Number_of_dose maximum_tooth_length Average_tooth_length Minimum_tooth~1
## <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.5 20 21.5 10.6 4.2
## 2 1 20 27.3 19.7 13.6
## 3 2 20 33.9 26.1 18.5
## # ... with abbreviated variable name 1: Minimum_tooth_length
ToothGrowth %>% group_by(dose, supp) %>% summarise(Number_of_dose = n(), maximum_tooth_length = max(len), Average_tooth_length = mean(len), Minimum_tooth_length = min(len))
## `summarise()` has grouped output by 'dose'. You can override using the
## `.groups` argument.
## # A tibble: 6 x 6
## # Groups: dose [3]
## dose supp Number_of_dose maximum_tooth_length Average_tooth_length Minimum~1
## <dbl> <fct> <int> <dbl> <dbl> <dbl>
## 1 0.5 OJ 10 21.5 13.2 8.2
## 2 0.5 VC 10 11.5 7.98 4.2
## 3 1 OJ 10 27.3 22.7 14.5
## 4 1 VC 10 22.5 16.8 13.6
## 5 2 OJ 10 30.9 26.1 22.4
## 6 2 VC 10 33.9 26.1 18.5
## # ... with abbreviated variable name 1: Minimum_tooth_length
Orange_juice <- filter(ToothGrowth, supp == "OJ")
summary(Orange_juice$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 15.53 22.70 20.66 25.73 30.90
Ascorbic_acid <- filter(ToothGrowth, supp == "VC")
summary(Ascorbic_acid$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 11.20 16.50 16.96 23.10 33.90
ToothGrowth %>% group_by(dose, supp) %>% summarise(avg_tooth_len = mean(len),.groups = 'drop') %>% ggplot(aes(x = dose, y = avg_tooth_len, fill = supp)) + geom_col(position = "dodge") +labs(title = "Average Tooth Length VS. Dosage & supp" )
ggplot(data = ToothGrowth) + geom_smooth(mapping = aes(x = dose, y = len)) + geom_point(mapping = aes(x = dose, y= len, color = supp))+labs(title = "Tooth Length VS. Dosage & supp" ) # As 1.5 mg has supplement dosage has not been given using different plot
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 0.4925
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.5075
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 2.5631e-16
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 2.2726
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at
## 0.4925
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 1.5075
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
## number 2.5631e-16
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other near
## singularities as well. 2.2726
ToothGrowth$dose=as.factor(ToothGrowth$dose)
ggplot(data = ToothGrowth) +geom_boxplot(aes(x = dose, y = len, fill = dose )) + facet_wrap(~supp)
After Viewing and analyzing this data set I come to know that :
The guinea pigs in the study were given supplement of Vitamin C in two form Orange Juice (OJ) and Ascorbic Acid (vc), in dosages of 0.5mg, 1mg, and 2mg. As the data shows, the Gunia pig experienced a growth spurt in their tooth length, particularly when their diet was supplemented with Vitamin C. The maximum tooth length reached a towering 33.9mm for VC and 30.9mm for OJ, while the minimum was 4.2mm for VC and 8.2mm for OJ. On average, VC-fed guinea pigs had tooth length of 17.0mm while OJ-fed guinea pigs had 20.7mm, indicating that OJ was more effective in promoting tooth growth at lower dosages. But as the dosage increased to 2mg, OJ and VC had similar effectiveness in promoting tooth growth. A clear indication that a balanced diet is essential for maintaining good health.