# Load in the data
data("ToothGrowth")
# Learn about the data
#?ToothGrowth
# Structure of the dataset
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# Look at the data
#View(ToothGrowth)
library(tidyverse)
library(ggplot2)

Question 1

What do the rows of this dataset represent?

The rows of this dataset represent 1 of the 60 guinea pig test subjects.

Question 2

What do the columns of this dataset represent? Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.

Len represents the length of odontoblasts. Supp represents which supplement the guinea pig was given, either orange juice(OJ) or ascorbic acid(VC). Dose represents how many mg of vitamin C the guinea pig was given. Supplement is a categorical variable, it is not ordinal. Length and dose are numerical variables. Length is continuous and dose is discrete.

Question 3

What are the response and explanatory variables in this study?

The response variable is the length of odontoblasts in the guinea pigs. The explanatory variables are the type of supplement and the dose of the supplement.

Question 4

Create a hypothesis about supplement treatment and dosage levels, without first looking at the data.

If a guinea pig is given vitamin C from orange juice it will have bigger teeth than a guinea pig who got less vitamin C from orange juice, or a guinea pig that gets its vitamin C from ascorbic acid.

Question 5

Use ggplot to create a side-by-side boxplot, which illustrates the distribution of each supplement treatment and allows for both visual comparison across and within treatments.

I’ve created a boxplot to compare type of supplement to length of odontoblasts

ggplot(ToothGrowth,aes(x=supp, y=len, color=supp))+geom_boxplot()

Question 6

Now add facets to your data to compare across dosage as well

Added facets to compare specific doses of the supplements in a clearer manner.

ggplot(ToothGrowth,aes(x=supp, y=len, color=supp))+geom_boxplot()+
    facet_wrap(~dose)

Question 7

Question 8

Did you see anything surprising? Does your hypothesis appear to be supported? Note that this is not a formal hypothesis test but rather an exploration.

My hypothesis is supported by the data. A higher supplement dose always improves teeth growth and orange juice gives more consistent, better results than ascorbic acid. I found it surprising that a 2 mg dose of ascorbic acid actually had the same median of toothgrowth as a 2 mg dose of orange juice. I now wonder if a 2.5 mg dose of ascorbic acid would surpass a 2.5 mg dose of orange juice.