dplyr assignment

You have learned about how to set up a script, locate and download data, explore that data and summarise the data. Via dplyr, you now have several tools such as select(), slice(), filter() and mutate() that allow you to access data. You know several tools such as group_by(), summarise() and %>% that allow easy summarisation of your data.

Here is your assignment. It involves working with a new data set on cattle weight gain as a function of the diet they are fed (grains) and vitamin supplements to their diet. The data are called growth.csv and are in the folder you’ve downloaded from us with all of the data.

# get the data
cow <- read.csv("../Student Resources/Datasets/growth.csv")

# explore the data frame
str(cow)
## 'data.frame':    48 obs. of  3 variables:
##  $ supplement: Factor w/ 4 levels "agrimore","control",..: 3 3 3 3 2 2 2 2 4 4 ...
##  $ diet      : Factor w/ 3 levels "barley","oats",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ gain      : num  17.4 16.8 18.1 15.8 17.7 ...
glimpse(cow)
## Observations: 48
## Variables: 3
## $ supplement <fctr> supergain, supergain, supergain, supergain, contro...
## $ diet       <fctr> wheat, wheat, wheat, wheat, wheat, wheat, wheat, w...
## $ gain       <dbl> 17.37125, 16.81489, 18.08184, 15.78175, 17.70656, 1...

4 * 3 = 12

# note the use of piping and the definition of 
# the standard error implemented via summarise 
# (sd / sqrt(sample size))
sumDat <- cow %>% 
  group_by(diet, supplement) %>%
  summarise(meanGain = mean(gain),
            seGain = sd(gain)/sqrt(n()))
sumDat
## Source: local data frame [12 x 4]
## Groups: diet [?]
## 
##      diet supplement meanGain    seGain
##    <fctr>     <fctr>    <dbl>     <dbl>
## 1  barley   agrimore 26.34848 0.9187479
## 2  barley    control 23.29665 0.7032491
## 3  barley  supergain 22.46612 0.7710644
## 4  barley  supersupp 25.57530 1.0599015
## 5    oats   agrimore 23.29838 0.6131592
## 6    oats    control 20.49366 0.5056319
## 7    oats  supergain 19.66300 0.3489388
## 8    oats  supersupp 21.86023 0.4132292
## 9   wheat   agrimore 19.63907 0.7099260
## 10  wheat    control 17.40552 0.4604420
## 11  wheat  supergain 17.01243 0.4852821
## 12  wheat  supersupp 19.66834 0.4746443
filter(sumDat, diet == 'barley')
## Source: local data frame [4 x 4]
## Groups: diet [1]
## 
##     diet supplement meanGain    seGain
##   <fctr>     <fctr>    <dbl>     <dbl>
## 1 barley   agrimore 26.34848 0.9187479
## 2 barley    control 23.29665 0.7032491
## 3 barley  supergain 22.46612 0.7710644
## 4 barley  supersupp 25.57530 1.0599015
# use quantile to find the value that 
# marks the 90th percentile
quantile(cow$gain, p=0.9)
##      90% 
## 25.11224
# because dplyr takes the data frame as first
# argument, no need for $
filter(cow, gain >= quantile(gain, p = 0.9))
##   supplement   diet     gain
## 1  supersupp barley 27.79490
## 2  supersupp barley 26.78869
## 3   agrimore barley 26.04248
## 4   agrimore barley 25.28337
## 5   agrimore barley 29.02916