You have learned about how to set up a script, locate and download data, explore that data and summarise the data. Via dplyr, you now have several tools such as select(), slice(), filter() and mutate() that allow you to access data. You know several tools such as group_by(), summarise() and %>% that allow easy summarisation of your data.
Here is your assignment. It involves working with a new data set on cattle weight gain as a function of the diet they are fed (grains) and vitamin supplements to their diet. The data are called growth.csv and are in the folder you’ve downloaded from us with all of the data.
# get the data
cow <- read.csv("../Student Resources/Datasets/growth.csv")
# explore the data frame
str(cow)
## 'data.frame': 48 obs. of 3 variables:
## $ supplement: Factor w/ 4 levels "agrimore","control",..: 3 3 3 3 2 2 2 2 4 4 ...
## $ diet : Factor w/ 3 levels "barley","oats",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ gain : num 17.4 16.8 18.1 15.8 17.7 ...
glimpse(cow)
## Observations: 48
## Variables: 3
## $ supplement <fctr> supergain, supergain, supergain, supergain, contro...
## $ diet <fctr> wheat, wheat, wheat, wheat, wheat, wheat, wheat, w...
## $ gain <dbl> 17.37125, 16.81489, 18.08184, 15.78175, 17.70656, 1...
4 * 3 = 12
# note the use of piping and the definition of
# the standard error implemented via summarise
# (sd / sqrt(sample size))
sumDat <- cow %>%
group_by(diet, supplement) %>%
summarise(meanGain = mean(gain),
seGain = sd(gain)/sqrt(n()))
sumDat
## Source: local data frame [12 x 4]
## Groups: diet [?]
##
## diet supplement meanGain seGain
## <fctr> <fctr> <dbl> <dbl>
## 1 barley agrimore 26.34848 0.9187479
## 2 barley control 23.29665 0.7032491
## 3 barley supergain 22.46612 0.7710644
## 4 barley supersupp 25.57530 1.0599015
## 5 oats agrimore 23.29838 0.6131592
## 6 oats control 20.49366 0.5056319
## 7 oats supergain 19.66300 0.3489388
## 8 oats supersupp 21.86023 0.4132292
## 9 wheat agrimore 19.63907 0.7099260
## 10 wheat control 17.40552 0.4604420
## 11 wheat supergain 17.01243 0.4852821
## 12 wheat supersupp 19.66834 0.4746443
filter(sumDat, diet == 'barley')
## Source: local data frame [4 x 4]
## Groups: diet [1]
##
## diet supplement meanGain seGain
## <fctr> <fctr> <dbl> <dbl>
## 1 barley agrimore 26.34848 0.9187479
## 2 barley control 23.29665 0.7032491
## 3 barley supergain 22.46612 0.7710644
## 4 barley supersupp 25.57530 1.0599015
# use quantile to find the value that
# marks the 90th percentile
quantile(cow$gain, p=0.9)
## 90%
## 25.11224
# because dplyr takes the data frame as first
# argument, no need for $
filter(cow, gain >= quantile(gain, p = 0.9))
## supplement diet gain
## 1 supersupp barley 27.79490
## 2 supersupp barley 26.78869
## 3 agrimore barley 26.04248
## 4 agrimore barley 25.28337
## 5 agrimore barley 29.02916