Your homework submission should include the markdown file and the knitted document (your choice of html, pdf or Word).
ChickWeight.
You need to load the data in R using data(ChickWeight). The
ChickWeight data frame has 578 rows and 4 columns from an
experiment on the effect of diet on early growth of chicks. Use
?ChickWeight to get more information on every one of the
variables.data("ChickWeight")
?ChickWeight
dplyr package to identify how many chicks have a complete
set of weight measurements and how many measurements there are in the
incomplete cases. Extract a subset of the data for all chicks with
complete information and name the data set complete. (Hint:
you might want to use mutate to introduce a helper variable
consisting of the number of observations)complete <- ChickWeight %>%
group_by(Chick) %>%
tally() %>%
filter(n==12)
complete <- ChickWeight %>%
filter(Chick%in%complete$Chick)
weightgain.complete <- complete %>%
group_by(Chick) %>%
mutate(weightgain = weight - weight[Time == 0])
ggplot2 package create side-by-side boxplots
of weightgain by Diet for day 21. Describe the
relationship in 2-3 sentences. Change the order of the categories in the
Diet variable such that the boxplots are ordered by median
weightgain.complete %>%
filter(Time == 21) %>%
ggplot(aes(x=weightgain, y=reorder(Diet, weightgain, fun=median))) + geom_boxplot()
ggplot2 package create a plot with
Time along the x axis and weight in the y
axis. Facet by Diet. Use a point layer and also draw one
line for each Chick. Color by Diet. Include
the legend on the bottom (check theme).ggplot(data=complete, aes(x=Time, y=weight)) + geom_point() +
geom_line(aes(group=Chick, color=Diet)) +
facet_wrap(~Diet) +
theme_dark() + theme(legend.position="bottom")
# The median seem to be somewhat similar between diet 2 and 4 while 3 surpasses them all. It is interesting that Diet 2 gave the broadest results, but was one of the furthest from being the best diet.
Chick with the maximum weight at
Time 21 for each of the diets. Redraw the previous plot
with only these 4 chicks (and don’t facet).complete%>%
group_by(Diet) %>%
filter(weight==max(weight), Time==21)
## # A tibble: 4 × 5
## # Groups: Diet [4]
## weight Time Chick Diet weightgain
## <dbl> <dbl> <ord> <fct> <dbl>
## 1 305 21 7 1 264
## 2 331 21 21 2 291
## 3 373 21 35 3 332
## 4 322 21 48 4 283
ChickWeight %>%
filter(Chick==7|Chick==21|Chick==35|Chick==48) %>%
ggplot(aes(x=Time, y=weight)) + geom_point() +
geom_line(aes(group=Chick, color=Diet)) +
facet_wrap(~Diet) +
theme_dark() + theme(legend.position="bottom")
ChickWeight %>%
group_by(Diet, Time) %>%
summarise(output=mean(weight)) %>%
ggplot(aes(x=Time, y=output)) + geom_point() +
geom_line(aes(color=Diet)) +
theme_dark() + theme(legend.position="bottom")
## `summarise()` has grouped output by 'Diet'. You can override using the
## `.groups` argument.
# The Diet vs Time because you can see how they all differ across the same plane. It further illustrates that diet 3 produced the best results and shows around what time the results started to really differ.
ggplot2 and includes a
horizontal line in zero. The arguments for this function should be
x (explanatory variable), y (response
variable) and col for the horizontal line color. The x
label should be Fitted Value and the y label should be Residuals. data(LifeCycleSavings), use x = sr
y = ddpi and col = red.data("LifeCycleSavings")
?LifeCycleSavings
residPlot <- function(pop15,sr,col){
lmFit <- lm(pop15~sr) +
ggplot(aes(x=pop15, y=sr))
return(plt)}
# residPlot(x = LifeCycleSavings$sr,y = LifeCycleSavings$ddpi,col = "red") would not let me knit the document. R said the function was unrecognizable