Load necessary packages and the dataset in question:
library(ggplot2)
library(dplyr)
library(broom)
library(knitr)
load(url("http://www.openintro.org/stat/data/evals.RData"))
evals <- evals %>%
select(score, ethnicity, gender, language, age, bty_avg, rank)Exploratory data analysis: Create a visualization that shows the relationship between:
Comment on this relationship.
# Add your code to create visualization below:
ggplot(data = evals, aes(x = age, y = score)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)Comment here:
%>% the table into kable(digits=3) to get a cleanly outputted table with 3 significant digits.# Add your code to create regression table below:
lm(score~age, data=evals) %>%
tidy() %>%
kable(digits=3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 4.462 | 0.127 | 35.195 | 0.000 |
| age | -0.006 | 0.003 | -2.311 | 0.021 |
Answer the two other questions here:
Does there seem to be a systematic pattern in the lack-of-fit of the model? In other words, is there a pattern in the error between the fitted score \(\widehat{y}\) and the observed score \(y\)? Hint:
geom_hline(yintercept=0, col="blue") adds a blue horizontal line at \(y=0\).
mean(evals$score)
# Add the code necessary to answer this question below:
point_by_point_info <- lm(score~age, data=evals) %>%
augment() %>%
select(score, age, .resid, .fitted)
ggplot(data = point_by_point_info, aes(x = age, y = .resid)) +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
geom_hline(yintercept=0, col="blue")Comment here:
Say an college administrator wants to model teaching scores using more than one predictor/explantory variable than just age, in particular using the instructor’s gender as well. Create a visualization that summarizes this relationship and comment on the observed relationship.
# Add your code to create visualization below:
ggplot(data = evals, aes(x = age, y = score, color=gender)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)Comment here: # For males, it is much less noticable/more steady in that as age increases score decreases compared to females when age increases score decreases by a fair amount.