Load necessary packages and the dataset in question:

library(ggplot2)
library(dplyr)
library(broom)
library(knitr)

load(url("http://www.openintro.org/stat/data/evals.RData"))
evals <- evals %>%
  select(score, ethnicity, gender, language, age, bty_avg, rank)

Question 1

Exploratory data analysis: Create a visualization that shows the relationship between:

  • \(y\): instructor teaching score
  • \(x\): instructor age

Comment on this relationship.

Answer

# Add your code to create visualization below:
ggplot(data = evals, aes(x = age, y = score)) + 
  geom_point() +
 geom_smooth(method="lm", se=FALSE)

Comment here:

The relationship will be shown in a scatterplot because the two variables, age and score, are both numerical variables.

Question 2

  1. Display the regression table that shows both the 1) fitted intercept and 2) fitted slope of the regression line. Pipe %>% the table into kable(digits=3) to get a cleanly outputted table with 3 significant digits.
  2. Interpret the slope.
  3. For an instructor that is 67 years old, what would you guess that their teaching score would be?

Answer

# Add your code to create regression table below:

lm(score~age, data=evals) %>% 
  tidy() %>%
  kable(digits=3)
term estimate std.error statistic p.value
(Intercept) 4.462 0.127 35.195 0.000
age -0.006 0.003 -2.311 0.021

Answer the two other questions here:

  1. Interpret the slope. # For every increase in 1 in age of a professor, there is an associated increase on average of -0.0059 evaluation score points. (Note that associated does not necessarilly mean causal).
  2. For an instructor that is 67 years old, what would you guess that their teaching score would be? # I would estimate their teaching score to be 4.064.

Question 3

Does there seem to be a systematic pattern in the lack-of-fit of the model? In other words, is there a pattern in the error between the fitted score \(\widehat{y}\) and the observed score \(y\)? Hint:
geom_hline(yintercept=0, col="blue") adds a blue horizontal line at \(y=0\).

Answer

mean(evals$score)

# Add the code necessary to answer this question below:
point_by_point_info <- lm(score~age, data=evals) %>% 
  augment() %>%
  select(score, age, .resid, .fitted)

ggplot(data = point_by_point_info, aes(x = age, y = .resid)) + 
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  geom_hline(yintercept=0, col="blue")

Comment here:

The dots are just on the graph in no particular order or pattern, so there is not a pattern in the error between the fitted score and the observed score. I also noted how the line is on the horizontal line.

Question 4

Say an college administrator wants to model teaching scores using more than one predictor/explantory variable than just age, in particular using the instructor’s gender as well. Create a visualization that summarizes this relationship and comment on the observed relationship.

Answer

# Add your code to create visualization below:
ggplot(data = evals, aes(x = age, y = score, color=gender)) + 
  geom_point() +
 geom_smooth(method="lm", se=FALSE)

Comment here: # For males, it is much less noticable/more steady in that as age increases score decreases compared to females when age increases score decreases by a fair amount.