Load necessary packages and the dataset in question:

library(ggplot2)
library(dplyr)
library(broom)
library(knitr)

load(url("http://www.openintro.org/stat/data/evals.RData"))
evals <- evals %>%
  select(score, ethnicity, gender, language, age, bty_avg, rank)

Question 1

Exploratory data analysis: Create a visualization that shows the relationship between:

  • \(y\): instructor teaching score
  • \(x\): instructor age

Comment on this relationship.

Answer

# Add your code to create visualization below:
ggplot(data=evals, mapping=aes(x=age,y=score))+
  geom_point()+
  labs(x="Instructor Age", y="Teaching Score")+
  geom_smooth(method = "lm", se=FALSE)

Comment here:

From the graph I just created, by examing the regression line, we can witness that there is a slight negative relationship between Teaching Score and Instructor Age. In other words, the older an instructor is, the more likely he or she will get a relatively lower score.

Question 2

  1. Display the regression table that shows both the 1) fitted intercept and 2) fitted slope of the regression line. Pipe %>% the table into kable(digits=3) to get a cleanly outputted table with 3 significant digits.
  2. Interpret the slope.
  3. For an instructor that is 67 years old, what would you guess that their teaching score would be?

Answer

# Add your code to create regression table below:
lm(score~age, data=evals)%>%
  tidy()%>%
  kable(digits=3)
term estimate std.error statistic p.value
(Intercept) 4.462 0.127 35.195 0.000
age -0.006 0.003 -2.311 0.021

Answer the two other questions here:

1. We can see from the table that the slope of the regression line is -0.006. A negative slope means that y value will decrease as x value increases.

2. From the table, we can conclude with a function y=-0.006x+4.462. Let’s put x=67 into the function, we get 4.06. I will say that for a 67-year-old instructor, he or she probably wil get a score of 4.06.

Question 3

Does there seem to be a systematic pattern in the lack-of-fit of the model? In other words, is there a pattern in the error between the fitted score \(\widehat{y}\) and the observed score \(y\)? Hint:
geom_hline(yintercept=0, col="blue") adds a blue horizontal line at \(y=0\).

Answer

# Add the code necessary to answer this question below:
point_by_point_info <- lm(score~age, data=evals) %>%
  augment() %>%
  select(score,age, .fitted, .resid)

ggplot(point_by_point_info, aes(x=age, y=.resid))+
  geom_point()+
  geom_hline(yintercept = 0, col="blue")+
  labs(x="Age", y ="Residual")

Comment here:

From the grapbh I created above, in my opinion, there is not a systmatic pattern in the model.

Question 4

Say an college administrator wants to model teaching scores using more than one predictor/explantory variable than just age, in particular using the instructor’s gender as well. Create a visualization that summarizes this relationship and comment on the observed relationship.

Answer

# Add your code to create visualization below:
ggplot(data=evals, mapping=aes(x=age, y=score, color=gender))+
  geom_point()+
  labs(x="Instructor Age", y="Teaching Score")+
  geom_smooth(method = "lm", se=FALSE)

Comment here:

Firstly, the regression line for both male and female instructors has a negative slope, but we can notice that the slope for female

instructors is much more steep than the one for male. In my opinion, the female instructors suffer from a higher impact of age than

their male counterparts.