Load necessary packages and the dataset in question:

library(ggplot2)
library(dplyr)
library(broom)
library(knitr)

load(url("http://www.openintro.org/stat/data/evals.RData"))
evals <- evals %>%
  select(score, ethnicity, gender, language, age, bty_avg, rank)

Question 1

Exploratory data analysis: Create a visualization that shows the relationship between:

$y$: instructor teaching score
$x$: instructor age

Comment on this relationship.

Answer

# Add your code to create visualization below:
ggplot(data = evals, aes(x = age, y = score)) + 
  geom_point() +
 geom_smooth(method="lm", se=FALSE)

Comment here:

The relationship will be shown in a scatterplot because the two variables, age and score, are both numerical variables.

Question 2

Display the regression table that shows both the 1) fitted intercept and 2) fitted slope of the regression line. Pipe %>% the table into kable(digits=3) to get a cleanly outputted table with 3 significant digits.
Interpret the slope.
For an instructor that is 67 years old, what would you guess that their teaching score would be?

Answer

# Add your code to create regression table below:

lm(score~age, data=evals) %>% 
  tidy() %>%
  kable(digits=3)

term	estimate	std.error	statistic	p.value
(Intercept)	4.462	0.127	35.195	0.000
age	-0.006	0.003	-2.311	0.021

Answer the two other questions here:

Interpret the slope. # For every increase in 1 in age of a professor, there is an associated increase on average of -0.0059 evaluation score points. (Note that associated does not necessarilly mean causal).
For an instructor that is 67 years old, what would you guess that their teaching score would be? # I would estimate their teaching score to be 4.064.

Question 3

Does there seem to be a systematic pattern in the lack-of-fit of the model? In other words, is there a pattern in the error between the fitted score $\widehat{y}$ and the observed score $y$? Hint:
geom_hline(yintercept=0, col="blue") adds a blue horizontal line at $y=0$.

Answer

mean(evals$score)

# Add the code necessary to answer this question below:
point_by_point_info <- lm(score~age, data=evals) %>% 
  augment() %>%
  select(score, age, .resid, .fitted)

ggplot(data = point_by_point_info, aes(x = age, y = .resid)) + 
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  geom_hline(yintercept=0, col="blue")

Comment here:

The dots are just on the graph in no particular order or pattern, so there is not a pattern in the error between the fitted score and the observed score. I also noted how the line is on the horizontal line.

Question 4

Say an college administrator wants to model teaching scores using more than one predictor/explantory variable than just age, in particular using the instructor’s gender as well. Create a visualization that summarizes this relationship and comment on the observed relationship.

Answer

# Add your code to create visualization below:
ggplot(data = evals, aes(x = age, y = score, color=gender)) + 
  geom_point() +
 geom_smooth(method="lm", se=FALSE)

Comment here: # For males, it is much less noticable/more steady in that as age increases score decreases compared to females when age increases score decreases by a fair amount.

Problem Set 08

Hannah Fox

2017-10-26

Question 1

Answer

The relationship will be shown in a scatterplot because the two variables, age and score, are both numerical variables.

Question 2

Answer

Question 3

Answer

The dots are just on the graph in no particular order or pattern, so there is not a pattern in the error between the fitted score and the observed score. I also noted how the line is on the horizontal line.

Question 4

Answer