Introduction

We will continue working with the course evaluation data from the previous in-class assignment. If you need to reference that document see https://rpubs.com/shawnsanto/ica-03-26-19.

We will analyze what goes into course evaluations and how certain variables effect the overall score.

To get started, load packages tidyverse and broom. Install any packages with code install.packages("package_name").


library(tidyverse)
library(broom)

Data

The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance. Each row in evals contains a different course and the columns represent variables about the courses and professors.

Use read_csv() to read in the data and save it as an object named evals. The data is available on Google Classroom.

evals <- read_csv("evals-mod.csv")

Data dictionary

Variable Description
score Average professor evaluation score: (1) very unsatisfactory - (5) excellent
rank Rank of professor: teaching, tenure track, tenure
ethnicity Ethnicity of professor: not minority, minority
gender Gender of professor: female, male
language Language of school where professor received education: english or non-english
age Age of professor
cls_perc_eval Percent of students in class who completed evaluation
cls_did_eval Number of students in class who completed evaluation
cls_students Total number of students in class
cls_level Class level: lower, upper
cls_profs Number of professors teaching sections in course in sample: single, multiple
cls_credits Number of credits of class: one credit (lab, PE, etc.), multi credit
bty_f1lower Beauty rating of professor from lower level female: (1) lowest - (10) highest
bty_f1upper Beauty rating of professor from upper level female: (1) lowest - (10) highest
bty_f2upper Beauty rating of professor from upper level female: (1) lowest - (10) highest
bty_m1lower Beauty rating of professor from lower level male: (1) lowest - (10) highest
bty_m1upper Beauty rating of professor from upper level male: (1) lowest - (10) highest
bty_m2upper Beauty rating of professor from upper level male: (1) lowest - (10) highest

Before you get started, add the avg_bty variable from last time.

evals <- evals %>%
  rowwise() %>% 
  mutate(avg_bty = mean(bty_f1lower:bty_m2upper)) %>% 
  ungroup()

Categorical predictors

Task 1

Fit a linear model with score as the response and language as a single predictor. Write out the model output.

Task 2

What is the baseline level in Task 1? Interpret the meaning of coefficient \(b_1\).

Task 3

Based on Task 1, what is the equation of the line for English speaking professors? What about non-English speaking professors?

Task 4

Create a scatter plot of score versus rank with ggplot(). Use geom_jitter().

Task 5

Fit a linear model with score as the response and rank as a single predictor. What is the baseline? Write out the model output.

Task 6

Add a new variable to evals called rank_new where the baseline level is set to “tenured”. Hint: relevel()

Task 7

Fit a linear model with score as the response and rank_new as a single predictor. Is the baseline now different from the baseline in Task 5?

Multiple regression

Task 8

Fit a linear model with score as the response and gender, rank, and avg_bty as predictors. Write out the model. Give an interpretation for the coefficient of avg_bty.

Task 9

What are the \(R^2\) and adjusted \(R^2\) values from your model in Task 8?

Task 10

Fit a linear model with score as the response and only gender and avg_bty as predictors. How did the \(R^2\) and adjusted \(R^2\) values change compared to Task 9?

Model Selection

Task 11

Fit a full model with score as the response and predictors: rank, ethnicity, gender, language, age, cls_perc_eval, cls_students, cls_level, cls_profs, cls_credits, bty_avg.

Task 12

Why did we not consider cls_students and the individual beauty scores?

Task 13

Use the fitted full model in Task 11 and backward selection to determine the “best”" model. What are the \(R^2\) and adjusted \(R^2\) values from this “best” model?

Inference

Task 14

Create a 95% prediction interval based on new predictor values of your choosing. Use your “best” model from Task 13.

Task 15

Create 95% confidence intervals for the coeffients of your “best”" model from Task 13.

Task 16

Can we use this model to make valid predictions about professors from any University?