Introduction

This is adapted from Lab 6 in Duke’s Introduction to Data Science course.

We will analyze what goes into course evaluations and how certain variables effect the overall score.

To get started, load packages tidyverse and broom. Install any packages with code install.packages("package_name").


library(tidyverse)
library(broom)

Data

The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance. Each row in evals contains a different course and the columns represent variables about the courses and professors.

Use read_csv() to read in the data and save it as an object named evals. The data is available on Google Classroom.

evals <- read_csv("evals-mod.csv")

Data dictionary

Variable Description
score Average professor evaluation score: (1) very unsatisfactory - (5) excellent
rank Rank of professor: teaching, tenure track, tenure
ethnicity Ethnicity of professor: not minority, minority
gender Gender of professor: female, male
language Language of school where professor received education: english or non-english
age Age of professor
cls_perc_eval Percent of students in class who completed evaluation
cls_did_eval Number of students in class who completed evaluation
cls_students Total number of students in class
cls_level Class level: lower, upper
cls_profs Number of professors teaching sections in course in sample: single, multiple
cls_credits Number of credits of class: one credit (lab, PE, etc.), multi credit
bty_f1lower Beauty rating of professor from lower level female: (1) lowest - (10) highest
bty_f1upper Beauty rating of professor from upper level female: (1) lowest - (10) highest
bty_f2upper Beauty rating of professor from upper level female: (1) lowest - (10) highest
bty_m1lower Beauty rating of professor from lower level male: (1) lowest - (10) highest
bty_m1upper Beauty rating of professor from upper level male: (1) lowest - (10) highest
bty_m2upper Beauty rating of professor from upper level male: (1) lowest - (10) highest

Exploratory data analysis

Task 1

Create a scatter plot between score and a quantitative variable. Comment on the relationship. What do you think the correlation is between the variables?

Task 2

Create a scatter plot between score and a categorical variable. Comment on the relationship. Add geom_jitter() to your plot. What do you notice about the variability in score between different levels?

Task 3

Create a variable named avg_bty. This is the average attractiveness from the six student scores for each professor. Add this variable to evals. Hint: rowwise() then mutate() then ungroup()

Task 4

Create a scatter plot between score and avg_bty. Comment on the relationship. What do you think the correlation is between the variables?

Task 5

Modify your plot from Task 4. Add a layer with geom_jitter(). Comment on the relationship.

Model build

Task 6

Fit a linear model with function lm() between variables score and avg_bty. Save it as an object m.bty. What are the estimates for \(b_0\) and \(b_1\)? Hint: tidy()

Task 7

Modify your plot from Task 4 and include the regression line. Pick a different color than the default of blue.

Task 8

Interpret the slope of the fitted model. Does the intercept have any practical meaning within the scope of our data?

Task 9

What is the \(R^2\) value? Interpret this value. Hint: glance()

Residuals

Task 10

Create a plot of the residual value versus avg_bty. Comment on what you observe.

Task 11

Compute the sum of squared residuals. Function augment() will give you all the residuals.