This is adapted from Lab 6 in Duke’s Introduction to Data Science course.
We will analyze what goes into course evaluations and how certain variables effect the overall score.
To get started, load packages tidyverse
and broom
. Install any packages with code install.packages("package_name")
.
library(tidyverse)
library(broom)
The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance. Each row in evals
contains a different course and the columns represent variables about the courses and professors.
Use read_csv()
to read in the data and save it as an object named evals
. The data is available on Google Classroom.
evals <- read_csv("evals-mod.csv")
Variable | Description |
---|---|
score | Average professor evaluation score: (1) very unsatisfactory - (5) excellent |
rank | Rank of professor: teaching, tenure track, tenure |
ethnicity | Ethnicity of professor: not minority, minority |
gender | Gender of professor: female, male |
language | Language of school where professor received education: english or non-english |
age | Age of professor |
cls_perc_eval | Percent of students in class who completed evaluation |
cls_did_eval | Number of students in class who completed evaluation |
cls_students | Total number of students in class |
cls_level | Class level: lower, upper |
cls_profs | Number of professors teaching sections in course in sample: single, multiple |
cls_credits | Number of credits of class: one credit (lab, PE, etc.), multi credit |
bty_f1lower | Beauty rating of professor from lower level female: (1) lowest - (10) highest |
bty_f1upper | Beauty rating of professor from upper level female: (1) lowest - (10) highest |
bty_f2upper | Beauty rating of professor from upper level female: (1) lowest - (10) highest |
bty_m1lower | Beauty rating of professor from lower level male: (1) lowest - (10) highest |
bty_m1upper | Beauty rating of professor from upper level male: (1) lowest - (10) highest |
bty_m2upper | Beauty rating of professor from upper level male: (1) lowest - (10) highest |
Create a scatter plot between score
and a quantitative variable. Comment on the relationship. What do you think the correlation is between the variables?
Create a scatter plot between score
and a categorical variable. Comment on the relationship. Add geom_jitter()
to your plot. What do you notice about the variability in score between different levels?
Create a variable named avg_bty
. This is the average attractiveness from the six student scores for each professor. Add this variable to evals
. Hint: rowwise()
then mutate()
then ungroup()
Create a scatter plot between score
and avg_bty
. Comment on the relationship. What do you think the correlation is between the variables?
Modify your plot from Task 4. Add a layer with geom_jitter()
. Comment on the relationship.
Fit a linear model with function lm()
between variables score
and avg_bty
. Save it as an object m.bty
. What are the estimates for \(b_0\) and \(b_1\)? Hint: tidy()
Modify your plot from Task 4 and include the regression line. Pick a different color than the default of blue.
Interpret the slope of the fitted model. Does the intercept have any practical meaning within the scope of our data?
What is the \(R^2\) value? Interpret this value. Hint: glance()
Create a plot of the residual value versus avg_bty
. Comment on what you observe.
Compute the sum of squared residuals. Function augment()
will give you all the residuals.