Getting Started

Case Study

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. In fact, Hamermesh and Parker (2005) found that instructors who are viewed to be better looking receive higher instructional ratings. (Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. http://www.sciencedirect.com/science/article/pii/S0272775704001165.)

In this lab we will analyze the data from this study in order to learn what goes into a positive professor evaluation.

The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance. (This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007).) The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.

Learning goals

Fitting a linear regression with a single numerical explanatory variable
Interpreting regression output in context of the data

Packages and Data

We’ll use the tidyverse package for much of the data wrangling and visualization and the data lives in the openintro package.

library(tidyverse) 
library(openintro)

The dataset is called evals, and since it is distributed with the package, we don’t need to load it separately; it becomes available to us when we load the package.

If we have trouble installing the openintro package, the dataset is also available for us to download from Canvas as the file evals.csv and can be loaded into R using the read_csv command. More details on how this is done can be found in Lab 1.

Instructions

Except for the code needed to load the packages, delete all the text prior to Exercises
After Exercises the only text that should appear are the questions and your answers to them
Delete all the comments that have been left for you in the code chunks
Include all code in the final submission

Exercises

Exploratory Data Analysis

# Use commands for exploring data that were introduced in the previous labs to
# answer the questions below

What is the sample size?
How many variables are there? How many are qualitative and how many are quantitative?

NOTE: Remember to delete eval=FALSE

# Use ggplot to create a histogram of professors' ratings (score)
# Include an informative title and x-axis label
ggplot(,aes())+
  geom_ () +
  labs()

Is the distribution skewed? If so, in which direction?
What does that tell you about how students rate courses?

NOTE: Remember to delete eval=FALSE

# In the following code chunk, do the following:
#   1. Select the variable for the professors' rating (score), and then
#   2. Summarize the variable by its mean, median, min and max
evals %>%
  select() %>%
  summarise()

Considering what you have learned from looking at the graph and the summary statistics, is this what you expected student evaluations of their professors to look like? Why, or why not?

NOTE: Remember to delete eval=FALSE

# Use ggplot to create a scatterplot of professors' ratings (score) against their beauty 
#    average (bty_avg)
# Include an informative title and axis labels
ggplot(,aes())+
  geom_ ()+
  labs()

Describe the relationship between score and bty_avg.

NOTE: Remember to delete eval=FALSE

# Recreate the scatterplot of professors' ratings (score) against their beauty average 
#    (bty_avg), but now using geom_jitter
# Include an informative title and axis labels
ggplot(,aes())+
  geom_jitter()+
  labs()

Based on the change to the plot, what do you think “jitter” means? What was misleading about the initial scatterplot?

Linear regression with a quantitative explanatory variable

Remember that a linear model is in the form \(\hat{y}_i = \beta_0 + \beta_1 x_i\).

NOTE: Remember to delete eval=FALSE

# In this code chunk, fit a linear model where professors' rating (score) is explained by
#   professors' beauty (bty_avg)
# Modify the code below so by:
#  1. Change x to the explanatory variable
#  2. Change y to the response variable
#  3. Change DATA to the name of the dataset we are analyzing
score_by_bty <- lm(y~x, data=DATA)
summary(score_by_bty)

Is the slope of the regression line significantly different from zero?
Interpret the slope of the linear model in context of the data.
Interpret the intercept of the linear model in context of the data. Comment on whether or not the intercept makes sense in this context.
Determine the \(R^2\) of the model and interpret it in context of the data.

NOTE: Remember to delete eval=FALSE

# Recreate the scatterplot of professors' ratings (score) against their beauty average
#   (bty_avg), but now including the regression line
# Include an informative title and axis labels
ggplot(,aes())+
  geom_jitter()+
  labs()+
  geom_smooth(method="lm",se=FALSE)

What is the direction, form and strength of the association between a professor’s rating and their beauty ranking?

Linear regression with a qualitative response variable

Explanatory variables can be qualitative as well as quantitative. In these cases the slope explains how one level of the qualitative variable differs from the baseline. Baselines can be defined in different ways, but in R it is done alphabetically. For example, we might look at the association between the cost of a vacation its location, where location is defined as home,domestic,foreign. In this case, domestic would be the baseline, and we would have two slopes, one that tells us how foreign travel differs in cost from domestic travel and how staying home for vacation (home) differs from foreign travel.

NOTE: Remember to delete eval=FALSE

# Recreate the scatterplot of professors' ratings (score) against their beauty average
#   (bty_avg), but now the points colored by gender (gender)
# Include an informative title and axis labels
ggplot(,aes())+
  geom_jitter()+
  labs()

Are there any apparent differences in the association between evaluation and beauty scores for professors identified as male or female?

NOTE: Remember to delete eval=FALSE

# In this code chunk, fit a linear model where professors' ranking (score) is explained by 
#   professors' gender (gender)
# Modify the code below by:
#  1. Change x to the explanatory variable
#  2. Change y to the response variable
#  3. Change DATA to the name of the dataset we are analyzing
score_by_gender <- lm(y~x, data=DATA)
summary(score_by_gender)

Based on the R output, on average are the scores of male professors higher or lower than those of female professors?
What percentage of the variation in professors’ evaluations is explained by gender? What does this mean for the association between gender and evaluation score?
Is it plausible that the association between gender and evaluation score might be zero?