Motivation

Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, ``Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity’’ (Hamermesh and Parker, 2005) conclude that instructors who are viewed to be better looking receive higher instructional ratings.

In this lab we will analyze the data from this study for ourselves! I put the article link down below just for reference, but I will provide a link to the data itself in the next section of the lab.

Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. http://www.sciencedirect.com/science/article/pii/S0272775704001165

The Data

The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, a beauty score was created for each of the sampled professors. This score was created by having six students rate the physical appearance of each professor.

The data we will be working with is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007).} The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.

To load the data, copy and paste the following line of code into a code chunk and press play!

load(url("http://www.openintro.org/stat/data/evals.RData"))

The variables we will be working with today are X = bty_avg (beauty score) and Y = score (evaluation score).

Model 1

The fundamental phenomenon suggested by the study is that better looking teachers are evaluated more favorably. Let's create a scatter plot to see if this appears to be the case:.

  1. Create a scatter plot to explore the relationship between X = beauty score (bty_avg) and Y = evaluation score (score). Make sure to label your axes! Hint: Remember to use library(ggplot2)before you run your plot.

  1. Based on the plot, does there seem to be a relationship between beauty score and evaluation score? In addition to answering this question, comment on the four things we discuss when describing scatter plots.

Now that we have plotted the data, let's try a model. The model will allow us to determine if what we saw in the plot is something more than natural variation.
  1. Fit an LSLR model called m_bty to predict Y = average professor score by using X = average beauty rating. Write out the LSLR line.

  1. Make and interpret a 95% confidence interval for the slope for beauty score.

  1. What is the R-squared for your model? Based on this, would you recommend we put much trust in this confidence interval? Explain.

  1. Based on the interval, does beauty score appear to be a practically significant predictor? Explain. Note: To explore practical significance, it may be helpful to look at a summary of evaluation score, and then look at your confidence interval. Do you think the change suggested in the CI makes a practical difference given the possible values for evaluation score?

Model 2

Right now, we have started to build a model for evaluation score, but our model right now is not a great fit to the data. How might we improve the model, i.e., explain more of the variability in evaluation score?

The data set actually contains several variables about the professors that were evaluated in this study. Let's consider the variable class level (cls_level). This variable is recorded as "lower" if the class is a 100 level or 200 level class and "upper" if the class is a 300 level class.

Let's explore whether the evaluation score of a professor seems to be different depending on if that professor teaches an upper level or a lower level course. To an LSLR model for Y = evaluation score and X = class level, we would use the following:

m_level <-lm(score~ cls_level, data = evals)

If you run summary(m_level), you will notice that the slope for class level is now called cls_levelupper. The reason is that R recodes class level from having the values of upper and lower to being an indicator variable called cls_levelupper that takes a value of \(0\) for lower level classes and a value of \(1\) for upper level classes.

  1. Write down the LSLR line for Y = evaluation score and X = class level. Interpret the slope and intercept. Hint: Remember to use summary(m_level) to get the estimated slope and intercept.

Okay, so we can see how this model works now. However, how did we decide that we were going to indicate the upper level rather than the lower level classes? It turns out that the decision to call the indicator variable cls_levelupper instead of cls_levellower has no deeper meaning. R simply picks on level to be 0 and one level to be 1.

Typically, R makes the choice alphabetically, but for any variable, you can use the summary command to check. Whatever value shows up first in the summary table is the baseline, i.e., is coded as 0. So, the indicator variables indicates when the value of the variable is different from the baseline. For class level, the baseline is lower. The indicator variable asks "Is this class an upper level class?" If so, we record a 1, if not a 0. The 1 indicates a value of upper level class, i.e., a value different from the baseline.

  1. What is the baseline for the variable pic_outfit?

  2. You can change the baseline level of a categorical variable (the level that is coded as a 0) using the relevel function. Use ?relevel or ask me to learn more.

    Model 3: Putting Them Together

    When we could only work with one predictor, we might look at the model using beauty average and the model using class level and choose which predictor is best. Now, with the power of MLR, we can instead choose to fit a model with both predictors.

    Putting two or more X variables together in the model allows us to build models that account for more sources of variation. This can allow us to build models that are stronger fits to the data, explain more variability in Y, and provide more accurate interpretations of the relationships between our X variables and Y. Let's try it by building a model for Y = evaluation score using X1 = beauty score and X2 = class level.

    m_bty_level <- lm(score ~ bty_avg + cls_level, data = evals)
    summary(m_bty_level)
    1. Using a significance level of .001, do we have evidence that the coefficient (beta hat!) for beauty average is different from 0, even with our new predictor added in? Has the addition of class level to the model changed the estimated slope for beauty average? If so, by how much?

    1. For your current model using beauty average and class level to predict evaluation score, write out the LSLR line corresponding to upper level classes.

    2. Now, let's plot the model you have just created.

      ggplot(evals, aes(bty_avg, score, col = cls_level)) + geom_abline(intercept = 3.94321 , slope= 0.0657, lty=1, col="red") + geom_abline(intercept = 3.94321 - 0.08885 , slope= 0.0657, col="blue") + geom_point()
      

      How did I get the values for this graph? The code geom_abline() tells R to draw a line on the graph. We then specify the slope and intercept of these lines using the LSLR line.

      1. Add a title and appropriate axis labels to the graph above.

      2. For two courses (one lower level class and one upper level class) whose professors received the same beauty rating, the upper level course professor is predicted to have the higher course evaluation score.

        A) True
        B) False

      Model 4: More Levels

      Some of the possible predictors in the data are categorical, but have more than two levels. We talked a little last class about how to work with these. Let's put that into practice.

      1. Consider the variable rank. How many levels does this variable have? What are they? Which of these levels is the baseline?

        1. Create a new model called m_bty_rank with class removed and rank added in (so you should have bty_avg and rankin the model but NOT class level). How many coefficients (beta hats) for rank are in the model?

        Instead of one indicator variable, this model uses two, and we can see what these are from the R summary. We have one variable called ranktenuretrack. This variable asks "Is this professor on the tenure track?", and recording a 1 one means yes. The second variable in the summary output asks "Is this professor tenured?", where again a 1 means yes and 0 means no. What is left? Well, a teaching track professor is neither tenured nor tenure track, so such professors will have zeroes for both indicator variables. This means that teaching professors are the baseline we compare to, and the average scores for teaching professors (with beauty score 0) is the intercept of the model.

        1. Write down the LSLR lines for (1) tenure track, (2) tenured and (3) teaching track faculty. This means you should have 3 lines. When you write out the model, you will have some terms that are just numbers, i.e., are not attached to an X. Combine these when you write down the models. Is the slope different across these three models? What about the intercept?

        2. Look at the code we used above Question 11 to plot the average beauty score (X) versus the evaluation score (Y) by class level. Now, we want to do the same thing, but we want to use rank instead of class level as our categorical variable. Adapt the code from Question 11 accordingly and show the graph. Make sure to add a title and label your axes!

        The interpretation of the slope in multiple regression is slightly different from that of simple regression. The estimate for beauty average reflects how much higher a group of professors is expected to score if they have a beauty rating that is one point higher while holding all other variables constant. All else held constant, we expect evaluation scores to change on average by the slope of average beauty score. In this case, that translates into considering only professors of the same rank with beauty average scores that are one point apart.

        1. Based on your model, interpret the slope for ranktenure track. Look at the notes from last class if you have questions on how to do this!
        2. Which of the following is the correct order of the three levels of rank if we were to order them from lowest predicted course evaluation score to highest predicted course evaluation score?

          A) Teaching, Tenure Track, Tenured
          B) Tenured, Tenure Track, Teaching
          C) Tenure Track, Tenured, Teaching
          D) Teaching, Tenured, Tenure Track
        This lab is based on this product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. The original lab was created for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel from a lab written by Mark Hansen of UCLA Statistics. It was pulled 2017 May 18. This adapted version was last updated by Nicole Dalzell 2022 2 March. This version is not endorsed by OpenIntro.
        The css file used to format this lab was retrieved from the GitHub of Mine Çetinkaya-Rundel, version 2016 Jan 13.