STA 112 Lab 4

Motivation

We have a client today who is interested in examining student evaluation scores of professors in college courses. The use of these student evaluations as an indicator of course quality and teaching effectiveness is criticized for a variety of reasons, including the concern that these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor.

One article that explores this concern is titled “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity” (Hamermesh and Parker, 2005). The article concluded that instructors who are viewed to be better looking receive higher instructional ratings.

In this lab we will analyze the data from this study for ourselves. I put the article link down below just for reference, but I will provide a link to the data itself in the next section of the lab.

Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013.[http://www.sciencedirect.com/science/article/pii/S0272775704001165]

The Data

The data for this study were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. A beauty score was created for each of the sampled professors by having six students rate the physical appearance of each professor. The beauty score in the data for each professor is the average of the six student scores.

The data set we will be working with is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007). Our data set is a data frame where each row contains a different course and columns represent variables about the courses and professors.

To load the data, copy and paste the following line of code into a code chunk and press play:

load(url("http://www.openintro.org/stat/data/evals.RData"))

The names of the variables are as follows:

score - average professor evaluation score: (1) very unsatisfactory - (5) excellent.

rank - rank of professor: teaching, tenure track, tenured.

ethnicity - ethnicity of professor: not minority, minority.

gender - gender of professor: female, male. Note: This variable is actually sex, not gender, and is recorded as binary in this study.

language - language of school where professor received education: English or non-English.

age - age of professor.

cls_perc_eval - percent of students in class who completed evaluation.

cls_did_eval - number of students in class who completed evaluation.

cls_students - total number of students in class.

cls_level - class level: lower, upper.

cls_profs - number of professors teaching sections in course in sample: single, multiple.

cls_credits - number of credits of class: one credit (lab, PE, etc.), multi credit.

bty_avg - average beauty rating of professor.

pic_outfit - outfit of professor in picture: not formal, formal.

pic_color - color of professor picture: color, black & white.

Model 1: One Numeric X

The fundamental phenomenon suggested by the study is that better looking teachers are evaluated more favorably. Our client asks us to use our data set to see if we find the same result. In other words, they ask us to build a model to explore the relationship between the beauty score (X = bty_avg) and the evaluation score (Y = score) of faculty being evaluated.

We know that the first step in any model building is EDA, so we will being by creating a plot to explore this relationship.

Question 1

Create a plot to explore the relationship between X = beauty score (bty_avg) and Y = evaluation score (score). Make sure to label your axes! Hint: Remember to use library(ggplot2)before you run your plot.

Question 2

Comment on the four things we discuss when describing scatter plots. Based on the plot, does there seem to be a relationship between beauty score and evaluation score?

Now that we have plotted the data, our client asks us to use a regression model to explore the relationship. The goal of building the model is to allow us to determine if what we saw in the plot is something more than natural variation.

Question 3

Build an LSLR model in R called model_beauty to predict Y = average professor score by using X = average beauty rating. You do not need to graph it.

Write out the LSLR line as the answer to this question. Make sure to use proper notation!

At this point, we have a model that we can use to describe the relationships in our data set. However, we want to be able to assess whether these data provide strong evidence of a population relationship between beauty and evaluation score. For this, we need inference procedures. You may assume all assumptions needed for inference are reasonable.

Question 4

Build and interpret a 95% confidence interval for the slope for beauty score. Use \(t^{*} = 1.97\).

There are two things we consider when examining a confidence interval. The first is statistical significance, which for this scenario means checking to see if 0 is included in the CI. If it is, then 0 is a plausible value for \(\beta_1\), suggesting there may be no population relationship between X and Y.

The second thing we check for is practical significance. This means checking to see whether or not the change suggested by the interval is large enough to make a real world impact in the context of the data.

For instance, if we are 95% confident that for every hour you study, your final exam grade on average increases between .005 and .007 points out of 100, does that motivate you to study?? Probably not, as you would need to study more than 142 hours to gain 1% on your final. The interval does not contain 0, so it is statistically significant, but it is not practically significant as the change suggested by either end point of the CI is not large enough to make a real world impact on your exam score.

Question 5

For the exam studying example, give an example of a CI that would be both statistically and practically significant. Briefly explain your choice. There are many right answers here!

Question 6

Based on your confidence interval from Question 4, does beauty score appear to have a practically significant relationship with evaluation score? Explain.

Note: To explore practical significance, it may be helpful to look at a summary of evaluation score, and then look at your confidence interval.

One other tool we use to assess the strength of the relationship between X and Y is the \(R^2\).

Question 7

Compute and interpret the \(R^2\) for your model. Comment on what this tells us about how well the model fits the data. In other words, are we capturing a large percentage of the variability in evaluation score with this model?

Model 2: One Categorical X

Our client now asks us to build a second model using the evaluation data. They are interested in the relationship between X = the class level of the course and Y = the evaluation score. Class level (cls_level) is a recorded as “lower” if the course is a 100 level or 200 level course and “upper” if the course is a 300 level course.

Why is this an interesting variable to look at? Well, class level can be used as a way to reflect the difficulty level of the course. For instance, it may be reasonable to assume that upper level classes may be more difficult than lower level classes.

Question 8

Create a side-by-side box plot to explore the relationship between class level and evaluation score. Make sure to label your axes. Based on the plot, does it looks like there is a relationship between class level and evaluation score?

To build a regression model for Y = evaluation score and X = class level, we can use the following:

model_Level <- lm(score~ cls_level, data = evals)
summary(model_Level)

If you run summary(model_ClassLevel), you will notice that the coefficient for class level is now called cls_levelupper. The reason is that R re-codes class level from having the values of upper and lower to being an indicator variable called cls_levelupper that takes a value of 0 for lower level classes and a value of 1 for upper level classes.

Question 9

Write down your fitted model for X = class level and Y = evaluation score using appropriate notation. Interpret \(\hat{\beta}_0\) and \(\hat{\beta}_1\).

Question 10

Build and interpret a 95% confidence interval for \(\beta_1\). You may assume all assumptions are reasonable.

Question 11

Does your confidence suggest practical significance, statistical significance, both, or neither? Explain.

Okay, so we can see how this model works now. However, how did we decide that we were going to indicate the upper level rather than the lower level classes?? It turns out that the decision to call the indicator variable cls_levelupper instead of cls_levellower has no deeper meaning. R simply picks on level to be 0 and one level to be 1.

Typically, R makes the choice alphabetically, but for any variable, you can use the summary command to check. Whatever value shows up first in the summary table is the baseline, i.e., is coded as 0. So, the indicator variables indicates when the value of the variable is different from the baseline. For class level, the baseline is lower. The indicator variable asks “Is this class an upper level class?” If so, we record a 1, if not a 0. The 1 indicates a value of upper level class, i.e., a value different from the baseline.

Question 12

What is the baseline for the variable pic_outfit?

You can change the baseline level of a categorical variable (the level that is coded as a 0) using the relevel function. Use ?relevel or ask me to learn more.

Model 3: Putting Them Together

We have built a model with one numeric X and a model with one categorical. However, there is no reason why we have to pick only one! We can build multiple linear regression (MR) models which include multiple X variables.

Why might we want to include multiple X variables? One reason is that that putting two or more X variables together in the model allows us to build models that account for more sources of variation. This can allow us to build models that are stronger fits to the data, explain more variability in Y, and provide more accurate interpretations of the relationships between our X variables and Y.

Our client believes that in addition to beauty score, the variance in evaluation score might be related to whether or not a faculty member is tenure track, tenured, or not tenure track. This information is recorded in the variable rank in the data set. Based on this, let’s try building a model for \(Y\) = evaluation score using \(X_1\) = beauty score and \(X_2\) = professor rank.

Question 13

Consider the variable professor rank.

How many levels (different values) does this variable have?
What are these levels?
Which of these levels will R treat as the baseline?

Question 14

Create a new model called model_BeautyRank with both beauty average and professor rank as X variables. Write out the fitted model.

Instead of one indicator variable, you will notice that this model uses two, and we can see what these are from the R summary. We have one variable called ranktenuretrack. This variable asks “Is this professor on the tenure track?”, and recording a 1 one means yes. The second variable in the summary output asks “Is this professor tenured?”, where again a 1 means yes and 0 means no.

What is left? Well, a teaching track professor is neither tenured nor tenure track, so such professors will have zeroes for both indicator variables. This means that teaching professors are the baseline we compare to, and the average scores for teaching professors (with beauty score 0) is the intercept of the model.

Question 15

Write down the LSLR lines for (1) tenure track, (2) tenured and (3) teaching track faculty. This means you should have 3 lines. Simplify fully.
Is the slope different across these three models?
What about the intercept?

Now that we have the model built, let’s interpret it. The interpretation of the coefficients in multiple regression is slightly different from that of LSLR. The estimate for beauty average reflects how much higher a group of professors is expected to score if they have a beauty rating that is one point higher keeping all other variables fixed. In this case, that translates into considering only professors of the same rank with beauty average scores that are one point apart.

When we write our interpretations, we reflect this with the phrase ``After controlling for rank”. This indicates that we are interpreting the coefficient as: For professors of the same rank, what happens when we increase the beauty score by 1 point?

Question 16

Based on your fitted model, interpret the coefficient for tenure track rank.

Question 17

Which of the following is the correct order of the three levels of rank if we were to order them from lowest predicted course evaluation score to highest predicted course evaluation score?

1. Teaching, Tenure Track, Tenured
1. Tenured, Tenure Track, Teaching
1. Tenure Track, Tenured, Teaching
1. Teaching, Tenured, Tenure Track

Model 4:

Now that we can build multiple regression models, we need to carefully consider the idea of controlling for sources of variation when we build a model. If we know or suspect that something might be related to Y, it is important to build that information into the model. This allows us to build models that are stronger reflections of the patterns in the data, as well as providing a more accurate understanding of the relationships we are interested in.

Our client suspects that professor rank, ethnicity, sex (which is recorded under the name gender in the data set), age, class level, and the number of students in the class should be controlled for before assessing the relationship between beauty and evaluation score. In other words, our client asks us to build a model that contains all of these variables, including beauty average.

Question 18

Build the model suggested by the client. You do not have to write out the fitted model, but show the summary as your answer to this question.

Question 19

Based on this model, interpret the coefficient for beauty average.

When we evaluate multiple regression models, we use something called the adjusted \(R^2\) to reflect the amount of variability in \(Y\) that is captured by the model. We know that \[R^2 = 1 - \frac{RSS}{TSS}\]. The adjusted \(R^2\) is \[R^2_{adj} = 1 - \left( \frac{RSS}{TSS} \times \frac{n-1}{n-p} \right),\] where \(p\) is the number of \(\beta\) terms in the model including the intercept.

Question 20

What is the \(R^2_{adj}\) of this model?

Question 21

Based on your results in Question 20, do you think that including a lot of variables in the model always results in a high amount of variability in \(Y\) being captured by the model?

Next Steps

As we move forward in the course, we will start to learn how we can use this idea of using multiple explanatory variables to build stronger models. This means we want to build models that are a better reflection of the variability in \(Y\) and the relationships between \(Y\) and other variables. We also need to learn how we can compare models, and how we might decide whether including more explanatory variables in the model is effective. All of that is coming up!

References

This lab is based on this product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. The original lab was created for OpenIntro by Andrew Bray and Mine Cetinkaya-Rundel from a lab written by Mark Hansen of UCLA Statistics. It was pulled 2017 May 18. This adapted version was last updated by Nicole Dalzell 2023 25 July. This version is not endorsed by OpenIntro.