We have been studying multiple linear regression (MLR), meaning regression models models that use more than one predictor to explain the variation in the response variable. In this lab, we are going to build and interpret MLR models, as well as discuss some considerations that we will explore in future classes.
Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, ``Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity’’ (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking receive higher instructional ratings. In this lab we will analyze the data from this study to explore this claim.
Article Citation:
Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. http://www.sciencedirect.com/science/article/pii/S0272775704001165
The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, a beauty score was created for each of the sampled professors. This score was created by having six students, three male and three female, rate the physical appearance of each professor.
The data we will be working with is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007). The result is a data frame where each row contains a different course and columns represent variables about the courses and professors.
load(url("http://www.openintro.org/stat/data/evals.RData"))
score
- average professor evaluation score: (1) very unsatisfactory - (5) excellent.rank
- rank of professor: teaching, tenure track, tenured.ethnicity
- ethnicity of professor: not minority, minority.gender
- gender of professor: female, male.language
- language of school where professor received education: English or non-English.age
- age of professor.cls_perc_eval
- percent of students in class who completed evaluation.cls_did_eval
- number of students in class who completed evaluation.cls_students
- total number of students in class.cls_level
- class level: lower, upper.cls_profs
- number of professors teaching sections in course in sample: single, multiple.cls_credits
- number of credits of class: one credit (lab, PE, etc.), multi credit.bty_f1lower
- beauty rating of professor from lower level female: (1) lowest - (10) highest.bty_f1upper
- beauty rating of professor from upper level female: (1) lowest - (10) highest.bty_f2upper
- beauty rating of professor from second upper level female: (1) lowest - (10) highest.bty_m1lower
- beauty rating of professor from lower level male: (1) lowest - (10) highest.bty_m1upper
- beauty rating of professor from upper level male: (1) lowest - (10) highest.bty_m2upper
- beauty rating of professor from second upper level male: (1) lowest - (10) highest.bty_avg
- average beauty rating of professor.pic_outfit
- outfit of professor in picture: not formal, formal. (not used in this analysis)pic_color
- color of professor’s picture: color, black & white. (not used in this analysis)The primary goal of our study is to explore the relationship between beauty score and professor evaluation scores. However, the data set actually contains several variables about the professors that were evaluated in this study. Generally, when we build linear models we try to include every predictor we have that has a relationship with the response variable. Doing this means that the conclusions we draw about our predictor of inference are more accurate. We are going to start with adding two predictors.
Fit a linear regression model for evaluation score using both beauty score (bty_avg
) and gender as predictors (recall that for this study, gender was recorded as a binary variable, male or female). Name this model m_bty_gen, and write out the linear regression line.
If you run summary(m_bty_gen)
, you will notice that the estimate for gender
is now called gendermale
. You’ll see this name change whenever you introduce a categorical variable. The reason is that R recodes gender
from having the values of female
and male
to being an indicator variable called gendermale
that takes a value of \(0\) for females and a value of \(1\) for males. Such variables are often referred to as “dummy” variables or indicator variables.
The decision to call the indicator variable gendermale
instead of genderfemale
has no deeper meaning. R simply has an ordering of the levels, and the level that comes first is recorded as \(0\). Typically, R makes the choice alphabetically, but for any variable, you can use the summary
command to check. Whatever value shows up first in the summary table is the baseline, i.e., is coded as 0. So, the indicator variables indicates when the value of the variable is different from the baseline. For gender
, the baseline is female. The indicator variable asks "Is this professor a male?" If so, we record a 1, if not a 0. The 1 indicates a value of male, i.e., a value different from the baseline.
You can also use the levels()
command for this. For instance, levels(evals$gender)
will give you the levels of the gender variable, with the first level as the baseline.
What is the baseline for the variable cls_level
?
You can change the baseline level of a categorical variable, which is the level that is coded as a 0, using the relevel
function. Use ?relevel
to learn more.
Interpret both slope terms in the m_bty_gen regression line.
Given that we have gender in the model, do we have evidence that beauty score explains a significant amount of variation in evaluation score? Use a significance level of 0.05. Note: You only need to show the output and write out the final conclusion. You do not need to show all of your steps.
Make a 95% confidence interval for the slope for beauty score. Does this interval suggest statistical significance? Practical significance? Explain. Hint: Practical significance often depends on the scale of the variable, meaning how big or small the values of the response variable can be.
Even though so far we have written it as one regression line, the MLR model we have created actually represents two regression lines, one for each level of the categorical variable.
Think about making predictions for individuals who identify as female. This means that the slope that is multiplied by the indicator for male is in fact multiplied by zero. This leaves a line with an intercept and slope form familiar from simple regression.
\[ \begin{align*} \widehat{score} &= \hat{\beta}_0 +\hat{\beta}_1 \times bty\_avg + \hat{\beta}_2 \times (0) \\ &= \hat{\beta}_0 + \hat{\beta}_1 \times bty\_avg \end{align*} \]
We can plot this line and the line corresponding to males with the following custom function.
multiLines(m_bty_gen)
Write out the regression line for males.
Some of the possible predictors in the data are categorical, but have more than two levels.
Consider the variable rank
. How many levels does this variable have? What are they? Which of these levels is the baseline?
Create a new model called m_bty_rank
with gender
removed and rank
added in. How many coefficients for rank
are in the model?
Instead of one indicator variable, this model uses two, and we can see what these are from the R summary. We have one variable called ranktenuretrack
. This variable asks "Is this professor on the tenure track?", and recording a 1 one means yes. The second variable in the summary
output asks "Is this professor tenured?", where again a 1 means yes and 0 means no. What is left? Well, a teaching track professor is neither tenured nor tenure track, so such professors will have zeroes for both indicator variables. This means that teaching professors are the baseline we compare to, and the average scores for teaching professors (with beauty score 0) is the intercept of the model.
Write down the regression lines for (1) tenure track, (2) tenured and (3) teaching track faculty. When you write out the model, you will have some terms that are just numbers, i.e., are not attached to a beta. Combine these when you write down the models. Is the slope different across these three models? What about the intercept?
Use the multiLines function to plot the regression lines for the model m_bty_rank
and show your result.
The interpretation of the coefficients in multiple regression is slightly different from that of simple regression. The estimate for bty_avg
reflects how much higher a group of professors is expected to score if they have a beauty rating that is one point higher while holding all other variables constant. All else held constant, we expect evaluation scores to change on average by the coefficient of average beauty score. In this case, that translates into considering only professors of the same rank with bty_avg
scores that are one point apart.
ranktenure track
.So far, we have fit two models: m_bty_gen
and m_bty_rank
. Now, fit a third model called m_bty_both
that includes beauty, gender and rank in the model. How do we decide which of the three models we might prefer to use to model the data?
Our instinct may be to reach for R squared, which served well for this purpose in SLR. However, in MLR, we have a problem. The different models actually have different number of predictor variables. Why is this a problem? As long as we are using the same data cases to fit the model, adding a new predictor variable to an MLR model can NEVER decrease the percent of variability explained by the model. This means that if we just look at R squared, when are deciding whether to use a model with predictor A versus a model with predictors A and B, we will always choose the latter, regardless of what this predictor B might be. I could add in favorite pop song to the model, and the R squared will go up. This was not a problem in SLR when the all models we were comparing had only one predictor each. How do we deal with this issue in MLR??
To compare models in MLR, we use something called adjusted R squared. It is called "adjusted" because it "adjusts", or accounts for, the fact that the models we are comparing may have different numbers of predictor variables. Mathematically, we define
\[ \begin{align*} R^2_{adj} &= 1 - \frac{RSS/(n-k-1)}{SSTotal/(n-1)} \end{align*} \]
The denominator stays the same for all models (the response variable doesn't change). However, the numerator will change for each model we are comparing. Basically, if adding a predictor variable to the model does not result in enough of a decrease in RSS, the adjusted R squared value will go down. In other words, it applies a penalty if we try to add a predictor into the model that is not useful. This means that, unlike the R squared value, we CAN use adjusted R squared to compare MLR models.
The adjusted R squared value can be found in the summary
output from your linear models.
At this point, we have chosen one model, and we will work with that model for the rest of the lab. Now that we have chosen it, it is to to evaluate it!
One thing we have not done so far with regressionl ines is to actually check the conditions for inference.We know that for simple linear regression models, we often use residual plots and qqplots to help us check some of our conditions. The same thing is true for MLR models. The qqplot does not change at all, but residual plots have to change a little.
Residual plots for MLR: When we make residual plots for MLR models, we have to do it a little differently than we did for SLR. We need to plot fitted values on the x-axis rather than the value of our predictor. Why? Because we have a lot of predictors! We only want one residual plot.For an MLR residual plot, we place the predicted values \(\hat{y}\) on the x-axis. In anticipation of us wanting to do that, R automatically computes the predicted values for all of the data points we have. To obtain those values, we use the code m1$fitted.values
.
m1$residuals
to obtain those points. Instead of using this code, how could we use R to compute the residuals from the definition of residuals? Hint: We know the code for the predicted (fitted) values and we know the code to pull the actual values of Y
. Use this and the definition of a residual to come up with your answer.Now, let's make a residual plot using the following command:
plot(m1$residuals ~ m1$fitted.values,, xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, lty = 3) # adds a horizontal dashed line at y = 0
This lab has given us a first glimpse of model selection and model comparison, a topic we will continue to explore in the next few classes. We now have the power to use more than one predictor in a model. How do we decide which predictors to use, and which to throw out? How do we decide what predictors might be related, or correlated to one another, and how do we handle it? Once we've chosen a model, how do we defend our choice? For all this and more, stay tuned...