1

Is this an observational study or an experiment? The original research question posed in the paper is whether beauty leads directly to the differences in course evaluations. Given the study design, is it possible to answer this question as it is phrased? If not, rephrase the question.

This one is hard to tell. I would assume it is an observational study since it’s analyzing surveys that are taken. If there was a survey without the professor’s picture as the control and a survey with a picture, it could be described as an experimental study.

2

Describe the distribution of score. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not?

The distribution is skewed left. The average score is somewhere between 4.0 - 4.5 which means students rarely score a professor with a low number.

hist(evals$score)

3

Excluding score, select two other variables and describe their relationship using an appropriate visualization (scatterplot, side-by-side boxplots, or mosaic plot).

The two variables I used were ‘gender’ and ‘cls_perc_eval’. The boxplot shows that an average higher percentage of students completed the professor evaluations if the professor was female compared to if the professor was male.

ggplot(evals, aes(x = gender, y = cls_perc_eval)) +
      geom_boxplot()

4

Replot the scatterplot, but this time use the function jitter() on the yy- or the xx-coordinate. (Use ?jitter to learn more.) What was misleading about the initial scatterplot?

The initial scatterplot had less points that the number of observations in the data frame. This was because many of the scores vs bty_avg were the same so the points were layered on top of each other making it hard to find an accurate correlation.

plot(jitter(evals$score) ~ evals$bty_avg)

5

Let’s see if the apparent trend in the plot is something more than natural variation. Fit a linear model called m_bty to predict average professor score by average beauty rating and add the line to your plot using abline(m_bty). Write out the equation for the linear model and interpret the slope. Is average beauty score a statistically significant predictor? Does it appear to be a practically significant predictor?

\(score = 3.88 + 0.06(avg\_bty)\)

The slope means for each increase in average beauty by 0.06 the score increases by 1.

The R-Squared value of 0.03 shows that it is not statistically significant and the slope is so small that it is not a practical predictor either.

m_bty <- lm(evals$score ~ evals$bty_avg)
plot(jitter(evals$score) ~ evals$bty_avg)
abline(m_bty)

summary(m_bty)
## 
## Call:
## lm(formula = evals$score ~ evals$bty_avg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.88034    0.07614   50.96  < 2e-16 ***
## evals$bty_avg  0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

6

Use residual plots to evaluate whether the conditions of least squares regression are reasonable. Provide plots and comments for each one

The residual scatterplot shots that the variability is favored to the negative values. There are more values with greater distance from the zero line to a negative value than positive ones. There isn’t a non-linear pattern that’s obvious but there is constant variability.

plot(m_bty$residuals ~ evals$bty_avg)
abline(h = 0, lty = 3)

The residual histogram shows that the residuals are skewed to the left. We are looking for the residuals to have a normal distribution around zero.

hist(m_bty$residuals)

The Q-Q plot shows the residuals tailing off means that it is not following the linear model. Since the residuals are not linear, the distribution isn’t normal and the variability isn’t constant, the least-squares line does not fit the data.

qqnorm(m_bty$residuals)
qqline(m_bty$residuals)

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07

7

P-values and parameter estimates should only be trusted if the conditions for the regression are reasonable. Verify that the conditions for this model are reasonable using diagnostic plots.

Scatterplot looks linear but the constant variability could be better.

plot(m_bty_gen$residuals ~ evals$bty_avg)
abline(h = 0, lty = 3)

Histogram shows a distribution that is skewed left. The distribution is not normal.

hist(m_bty_gen$residuals)

The Q-Q plot shows that the residuals fall away from the line meaning that even with the ‘gender’ variable the least squares line is still not a good fit for the data.

qqnorm(m_bty_gen$residuals)
qqline(m_bty_gen$residuals)

8

Is bty_avg still a significant predictor of score? Has the addition of gender to the model changed the parameter estimate for bty_avg?

The addition of the gender value gives a slight increase to bty_avg as being a significant predictor value but the change is negligible. The residuals do not meet the conditions for the least squares line meaning the R values and p-values can’t be trusted.

9

What is the equation of the line corresponding to males? (Hint: For males, the parameter estimate is multiplied by 1.) For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score?

The slope indicates that, on average, males will have a .0172 higher score than females with the same beauty rank.

\(score = 3.74734 + 0.074(bty\_avg) + 0.172(gendermale) + 0(genderfemale)\)

summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07

10

Create a new model called m_bty_rank with gender removed and rank added in. How does R appear to handle categorical variables that have more than two levels? Note that the rank variable has three levels: teaching, tenure track, tenured.

R picks one level of the variable to be the baseline and compares the others against it. For example, since ‘teaching’ is missing that means it is the baseline and and professor that is tenured will have, on average, a -0.126 lower score.

m_bty_gen <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05

11

Which variable would you expect to have the highest p-value in this model? Why? Hint: Think about which variable would you expect to not have any association with the professor score.

Since we’re dealing with beauty, I would assume the ‘language’ variable would be the worst predictor.

12

Check your suspicions from the previous exercise. Include the model output in your response.

The variable with the highest p-value is cls_profs.

m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_profs + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)
summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

13

Interpret the coefficient associated with the ethnicity variable.

Ethnicity is a binary variable so R gives “not minority” a value of 1 and “minority” a value of 0. A prof that is not a minority has, on average, a score 0.12 higher that a prof who is a minority.

14

Drop the variable with the highest p-value and re-fit the model. Did the coefficients and significance of the other explanatory variables change? (One of the things that makes multiple regression interesting is that coefficient estimates depend on the other variables that are included in the model.) If not, what does this say about whether or not the dropped variable was collinear with the other explanatory variables?

The cls_profs variable was dropped but there wasn’t any significant change to the other variables. This means that is wasn’t collinear with the other variables.

m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)
summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7836 -0.3257  0.0859  0.3513  0.9551 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0872523  0.2888562  14.150  < 2e-16 ***
## ranktenure track      -0.1476746  0.0819824  -1.801 0.072327 .  
## ranktenured           -0.0973829  0.0662614  -1.470 0.142349    
## ethnicitynot minority  0.1274458  0.0772887   1.649 0.099856 .  
## gendermale             0.2101231  0.0516873   4.065 5.66e-05 ***
## languagenon-english   -0.2282894  0.1111305  -2.054 0.040530 *  
## age                   -0.0089992  0.0031326  -2.873 0.004262 ** 
## cls_perc_eval          0.0052888  0.0015317   3.453 0.000607 ***
## cls_students           0.0004687  0.0003737   1.254 0.210384    
## cls_levelupper         0.0606374  0.0575010   1.055 0.292200    
## cls_creditsone credit  0.5061196  0.1149163   4.404 1.33e-05 ***
## bty_avg                0.0398629  0.0174780   2.281 0.023032 *  
## pic_outfitnot formal  -0.1083227  0.0721711  -1.501 0.134080    
## pic_colorcolor        -0.2190527  0.0711469  -3.079 0.002205 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared:  0.187,  Adjusted R-squared:  0.1634 
## F-statistic: 7.943 on 13 and 449 DF,  p-value: 2.336e-14

15

Using backward-selection and p-value as the selection criterion, determine the best model. You do not need to show all steps in your answer, just the output for the final model. Also, write out the linear model for predicting score based on the final model you settle on

The current model seems to be the best model. The R value gradually declines with each variable that is removed.

\(score = 4.09 - .148(ranktenure track) - 0.1(ranktenured) + 0.127(ethnicitynot minority) + 0.21(gendermale) - 0.23(languagenon-english) - 0.009(age) + 0.00523(cls\_perc\_eval) + 0.005(cls\_students) + 0.06(cls\_levelupper) + 0.506(cls\_creditsone credit) + 0.04(bty\_avg) - 0.108(pic_outfitnot formal) - 0.219(pic\_colorcolor)\)

m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)

summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7836 -0.3257  0.0859  0.3513  0.9551 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0872523  0.2888562  14.150  < 2e-16 ***
## ranktenure track      -0.1476746  0.0819824  -1.801 0.072327 .  
## ranktenured           -0.0973829  0.0662614  -1.470 0.142349    
## ethnicitynot minority  0.1274458  0.0772887   1.649 0.099856 .  
## gendermale             0.2101231  0.0516873   4.065 5.66e-05 ***
## languagenon-english   -0.2282894  0.1111305  -2.054 0.040530 *  
## age                   -0.0089992  0.0031326  -2.873 0.004262 ** 
## cls_perc_eval          0.0052888  0.0015317   3.453 0.000607 ***
## cls_students           0.0004687  0.0003737   1.254 0.210384    
## cls_levelupper         0.0606374  0.0575010   1.055 0.292200    
## cls_creditsone credit  0.5061196  0.1149163   4.404 1.33e-05 ***
## bty_avg                0.0398629  0.0174780   2.281 0.023032 *  
## pic_outfitnot formal  -0.1083227  0.0721711  -1.501 0.134080    
## pic_colorcolor        -0.2190527  0.0711469  -3.079 0.002205 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared:  0.187,  Adjusted R-squared:  0.1634 
## F-statistic: 7.943 on 13 and 449 DF,  p-value: 2.336e-14

16

Verify that the conditions for this model are reasonable using diagnostic plots.

Residual scatterplot shows linearity, mostly constant variation and nearly normal residuals

plot(m_full$residuals ~ evals$bty_avg)
abline(h = 0, lty = 3)

Histogram shows a slight left skew but the distribution looks nearly normal around zero.

hist(m_full$residuals)

Q-Q plot shows a slight skew. The least squares line does not meet the conditions even though it’s closer than the previous models. Even if it did fit, the model does not fit the data very well due to the low R square value.

qqnorm(m_full$residuals)
qqline(m_full$residuals)

17

The original paper describes how these data were gathered by taking a sample of professors from the University of Texas at Austin and including all courses that they have taught. Considering that each row represents a course, could this new information have an impact on any of the conditions of linear regression?

It’s unlikely. It hard to imagine how the type of course would help the beauty rank predict a higher score. If the course was more interesting where the students focused on the teacher more, it might could give a professor a higher beauty ranking that could possibly lead to a higher score.

18

Based on your final model, describe the characteristics of a professor and course at University of Texas at Austin that would be associated with a high evaluation score.

Though the model is not a good predictor the characteristics I would describe would be a non-minority male, speaks the english language, that teaches one credit, that wears formal clothes in a black and white picture.

19

Would you be comfortable generalizing your conclusions to apply to professors generally (at any university)? Why or why not?

I would not be comfortable the R squared value of the linear model is small meaning the model does not describe the data very well.