Data 606 Lab 8

load("more/evals.RData")

This is an observational study. An alternative question could be “Is there a statistical significant relationship between a professor’s beauty and a teaching evaluation score of that professor?”

plot(evals$score)

boxplot(evals$score)

‘Score’ is skewed. It’s not too surprising that the scores tend to be high. People tend to want to be kind when evaluating others. While some universities put a lot of emphasis on research and may not choose their professors for their pedagogy, professors are expected to know their subject matter well and be competent.

Looking at side-by-side boxplots for beauty scores of professors of lower level courses, it appears the female professor is seen as more beautiful.

There appears to be a relationship between beauty and rating, but it appears to have a low correlation.

plot(evals$score ~ jitter(evals$bty_avg),xlab="beauty",ylab="score")

After applying a jitter function, we can see places where the density of points is greater.

m_bty<- lm(evals$score ~ evals$bty_avg)
summary(m_bty)
## 
## Call:
## lm(formula = evals$score ~ evals$bty_avg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.88034    0.07614   50.96  < 2e-16 ***
## evals$bty_avg  0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05
plot(evals$score ~ jitter(evals$bty_avg),xlab="beauty",ylab="score")
abline(m_bty)

The model for scores~beauty is \(\tiny score=3.88034 + .0664(beauty\ score)\). It is statistically significant, but is not practical for predicting a score. That really isn’t the point of this model. The researchers were wondering if a relationship exists.

plot(residuals(m_bty))

boxplot(residuals(m_bty))

Our residuals are variance stable, but not normal. Since the mean is close to one end, it’s not possible to be normal. Our conditions for regression are not quite met. The residuals appear to be fairly random. With 463 observations, the fact that the data is skewed does not necessarily prevent a t-test from being accepted. There does not appear to be any autocorrelation. We should use more tests to see if the residuals are random enough to accept our regression results.

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07
plot.new()
plot(m_bty_gen$residuals)

hist(m_bty_gen$residuals)

qqnorm(m_bty_gen$residuals)
qqline(m_bty_gen$residuals)

Our residuals from our new model are similar to the previous model. They appear to be random. They don’t appear to be autocorrelated. They appear to be variance - stable. However, they are not normal.

Average beauty score is still significant at a .01 level. The R2 for the new model is up slightly. The f-score has fallen, but is still significant.

multiLines(m_bty_gen)

The model for the score for a male professor is score=3.74734 +.07416(beauty score) +.17239. This is a higher score than for a female professor.

m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05

R turns a categorical variable with 3 levels to 2 variables. Categorical variables are always handled this way with dummy variables. Having a variable for each option would introduce unnecessary multicolinearity and introduce an unneccesary variable. The interpretation is that the one not listed would be the default assumption..

I would expect that the proportion of students filling out a form is the least likely variable to have any significance.

m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_profs + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)
summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

My suspicions were incorrect. Apparently, students that are in a class that turn in more evaluations rate their professors higher.

The ethnicity variable is not statistically significant. It adds .1235 to the model, but is not significant.

m_minus_number <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)
summary(m_minus_number)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7836 -0.3257  0.0859  0.3513  0.9551 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0872523  0.2888562  14.150  < 2e-16 ***
## ranktenure track      -0.1476746  0.0819824  -1.801 0.072327 .  
## ranktenured           -0.0973829  0.0662614  -1.470 0.142349    
## ethnicitynot minority  0.1274458  0.0772887   1.649 0.099856 .  
## gendermale             0.2101231  0.0516873   4.065 5.66e-05 ***
## languagenon-english   -0.2282894  0.1111305  -2.054 0.040530 *  
## age                   -0.0089992  0.0031326  -2.873 0.004262 ** 
## cls_perc_eval          0.0052888  0.0015317   3.453 0.000607 ***
## cls_students           0.0004687  0.0003737   1.254 0.210384    
## cls_levelupper         0.0606374  0.0575010   1.055 0.292200    
## cls_creditsone credit  0.5061196  0.1149163   4.404 1.33e-05 ***
## bty_avg                0.0398629  0.0174780   2.281 0.023032 *  
## pic_outfitnot formal  -0.1083227  0.0721711  -1.501 0.134080    
## pic_colorcolor        -0.2190527  0.0711469  -3.079 0.002205 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared:  0.187,  Adjusted R-squared:  0.1634 
## F-statistic: 7.943 on 13 and 449 DF,  p-value: 2.336e-14

The variable that we dropped, number of professors in the course, was collinear with some of the variables. Many of the t-values changed, but only by a small amount.

m_backward_sel <- lm(score ~ethnicity + gender  + age + cls_perc_eval 
              + cls_credits + bty_avg 
              + pic_color, data = evals)
summary(m_backward_sel)
## 
## Call:
## lm(formula = score ~ ethnicity + gender + age + cls_perc_eval + 
##     cls_credits + bty_avg + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85434 -0.33568  0.09247  0.38288  0.93903 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.690771   0.229181  16.104  < 2e-16 ***
## ethnicitynot minority  0.216955   0.071348   3.041  0.00250 ** 
## gendermale             0.201574   0.050220   4.014 6.99e-05 ***
## age                   -0.006034   0.002621  -2.302  0.02176 *  
## cls_perc_eval          0.004719   0.001439   3.278  0.00113 ** 
## cls_creditsone credit  0.527806   0.103839   5.083 5.44e-07 ***
## bty_avg                0.052431   0.016975   3.089  0.00213 ** 
## pic_colorcolor        -0.170149   0.066780  -2.548  0.01116 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5008 on 455 degrees of freedom
## Multiple R-squared:  0.1649, Adjusted R-squared:  0.1521 
## F-statistic: 12.84 on 7 and 455 DF,  p-value: 4.344e-15

Our final model is score= 3.69 + .217 (not minority?) + .202 (male?) -.006 (age) + .0047 (percent_eval) + .528(credits) + .052(beauty)- .1701(pic_color) When we eliminate more variables, the F-statistic and R2 are not as good. All of the remaining factors are statistically significant at a .05 level.

plot.new()
plot(m_backward_sel$residuals)

hist(m_backward_sel$residuals)

qqnorm(m_backward_sel$residuals)
qqline(m_backward_sel$residuals)

Our residuals look similar to those in other models from the same dataset. They are not normal, but look otherwise random and variance-stable.

The original data lost some of its independence when it included multiple classes for the same professor. With random selection, one would expect some to be taught by the same professor. The amount it affects the data would depend how much duplication was present.

Based on our final model, a young, white, handsome male teaching a one-credit class would receive a high recommendation.

Our model has intriguing suggestions. It is consonant with beauty messages seen in other studies and in our media. It should give schools pause when enacting rating systems if they are tied to real consequences. However, it is not a statistically sound inference to assume it proves a relationship would be present at other universities.