library(tidyverse)
library(openintro)
load("evals.RData")This is an observational study, because there is no controlling of variables, etc. To rephrase the question they posed:
“Is there a relationship between beauty and course evaluations at UT Austin?”
The data from evals$score is highly skewed right. This tells me that students tend to rate their courses clustering along the high side of 1-5, and because there is no higher value, the data ends up skewed right.
hist(evals$score)The distributions of beauty average for professors of non-minority ethnicities has a similar sample mean to those of minority ethnicities. There does not seem to be any obvious relationship between whether a professor being an ethnic minority impacts their rated beauty average.
plot(evals$bty_avg ~ evals$ethnicity)Jitter adds noise to variables in order to break ties. I’m not absolutely sure, but in the original graph the data seemed fairly uniformly distributed, but after using the jitter function, there appears to be a slight upward trend? This might not be what OpenIntro wanted me to notice.
plot(evals$score ~ evals$bty_avg)plot(jitter(evals$score) ~ evals$bty_avg)equation of the linear model:
score_hat = 0.067*bty_avg + 3.880
Average beauty score is a statistically significant predictor of scores as given by the p-value 5.03e-05, but it isn’t a practically useful predictor on its own because the R^2 value is 0.035, meaning 96.5% of the variability in scores is not explained by this model (its a pretty bad model).
m_bty <- lm(score ~ bty_avg, data = evals)
plot(evals$score ~ evals$bty_avg)
abline(m_bty)summary(m_bty)##
## Call:
## lm(formula = score ~ bty_avg, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
The residuals vs fitted plot seems to have a horizontal line roughly near 0, so I can accept this diagnostic plot as showing no problems. The Q-Q plot of the residuals seems to follow the QQline fairly well until the upper end, so I decided to do a shapiro test which returned a significant p-value, meaning at this point I would stop with the model, but this lab is all about doing it so I won’t. The scale-location graph appears to have a roughly horizontal line, so I can accept that the residuals are roughly evenly spread. The residuals vs. leverage plot shows a couple observations that are very influential, so it might be worthwhile to see if there was something special about these observations and if it would make more sense to remove them from the modeling.
plot(m_bty)shapiro.test(m_bty$residuals)##
## Shapiro-Wilk normality test
##
## data: m_bty$residuals
## W = 0.95491, p-value = 1.089e-10
The residuals are not normally distributed as checked using a shapiro.test, so at this point, again, I would stop, but I won’t because its the whole point of the lab.
plot(evals$bty_avg ~ evals$bty_f1lower)cor(evals$bty_avg, evals$bty_f1lower)## [1] 0.8439112
plot(evals[,13:19])m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
plot(m_bty_gen)shapiro.test(m_bty_gen$residuals)##
## Shapiro-Wilk normality test
##
## data: m_bty_gen$residuals
## W = 0.95803, p-value = 3.285e-10
bty_avg is still a significant predictor of score, and the addition of gender has increased the parameter “Estimate” of bty_avg.
Ok now this graph is so pretty. However what it seems to say is sadly not so pretty because it says that male professors tend to be scored higher than female professors.
The equation of the line corresponding to males is given by:
score_hat = 0.074*bty_avg + (3.747+0.172)
multiLines(m_bty_gen)For a categorical variable with more than two levels R seems to split it up into one reference frame embedded within the intercept and levels-1 dummy variables. This is actually super convenient I love it.
m_bty_gen <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_gen)##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
I would expect rank to have the least association with professor score and thereby the highest p-value. I just don’t think that students would know or care if their professor was tenured or tenure-tracked.
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_profs + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full)##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77397 -0.32432 0.09067 0.35183 0.95036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0952141 0.2905277 14.096 < 2e-16 ***
## ranktenure track -0.1475932 0.0820671 -1.798 0.07278 .
## ranktenured -0.0973378 0.0663296 -1.467 0.14295
## ethnicitynot minority 0.1234929 0.0786273 1.571 0.11698
## gendermale 0.2109481 0.0518230 4.071 5.54e-05 ***
## languagenon-english -0.2298112 0.1113754 -2.063 0.03965 *
## age -0.0090072 0.0031359 -2.872 0.00427 **
## cls_perc_eval 0.0053272 0.0015393 3.461 0.00059 ***
## cls_students 0.0004546 0.0003774 1.205 0.22896
## cls_levelupper 0.0605140 0.0575617 1.051 0.29369
## cls_profssingle -0.0146619 0.0519885 -0.282 0.77806
## cls_creditsone credit 0.5020432 0.1159388 4.330 1.84e-05 ***
## bty_avg 0.0400333 0.0175064 2.287 0.02267 *
## pic_outfitnot formal -0.1126817 0.0738800 -1.525 0.12792
## pic_colorcolor -0.2172630 0.0715021 -3.039 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1617
## F-statistic: 7.366 on 14 and 448 DF, p-value: 6.552e-14
formula is given b:
rank_hat = 4.095 - 0.148xranktenure_track - 0.097xranktenured etc. etc.
My suspicions were wrong, because apparently cls_profssingle is the most useless variable in the model because it has a p-value of 0.778.
The coefficient of the ethnicity variable can be interpreted as follows:
All else held constant, if a professor belongs to a non-minority ethnic group, their score tends to increase by 0.123.
The values of the parameters in “Estimate” barely changed by removing the cls_profs variable, so it was likely collinear with the other variables
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full)##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7836 -0.3257 0.0859 0.3513 0.9551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0872523 0.2888562 14.150 < 2e-16 ***
## ranktenure track -0.1476746 0.0819824 -1.801 0.072327 .
## ranktenured -0.0973829 0.0662614 -1.470 0.142349
## ethnicitynot minority 0.1274458 0.0772887 1.649 0.099856 .
## gendermale 0.2101231 0.0516873 4.065 5.66e-05 ***
## languagenon-english -0.2282894 0.1111305 -2.054 0.040530 *
## age -0.0089992 0.0031326 -2.873 0.004262 **
## cls_perc_eval 0.0052888 0.0015317 3.453 0.000607 ***
## cls_students 0.0004687 0.0003737 1.254 0.210384
## cls_levelupper 0.0606374 0.0575010 1.055 0.292200
## cls_creditsone credit 0.5061196 0.1149163 4.404 1.33e-05 ***
## bty_avg 0.0398629 0.0174780 2.281 0.023032 *
## pic_outfitnot formal -0.1083227 0.0721711 -1.501 0.134080
## pic_colorcolor -0.2190527 0.0711469 -3.079 0.002205 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared: 0.187, Adjusted R-squared: 0.1634
## F-statistic: 7.943 on 13 and 449 DF, p-value: 2.336e-14
score_hat = 3.772 + 0.207x(gender) + 0.168x(ethnicity) - 0.206x(language) - 00.6x(age) + 0.005x(cls_perc_eval) + 0.505x(cls_credits) + 0.051x(bty_avg) - 0.191x(pic_color)
best_model <- lm(score ~ gender + ethnicity + language + age + cls_perc_eval
+ cls_credits + bty_avg
+ pic_color, data = evals)
summary(best_model)##
## Call:
## lm(formula = score ~ gender + ethnicity + language + age + cls_perc_eval +
## cls_credits + bty_avg + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85320 -0.32394 0.09984 0.37930 0.93610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.771922 0.232053 16.255 < 2e-16 ***
## gendermale 0.207112 0.050135 4.131 4.30e-05 ***
## ethnicitynot minority 0.167872 0.075275 2.230 0.02623 *
## languagenon-english -0.206178 0.103639 -1.989 0.04726 *
## age -0.006046 0.002612 -2.315 0.02108 *
## cls_perc_eval 0.004656 0.001435 3.244 0.00127 **
## cls_creditsone credit 0.505306 0.104119 4.853 1.67e-06 ***
## bty_avg 0.051069 0.016934 3.016 0.00271 **
## pic_colorcolor -0.190579 0.067351 -2.830 0.00487 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4992 on 454 degrees of freedom
## Multiple R-squared: 0.1722, Adjusted R-squared: 0.1576
## F-statistic: 11.8 on 8 and 454 DF, p-value: 2.58e-15
The residuals are not normally distributed, so once again the Q-Q diagnostic plot tells us that linear modeling is likely not the best choice for this situation.
plot(best_model)shapiro.test(best_model$residuals)##
## Shapiro-Wilk normality test
##
## data: best_model$residuals
## W = 0.96414, p-value = 3.332e-09
This method of sampling is a type of hierarchical sampling, which influences the true degrees of freedom by violating the principle of independence between observations. In multiple ways this could cause a huge distruption in the modeling.
Based on the final model, the characteristics of a professor and course associated with a high evaluation score are the following:
Male, non-minority ethnicity, English speaking, young, evaluated by many students, only one credit in the course, high beauty score, having a picture in black and white.
I would not be comfortable generalizing these conclusions to professors in general, because this was just an observational study, and the sampled population is not necessarily representative of the population at any university.