There is no control group, so this is an observational study. The question could be reworded as: Is beauty correlated with higher professor evaluations?
ggplot(evals) + geom_histogram(aes(x = score), fill = "darkgreen", color = "black")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This distribution is heavily left skewed. This tells you that students give mostly perfect or near perfect scores. 3 does not mean an average course. This is inline with other ratings, such as restaraunt reviews. People do not think 4 means 1 standard deviation above the mean.
ggplot(evals) + geom_boxplot(aes(x = pic_outfit, y = bty_avg)) + labs(y= "Beauty Score", x = "Pic Type")
It looks as though the beauty score is higher on average with formal pictures, but the spread is much higher for not formal. If the scores were much higher for one group, bty_avg would be correlated with pic_outfit. There is probably not enough of difference to worry about pic_outfit being collinear with bty_avg.
plot(evals$score ~ evals$bty_avg)
plot(evals$score ~ jitter(evals$bty_avg))
The original plot had many points on top of each other because both dimensions are discrete.
m_bty <- lm(score ~ bty_avg, data = evals)
summary(m_bty)
##
## Call:
## lm(formula = score ~ bty_avg, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
\(\hat { y } =3.88+.0667*bty\)
Beauty score is stastically signficant, with a t value over 4. However, R squared is only .03, so it doesn’t explain very much of the variation in score. Additionally, each point increase in beauty corresponds to only a .0667 increase in score. This means that if professor went from a 2 to an 8 in beaty avg, they would be only expected to increase .36 points in eval score. This isn’t irrelevant, but it’s not groundbreaking either.
df <- data.frame(m_bty$model, residuals = m_bty$residuals, predicted = m_bty$fitted.values, residuals_abs = abs(m_bty$residuals))
ggplot(df) + geom_point( aes(x = predicted, y = residuals)) + geom_hline(yintercept = 0)
qqnorm(m_bty$residuals)
qqline(m_bty$residuals)
ggplot(df) + geom_point( aes(x = bty_avg, y = residuals_abs))
ggplot(m_bty) + geom_histogram(aes( x = m_bty$residuals), fill = "darkblue", color = "black")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Based on the scatter from Excercise 4, it appears the linearity assumption holds or at least isn’t obviously violated. The points are scattered, but a non-linear trend isn’t obvious. The normality of residuals assumption does not hold. The distribution is skewed to the left. This can be seen in all three plots. The variance appears to be constant, with the spread about the same over all bty_avg scores and fitted values. This is not time series data, though the same professor provides multiple data points, so there is some dependance in observations.
m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
df_gen <- data.frame(m_bty_gen$model, residuals = m_bty_gen$residuals, predicted = m_bty_gen$fitted.values, residuals_abs = abs(m_bty$residuals))
ggplot(df_gen) + geom_point( aes(x = bty_avg, y = residuals))
ggplot(df_gen) + geom_point( aes(x = predicted, y = residuals)) + geom_hline(yintercept = 0)
qqnorm(m_bty_gen$residuals)
qqline(m_bty_gen$residuals)
ggplot(df_gen) + geom_boxplot(aes(y = residuals, x = gender))
ggplot(df_gen) + geom_histogram( aes( x = residuals), fill = "darkblue", color = "black") + facet_wrap(~gender)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
It appears that the normality of residuals assumption is still violated when gender is added. This is more prounounced for males. There is strong left skew for males. The spread of residuals is at least about the same for males and females, indicating constant variance still holds. The p values for the parameter estimates should be not be entirely trusted because of the non-normal residuals. This violation is not strong enough to disregard the results.
The parameter estimate for bty_avg has changed by almost .01 from the one variable model. It is still signficant, with a p value near 0. The p value shouldn’t be entirely trusted given what we know about the model assumptions.
\(\hat { y } =3.747+.0746*bty+.17239*genmale\)
Since the parameter estimate is positive for genmale, this means that a male professor would have a higher expected score with all else equal.
m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
With a 3 level categorical, it gives us two additional parameters. It appearns R creates n - 1 paraemters where n is the number of levels.
I would expect class credits not to have any correlation to professor score. It is in no way reflective of the professor. It should therefore have the lowest p value.
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_profs + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77397 -0.32432 0.09067 0.35183 0.95036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0952141 0.2905277 14.096 < 2e-16 ***
## ranktenure track -0.1475932 0.0820671 -1.798 0.07278 .
## ranktenured -0.0973378 0.0663296 -1.467 0.14295
## ethnicitynot minority 0.1234929 0.0786273 1.571 0.11698
## gendermale 0.2109481 0.0518230 4.071 5.54e-05 ***
## languagenon-english -0.2298112 0.1113754 -2.063 0.03965 *
## age -0.0090072 0.0031359 -2.872 0.00427 **
## cls_perc_eval 0.0053272 0.0015393 3.461 0.00059 ***
## cls_students 0.0004546 0.0003774 1.205 0.22896
## cls_levelupper 0.0605140 0.0575617 1.051 0.29369
## cls_profssingle -0.0146619 0.0519885 -0.282 0.77806
## cls_creditsone credit 0.5020432 0.1159388 4.330 1.84e-05 ***
## bty_avg 0.0400333 0.0175064 2.287 0.02267 *
## pic_outfitnot formal -0.1126817 0.0738800 -1.525 0.12792
## pic_colorcolor -0.2172630 0.0715021 -3.039 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1617
## F-statistic: 7.366 on 14 and 448 DF, p-value: 6.552e-14
It turns out that the cls_credits variable was signficant. Apparently students like their one credit classes. Maybe my veiwpoint is skewed from having physics lab be my only one crediter. The cls_pfofs variable had by far the highest p value. Apparently students are indifferent to the number of professors.
The students rate the professor .12 points higher on average, all else equal, when the professor is not a minority.
m_2 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_2)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7836 -0.3257 0.0859 0.3513 0.9551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0872523 0.2888562 14.150 < 2e-16 ***
## ranktenure track -0.1476746 0.0819824 -1.801 0.072327 .
## ranktenured -0.0973829 0.0662614 -1.470 0.142349
## ethnicitynot minority 0.1274458 0.0772887 1.649 0.099856 .
## gendermale 0.2101231 0.0516873 4.065 5.66e-05 ***
## languagenon-english -0.2282894 0.1111305 -2.054 0.040530 *
## age -0.0089992 0.0031326 -2.873 0.004262 **
## cls_perc_eval 0.0052888 0.0015317 3.453 0.000607 ***
## cls_students 0.0004687 0.0003737 1.254 0.210384
## cls_levelupper 0.0606374 0.0575010 1.055 0.292200
## cls_creditsone credit 0.5061196 0.1149163 4.404 1.33e-05 ***
## bty_avg 0.0398629 0.0174780 2.281 0.023032 *
## pic_outfitnot formal -0.1083227 0.0721711 -1.501 0.134080
## pic_colorcolor -0.2190527 0.0711469 -3.079 0.002205 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared: 0.187, Adjusted R-squared: 0.1634
## F-statistic: 7.943 on 13 and 449 DF, p-value: 2.336e-14
The parameter estimates remained almost identical. Only bty_avg and pic_outfit changed slightly. This means that the cls_profs was independant of the other explanitory variables.
m_3 <-lm(score ~ ethnicity + gender + language + age + cls_perc_eval
+ cls_credits + bty_avg
+ pic_color, data = evals)
summary(m_3)
##
## Call:
## lm(formula = score ~ ethnicity + gender + language + age + cls_perc_eval +
## cls_credits + bty_avg + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85320 -0.32394 0.09984 0.37930 0.93610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.771922 0.232053 16.255 < 2e-16 ***
## ethnicitynot minority 0.167872 0.075275 2.230 0.02623 *
## gendermale 0.207112 0.050135 4.131 4.30e-05 ***
## languagenon-english -0.206178 0.103639 -1.989 0.04726 *
## age -0.006046 0.002612 -2.315 0.02108 *
## cls_perc_eval 0.004656 0.001435 3.244 0.00127 **
## cls_creditsone credit 0.505306 0.104119 4.853 1.67e-06 ***
## bty_avg 0.051069 0.016934 3.016 0.00271 **
## pic_colorcolor -0.190579 0.067351 -2.830 0.00487 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4992 on 454 degrees of freedom
## Multiple R-squared: 0.1722, Adjusted R-squared: 0.1576
## F-statistic: 11.8 on 8 and 454 DF, p-value: 2.58e-15
\(\hat { y } =3.771+.1679*ethnicity+.2071*gender-.2062*language-.006*age+.0047*cls\_ perc+.5053*cls\_ credit+.051*bty\_ avg-.1906*pic\_ color\)
df_m3 <- data.frame(m_3$model, residuals = m_bty_gen$residuals, predicted = m_bty_gen$fitted.values, residuals_abs = abs(m_bty$residuals))
ggplot(df_m3) + geom_point(aes(x = age, y = score))
ggplot(df_m3) + geom_point(aes(x = cls_perc_eval, y = score))
ggplot(df_m3) + geom_point(aes(x = predicted, y = residuals))
qqnorm(df_m3$residuals)
qqline(df_m3$residuals)
ggplot(df_m3) + geom_point(aes(x = bty_avg, y = residuals_abs))
ggplot(df_m3) + geom_point(aes(x = age, y = residuals_abs))
ggplot(df_m3) + geom_point(aes(x = cls_perc_eval, y = residuals_abs))
ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = ethnicity))
ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = language))
ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = cls_credits))
ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = pic_color))
ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = gender))
This was mentioned earlier, but it does violate the independance of observations assumption. This is probably a necesary cost to gather enough data points and doesn’t totally invalidate the model. However, it would be interesting to randomly choose 1 class from each professor and rerun the model.
It appears that ethnicity of not a minority, gender of male, language of english, class credits of one credit, bty_avg, and black and white profile picture are the most signficant factors in a high evaluation score at UT austin. Age and cl_perc_eval were statistically signficant, but not practically signficant, given the parameter estimates were under .001.
3 of the model variables are related to demographics and probably depend on the demographics of the student body. These variables likely wouldn’t generalize well to other universities where the student body looks different. The other variables would generalize well, so I would rerun the model fit without the demographic variables before generalizing.