download.file("http://www.openintro.org/stat/data/evals.RData", destfile = "evals.RData")
load("evals.RData")
head(evals)
## score rank ethnicity gender language age cls_perc_eval
## 1 4.7 tenure track minority female english 36 55.81395
## 2 4.1 tenure track minority female english 36 68.80000
## 3 3.9 tenure track minority female english 36 60.80000
## 4 4.8 tenure track minority female english 36 62.60163
## 5 4.6 tenured not minority male english 59 85.00000
## 6 4.3 tenured not minority male english 59 87.50000
## cls_did_eval cls_students cls_level cls_profs cls_credits bty_f1lower
## 1 24 43 upper single multi credit 5
## 2 86 125 upper single multi credit 5
## 3 76 125 upper single multi credit 5
## 4 77 123 upper single multi credit 5
## 5 17 20 upper multiple multi credit 4
## 6 35 40 upper multiple multi credit 4
## bty_f1upper bty_f2upper bty_m1lower bty_m1upper bty_m2upper bty_avg
## 1 7 6 2 4 6 5
## 2 7 6 2 4 6 5
## 3 7 6 2 4 6 5
## 4 7 6 2 4 6 5
## 5 4 2 2 3 3 3
## 6 4 2 2 3 3 3
## pic_outfit pic_color
## 1 not formal color
## 2 not formal color
## 3 not formal color
## 4 not formal color
## 5 not formal color
## 6 not formal color
This is an observational study. Causality can’t be shown, so this study couldn’t claim that beauty leads directly to differences in course evaluations. There are many factors that could affect a student’s evaluation of the course that are not included in the data. The study could show if there is an association between the rating of a professor’s beauty and the evaluation score of the course.
The distribution of the course scores is left skewed with low outliers. I didn’t have an expectation about the distribution. I think the distribution is left skewed, rather than normally distributed, because it’s human nature to give a good score to someone you’ve had a relationship with for several months.
boxplot(evals$score, col = 'plum')
hist(evals$score, col = 'mediumorchid')
There is a moderate negative correlation between age and beauty. The average beauty score decreases as the age of a professor increases.
m1 <- lm(bty_avg ~ age, data = evals)
summary(m1)
##
## Call:
## lm(formula = bty_avg ~ age, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4326 -1.1254 -0.1971 0.6859 3.8724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.713283 0.341083 19.682 < 2e-16 ***
## age -0.047461 0.006912 -6.866 2.14e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.456 on 461 degrees of freedom
## Multiple R-squared: 0.09278, Adjusted R-squared: 0.09082
## F-statistic: 47.15 on 1 and 461 DF, p-value: 2.137e-11
plot(evals$age, evals$bty_avg, main = "Age v. Beauty",
ylab = "Beauty (avg)", xlab = "Age",
pch = 23, cex=2, col="purple", bg="plum", lwd=2)
abline(m1, col = "magenta4", lwd=2.5)
plot(evals$score ~ evals$bty_avg, pch=10, col="mediumorchid4",
main = "Score x Beauty", ylab = "score", xlab = " avg. beauty")
The number of observations in the data set is 463. There are duplicate (x, y)-tuples in the data, duplicate data points aren’t visible on the plot.
plot(jitter(evals$score) ~ jitter(evals$bty_avg),
pch=10, col=c("mediumorchid4","mediumvioletred", "pink3"), #using several colors for distinctivity
main = "Score x Beauty", ylab = "score", xlab = " avg. beauty")
There’s a small positive correlation between average beauty and score, the slope of the regression line is 0.067. The p-value for bty_avg is small, it’s a statistically significant predictor. The adjusted R2 value = 0.033 is small. The SE = 0.535 is large, this isn’t a very good prediction model.
m_bty <- lm(score ~ bty_avg, data = evals)
s1 <- summary(m_bty)
s1
##
## Call:
## lm(formula = score ~ bty_avg, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
plot(jitter(evals$score) ~ jitter(evals$bty_avg),
pch=10, col=c("mediumorchid4","mediumvioletred", "pink3"),
main = "Score x Beauty", ylab = "score", xlab = " avg. beauty")
abline(m_bty, col = "magenta1", lwd=2.5)
The residuals are normally distributed with a bit of left skewing. The residuals v. fitted plot doesn’t indicate any non-linearity. The scale location plot has a horizontal line with randomly spread points indicating good homoscedasticity, and there are no influencing outliers.
plot(m_bty$residuals ~ evals$bty_avg, col='plum', pch = 19, ylab = "residuals", xlab = "avg. beauty")
abline(h = 0, lty = 3, col='midnightblue')
lines(stats::lowess(m_bty$residuals ~ evals$bty_avg), col='firebrick', lwd=2, lty=5)
par(mfrow=c(1,2))
hist(m_bty$residuals, col='violetred', main = 'score by avg. beauty residuals', xlab = 'residuals')
qqnorm(m_bty$residuals, pch = 8, col = 'purple3')
qqline(m_bty$residuals, col = 'mediumblue', lwd=2) # adds diagonal line to the normal prob plot
par(mfrow=c(2,2))
plot(m_bty, col= 'blueviolet', lwd = 2)
## Multiple linear regression
plot(evals$bty_avg ~ evals$bty_f1lower, main='Avg. Beauty to Beauty score by female lower lvl student',
ylab='Avg. Beauty', xlab='Score from female LL student',
bg='violet', pch=23)
c <- cor(evals$bty_avg, evals$bty_f1lower)
Correlation = 0.8439112
plot(evals[,13:19], col='darkorchid3')
### Exercise 7 - Beauty and Gender
The diagnostic plots look a little bit better with gender added in the model. The conditions for linear regression are still met.
m_bg <- lm(score ~ bty_avg + gender, data = evals)
par(mfrow=c(2,2))
plot(m_bg, col= 'darkviolet', lwd = 2)
The p-value for bty_avg is smaller than the model with bty_avg alone. Adding gender has increased the R2 value to 0.055, which is still not a very good prediction model.
multiLinesP <- function(model, ...){
if(class(model)!="lm"){
warning("Model must be the output of the function lm()")
}
if(length(model$xlevels)!=1){
warning("Model must contain exactly one categorical predictor")
}
if(length(model$coef)-length(model$xlevels[[1]])!=1){
warning("Model must contain exactly one non-categorical predictor")
}
palette <- c("#68228B", "#56B4E9", "#800000", "#009E73", "#CC79A7", "#F0E442", "#0072B2")
baseIntercept <- model$coef[1]
nLines <- length(model$xlevels[[1]])
intercepts <- c(baseIntercept, rep(0, nLines-1))
indicatorInd <- c(1, rep(0, nLines)) # used to find slope parameter by process of elimination
for(i in 1:(nLines-1)){
indicatorName <- paste(names(model$contrasts),model$xlevels[[1]][1+i], sep = "")
intercepts[i+1] <- baseIntercept + model$coef[names(model$coef)==indicatorName]
indicatorInd <- indicatorInd + (names(model$coef)==indicatorName)
}
slope <- model$coef[!indicatorInd]
num_pred = which(names(model$model[,-1]) != names(model$xlevels)) + 1
cat_pred = which(names(model$model[,-1]) == names(model$xlevels)) + 1
model$model$COL = NA
model$model$PCH = NA
for(i in 1:nLines){
model$model$COL[model$model[,cat_pred] == levels(model$model[,cat_pred])[i]] = adjustcolor(palette[i],0.40)
model$model$PCH[model$model[,cat_pred] == levels(model$model[,cat_pred])[i]] = i+14
}
plot(model$model[,1] ~ jitter(model$model[,num_pred]), col = model$model$COL, pch = model$model$PCH,
ylab = names(model$model)[1],
xlab = names(model$model)[num_pred])
for(j in 1:nLines){
abline(intercepts[j], slope, col = palette[j], lwd = 2, ...)
}
if(slope > 0){legend_pos = "bottomright"}
if(slope < 0){legend_pos = "topleft"}
legend(legend_pos, col = palette[1:nLines], lty = 1, legend = levels(model$model[,cat_pred]), lwd = 2)
}
multiLinesP(m_bg) #customized multiLines function palette with purples
score = 3.737 + 0.0742(bty_avg) + 0.172(gendermale), men will have higher course evaluation scores
s2 <- summary(m_bg)
s2
##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
The lowest value of rank = teaching is used for the y-intercept. The other values of rank are treated as binary variables, one or neither can be true for each observation. So the formula for score is:
score = 3.98 + 0.068(bty_avg) - 0.162(ranktenure track) - 0.126(ranktenured)
So tenure track professors get lower scores, maybe they’re spending more time publishing in order to get their tenure.
m_br <- lm(score ~ bty_avg + rank, data = evals)
s3 <- summary(m_br)
s3
##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
I’d expect the number of students in the class to have a high p-value. The scores are calculated from the students who participated in the survey.
m1 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_profs + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
s1 <- summary(m1)
The highest p-value is 0.7780566 for number of professors teaching sections in course. Number of students does have a p-value = 0.2289607
The p-value for ethnicity is 0.1169791, this is still fairly large. In this model, holding all other variables constant a non-minority professor would receive a score 0.123 points higher than a minority professor.
Removing the cls_profs variable from the model changes the values of the coefficients and p-values for the remaining variables.
m2 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
s2 <- summary(m2)
t1 <- tibble(name = labels(s1$coefficients[,'Estimate']), coef_m1 = s1$coefficients[,'Estimate'],
pval_m1 = s1$coefficients[,'Pr(>|t|)']) %>%
filter(name != 'cls_profssingle') %>%
mutate(coef_m2 = s2$coefficients[,1], diff_c = coef_m1 - coef_m2,
pval_m2 = s2$coefficients[,'Pr(>|t|)'], diff_pv = pval_m1 - pval_m2) %>%
relocate(pval_m1, .after = diff_c)
kable(t1, table.attr = "style = \"color: indigo;\"") %>% kable_minimal()
| name | coef_m1 | coef_m2 | diff_c | pval_m1 | pval_m2 | diff_pv |
|---|---|---|---|---|---|---|
| (Intercept) | 4.0952141 | 4.0872523 | 0.0079618 | 0.0000000 | 0.0000000 | 0.0000000 |
| ranktenure track | -0.1475932 | -0.1476746 | 0.0000813 | 0.0727793 | 0.0723271 | 0.0004523 |
| ranktenured | -0.0973378 | -0.0973829 | 0.0000451 | 0.1429455 | 0.1423493 | 0.0005962 |
| ethnicitynot minority | 0.1234929 | 0.1274458 | -0.0039528 | 0.1169791 | 0.0998556 | 0.0171234 |
| gendermale | 0.2109481 | 0.2101231 | 0.0008250 | 0.0000554 | 0.0000566 | -0.0000012 |
| languagenon-english | -0.2298112 | -0.2282894 | -0.0015217 | 0.0396509 | 0.0405303 | -0.0008794 |
| age | -0.0090072 | -0.0089992 | -0.0000080 | 0.0042688 | 0.0042616 | 0.0000072 |
| cls_perc_eval | 0.0053272 | 0.0052888 | 0.0000385 | 0.0005903 | 0.0006072 | -0.0000169 |
| cls_students | 0.0004546 | 0.0004687 | -0.0000141 | 0.2289607 | 0.2103843 | 0.0185764 |
| cls_levelupper | 0.0605140 | 0.0606374 | -0.0001235 | 0.2936925 | 0.2922000 | 0.0014925 |
| cls_creditsone credit | 0.5020432 | 0.5061196 | -0.0040764 | 0.0000184 | 0.0000133 | 0.0000051 |
| bty_avg | 0.0400333 | 0.0398629 | 0.0001704 | 0.0226744 | 0.0230315 | -0.0003571 |
| pic_outfitnot formal | -0.1126817 | -0.1083227 | -0.0043589 | 0.1279153 | 0.1340803 | -0.0061650 |
| pic_colorcolor | -0.2172630 | -0.2190527 | 0.0017897 | 0.0025162 | 0.0022052 | 0.0003110 |
The best model is m2, all variables except cls_profs with an adjusted r2 = 0.163. With only 16.3% of the variability explained by the model, it’s still not great.
# remove cls_level with p_value 0.292
s3 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals) %>%
summary()
# remove cls_level with p_value 0.319
s4 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_credits + bty_avg + pic_outfit + pic_color, data = evals) %>%
summary()
# remove rank with p_value 0.205
s5 <- lm(score ~ ethnicity + gender + language + age + cls_perc_eval
+ cls_credits + bty_avg + pic_outfit + pic_color, data = evals) %>%
summary()
# remove pic_outfit with p_value 0.0905
s6 <- lm(score ~ ethnicity + gender + language + age + cls_perc_eval
+ cls_credits + bty_avg + pic_color, data = evals) %>%
summary()
# The remaining redux models got smaller adj.r.squared values
models <- tibble(name = c('m1', 'm2', 'm3', 'm4', 'm5', 'm6'),
rsquared = c(s1$adj.r.squared, s2$adj.r.squared, s3$adj.r.squared, s4$adj.r.squared,
s5$adj.r.squared, s6$adj.r.squared),
SE = c(s1$sigma, s2$sigma, s3$sigma, s4$sigma, s5$sigma, s6$sigma),
f_statistic = c(s1$fstatistic[1], s2$fstatistic[1], s3$fstatistic[1], s4$fstatistic[1],
s5$fstatistic[1], s6$fstatistic[1]),
bty.p_value = c(s1$coefficients[2,4], s2$coefficients[2,4], s3$coefficients[2,4],
s4$coefficients[2,4], s5$coefficients[2,4], s6$coefficients[2,4])) #p-value for bty-avg
models <- arrange(models, desc(rsquared))
models$name <- as.character(models$name)
kable(models, table.attr = "style = \"color: indigo;\"") %>% kable_minimal()
| name | rsquared | SE | f_statistic | bty.p_value |
|---|---|---|---|---|
| m2 | 0.1634262 | 0.4974425 | 7.942500 | 0.0723271 |
| m4 | 0.1632301 | 0.4975008 | 9.193009 | 0.0828135 |
| m3 | 0.1632178 | 0.4975044 | 8.509584 | 0.0831837 |
| m1 | 0.1617076 | 0.4979532 | 7.365741 | 0.0727793 |
| m5 | 0.1610344 | 0.4981531 | 10.853127 | 0.0297983 |
| m6 | 0.1575649 | 0.4991821 | 11.801271 | 0.0262285 |
The diagnostic plots for m2 look good for satisfying the conditions of linear regression.
par(mfrow=c(2,2))
plot(m2, col= 'darkviolet', lwd = 2)
Each row represents a course so the observations are about a course not a professor. There are courses that are taught by only one professor, and courses that are taught by multiple professors (cls_profs). If the single professor courses are filtered out of the data, then we can find a model with an adjusted R2 = 0.234, which is better than the model with all the observations.
cls_m <- evals %>% filter(cls_profs == 'multiple')
lm(score ~ ethnicity + gender + language + age + cls_perc_eval
+ cls_credits + bty_avg + pic_outfit, data = cls_m) %>%
summary()
##
## Call:
## lm(formula = score ~ ethnicity + gender + language + age + cls_perc_eval +
## cls_credits + bty_avg + pic_outfit, data = cls_m)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.55019 -0.26938 0.04784 0.33733 1.25809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.994145 0.268938 14.852 < 2e-16 ***
## ethnicitynot minority 0.197930 0.104830 1.888 0.05999 .
## gendermale 0.176262 0.058777 2.999 0.00294 **
## languagenon-english -0.207967 0.138240 -1.504 0.13354
## age -0.014892 0.003236 -4.603 6.19e-06 ***
## cls_perc_eval 0.008509 0.001714 4.964 1.17e-06 ***
## cls_creditsone credit 0.487818 0.119172 4.093 5.48e-05 ***
## bty_avg 0.046966 0.019734 2.380 0.01795 *
## pic_outfitnot formal -0.250986 0.092186 -2.723 0.00686 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4825 on 297 degrees of freedom
## Multiple R-squared: 0.2539, Adjusted R-squared: 0.2338
## F-statistic: 12.64 on 8 and 297 DF, p-value: 1.266e-15
If we look at only the single professor courses, then bty_avg becomes less significant, p-value = 0.541, as we do model selection.
cls_s <- evals %>% filter(cls_profs == 'single')
lm(score ~ rank + gender + language + cls_level + cls_credits +
bty_avg + pic_color, data = cls_s) %>%
summary()
##
## Call:
## lm(formula = score ~ rank + gender + language + cls_level + cls_credits +
## bty_avg + pic_color, data = cls_s)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.57972 -0.30552 0.07844 0.33951 0.92990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.36418 0.24922 17.512 < 2e-16 ***
## ranktenure track -0.33060 0.12852 -2.572 0.01109 *
## ranktenured -0.11285 0.11397 -0.990 0.32369
## gendermale 0.16256 0.09109 1.785 0.07638 .
## languagenon-english -0.23834 0.14588 -1.634 0.10444
## cls_levelupper 0.19254 0.09737 1.977 0.04984 *
## cls_creditsone credit 0.65080 0.23684 2.748 0.00675 **
## bty_avg 0.01923 0.03137 0.613 0.54079
## pic_colorcolor -0.44578 0.13498 -3.303 0.00120 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4788 on 148 degrees of freedom
## Multiple R-squared: 0.2271, Adjusted R-squared: 0.1853
## F-statistic: 5.436 on 8 and 148 DF, p-value: 5.172e-06
A high scoring professor would be a younger man, who is not a minority and was educated in an English speaking school. He is not tenured or tenured tracked and teaches multi-credit, upper-level courses. He is attractive (to students), and is dressed formally in the black & white photograph shown to students.
t1 <- tibble(name = labels(s2$coefficients[,'Estimate']), coef = s2$coefficients[,'Estimate'])
kable(t1, table.attr = "style = \"color: indigo;\"") %>% kable_minimal()
| name | coef |
|---|---|
| (Intercept) | 4.0872523 |
| ranktenure track | -0.1476746 |
| ranktenured | -0.0973829 |
| ethnicitynot minority | 0.1274458 |
| gendermale | 0.2101231 |
| languagenon-english | -0.2282894 |
| age | -0.0089992 |
| cls_perc_eval | 0.0052888 |
| cls_students | 0.0004687 |
| cls_levelupper | 0.0606374 |
| cls_creditsone credit | 0.5061196 |
| bty_avg | 0.0398629 |
| pic_outfitnot formal | -0.1083227 |
| pic_colorcolor | -0.2190527 |
I would not generalize these results to the population of professors. This observational study was conducted at a single American university and the sample data is not representative of all college professors.