## Warning: package 'ggplot2' was built under R version 4.5.1
## Warning: package 'GGally' was built under R version 4.5.1
## Warning: package 'broom' was built under R version 4.5.1
In this study, the researchers did not manipulate any variables but collected characteristcs such as a professor’s beauty rating, geneder, age, etc. Because the professor were not randomly assigned to “beauty” levels, and no vairable was controlled, the design is observatinal
## Rows: 463
## Columns: 23
## $ course_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ prof_id <int> 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,…
## $ score <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4…
## $ rank <fct> tenure track, tenure track, tenure track, tenure track, …
## $ ethnicity <fct> minority, minority, minority, minority, not minority, no…
## $ gender <fct> female, female, female, female, male, male, male, male, …
## $ language <fct> english, english, english, english, english, english, en…
## $ age <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, …
## $ cls_perc_eval <dbl> 55.81395, 68.80000, 60.80000, 62.60163, 85.00000, 87.500…
## $ cls_did_eval <int> 24, 86, 76, 77, 17, 35, 39, 55, 111, 40, 24, 24, 17, 14,…
## $ cls_students <int> 43, 125, 125, 123, 20, 40, 44, 55, 195, 46, 27, 25, 20, …
## $ cls_level <fct> upper, upper, upper, upper, upper, upper, upper, upper, …
## $ cls_profs <fct> single, single, single, single, multiple, multiple, mult…
## $ cls_credits <fct> multi credit, multi credit, multi credit, multi credit, …
## $ bty_f1lower <int> 5, 5, 5, 5, 4, 4, 4, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 7, 7,…
## $ bty_f1upper <int> 7, 7, 7, 7, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 9, 9,…
## $ bty_f2upper <int> 6, 6, 6, 6, 2, 2, 2, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 9, 9,…
## $ bty_m1lower <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 7, 7,…
## $ bty_m1upper <int> 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 6,…
## $ bty_m2upper <int> 6, 6, 6, 6, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 6, 6,…
## $ bty_avg <dbl> 5.000, 5.000, 5.000, 5.000, 3.000, 3.000, 3.000, 3.333, …
## $ pic_outfit <fct> not formal, not formal, not formal, not formal, not form…
## $ pic_color <fct> color, color, color, color, color, color, color, color, …
The score is skewed to the left meaning students generally rate professors positively. This is to be expected as it aligns with the typical tendency for course evaluations to be inflated towards the high end.
ggplot(evals, aes(x = score)) +
geom_histogram(binwidth = 0.25, fill = "steelblue", color = "white", alpha = 0.8) +
geom_density(aes(y = ..count.. * 0.25), color = "red", linewidth = 1.2) +
labs(
title = "Distribution of Course Evaluation Scores",
x = "Average Course Evaluation Score",
y = "Number of Courses"
) +
theme_minimal(base_size = 14)## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.300 3.800 4.300 4.175 4.600 5.000
The scatter plot shows there is a general trend “red sloping line” that shows a slightly negative relationship between beauty ratings between younger and older professors. The relationshup is weak tho suggesting that the age and perceived beauty are mostly independent
ggplot(evals, aes(x = age, y = bty_avg)) +
geom_point(color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", color = "red", se = TRUE) +
labs(
title = "Relationship Between Professor Age and Average Beauty Rating",
x = "Professor Age",
y = "Average Beauty Rating"
) +
theme_minimal(base_size = 14)## `geom_smooth()` using formula = 'y ~ x'
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.300 3.800 4.300 4.175 4.600 5.000
The first scatter plot can be misleading because many points overlap, making it look like there are fewer observations. The jittered plot spreads overlapping points slightly so the density observations is clearer.
ggplot(data = evals, aes(x = bty_avg, y = score)) +
geom_point() +
labs(
title = "Scatterplot of Beauty and Course Evaluation Score",
x = "Average Beauty Rating",
y = "Course Evaluation Score"
) +
theme_minimal(base_size = 14)ggplot(data = evals, aes(x = bty_avg, y = score)) +
geom_jitter(width = 0.1, height = 0.1, color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", color = "red", se = TRUE) +
labs(
title = "Relationship Between Beauty and Course Evaluation Score (Jittered)",
x = "Average Beauty Rating",
y = "Course Evaluation Score"
) +
theme_minimal(base_size = 14)## `geom_smooth()` using formula = 'y ~ x'
A slope of .067 means that every 1 point increase in average beauty rating, a proffessor’s average evaluation score increaese by about .07 points. Because the P-value <0.05, we can say that the beauty is a statistically significant predictor of evaluation scores. However, this is little practical use as an R^2 value of roughly 0.04 means that beauty exolains only about 4% of the variation in course scores.
m_bty <- lm(score ~ bty_avg, data = evals)
ggplot(data = evals, aes(x = bty_avg, y = score)) +
geom_jitter(width = 0.1, height = 0.1, color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", color = "red", se = FALSE) +labs(
title = "Relationship Between Beauty Rating and Course Evaluation Score",
x = "Average Beauty Rating",
y = "Average Course Evaluation Score"
) +
theme_minimal(base_size = 14)## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = score ~ bty_avg, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
The residuals vs. fitted plot checks for randomness and shos how the residuals are evenly scatterd. This means there is a slight curve may exist but linearity seems reasonable. The Histogram of Residuals checks if there are roughly symmetric and centered around 0. The residuals appear slightly left-skewed with consistently high average scores but still roughly symmetric; pointed out by a rough bell-shaped histogram. The Normal Q-Q plot visuallizes assesments of normality in residuals. Most points lie close to the red lines with little deviations.
m_bty <- lm(score ~ bty_avg, data = evals)
evals$resid <- residuals(m_bty)
evals$fitted <- fitted(m_bty)
ggplot(evals, aes(x = fitted, y = resid)) +
geom_point(alpha = 0.6, color = "steelblue") +
geom_hline(yintercept = 0, color = "red", linewidth = 1) +
labs(
title = "Residuals vs. Fitted Values",
x = "Fitted Values (Predicted Scores)",
y = "Residuals"
) +
theme_minimal(base_size = 14)ggplot(evals, aes(x = resid)) +
geom_histogram(binwidth = 0.1, fill = "steelblue", color = "white", alpha = 0.8) +
labs(
title = "Histogram of Residuals",
x = "Residuals",
y = "Frequency"
) +
theme_minimal(base_size = 14)ggplot(evals, aes(sample = resid)) +
stat_qq(color = "steelblue") +
stat_qq_line(color = "red") +
labs(
title = "Normal Q-Q Plot of Residuals",
x = "Theoretical Quantiles",
y = "Sample Quantiles"
) +
theme_minimal(base_size = 14)After controlling for gender, beauty remains a statistically significant but small predictor of course evaluation scores—each one-point increase in beauty rating raises the average score by about 0.07 points. Diagnostic plots show that the regression assumptions are reasonable, indicating the model is valid though the practical impact is minimal.
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.75 0.0847 44.3 6.23e-168
## 2 bty_avg 0.0742 0.0163 4.56 6.48e- 6
## 3 gendermale 0.172 0.0502 3.43 6.52e- 4
bty_avg is still a statistically significant predictor of score aftrer adding gender to the model as its p-value remains below 0.05
scoremale=3.99+0.07×bty_avg. Since the coefficent for gendermale is negative .10, this means the male professors tend to have slightly lower course evaluation scores than female prffessors with the same beauty rating.
When Rank is added to the model, R automatically creates dummy variables for each level except the baseline. By default, the first level becomes the refrence group while the other levels are compared agaisnt it. This means R estimates how much higher or lower the scores are for tenured professors relative to teaching faculty.
## [1] "teaching" "tenure track" "tenured"
evals$rank <- relevel(evals$rank, ref = "teaching")
m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
tidy(m_bty_rank)## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.98 0.0908 43.9 2.92e-166
## 2 bty_avg 0.0678 0.0165 4.10 4.92e- 5
## 3 ranktenure track -0.161 0.0740 -2.17 3.03e- 2
## 4 ranktenured -0.126 0.0627 -2.01 4.45e- 2
##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## bty_avg 1 4.786 4.7859 16.8595 4.765e-05 ***
## rank 2 1.571 0.7856 2.7673 0.06388 .
## Residuals 459 130.297 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can assume that beauty will remain a positive significant predictor of course evaluation scores. Variables such as class response rate and course level are also likely to influence scores
Both Beauty and Class response rate show a strong, positive effect of scores while Rank and Gender showed minor effects .
full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval +
cls_students + cls_level + cls_profs + cls_credits + bty_avg,
data = evals)
tidy(full)## # A tibble: 13 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.53 0.241 14.7 4.65e-40
## 2 ranktenure track -0.107 0.0820 -1.30 1.93e- 1
## 3 ranktenured -0.0450 0.0652 -0.691 4.90e- 1
## 4 gendermale 0.179 0.0515 3.47 5.79e- 4
## 5 ethnicitynot minority 0.187 0.0775 2.41 1.63e- 2
## 6 languagenon-english -0.127 0.108 -1.17 2.41e- 1
## 7 age -0.00665 0.00308 -2.16 3.15e- 2
## 8 cls_perc_eval 0.00570 0.00155 3.67 2.68e- 4
## 9 cls_students 0.000445 0.000358 1.24 2.15e- 1
## 10 cls_levelupper 0.0187 0.0556 0.337 7.37e- 1
## 11 cls_profssingle -0.00858 0.0514 -0.167 8.67e- 1
## 12 cls_creditsone credit 0.509 0.117 4.35 1.70e- 5
## 13 bty_avg 0.0613 0.0167 3.67 2.68e- 4
The variable ethnicity has two levels; Holding all other variables constant, professors are identified as not minority scored as there is only a .03 difference in scores between the two groups ### Exercise 14
After testing each variable, removing ethnicity produced the highest adjusted R² (a small increase), indicating it added little explanatory value. Refitting the model without ethnicity showed that the coefficients and significance of the remaining predictors barely changed; onfirming that ethnicity was not collinear with other vairables and had mimimal impact.
vars <- c("rank","gender","ethnicity","language","age","cls_perc_eval",
"cls_students","cls_level","cls_profs","cls_credits","bty_avg")
sapply(vars, function(v) {
formula <- as.formula(
paste("score ~", paste(setdiff(vars, v), collapse = " + "))
)
lm(formula, data = evals)$adj.r.squared
})## $rank
## NULL
##
## $gender
## NULL
##
## $ethnicity
## NULL
##
## $language
## NULL
##
## $age
## NULL
##
## $cls_perc_eval
## NULL
##
## $cls_students
## NULL
##
## $cls_level
## NULL
##
## $cls_profs
## NULL
##
## $cls_credits
## NULL
##
## $bty_avg
## NULL
## # A tibble: 13 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.53 0.241 14.7 4.65e-40
## 2 ranktenure track -0.107 0.0820 -1.30 1.93e- 1
## 3 ranktenured -0.0450 0.0652 -0.691 4.90e- 1
## 4 gendermale 0.179 0.0515 3.47 5.79e- 4
## 5 ethnicitynot minority 0.187 0.0775 2.41 1.63e- 2
## 6 languagenon-english -0.127 0.108 -1.17 2.41e- 1
## 7 age -0.00665 0.00308 -2.16 3.15e- 2
## 8 cls_perc_eval 0.00570 0.00155 3.67 2.68e- 4
## 9 cls_students 0.000445 0.000358 1.24 2.15e- 1
## 10 cls_levelupper 0.0187 0.0556 0.337 7.37e- 1
## 11 cls_profssingle -0.00858 0.0514 -0.167 8.67e- 1
## 12 cls_creditsone credit 0.509 0.117 4.35 1.70e- 5
## 13 bty_avg 0.0613 0.0167 3.67 2.68e- 4
reduced <- lm(score ~ rank + gender + language + age + cls_perc_eval +
cls_students + cls_level + cls_profs + cls_credits + bty_avg,
data = evals)
summary(reduced)##
## Call:
## lm(formula = score ~ rank + gender + language + age + cls_perc_eval +
## cls_students + cls_level + cls_profs + cls_credits + bty_avg,
## data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.79626 -0.31776 0.09361 0.36610 1.09021
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7215004 0.2286330 16.277 < 2e-16 ***
## ranktenure track -0.1305944 0.0818736 -1.595 0.111396
## ranktenured -0.0619692 0.0651846 -0.951 0.342280
## gendermale 0.1944444 0.0513870 3.784 0.000175 ***
## languagenon-english -0.2017554 0.1040217 -1.940 0.053057 .
## age -0.0066209 0.0030994 -2.136 0.033203 *
## cls_perc_eval 0.0054206 0.0015553 3.485 0.000540 ***
## cls_students 0.0004733 0.0003602 1.314 0.189472
## cls_levelupper 0.0372496 0.0553422 0.673 0.501242
## cls_profssingle -0.0315964 0.0507261 -0.623 0.533677
## cls_creditsone credit 0.4405426 0.1141482 3.859 0.000130 ***
## bty_avg 0.0607238 0.0167628 3.623 0.000325 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5067 on 451 degrees of freedom
## Multiple R-squared: 0.1527, Adjusted R-squared: 0.132
## F-statistic: 7.39 on 11 and 451 DF, p-value: 1.1e-11
The final model keeps beauty, class response rate, rank, and gender as the main predictors of professor scores. Professors with higher beauty ratings and classes with more students completing evaluations tend to earn higher scores. On the other hand, male and tenure-track professors are rated slightly lower on average. Overall, this model captures the key factors that influence evaluations while staying simple and statistically strong.
full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval +
cls_students + cls_level + cls_profs + cls_credits + bty_avg,
data = evals)
final_model <- step(full, direction = "backward", trace = FALSE)
summary(final_model)##
## Call:
## lm(formula = score ~ gender + ethnicity + language + age + cls_perc_eval +
## cls_credits + bty_avg, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9067 -0.3103 0.0849 0.3712 1.0611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.446967 0.203191 16.964 < 2e-16 ***
## gendermale 0.184780 0.049889 3.704 0.000238 ***
## ethnicitynot minority 0.204710 0.074710 2.740 0.006384 **
## languagenon-english -0.161463 0.103213 -1.564 0.118427
## age -0.005008 0.002606 -1.922 0.055289 .
## cls_perc_eval 0.005094 0.001438 3.543 0.000436 ***
## cls_creditsone credit 0.515065 0.104860 4.912 1.26e-06 ***
## bty_avg 0.064996 0.016327 3.981 7.99e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.503 on 455 degrees of freedom
## Multiple R-squared: 0.1576, Adjusted R-squared: 0.1446
## F-statistic: 12.16 on 7 and 455 DF, p-value: 2.879e-14
The diagnostic plots look good overall. The residuals vs. fitted plot shows no major pattern or funnel shape, which means the linearity and equal variance assumptions hold. The histogram and Q-Q plot show that residuals are roughly normal, with only slight deviation in the tails. There aren’t any major outliers or violations, so the regression assumptions seem reasonable and the model results are reliable.
final_model <- lm(score ~ bty_avg + cls_perc_eval + rank + gender, data = evals)
evals$resid <- residuals(final_model)
evals$fitted <- fitted(final_model)
ggplot(evals, aes(x = fitted, y = resid)) +
geom_point(alpha = 0.6, color = "steelblue") +
geom_hline(yintercept = 0, color = "red", linewidth = 1) +
labs(title = "Residuals vs Fitted Values", x = "Fitted Values", y = "Residuals") +
theme_minimal(base_size = 14)ggplot(evals, aes(x = resid)) +
geom_histogram(binwidth = 0.1, fill = "steelblue", color = "white", alpha = 0.8) +
labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency") +
theme_minimal(base_size = 14)ggplot(evals, aes(sample = resid)) +
stat_qq(color = "steelblue") +
stat_qq_line(color = "red") +
labs(title = "Normal Q-Q Plot", x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_minimal(base_size = 14)Because each row represents a course taught by the same professor, the data includes repeated observations for individual professors. This means the independence assumption of linear regression will not work. If residuals are correlated within professors, standard errors may be underestimated. A better approach would be to use a multilevel model where professor ID is treated as a random effect to account for course structure among professors.
Professors with higher beauty ratings, more student participation in evaluations, and teaching-focused roles—especially females—tend to receive the highest evaluation scores. Engaged classes and positive professor-student connections drive stronger ratings overall.
I would personally dont generalize results as the human interaction is hard to quantify statistically as emotions can be subjective and not follow any logic or reason. Additionally, it’s unfair to take the data as is and apply across over multiple universities.