In this lab we work with the evals dataset from the
openintro package.
Each row represents a single course taught at the University of Texas at
Austin, and columns describe the course, the instructor, and students’
evaluations.
# Load data that comes with the openintro package
data(evals)
glimpse(evals)
## Rows: 463
## Columns: 23
## $ course_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ prof_id <int> 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,…
## $ score <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4…
## $ rank <fct> tenure track, tenure track, tenure track, tenure track, …
## $ ethnicity <fct> minority, minority, minority, minority, not minority, no…
## $ gender <fct> female, female, female, female, male, male, male, male, …
## $ language <fct> english, english, english, english, english, english, en…
## $ age <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, …
## $ cls_perc_eval <dbl> 55.81395, 68.80000, 60.80000, 62.60163, 85.00000, 87.500…
## $ cls_did_eval <int> 24, 86, 76, 77, 17, 35, 39, 55, 111, 40, 24, 24, 17, 14,…
## $ cls_students <int> 43, 125, 125, 123, 20, 40, 44, 55, 195, 46, 27, 25, 20, …
## $ cls_level <fct> upper, upper, upper, upper, upper, upper, upper, upper, …
## $ cls_profs <fct> single, single, single, single, multiple, multiple, mult…
## $ cls_credits <fct> multi credit, multi credit, multi credit, multi credit, …
## $ bty_f1lower <int> 5, 5, 5, 5, 4, 4, 4, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 7, 7,…
## $ bty_f1upper <int> 7, 7, 7, 7, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 9, 9,…
## $ bty_f2upper <int> 6, 6, 6, 6, 2, 2, 2, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 9, 9,…
## $ bty_m1lower <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 7, 7,…
## $ bty_m1upper <int> 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 6,…
## $ bty_m2upper <int> 6, 6, 6, 6, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 6, 6,…
## $ bty_avg <dbl> 5.000, 5.000, 5.000, 5.000, 3.000, 3.000, 3.000, 3.333, …
## $ pic_outfit <fct> not formal, not formal, not formal, not formal, not form…
## $ pic_color <fct> color, color, color, color, color, color, color, color, …
Question:
Is this an observational study or an experiment? The original paper asks
whether beauty leads directly to differences in course
evaluations. Given how the data were collected, can we answer that
causal question? If not, re‑phrase the research question so it is
appropriate for this study design.
Answer (E1):
This is an observational study, not an experiment, because the
researchers did not randomly assign instructors to different beauty
levels or course conditions. Instead, they collected naturally occurring
course evaluations and then had independent raters score the
instructors’ photos. Because there is no random assignment, we cannot
make a strong causal claim that beauty causes higher course evaluations.
A more appropriate research question is whether instructors’ beauty
scores are associated with their course evaluation scores.
Goal: Describe the distribution of
score (course evaluation score).
ggplot(evals, aes(x = score)) +
geom_histogram(bins = 15) +
labs(title = "Distribution of Course Evaluation Scores",
x = "Score",
y = "Count")
summary(evals$score)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.300 3.800 4.300 4.175 4.600 5.000
Answer (E2):
The distribution of course evaluation scores is [roughly symmetric /
slightly left-skewed / slightly right-skewed] with most scores clustered
toward the [high / mid] end of the scale. The median score is about
[median from summary()], and the mean is about [mean from summary()],
suggesting that students tend to give [generally high / moderate]
ratings. Scores range from about [min] to [max], so the total spread of
the data is about [max − min] points. Overall, the distribution suggests
that very low scores are relatively [rare / common], while many courses
receive fairly high evaluations. —
Goal: Pick two variables other than
score and describe their relationship.
Choose variables that make sense together (for example,
age and bty_avg, or cls_students
and cls_perc_eval, or gender and
bty_avg). Update the code below with your chosen
variables.
# Example: relationship between age and beauty (change to your chosen variables)
ggplot(evals, aes(x = age, y = bty_avg)) +
geom_point(alpha = 0.6) +
labs(title = "Example: Relationship between Age and Beauty Rating",
x = "Age",
y = "Average Beauty Rating")
Answer (E3):
I examined the relationship between [Variable X] and [Variable Y]. The
plot suggests a [positive / negative / weak / no clear] relationship: as
[X increases / changes], [Y tends to increase / decrease / stay about
the same]. The points are [tightly / loosely] clustered around a trend,
which indicates [stronger / weaker] association. There appear to be a
few potential outliers where [describe any unusual values], but they do
not drastically change the overall pattern.
Goal: Compare the basic scatterplot with one that uses jitter.
ggplot(evals, aes(x = bty_avg, y = score)) +
geom_point() +
labs(title = "Score vs. Beauty Rating (no jitter)",
x = "Average Beauty Rating",
y = "Course Score")
ggplot(evals, aes(x = bty_avg, y = score)) +
geom_jitter(width = 0.05, height = 0.05, alpha = 0.6) +
labs(title = "Score vs. Beauty Rating (with jitter)",
x = "Average Beauty Rating",
y = "Course Score")
Answer (E4):
In the plot without jitter, many points lie directly on top of each
other, so it is hard to see how many observations share the same
combination of beauty and score values. Adding jitter spreads the
overlapping points out slightly, revealing the underlying density of
points at each location. The jittered plot makes it clearer that many
courses share the same rounded scores and similar beauty ratings. The
overall relationship between beauty and score does not change, but the
jitter helps us better visualize how many observations are stacked in
each region.
Goal: Fit a simple linear regression model
predicting score from bty_avg and interpret
it.
m_bty <- lm(score ~ bty_avg, data = evals)
summary(m_bty)
##
## Call:
## lm(formula = score ~ bty_avg, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
Answer (E5):
The fitted model is
𝑠 𝑐 𝑜 𝑟 𝑒 ^ = [ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡] + [ 𝑠 𝑙 𝑜 𝑝 𝑒] × 𝑏 𝑡 𝑦 _ 𝑎 𝑣 𝑔 . score ^ =[intercept]+[slope]×bty_avg.
The slope of [slope] means that for each additional 1-point increase in beauty rating, the predicted course evaluation score changes by about [slope] points on average. Based on the very [small / moderate / large] p-value for bty_avg (p = [p-value]), there is [strong / some / little] statistical evidence of an association between beauty and course evaluations at the 5% significance level. However, because the score scale is only about 1–5, a change of [slope] points may be viewed as [small / moderate] in practical terms, even if it is statistically significant
Goal: Check regression conditions for the simple
model m_bty.
par(mfrow = c(2, 2))
plot(m_bty)
par(mfrow = c(1, 1))
Answer (E6):
The Residuals vs Fitted plot shows [a roughly horizontal band / some
curvature], which suggests that the [linearity assumption is reasonable
/ there may be some non-linearity]. The spread of residuals across
fitted values appears [approximately constant / increasing /
decreasing], so the **constant variance assumption seems [reasonable /
questionable]. The Normal Q–Q plot is [approximately straight / strongly
curved at the tails], indicating that the residuals are [roughly normal
/ somewhat skewed]. Independence cannot be fully checked from these
plots, but since multiple courses come from the same professor, there
may be some dependence among observations. —
Goal: Explore collinearity between beauty variables.
ggplot(evals, aes(x = bty_f1lower, y = bty_avg)) +
geom_point(alpha = 0.6) +
labs(title = "Average Beauty vs. Rater 1 (Lower Face)",
x = "bty_f1lower",
y = "Average Beauty Rating")
cor(evals$bty_f1lower, evals$bty_avg, use = "complete.obs")
## [1] 0.8439112
Answer (E7):
The correlation between bty_f1lower and bty_avg is about [correlation
value], which indicates a [very strong / strong / moderate] positive
relationship. This means that professors who are rated as more
attractive on the lower-face scale by rater 1 also tend to have higher
average beauty scores overall. Because the correlation is so high, these
two variables contain very similar information. Including many highly
correlated beauty variables in the same regression model could lead to
collinearity, making it harder to interpret individual coefficients and
potentially inflating standard errors.
Goal: Fit a multiple regression model with beauty and gender.
m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
Answer (E8):
Goal: Understand how a categorical variable with two levels (picture color) affects the regression line.
ggplot(evals, aes(x = bty_avg, y = score, color = pic_color)) +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
labs(title = "Score vs. Beauty by Picture Color",
x = "Average Beauty Rating",
y = "Course Score")
Answer (E9):
When we include pic_color in the model, the line for the reference category (e.g., black-and-white photos) has equation
𝑠 𝑐 𝑜 𝑟 𝑒 ^ = [ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡 _ 𝐵 𝑊] + [ 𝑠 𝑙 𝑜 𝑝 𝑒] × 𝑏 𝑡 𝑦 _ 𝑎 𝑣 𝑔 , score ^ =[intercept_BW]+[slope]×bty_avg,
while the line for color photos adds the pic_color coefficient to the intercept:
𝑠 𝑐 𝑜 𝑟 𝑒 ^ = ( [ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡 _ 𝐵 𝑊] + [ 𝑐 𝑜 𝑒 𝑓 _ 𝑐 𝑜 𝑙 𝑜 𝑟] ) + [ 𝑠 𝑙 𝑜 𝑝 𝑒] × 𝑏 𝑡 𝑦 _ 𝑎 𝑣 𝑔 . score ^ =([intercept_BW]+[coef_color])+[slope]×bty_avg.
Goal: Add rank (with three levels) to
the model and see how R handles it.
m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
Answer (E10):
In the model with rank, R creates indicator variables for each
non-reference rank level, such as ranktenure track and ranktenured,
while the omitted level (e.g., teaching) is the reference category. The
intercept corresponds to the predicted score for a professor in the
[reference rank] group when bty_avg = 0. A coefficient like ranktenured
= [value] means that, holding beauty constant, tenured professors are
predicted to have scores [value] points [higher/lower] than the
reference group. The p-values for these coefficients indicate whether
each rank level differs significantly from the baseline in predicted
score.
Goal: Look ahead to a larger model and think about which variable might not matter.
Before fitting the full model, think about the list of predictors and
write down which variable you expect to have the highest p‑value (i.e.,
least evidence of association with score) and
why.
Answer (E11):
Before fitting the full model, I predict that [name the variable] will
have the largest p-value and thus the weakest evidence of association
with score. My reasoning is that this variable seems [less directly
related to teaching quality or student perceptions / more random / less
conceptually relevant] compared to the others. For example, [give a
short explanation, like “whether the professor is wearing a formal
outfit might not strongly affect how students rate the course
overall.”]
Goal: Fit the full model given in the lab and inspect the output.
m_full <- lm(score ~ rank + gender + ethnicity + language + age +
cls_perc_eval + cls_students + cls_level + cls_profs +
cls_credits + bty_avg + pic_outfit + pic_color,
data = evals)
summary(m_full)
##
## Call:
## lm(formula = score ~ rank + gender + ethnicity + language + age +
## cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77397 -0.32432 0.09067 0.35183 0.95036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0952141 0.2905277 14.096 < 2e-16 ***
## ranktenure track -0.1475932 0.0820671 -1.798 0.07278 .
## ranktenured -0.0973378 0.0663296 -1.467 0.14295
## gendermale 0.2109481 0.0518230 4.071 5.54e-05 ***
## ethnicitynot minority 0.1234929 0.0786273 1.571 0.11698
## languagenon-english -0.2298112 0.1113754 -2.063 0.03965 *
## age -0.0090072 0.0031359 -2.872 0.00427 **
## cls_perc_eval 0.0053272 0.0015393 3.461 0.00059 ***
## cls_students 0.0004546 0.0003774 1.205 0.22896
## cls_levelupper 0.0605140 0.0575617 1.051 0.29369
## cls_profssingle -0.0146619 0.0519885 -0.282 0.77806
## cls_creditsone credit 0.5020432 0.1159388 4.330 1.84e-05 ***
## bty_avg 0.0400333 0.0175064 2.287 0.02267 *
## pic_outfitnot formal -0.1126817 0.0738800 -1.525 0.12792
## pic_colorcolor -0.2172630 0.0715021 -3.039 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1617
## F-statistic: 7.366 on 14 and 448 DF, p-value: 6.552e-14
Answer (E12):
In the full model including all predictors, the variable with the
largest p-value is [actual variable name], indicating the least
statistical evidence of association with score after adjusting for all
other variables. I originally guessed that [your guess] would be least
important, and this guess was [correct/incorrect]. This suggests that
[brief interpretation: either your intuition matched the data, or the
data revealed that some other variable is even less related to course
scores than you expected].
Goal: Interpret the coefficient for ethnicity in the full model.
Answer (E13):
In the full model, [ethnicity minority or not minority] is the reference
category, and the coefficient for [the other level] represents the
difference in predicted course score between the two groups holding all
other variables constant. For example, if the coefficient for
ethnicitynot minority is [value], then non-minority professors are
predicted to score about [value] points [higher/lower] than minority
professors with the same values of beauty, rank, class size, etc. The
corresponding p-value shows that this difference is [statistically
significant / not statistically significant] at the 5% level. In
practical terms, a difference of [value] points on a 1–5 scale is
[small/moderate/large].
Goal: Refit the model after dropping the least useful variable.
m_full, identify the predictor with the largest
p‑value.m_drop1 <- lm(score ~ rank + gender + ethnicity + language + age +
cls_perc_eval + cls_students + cls_level + cls_profs +
cls_credits + bty_avg + pic_outfit + pic_color,
data = evals)
summary(m_drop1)
##
## Call:
## lm(formula = score ~ rank + gender + ethnicity + language + age +
## cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77397 -0.32432 0.09067 0.35183 0.95036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0952141 0.2905277 14.096 < 2e-16 ***
## ranktenure track -0.1475932 0.0820671 -1.798 0.07278 .
## ranktenured -0.0973378 0.0663296 -1.467 0.14295
## gendermale 0.2109481 0.0518230 4.071 5.54e-05 ***
## ethnicitynot minority 0.1234929 0.0786273 1.571 0.11698
## languagenon-english -0.2298112 0.1113754 -2.063 0.03965 *
## age -0.0090072 0.0031359 -2.872 0.00427 **
## cls_perc_eval 0.0053272 0.0015393 3.461 0.00059 ***
## cls_students 0.0004546 0.0003774 1.205 0.22896
## cls_levelupper 0.0605140 0.0575617 1.051 0.29369
## cls_profssingle -0.0146619 0.0519885 -0.282 0.77806
## cls_creditsone credit 0.5020432 0.1159388 4.330 1.84e-05 ***
## bty_avg 0.0400333 0.0175064 2.287 0.02267 *
## pic_outfitnot formal -0.1126817 0.0738800 -1.525 0.12792
## pic_colorcolor -0.2172630 0.0715021 -3.039 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1617
## F-statistic: 7.366 on 14 and 448 DF, p-value: 6.552e-14
(Be sure to actually remove the variable you chose from the formula above.)
Answer (E14):
I removed [variable name] from the full model because it had the largest
p-value and appeared to contribute the least. After refitting the model
without this variable, the key coefficients (such as bty_avg, gender,
and cls_perc_eval) changed by [only a small amount / a noticeable
amount] in both magnitude and p-values. This suggests that [if changes
are small:] the dropped variable was not strongly collinear with the
remaining predictors and did not contribute much unique information; [if
changes are larger:] the dropped variable shared information with
others, and its removal altered how the remaining predictors explain
variation in score.
Goal: Use backward selection (based on p‑values) to find a simpler model.
Starting from m_full, repeatedly remove the variable
with the highest p‑value (above a chosen threshold, such as 0.05 or
0.10) and refit the model until only predictors with reasonably small
p‑values remain.
Below, place only your final chosen model and its summary.
# Replace the formula with your final chosen set of predictors
m_final <- lm(score ~ bty_avg + gender + cls_perc_eval + cls_credits,
data = evals)
summary(m_final)
##
## Call:
## lm(formula = score ~ bty_avg + gender + cls_perc_eval + cls_credits,
## data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8421 -0.3384 0.1046 0.3841 1.0547
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.375919 0.128466 26.279 < 2e-16 ***
## bty_avg 0.072030 0.015932 4.521 7.84e-06 ***
## gendermale 0.176206 0.048679 3.620 0.000328 ***
## cls_perc_eval 0.004729 0.001451 3.258 0.001204 **
## cls_creditsone credit 0.457260 0.102579 4.458 1.04e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5107 on 458 degrees of freedom
## Multiple R-squared: 0.1259, Adjusted R-squared: 0.1182
## F-statistic: 16.49 on 4 and 458 DF, p-value: 1.25e-12
Answer (E15):
After performing backward selection based on p-values, my final model
includes [list the remaining predictors]. The fitted regression equation
can be written as
𝑠 𝑐 𝑜 𝑟 𝑒 ^ = [ 𝑓 𝑖 𝑛 𝑎 𝑙 _ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡] + [ 𝑏 1] × [ 𝑃 𝑟 𝑒 𝑑 𝑖 𝑐 𝑡 𝑜 𝑟 1] + [ 𝑏 2] × [ 𝑃 𝑟 𝑒 𝑑 𝑖 𝑐 𝑡 𝑜 𝑟 2] + … score ^ =[final_intercept]+[b1]×[Predictor1]+[b2]×[Predictor2]+…
Goal: Check regression conditions for the final model.
par(mfrow = c(2, 2))
plot(m_final)
par(mfrow = c(1, 1))
Answer (E16):
For the final model, the Residuals vs Fitted plot appears [roughly
horizontal / mildly curved], which suggests that the linearity
assumption is [reasonably satisfied / somewhat questionable]. The spread
of residuals across fitted values is [fairly constant / shows some
funnel-shaped pattern], so the constant variance assumption seems
[reasonable / potentially violated]. The Normal Q–Q plot is [close to a
straight line / clearly curved], indicating that the residuals are
[approximately normal / somewhat non-normal]. As before, independence
cannot be fully verified from these plots, but the presence of multiple
courses from the same professor may mean that some observations are
correlated.
Goal: Think about independence more carefully given how the data were sampled.
Answer (E17):
Because several courses can belong to the same professor, the
observations are not a simple random sample of independent individuals.
Courses taught by the same instructor may tend to have similar
evaluation scores due to instructor characteristics that persist across
classes. This clustering violates the strict independence assumption
underlying standard linear regression, which can lead to underestimated
standard errors and p-values that are too small. A more advanced model
(such as a mixed-effects model) would be better suited to account for
this structure
Goal: Describe a “high‑scoring” professor/course based on your final model.
Answer (E18):
Based on the final model, a high-scoring course is likely to be taught
by a professor with a higher beauty rating and in a class where a larger
percentage of students complete the evaluation. If gender or rank matter
in the final model, then [describe direction, e.g., one gender or rank
level tends to have slightly higher predicted scores], holding other
factors constant. Courses with [certain class sizes, credit levels, or
picture characteristics] that have positive coefficients also contribute
to higher predicted scores. In general, the model suggests that both
appearance-related variables and course-level variables play a role in
how students rate their classes.
Goal: Consider generalizability.
Answer (E19):
These findings are based on data from a single university (UT Austin)
during a specific time period, so we must be cautious about generalizing
to all universities and contexts. Other institutions may differ in
student demographics, campus culture, and evaluation practices, which
could change the relationship between instructor appearance and course
ratings. In addition, the study relies on beauty ratings from a
particular group of raters, which may not reflect perceptions in other
populations. Overall, the results mainly generalize to similar types of
courses and institutions, and we should avoid claiming that the same
patterns necessarily hold everywhere.