Setup

In this lab we work with the evals dataset from the openintro package.
Each row represents a single course taught at the University of Texas at Austin, and columns describe the course, the instructor, and students’ evaluations.

# Load data that comes with the openintro package
data(evals)
glimpse(evals)
## Rows: 463
## Columns: 23
## $ course_id     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ prof_id       <int> 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,…
## $ score         <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4…
## $ rank          <fct> tenure track, tenure track, tenure track, tenure track, …
## $ ethnicity     <fct> minority, minority, minority, minority, not minority, no…
## $ gender        <fct> female, female, female, female, male, male, male, male, …
## $ language      <fct> english, english, english, english, english, english, en…
## $ age           <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, …
## $ cls_perc_eval <dbl> 55.81395, 68.80000, 60.80000, 62.60163, 85.00000, 87.500…
## $ cls_did_eval  <int> 24, 86, 76, 77, 17, 35, 39, 55, 111, 40, 24, 24, 17, 14,…
## $ cls_students  <int> 43, 125, 125, 123, 20, 40, 44, 55, 195, 46, 27, 25, 20, …
## $ cls_level     <fct> upper, upper, upper, upper, upper, upper, upper, upper, …
## $ cls_profs     <fct> single, single, single, single, multiple, multiple, mult…
## $ cls_credits   <fct> multi credit, multi credit, multi credit, multi credit, …
## $ bty_f1lower   <int> 5, 5, 5, 5, 4, 4, 4, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 7, 7,…
## $ bty_f1upper   <int> 7, 7, 7, 7, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 9, 9,…
## $ bty_f2upper   <int> 6, 6, 6, 6, 2, 2, 2, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 9, 9,…
## $ bty_m1lower   <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 7, 7,…
## $ bty_m1upper   <int> 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 6,…
## $ bty_m2upper   <int> 6, 6, 6, 6, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 6, 6,…
## $ bty_avg       <dbl> 5.000, 5.000, 5.000, 5.000, 3.000, 3.000, 3.000, 3.333, …
## $ pic_outfit    <fct> not formal, not formal, not formal, not formal, not form…
## $ pic_color     <fct> color, color, color, color, color, color, color, color, …

Exercise 1

Question:
Is this an observational study or an experiment? The original paper asks whether beauty leads directly to differences in course evaluations. Given how the data were collected, can we answer that causal question? If not, re‑phrase the research question so it is appropriate for this study design.

Answer (E1):
This is an observational study, not an experiment, because the researchers did not randomly assign instructors to different beauty levels or course conditions. Instead, they collected naturally occurring course evaluations and then had independent raters score the instructors’ photos. Because there is no random assignment, we cannot make a strong causal claim that beauty causes higher course evaluations. A more appropriate research question is whether instructors’ beauty scores are associated with their course evaluation scores.


Exercise 2

Goal: Describe the distribution of score (course evaluation score).

ggplot(evals, aes(x = score)) +
  geom_histogram(bins = 15) +
  labs(title = "Distribution of Course Evaluation Scores",
       x = "Score",
       y = "Count")

summary(evals$score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.300   3.800   4.300   4.175   4.600   5.000

Answer (E2):
The distribution of course evaluation scores is [roughly symmetric / slightly left-skewed / slightly right-skewed] with most scores clustered toward the [high / mid] end of the scale. The median score is about [median from summary()], and the mean is about [mean from summary()], suggesting that students tend to give [generally high / moderate] ratings. Scores range from about [min] to [max], so the total spread of the data is about [max − min] points. Overall, the distribution suggests that very low scores are relatively [rare / common], while many courses receive fairly high evaluations. —

Exercise 3

Goal: Pick two variables other than score and describe their relationship.

Choose variables that make sense together (for example, age and bty_avg, or cls_students and cls_perc_eval, or gender and bty_avg). Update the code below with your chosen variables.

# Example: relationship between age and beauty (change to your chosen variables)
ggplot(evals, aes(x = age, y = bty_avg)) +
  geom_point(alpha = 0.6) +
  labs(title = "Example: Relationship between Age and Beauty Rating",
       x = "Age",
       y = "Average Beauty Rating")

Answer (E3):
I examined the relationship between [Variable X] and [Variable Y]. The plot suggests a [positive / negative / weak / no clear] relationship: as [X increases / changes], [Y tends to increase / decrease / stay about the same]. The points are [tightly / loosely] clustered around a trend, which indicates [stronger / weaker] association. There appear to be a few potential outliers where [describe any unusual values], but they do not drastically change the overall pattern.


Exercise 4

Goal: Compare the basic scatterplot with one that uses jitter.

ggplot(evals, aes(x = bty_avg, y = score)) +
  geom_point() +
  labs(title = "Score vs. Beauty Rating (no jitter)",
       x = "Average Beauty Rating",
       y = "Course Score")

ggplot(evals, aes(x = bty_avg, y = score)) +
  geom_jitter(width = 0.05, height = 0.05, alpha = 0.6) +
  labs(title = "Score vs. Beauty Rating (with jitter)",
       x = "Average Beauty Rating",
       y = "Course Score")

Answer (E4):
In the plot without jitter, many points lie directly on top of each other, so it is hard to see how many observations share the same combination of beauty and score values. Adding jitter spreads the overlapping points out slightly, revealing the underlying density of points at each location. The jittered plot makes it clearer that many courses share the same rounded scores and similar beauty ratings. The overall relationship between beauty and score does not change, but the jitter helps us better visualize how many observations are stacked in each region.


Exercise 5

Goal: Fit a simple linear regression model predicting score from bty_avg and interpret it.

m_bty <- lm(score ~ bty_avg, data = evals)
summary(m_bty)
## 
## Call:
## lm(formula = score ~ bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.88034    0.07614   50.96  < 2e-16 ***
## bty_avg      0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

Answer (E5):

The fitted model is

𝑠 𝑐 𝑜 𝑟 𝑒 ^ = [ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡] + [ 𝑠 𝑙 𝑜 𝑝 𝑒] × 𝑏 𝑡 𝑦 _ 𝑎 𝑣 𝑔 . score ^ =[intercept]+[slope]×bty_avg.

The slope of [slope] means that for each additional 1-point increase in beauty rating, the predicted course evaluation score changes by about [slope] points on average. Based on the very [small / moderate / large] p-value for bty_avg (p = [p-value]), there is [strong / some / little] statistical evidence of an association between beauty and course evaluations at the 5% significance level. However, because the score scale is only about 1–5, a change of [slope] points may be viewed as [small / moderate] in practical terms, even if it is statistically significant


Exercise 6

Goal: Check regression conditions for the simple model m_bty.

par(mfrow = c(2, 2))
plot(m_bty)

par(mfrow = c(1, 1))

Answer (E6):
The Residuals vs Fitted plot shows [a roughly horizontal band / some curvature], which suggests that the [linearity assumption is reasonable / there may be some non-linearity]. The spread of residuals across fitted values appears [approximately constant / increasing / decreasing], so the **constant variance assumption seems [reasonable / questionable]. The Normal Q–Q plot is [approximately straight / strongly curved at the tails], indicating that the residuals are [roughly normal / somewhat skewed]. Independence cannot be fully checked from these plots, but since multiple courses come from the same professor, there may be some dependence among observations. —

Exercise 7

Goal: Explore collinearity between beauty variables.

ggplot(evals, aes(x = bty_f1lower, y = bty_avg)) +
  geom_point(alpha = 0.6) +
  labs(title = "Average Beauty vs. Rater 1 (Lower Face)",
       x = "bty_f1lower",
       y = "Average Beauty Rating")

cor(evals$bty_f1lower, evals$bty_avg, use = "complete.obs")
## [1] 0.8439112

Answer (E7):
The correlation between bty_f1lower and bty_avg is about [correlation value], which indicates a [very strong / strong / moderate] positive relationship. This means that professors who are rated as more attractive on the lower-face scale by rater 1 also tend to have higher average beauty scores overall. Because the correlation is so high, these two variables contain very similar information. Including many highly correlated beauty variables in the same regression model could lead to collinearity, making it harder to interpret individual coefficients and potentially inflating standard errors.


Exercise 8

Goal: Fit a multiple regression model with beauty and gender.

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07

Answer (E8):

In the model score ~ bty_avg + gender, the intercept represents the predicted score for the reference gender (e.g., female professors if genderfemale is the baseline) when bty_avg = 0, which is mostly a mathematical anchor because a beauty rating of zero may not be realistic. The slope for bty_avg shows how the predicted score changes for a 1-unit increase in beauty rating holding gender constant, and it is about [slope value]. The coefficient for gendermale represents the difference in predicted score between male and female professors with the same beauty rating: if the coefficient is [positive/negative], male professors are predicted to score about [coefficient value] points [higher/lower] than female professors. The p-values indicate that [state which predictors] are statistically significant at the 5% level.

Exercise 9

Goal: Understand how a categorical variable with two levels (picture color) affects the regression line.

ggplot(evals, aes(x = bty_avg, y = score, color = pic_color)) +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
  labs(title = "Score vs. Beauty by Picture Color",
       x = "Average Beauty Rating",
       y = "Course Score")

Answer (E9):

When we include pic_color in the model, the line for the reference category (e.g., black-and-white photos) has equation

𝑠 𝑐 𝑜 𝑟 𝑒 ^ = [ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡 _ 𝐵 𝑊] + [ 𝑠 𝑙 𝑜 𝑝 𝑒] × 𝑏 𝑡 𝑦 _ 𝑎 𝑣 𝑔 , score ^ =[intercept_BW]+[slope]×bty_avg,

while the line for color photos adds the pic_color coefficient to the intercept:

𝑠 𝑐 𝑜 𝑟 𝑒 ^ = ( [ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡 _ 𝐵 𝑊] + [ 𝑐 𝑜 𝑒 𝑓 _ 𝑐 𝑜 𝑙 𝑜 𝑟] ) + [ 𝑠 𝑙 𝑜 𝑝 𝑒] × 𝑏 𝑡 𝑦 _ 𝑎 𝑣 𝑔 . score ^ =([intercept_BW]+[coef_color])+[slope]×bty_avg.

This means that for two professors with the same beauty rating, the one with a [color / black-and-white] photo is predicted to score about [coef_color] points [higher/lower] than the other. The p-value for pic_color shows that this difference is [statistically significant / not clearly significant].

Exercise 10

Goal: Add rank (with three levels) to the model and see how R handles it.

m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05

Answer (E10):
In the model with rank, R creates indicator variables for each non-reference rank level, such as ranktenure track and ranktenured, while the omitted level (e.g., teaching) is the reference category. The intercept corresponds to the predicted score for a professor in the [reference rank] group when bty_avg = 0. A coefficient like ranktenured = [value] means that, holding beauty constant, tenured professors are predicted to have scores [value] points [higher/lower] than the reference group. The p-values for these coefficients indicate whether each rank level differs significantly from the baseline in predicted score.


Exercise 11

Goal: Look ahead to a larger model and think about which variable might not matter.

Before fitting the full model, think about the list of predictors and write down which variable you expect to have the highest p‑value (i.e., least evidence of association with score) and why.

Answer (E11):
Before fitting the full model, I predict that [name the variable] will have the largest p-value and thus the weakest evidence of association with score. My reasoning is that this variable seems [less directly related to teaching quality or student perceptions / more random / less conceptually relevant] compared to the others. For example, [give a short explanation, like “whether the professor is wearing a formal outfit might not strongly affect how students rate the course overall.”]


Exercise 12

Goal: Fit the full model given in the lab and inspect the output.

m_full <- lm(score ~ rank + gender + ethnicity + language + age +
               cls_perc_eval + cls_students + cls_level + cls_profs +
               cls_credits + bty_avg + pic_outfit + pic_color,
             data = evals)
summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + gender + ethnicity + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

Answer (E12):
In the full model including all predictors, the variable with the largest p-value is [actual variable name], indicating the least statistical evidence of association with score after adjusting for all other variables. I originally guessed that [your guess] would be least important, and this guess was [correct/incorrect]. This suggests that [brief interpretation: either your intuition matched the data, or the data revealed that some other variable is even less related to course scores than you expected].


Exercise 13

Goal: Interpret the coefficient for ethnicity in the full model.

Answer (E13):
In the full model, [ethnicity minority or not minority] is the reference category, and the coefficient for [the other level] represents the difference in predicted course score between the two groups holding all other variables constant. For example, if the coefficient for ethnicitynot minority is [value], then non-minority professors are predicted to score about [value] points [higher/lower] than minority professors with the same values of beauty, rank, class size, etc. The corresponding p-value shows that this difference is [statistically significant / not statistically significant] at the 5% level. In practical terms, a difference of [value] points on a 1–5 scale is [small/moderate/large].


Exercise 14

Goal: Refit the model after dropping the least useful variable.

  1. From m_full, identify the predictor with the largest p‑value.
  2. Drop that variable and refit the model (update the formula below).
m_drop1 <- lm(score ~ rank + gender + ethnicity + language + age +
                cls_perc_eval + cls_students + cls_level + cls_profs +
                cls_credits + bty_avg + pic_outfit + pic_color,
              data = evals)
summary(m_drop1)
## 
## Call:
## lm(formula = score ~ rank + gender + ethnicity + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

(Be sure to actually remove the variable you chose from the formula above.)

Answer (E14):
I removed [variable name] from the full model because it had the largest p-value and appeared to contribute the least. After refitting the model without this variable, the key coefficients (such as bty_avg, gender, and cls_perc_eval) changed by [only a small amount / a noticeable amount] in both magnitude and p-values. This suggests that [if changes are small:] the dropped variable was not strongly collinear with the remaining predictors and did not contribute much unique information; [if changes are larger:] the dropped variable shared information with others, and its removal altered how the remaining predictors explain variation in score.


Exercise 15

Goal: Use backward selection (based on p‑values) to find a simpler model.

Starting from m_full, repeatedly remove the variable with the highest p‑value (above a chosen threshold, such as 0.05 or 0.10) and refit the model until only predictors with reasonably small p‑values remain.

Below, place only your final chosen model and its summary.

# Replace the formula with your final chosen set of predictors
m_final <- lm(score ~ bty_avg + gender + cls_perc_eval + cls_credits,
              data = evals)
summary(m_final)
## 
## Call:
## lm(formula = score ~ bty_avg + gender + cls_perc_eval + cls_credits, 
##     data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8421 -0.3384  0.1046  0.3841  1.0547 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.375919   0.128466  26.279  < 2e-16 ***
## bty_avg               0.072030   0.015932   4.521 7.84e-06 ***
## gendermale            0.176206   0.048679   3.620 0.000328 ***
## cls_perc_eval         0.004729   0.001451   3.258 0.001204 ** 
## cls_creditsone credit 0.457260   0.102579   4.458 1.04e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5107 on 458 degrees of freedom
## Multiple R-squared:  0.1259, Adjusted R-squared:  0.1182 
## F-statistic: 16.49 on 4 and 458 DF,  p-value: 1.25e-12

Answer (E15):
After performing backward selection based on p-values, my final model includes [list the remaining predictors]. The fitted regression equation can be written as

𝑠 𝑐 𝑜 𝑟 𝑒 ^ = [ 𝑓 𝑖 𝑛 𝑎 𝑙 _ 𝑖 𝑛 𝑡 𝑒 𝑟 𝑐 𝑒 𝑝 𝑡] + [ 𝑏 1] × [ 𝑃 𝑟 𝑒 𝑑 𝑖 𝑐 𝑡 𝑜 𝑟 1] + [ 𝑏 2] × [ 𝑃 𝑟 𝑒 𝑑 𝑖 𝑐 𝑡 𝑜 𝑟 2] + … score ^ =[final_intercept]+[b1]×[Predictor1]+[b2]×[Predictor2]+…

I stopped removing variables once all remaining predictors had p-values below [chosen cutoff, e.g., 0.05 or 0.10], indicating reasonably strong evidence that each contributes to explaining variation in score. This final model balances simplicity with explanatory power.

Exercise 16

Goal: Check regression conditions for the final model.

par(mfrow = c(2, 2))
plot(m_final)

par(mfrow = c(1, 1))

Answer (E16):
For the final model, the Residuals vs Fitted plot appears [roughly horizontal / mildly curved], which suggests that the linearity assumption is [reasonably satisfied / somewhat questionable]. The spread of residuals across fitted values is [fairly constant / shows some funnel-shaped pattern], so the constant variance assumption seems [reasonable / potentially violated]. The Normal Q–Q plot is [close to a straight line / clearly curved], indicating that the residuals are [approximately normal / somewhat non-normal]. As before, independence cannot be fully verified from these plots, but the presence of multiple courses from the same professor may mean that some observations are correlated.


Exercise 17

Goal: Think about independence more carefully given how the data were sampled.

Answer (E17):
Because several courses can belong to the same professor, the observations are not a simple random sample of independent individuals. Courses taught by the same instructor may tend to have similar evaluation scores due to instructor characteristics that persist across classes. This clustering violates the strict independence assumption underlying standard linear regression, which can lead to underestimated standard errors and p-values that are too small. A more advanced model (such as a mixed-effects model) would be better suited to account for this structure


Exercise 18

Goal: Describe a “high‑scoring” professor/course based on your final model.

Answer (E18):
Based on the final model, a high-scoring course is likely to be taught by a professor with a higher beauty rating and in a class where a larger percentage of students complete the evaluation. If gender or rank matter in the final model, then [describe direction, e.g., one gender or rank level tends to have slightly higher predicted scores], holding other factors constant. Courses with [certain class sizes, credit levels, or picture characteristics] that have positive coefficients also contribute to higher predicted scores. In general, the model suggests that both appearance-related variables and course-level variables play a role in how students rate their classes.


Exercise 19

Goal: Consider generalizability.

Answer (E19):
These findings are based on data from a single university (UT Austin) during a specific time period, so we must be cautious about generalizing to all universities and contexts. Other institutions may differ in student demographics, campus culture, and evaluation practices, which could change the relationship between instructor appearance and course ratings. In addition, the study relies on beauty ratings from a particular group of raters, which may not reflect perceptions in other populations. Overall, the results mainly generalize to similar types of courses and institutions, and we should avoid claiming that the same patterns necessarily hold everywhere.