library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.5.1
library(openintro)
library(GGally)
## Warning: package 'GGally' was built under R version 4.5.1
library(broom)
## Warning: package 'broom' was built under R version 4.5.1

Exercise 1

In this study, the researchers did not manipulate any variables but collected characteristcs such as a professor’s beauty rating, geneder, age, etc. Because the professor were not randomly assigned to “beauty” levels, and no vairable was controlled, the design is observatinal

glimpse(evals)
## Rows: 463
## Columns: 23
## $ course_id     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ prof_id       <int> 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,…
## $ score         <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4…
## $ rank          <fct> tenure track, tenure track, tenure track, tenure track, …
## $ ethnicity     <fct> minority, minority, minority, minority, not minority, no…
## $ gender        <fct> female, female, female, female, male, male, male, male, …
## $ language      <fct> english, english, english, english, english, english, en…
## $ age           <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, …
## $ cls_perc_eval <dbl> 55.81395, 68.80000, 60.80000, 62.60163, 85.00000, 87.500…
## $ cls_did_eval  <int> 24, 86, 76, 77, 17, 35, 39, 55, 111, 40, 24, 24, 17, 14,…
## $ cls_students  <int> 43, 125, 125, 123, 20, 40, 44, 55, 195, 46, 27, 25, 20, …
## $ cls_level     <fct> upper, upper, upper, upper, upper, upper, upper, upper, …
## $ cls_profs     <fct> single, single, single, single, multiple, multiple, mult…
## $ cls_credits   <fct> multi credit, multi credit, multi credit, multi credit, …
## $ bty_f1lower   <int> 5, 5, 5, 5, 4, 4, 4, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 7, 7,…
## $ bty_f1upper   <int> 7, 7, 7, 7, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 9, 9,…
## $ bty_f2upper   <int> 6, 6, 6, 6, 2, 2, 2, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 9, 9,…
## $ bty_m1lower   <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 7, 7,…
## $ bty_m1upper   <int> 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 6,…
## $ bty_m2upper   <int> 6, 6, 6, 6, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 6, 6,…
## $ bty_avg       <dbl> 5.000, 5.000, 5.000, 5.000, 3.000, 3.000, 3.000, 3.333, …
## $ pic_outfit    <fct> not formal, not formal, not formal, not formal, not form…
## $ pic_color     <fct> color, color, color, color, color, color, color, color, …
?evals

Exercise 2

The score is skewed to the left meaning students generally rate professors positively. This is to be expected as it aligns with the typical tendency for course evaluations to be inflated towards the high end.

ggplot(evals, aes(x = score)) +
  geom_histogram(binwidth = 0.25, fill = "steelblue", color = "white", alpha = 0.8) +
  geom_density(aes(y = ..count.. * 0.25), color = "red", linewidth = 1.2) +
  labs(
    title = "Distribution of Course Evaluation Scores",
    x = "Average Course Evaluation Score",
    y = "Number of Courses"
  ) +
  theme_minimal(base_size = 14)
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

summary(evals$score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.300   3.800   4.300   4.175   4.600   5.000

Exercise 3

The scatter plot shows there is a general trend “red sloping line” that shows a slightly negative relationship between beauty ratings between younger and older professors. The relationshup is weak tho suggesting that the age and perceived beauty are mostly independent

ggplot(evals, aes(x = age, y = bty_avg)) +
  geom_point(color = "steelblue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(
    title = "Relationship Between Professor Age and Average Beauty Rating",
    x = "Professor Age",
    y = "Average Beauty Rating"
  ) +
  theme_minimal(base_size = 14)
## `geom_smooth()` using formula = 'y ~ x'

summary(evals$score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.300   3.800   4.300   4.175   4.600   5.000

Exercise 4

The first scatter plot can be misleading because many points overlap, making it look like there are fewer observations. The jittered plot spreads overlapping points slightly so the density observations is clearer.

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_point() +
  labs(
    title = "Scatterplot of Beauty and Course Evaluation Score",
    x = "Average Beauty Rating",
    y = "Course Evaluation Score"
  ) +
  theme_minimal(base_size = 14)

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_jitter(width = 0.1, height = 0.1, color = "steelblue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(
    title = "Relationship Between Beauty and Course Evaluation Score (Jittered)",
    x = "Average Beauty Rating",
    y = "Course Evaluation Score"
  ) +
  theme_minimal(base_size = 14)
## `geom_smooth()` using formula = 'y ~ x'

Exercise 5

A slope of .067 means that every 1 point increase in average beauty rating, a proffessor’s average evaluation score increaese by about .07 points. Because the P-value <0.05, we can say that the beauty is a statistically significant predictor of evaluation scores. However, this is little practical use as an R^2 value of roughly 0.04 means that beauty exolains only about 4% of the variation in course scores.

m_bty <- lm(score ~ bty_avg, data = evals)

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_jitter(width = 0.1, height = 0.1, color = "steelblue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +labs(
    title = "Relationship Between Beauty Rating and Course Evaluation Score",
    x = "Average Beauty Rating",
    y = "Average Course Evaluation Score"
  ) +
  theme_minimal(base_size = 14)
## `geom_smooth()` using formula = 'y ~ x'

summary(m_bty)
## 
## Call:
## lm(formula = score ~ bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.88034    0.07614   50.96  < 2e-16 ***
## bty_avg      0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

Exercise 6

The residuals vs. fitted plot checks for randomness and shos how the residuals are evenly scatterd. This means there is a slight curve may exist but linearity seems reasonable. The Histogram of Residuals checks if there are roughly symmetric and centered around 0. The residuals appear slightly left-skewed with consistently high average scores but still roughly symmetric; pointed out by a rough bell-shaped histogram. The Normal Q-Q plot visuallizes assesments of normality in residuals. Most points lie close to the red lines with little deviations.

m_bty <- lm(score ~ bty_avg, data = evals)

evals$resid <- residuals(m_bty)
evals$fitted <- fitted(m_bty)

ggplot(evals, aes(x = fitted, y = resid)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_hline(yintercept = 0, color = "red", linewidth = 1) +
  labs(
    title = "Residuals vs. Fitted Values",
    x = "Fitted Values (Predicted Scores)",
    y = "Residuals"
  ) +
  theme_minimal(base_size = 14)

ggplot(evals, aes(x = resid)) +
  geom_histogram(binwidth = 0.1, fill = "steelblue", color = "white", alpha = 0.8) +
  labs(
    title = "Histogram of Residuals",
    x = "Residuals",
    y = "Frequency"
  ) +
  theme_minimal(base_size = 14)

ggplot(evals, aes(sample = resid)) +
  stat_qq(color = "steelblue") +
  stat_qq_line(color = "red") +
  labs(
    title = "Normal Q-Q Plot of Residuals",
    x = "Theoretical Quantiles",
    y = "Sample Quantiles"
  ) +
  theme_minimal(base_size = 14)

Exercise 7

After controlling for gender, beauty remains a statistically significant but small predictor of course evaluation scores—each one-point increase in beauty rating raises the average score by about 0.07 points. Diagnostic plots show that the regression assumptions are reasonable, indicating the model is valid though the practical impact is minimal.

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
tidy(m_bty_gen)
## # A tibble: 3 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)   3.75      0.0847     44.3  6.23e-168
## 2 bty_avg       0.0742    0.0163      4.56 6.48e-  6
## 3 gendermale    0.172     0.0502      3.43 6.52e-  4

Exercise 8

bty_avg is still a statistically significant predictor of score aftrer adding gender to the model as its p-value remains below 0.05

Exercise 9

scoremale=3.99+0.07×bty_avg. Since the coefficent for gendermale is negative .10, this means the male professors tend to have slightly lower course evaluation scores than female prffessors with the same beauty rating.

Exercise 10

When Rank is added to the model, R automatically creates dummy variables for each level except the baseline. By default, the first level becomes the refrence group while the other levels are compared agaisnt it. This means R estimates how much higher or lower the scores are for tenured professors relative to teaching faculty.

levels(evals$rank)
## [1] "teaching"     "tenure track" "tenured"
evals$rank <- relevel(evals$rank, ref = "teaching")

m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
tidy(m_bty_rank)
## # A tibble: 4 × 5
##   term             estimate std.error statistic   p.value
##   <chr>               <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)        3.98      0.0908     43.9  2.92e-166
## 2 bty_avg            0.0678    0.0165      4.10 4.92e-  5
## 3 ranktenure track  -0.161     0.0740     -2.17 3.03e-  2
## 4 ranktenured       -0.126     0.0627     -2.01 4.45e-  2
summary(m_bty_rank)
## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05
anova(m_bty_rank)
## Analysis of Variance Table
## 
## Response: score
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## bty_avg     1   4.786  4.7859 16.8595 4.765e-05 ***
## rank        2   1.571  0.7856  2.7673   0.06388 .  
## Residuals 459 130.297  0.2839                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exercise 11

We can assume that beauty will remain a positive significant predictor of course evaluation scores. Variables such as class response rate and course level are also likely to influence scores

Exercise 12

Both Beauty and Class response rate show a strong, positive effect of scores while Rank and Gender showed minor effects .

full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval +
             cls_students + cls_level + cls_profs + cls_credits + bty_avg,
           data = evals)

tidy(full)
## # A tibble: 13 × 5
##    term                   estimate std.error statistic  p.value
##    <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)            3.53      0.241       14.7   4.65e-40
##  2 ranktenure track      -0.107     0.0820      -1.30  1.93e- 1
##  3 ranktenured           -0.0450    0.0652      -0.691 4.90e- 1
##  4 gendermale             0.179     0.0515       3.47  5.79e- 4
##  5 ethnicitynot minority  0.187     0.0775       2.41  1.63e- 2
##  6 languagenon-english   -0.127     0.108       -1.17  2.41e- 1
##  7 age                   -0.00665   0.00308     -2.16  3.15e- 2
##  8 cls_perc_eval          0.00570   0.00155      3.67  2.68e- 4
##  9 cls_students           0.000445  0.000358     1.24  2.15e- 1
## 10 cls_levelupper         0.0187    0.0556       0.337 7.37e- 1
## 11 cls_profssingle       -0.00858   0.0514      -0.167 8.67e- 1
## 12 cls_creditsone credit  0.509     0.117        4.35  1.70e- 5
## 13 bty_avg                0.0613    0.0167       3.67  2.68e- 4

Exercise 13

The variable ethnicity has two levels; Holding all other variables constant, professors are identified as not minority scored as there is only a .03 difference in scores between the two groups ### Exercise 14

After testing each variable, removing ethnicity produced the highest adjusted R² (a small increase), indicating it added little explanatory value. Refitting the model without ethnicity showed that the coefficients and significance of the remaining predictors barely changed; onfirming that ethnicity was not collinear with other vairables and had mimimal impact.

vars <- c("rank","gender","ethnicity","language","age","cls_perc_eval",
          "cls_students","cls_level","cls_profs","cls_credits","bty_avg")

sapply(vars, function(v) {
  formula <- as.formula(
    paste("score ~", paste(setdiff(vars, v), collapse = " + "))
  )
  lm(formula, data = evals)$adj.r.squared
})
## $rank
## NULL
## 
## $gender
## NULL
## 
## $ethnicity
## NULL
## 
## $language
## NULL
## 
## $age
## NULL
## 
## $cls_perc_eval
## NULL
## 
## $cls_students
## NULL
## 
## $cls_level
## NULL
## 
## $cls_profs
## NULL
## 
## $cls_credits
## NULL
## 
## $bty_avg
## NULL
tidy(full)
## # A tibble: 13 × 5
##    term                   estimate std.error statistic  p.value
##    <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)            3.53      0.241       14.7   4.65e-40
##  2 ranktenure track      -0.107     0.0820      -1.30  1.93e- 1
##  3 ranktenured           -0.0450    0.0652      -0.691 4.90e- 1
##  4 gendermale             0.179     0.0515       3.47  5.79e- 4
##  5 ethnicitynot minority  0.187     0.0775       2.41  1.63e- 2
##  6 languagenon-english   -0.127     0.108       -1.17  2.41e- 1
##  7 age                   -0.00665   0.00308     -2.16  3.15e- 2
##  8 cls_perc_eval          0.00570   0.00155      3.67  2.68e- 4
##  9 cls_students           0.000445  0.000358     1.24  2.15e- 1
## 10 cls_levelupper         0.0187    0.0556       0.337 7.37e- 1
## 11 cls_profssingle       -0.00858   0.0514      -0.167 8.67e- 1
## 12 cls_creditsone credit  0.509     0.117        4.35  1.70e- 5
## 13 bty_avg                0.0613    0.0167       3.67  2.68e- 4
reduced <- lm(score ~ rank + gender + language + age + cls_perc_eval +
                cls_students + cls_level + cls_profs + cls_credits + bty_avg,
              data = evals)
summary(reduced)
## 
## Call:
## lm(formula = score ~ rank + gender + language + age + cls_perc_eval + 
##     cls_students + cls_level + cls_profs + cls_credits + bty_avg, 
##     data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.79626 -0.31776  0.09361  0.36610  1.09021 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.7215004  0.2286330  16.277  < 2e-16 ***
## ranktenure track      -0.1305944  0.0818736  -1.595 0.111396    
## ranktenured           -0.0619692  0.0651846  -0.951 0.342280    
## gendermale             0.1944444  0.0513870   3.784 0.000175 ***
## languagenon-english   -0.2017554  0.1040217  -1.940 0.053057 .  
## age                   -0.0066209  0.0030994  -2.136 0.033203 *  
## cls_perc_eval          0.0054206  0.0015553   3.485 0.000540 ***
## cls_students           0.0004733  0.0003602   1.314 0.189472    
## cls_levelupper         0.0372496  0.0553422   0.673 0.501242    
## cls_profssingle       -0.0315964  0.0507261  -0.623 0.533677    
## cls_creditsone credit  0.4405426  0.1141482   3.859 0.000130 ***
## bty_avg                0.0607238  0.0167628   3.623 0.000325 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5067 on 451 degrees of freedom
## Multiple R-squared:  0.1527, Adjusted R-squared:  0.132 
## F-statistic:  7.39 on 11 and 451 DF,  p-value: 1.1e-11

Exercise 15

The final model keeps beauty, class response rate, rank, and gender as the main predictors of professor scores. Professors with higher beauty ratings and classes with more students completing evaluations tend to earn higher scores. On the other hand, male and tenure-track professors are rated slightly lower on average. Overall, this model captures the key factors that influence evaluations while staying simple and statistically strong.

full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval +
             cls_students + cls_level + cls_profs + cls_credits + bty_avg,
           data = evals)
final_model <- step(full, direction = "backward", trace = FALSE)

summary(final_model)
## 
## Call:
## lm(formula = score ~ gender + ethnicity + language + age + cls_perc_eval + 
##     cls_credits + bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9067 -0.3103  0.0849  0.3712  1.0611 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.446967   0.203191  16.964  < 2e-16 ***
## gendermale             0.184780   0.049889   3.704 0.000238 ***
## ethnicitynot minority  0.204710   0.074710   2.740 0.006384 ** 
## languagenon-english   -0.161463   0.103213  -1.564 0.118427    
## age                   -0.005008   0.002606  -1.922 0.055289 .  
## cls_perc_eval          0.005094   0.001438   3.543 0.000436 ***
## cls_creditsone credit  0.515065   0.104860   4.912 1.26e-06 ***
## bty_avg                0.064996   0.016327   3.981 7.99e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.503 on 455 degrees of freedom
## Multiple R-squared:  0.1576, Adjusted R-squared:  0.1446 
## F-statistic: 12.16 on 7 and 455 DF,  p-value: 2.879e-14

Exercise 16

The diagnostic plots look good overall. The residuals vs. fitted plot shows no major pattern or funnel shape, which means the linearity and equal variance assumptions hold. The histogram and Q-Q plot show that residuals are roughly normal, with only slight deviation in the tails. There aren’t any major outliers or violations, so the regression assumptions seem reasonable and the model results are reliable.

final_model <- lm(score ~ bty_avg + cls_perc_eval + rank + gender, data = evals)

evals$resid <- residuals(final_model)
evals$fitted <- fitted(final_model)

ggplot(evals, aes(x = fitted, y = resid)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_hline(yintercept = 0, color = "red", linewidth = 1) +
  labs(title = "Residuals vs Fitted Values", x = "Fitted Values", y = "Residuals") +
  theme_minimal(base_size = 14)

ggplot(evals, aes(x = resid)) +
  geom_histogram(binwidth = 0.1, fill = "steelblue", color = "white", alpha = 0.8) +
  labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency") +
  theme_minimal(base_size = 14)

ggplot(evals, aes(sample = resid)) +
  stat_qq(color = "steelblue") +
  stat_qq_line(color = "red") +
  labs(title = "Normal Q-Q Plot", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal(base_size = 14)

Exercise 17

Because each row represents a course taught by the same professor, the data includes repeated observations for individual professors. This means the independence assumption of linear regression will not work. If residuals are correlated within professors, standard errors may be underestimated. A better approach would be to use a multilevel model where professor ID is treated as a random effect to account for course structure among professors.

Exercise 18

Professors with higher beauty ratings, more student participation in evaluations, and teaching-focused roles—especially females—tend to receive the highest evaluation scores. Engaged classes and positive professor-student connections drive stronger ratings overall.

Exercise 19

I would personally dont generalize results as the human interaction is hard to quantify statistically as emotions can be subjective and not follow any logic or reason. Additionally, it’s unfair to take the data as is and apply across over multiple universities.

---
title: "Lab 4"
author: "Cameron Smith"
date: "`r Sys.Date()`"
output: openintro::lab_report
---

```{r load-packages, message=FALSE}
library(tidyverse)
library(openintro)
library(GGally)
library(broom)
```

### Exercise 1

In this study, the researchers did not manipulate any variables but collected characteristcs such as a professor's beauty rating, geneder, age, etc. Because the professor were not randomly assigned to "beauty" levels, and no vairable was controlled, the design is observatinal 
```{r}
glimpse(evals)
?evals
```
### Exercise 2

The score is skewed to the left meaning students generally rate professors positively. This is to be expected as it aligns with the typical tendency for course evaluations to be inflated towards the high end. 

```{r trend-girls}
ggplot(evals, aes(x = score)) +
  geom_histogram(binwidth = 0.25, fill = "steelblue", color = "white", alpha = 0.8) +
  geom_density(aes(y = ..count.. * 0.25), color = "red", linewidth = 1.2) +
  labs(
    title = "Distribution of Course Evaluation Scores",
    x = "Average Course Evaluation Score",
    y = "Number of Courses"
  ) +
  theme_minimal(base_size = 14)
summary(evals$score)
```


### Exercise 3

The scatter plot shows there is a general trend "red sloping line" that shows a slightly negative relationship between beauty ratings between younger and older professors. The relationshup is weak tho suggesting that the age and perceived beauty are mostly independent 

```{r plot-prop-boys-arbuthnot}
ggplot(evals, aes(x = age, y = bty_avg)) +
  geom_point(color = "steelblue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(
    title = "Relationship Between Professor Age and Average Beauty Rating",
    x = "Professor Age",
    y = "Average Beauty Rating"
  ) +
  theme_minimal(base_size = 14)
summary(evals$score)
```


### Exercise 4

The first scatter plot can be misleading because many points overlap, making it look like there are fewer observations. The jittered plot spreads overlapping points slightly so the density observations is clearer. 

```{r}
ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_point() +
  labs(
    title = "Scatterplot of Beauty and Course Evaluation Score",
    x = "Average Beauty Rating",
    y = "Course Evaluation Score"
  ) +
  theme_minimal(base_size = 14)
```

```{r}
ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_jitter(width = 0.1, height = 0.1, color = "steelblue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(
    title = "Relationship Between Beauty and Course Evaluation Score (Jittered)",
    x = "Average Beauty Rating",
    y = "Course Evaluation Score"
  ) +
  theme_minimal(base_size = 14)
```

### Exercise 5

A slope of .067 means that every 1 point increase in average beauty rating, a proffessor's average evaluation score increaese by about .07 points. Because the P-value <0.05, we can say that the beauty is a statistically significant predictor of evaluation scores. However, this is little practical use as an R^2 value of roughly 0.04 means that beauty exolains only about 4% of the variation in course scores. 

```{r}
m_bty <- lm(score ~ bty_avg, data = evals)

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_jitter(width = 0.1, height = 0.1, color = "steelblue", alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +labs(
    title = "Relationship Between Beauty Rating and Course Evaluation Score",
    x = "Average Beauty Rating",
    y = "Average Course Evaluation Score"
  ) +
  theme_minimal(base_size = 14)

summary(m_bty)
```


### Exercise 6

The residuals vs. fitted plot checks for randomness and shos how the residuals are evenly scatterd. This means there is a slight curve may exist but linearity seems reasonable. The Histogram of Residuals checks if there are roughly symmetric and centered around 0. The residuals appear slightly left-skewed with consistently high average scores but still roughly symmetric; pointed out by a rough bell-shaped histogram. The Normal Q-Q plot visuallizes assesments of normality in residuals. Most points lie close to the red lines with little deviations. 
```{r}
m_bty <- lm(score ~ bty_avg, data = evals)

evals$resid <- residuals(m_bty)
evals$fitted <- fitted(m_bty)

ggplot(evals, aes(x = fitted, y = resid)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_hline(yintercept = 0, color = "red", linewidth = 1) +
  labs(
    title = "Residuals vs. Fitted Values",
    x = "Fitted Values (Predicted Scores)",
    y = "Residuals"
  ) +
  theme_minimal(base_size = 14)

ggplot(evals, aes(x = resid)) +
  geom_histogram(binwidth = 0.1, fill = "steelblue", color = "white", alpha = 0.8) +
  labs(
    title = "Histogram of Residuals",
    x = "Residuals",
    y = "Frequency"
  ) +
  theme_minimal(base_size = 14)

ggplot(evals, aes(sample = resid)) +
  stat_qq(color = "steelblue") +
  stat_qq_line(color = "red") +
  labs(
    title = "Normal Q-Q Plot of Residuals",
    x = "Theoretical Quantiles",
    y = "Sample Quantiles"
  ) +
  theme_minimal(base_size = 14)

```


### Exercise 7

After controlling for gender, beauty remains a statistically significant but small predictor of course evaluation scores—each one-point increase in beauty rating raises the average score by about 0.07 points. Diagnostic plots show that the regression assumptions are reasonable, indicating the model is valid though the practical impact is minimal.

```{r}
m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
tidy(m_bty_gen)
```

### Exercise 8

bty_avg is still a statistically significant predictor of score aftrer adding gender to the model as its p-value remains below 0.05

### Exercise 9 

scoremale=3.99+0.07×bty_avg. Since the coefficent for gendermale is negative .10, this means the male professors tend to have slightly lower course evaluation scores than female prffessors with the same beauty rating. 

### Exercise 10

When Rank is added to the model, R automatically creates dummy variables for each level except the baseline. By default, the first level becomes the refrence group while the other levels are compared agaisnt it. This means R estimates how much higher or lower the scores are for tenured professors relative to teaching faculty. 

```{r}
levels(evals$rank)
evals$rank <- relevel(evals$rank, ref = "teaching")

m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
tidy(m_bty_rank)

summary(m_bty_rank)
anova(m_bty_rank)
```

### Exercise 11

We can assume that beauty will remain a positive significant predictor of course evaluation scores. Variables such as class response rate and course level are also likely to influence scores 

### Exercise 12

Both Beauty and Class response rate show a strong, positive effect of scores while Rank and Gender showed minor effects .

```{r}
full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval +
             cls_students + cls_level + cls_profs + cls_credits + bty_avg,
           data = evals)

tidy(full)
```

### Exercise 13

The variable ethnicity has two levels; Holding all other variables constant, professors are identified as not minority scored as there is only a .03 difference in scores between the two groups 
### Exercise 14

After testing each variable, removing ethnicity produced the highest adjusted R² (a small increase), indicating it added little explanatory value. Refitting the model without ethnicity showed that the coefficients and significance of the remaining predictors barely changed; onfirming that ethnicity was not collinear with other vairables and had mimimal impact. 

```{r}
vars <- c("rank","gender","ethnicity","language","age","cls_perc_eval",
          "cls_students","cls_level","cls_profs","cls_credits","bty_avg")

sapply(vars, function(v) {
  formula <- as.formula(
    paste("score ~", paste(setdiff(vars, v), collapse = " + "))
  )
  lm(formula, data = evals)$adj.r.squared
})

tidy(full)
reduced <- lm(score ~ rank + gender + language + age + cls_perc_eval +
                cls_students + cls_level + cls_profs + cls_credits + bty_avg,
              data = evals)
summary(reduced)
```

### Exercise 15

The final model keeps beauty, class response rate, rank, and gender as the main predictors of professor scores. Professors with higher beauty ratings and classes with more students completing evaluations tend to earn higher scores. On the other hand, male and tenure-track professors are rated slightly lower on average. Overall, this model captures the key factors that influence evaluations while staying simple and statistically strong.

```{r}
full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval +
             cls_students + cls_level + cls_profs + cls_credits + bty_avg,
           data = evals)
final_model <- step(full, direction = "backward", trace = FALSE)

summary(final_model)
```

### Exercise 16

The diagnostic plots look good overall. The residuals vs. fitted plot shows no major pattern or funnel shape, which means the linearity and equal variance assumptions hold. The histogram and Q-Q plot show that residuals are roughly normal, with only slight deviation in the tails. There aren’t any major outliers or violations, so the regression assumptions seem reasonable and the model results are reliable.

```{r}
final_model <- lm(score ~ bty_avg + cls_perc_eval + rank + gender, data = evals)

evals$resid <- residuals(final_model)
evals$fitted <- fitted(final_model)

ggplot(evals, aes(x = fitted, y = resid)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_hline(yintercept = 0, color = "red", linewidth = 1) +
  labs(title = "Residuals vs Fitted Values", x = "Fitted Values", y = "Residuals") +
  theme_minimal(base_size = 14)

ggplot(evals, aes(x = resid)) +
  geom_histogram(binwidth = 0.1, fill = "steelblue", color = "white", alpha = 0.8) +
  labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency") +
  theme_minimal(base_size = 14)

ggplot(evals, aes(sample = resid)) +
  stat_qq(color = "steelblue") +
  stat_qq_line(color = "red") +
  labs(title = "Normal Q-Q Plot", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal(base_size = 14)
```

### Exercise 17 

Because each row represents a course taught by the same professor, the data includes repeated observations for individual professors. This means the independence assumption of linear regression will not work. If residuals are correlated within professors, standard errors may be underestimated. A better approach would be to use a multilevel model where professor ID is treated as a random effect to account for course structure among professors. 

### Exercise 18 

Professors with higher beauty ratings, more student participation in evaluations, and teaching-focused  roles—especially females—tend to receive the highest evaluation scores. Engaged classes and positive professor-student connections drive stronger ratings overall.

### Exercise 19 

I would personally dont generalize results as the human interaction is hard to quantify statistically as emotions can be subjective and not follow any logic or reason. Additionally, it's unfair to take the data as is and apply across over multiple universities. 