Multiple linear regression

Introduction

This analysis examines factors influencing professor evaluation scores at the University of Texas at Austin, with particular focus on whether physical attractiveness predicts higher ratings. Using multiple linear regression on 463 course evaluations, I found that while beauty rating is a statistically significant predictor (p < 0.001), it explains only 3.5% of the variance in scores—suggesting minimal practical impact. The final model identifies seven significant predictors: gender, language of degree institution, age, class evaluation participation rate, course credits, beauty rating, and picture color. Key limitations include the single-institution sample and non-independent observations due to professors teaching multiple courses. This project demonstrates skills in exploratory data analysis, regression diagnostics, model selection via backward elimination, and communicating statistical findings to diverse audiences.

Here, you will analyze the data from this study in order to learn what goes into a positive professor evaluation.

Getting Started

Load packages

In this lab, you will explore and visualize the data using the tidyverse suite of packages. The data can be found in the companion package for OpenIntro resources, openintro.

Let’s load the packages.

library(tidyverse)
library(openintro)
library(GGally)

This is the first time we’re using the GGally package. You will be using the ggpairs function from this package later in the lab.

The data

The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rated the professors’ physical appearance. The result is a data frame where each row contains a different course and columns represent variables about the courses and professors. It’s called evals.

glimpse(evals)

## Rows: 463
## Columns: 23
## $ course_id     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ prof_id       <int> 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,…
## $ score         <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4…
## $ rank          <fct> tenure track, tenure track, tenure track, tenure track, …
## $ ethnicity     <fct> minority, minority, minority, minority, not minority, no…
## $ gender        <fct> female, female, female, female, male, male, male, male, …
## $ language      <fct> english, english, english, english, english, english, en…
## $ age           <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, …
## $ cls_perc_eval <dbl> 55.81395, 68.80000, 60.80000, 62.60163, 85.00000, 87.500…
## $ cls_did_eval  <int> 24, 86, 76, 77, 17, 35, 39, 55, 111, 40, 24, 24, 17, 14,…
## $ cls_students  <int> 43, 125, 125, 123, 20, 40, 44, 55, 195, 46, 27, 25, 20, …
## $ cls_level     <fct> upper, upper, upper, upper, upper, upper, upper, upper, …
## $ cls_profs     <fct> single, single, single, single, multiple, multiple, mult…
## $ cls_credits   <fct> multi credit, multi credit, multi credit, multi credit, …
## $ bty_f1lower   <int> 5, 5, 5, 5, 4, 4, 4, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 7, 7,…
## $ bty_f1upper   <int> 7, 7, 7, 7, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 9, 9,…
## $ bty_f2upper   <int> 6, 6, 6, 6, 2, 2, 2, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 9, 9,…
## $ bty_m1lower   <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 7, 7,…
## $ bty_m1upper   <int> 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 6,…
## $ bty_m2upper   <int> 6, 6, 6, 6, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 6, 6,…
## $ bty_avg       <dbl> 5.000, 5.000, 5.000, 5.000, 3.000, 3.000, 3.000, 3.333, …
## $ pic_outfit    <fct> not formal, not formal, not formal, not formal, not form…
## $ pic_color     <fct> color, color, color, color, color, color, color, color, …

We have observations on 21 different variables, some categorical and some numerical. The meaning of each variable can be found by bringing up the help file:

?evals

Exercise 1: Is this an observational study or an experiment?

The original research question posed in the paper is whether beauty leads directly to the differences in course evaluations. Given the study design, is it possible to answer this question as it is phrased? If not, rephrase the question.

This is an observational study. The data was collected at the end of the semester when students evaluated their professors through course evaluations. This study qualifies as observational rather than experimental for several key reasons:

No Variable Manipulation: In an observational study, researchers collect data on variables as they naturally occur without any intervention or manipulation. In this case, the researchers are simply gathering existing evaluation scores and other variables (such as professor characteristics, course attributes, or student demographics) without controlling or altering any conditions. The professors taught their courses naturally, and students provided their ratings based on their authentic experiences.

Lack of Random Assignment: Unlike an experimental study, there was no random assignment of subjects to different treatment groups. Students enrolled in courses through normal registration processes, and professors taught as they typically would. Researchers did not assign students to specific professors or manipulate teaching methods to observe different outcomes.

Passive Data Collection: The emphasis is on recording and analyzing data from variables that already exist in their natural state. The student evaluations occurred as part of the routine end-of-semester process, and researchers are examining these pre-existing ratings rather than creating controlled conditions to test a hypothesis.

What Would Make It Experimental: For this to be an experimental study, researchers would need to actively manipulate one or more independent variables (such as randomly assigning different teaching methods, class sizes, or professor characteristics) and then measure the effect on the dependent variable (student evaluations) while controlling for other factors.

Therefore, this study is observational because it involves measuring and analyzing naturally occurring data without researcher intervention or control over the variables of interest.

Original Research Question The original research question posed in the paper is whether beauty leads directly to differences in course evaluations.

Evaluation of the Research Question and Study Design Given the design of the study, it is not possible to answer the question as originally phrased. Establishing a direct causal relationship between beauty and course evaluations would require an experimental design in which the independent variable—beauty—is systematically manipulated. For example, participants would need to evaluate instructors who have been standardized according to predetermined levels of beauty. This would allow the researcher to isolate and assess whether beauty itself contributes to variations in average course evaluation scores.

However, the existing study does not employ such an experimental manipulation. Instead, it relies on naturally occurring variations in perceived beauty and evaluation scores, which limits the conclusions to correlational findings rather than causal ones.

Revised Research Question A research question that better aligns with the current study design would be:

“Is there a correlation between students’ beauty ratings of professors and the professors’ average course evaluation scores?”

Exercise 2: Describe the distribution of `score`.

Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not?

hist(evals$score, main = "Average Professor Evaluation Score", 
     xlab = "Score", col = "goldenrod1")

Analysis of the Distribution of score

The distribution is left-skewed** (negatively skewed), showing that the majority of professors received high evaluation scores.** The bulk of the data is concentrated between 4.0 and 4.5 on the evaluation scale, with a long tail extending toward lower scores (2.5-3.5). This indicates that students in this study were more likely to rate professors favorably, with relatively few professors receiving low evaluation scores.

Key Observations: - The peak (mode) of the distribution appears to be around 4.3-4.5 - Very few professors received scores below 3.0 - The left tail suggests that while most professors were rated highly, there were some outliers who received notably lower evaluations

Unexpected Finding: I did not expect to see this pattern. I anticipated a more normal (symmetric) distribution centered around the middle of the scale, with approximately equal numbers of professors rated above and below average. Instead, the data shows a ceiling effect, where ratings cluster toward the higher end of the scale. This could suggest: - Grade inflation or evaluation inflation - Students’ reluctance to give low ratings - Generally effective teaching across the sample - Possible response bias (students satisfied with courses may be more likely to complete evaluations)

The Key Correction: When the tail extends to the LEFT (toward lower values), it’s LEFT-skewed. When the tail extends to the RIGHT (toward higher values), it’s RIGHT-skewed. Think of the direction the tail is pointing!

Exercise 3: Excluding `score`, select two other variables for analysis

Two other pairs of variables to explore are pic_color and tenure and pic_outfit and tenure.

# Grouped bar chart - Picture Color
library(ggplot2)
p1 <- ggplot(evals, aes(x = rank, fill = pic_color)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(values = c("goldenrod1", "steelblue3")) +
  labs(title = "Professor Rank by Picture Color",
       x = "Rank",
       fill = "Picture Type")
print(p1)

# Check if "teaching" exists in the rank column
print("teaching" %in% evals$rank)

## [1] TRUE

# Grouped bar chart - Picture Outfit
p2 <- ggplot(evals, aes(x = rank, fill = pic_outfit)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(values = c("goldenrod1", "coral3")) +
  labs(title = "Professor Rank by Picture Outfit",
       x = "Rank",
       fill = "Picture Outfit")
print(p2)

# Check if any teaching rank professors have formal pictures
evals %>% 
  filter(rank == "teaching", pic_outfit == "formal") %>% 
  nrow() > 0

## [1] FALSE

Analysis of the plot of professor rank by picture type

cat("=============Overall percentages of Professor Rank by Picture Color===============\n")

## =============Overall percentages of Professor Rank by Picture Color===============

table1 <- table(evals$rank, evals$pic_color)
print("Counts: Rank by Picture Color")

## [1] "Counts: Rank by Picture Color"

print(table1)

##               
##                black&white color
##   teaching              13    89
##   tenure track          20    88
##   tenured               45   208

# Now prop.table will work
print("\nRow Percentages (% within each rank)")

## [1] "\nRow Percentages (% within each rank)"

print(round(prop.table(table1, margin = 1) * 100, 1))

##               
##                black&white color
##   teaching            12.7  87.3
##   tenure track        18.5  81.5
##   tenured             17.8  82.2

cat("=============Overall percentages of Professor Rank by Picture Outfit===============\n")

## =============Overall percentages of Professor Rank by Picture Outfit===============

table2 <- table(evals$rank, evals$pic_outfit)
print("Counts: Rank by Picture Color")

## [1] "Counts: Rank by Picture Color"

print(table2)

##               
##                formal not formal
##   teaching          0        102
##   tenure track     15         93
##   tenured          62        191

# Now prop.table will work
print("\nRow Percentages (% within each rank)")

## [1] "\nRow Percentages (% within each rank)"

print(round(prop.table(table2, margin = 1) * 100, 1))

##               
##                formal not formal
##   teaching        0.0      100.0
##   tenure track   13.9       86.1
##   tenured        24.5       75.5

A clear relationship exists between academic rank and picture outfit formality. Across all ranks, professors predominantly wore informal attire rather than formal attire. Teaching rank professors overwhelmingly chose informal dress, with approximately 100 wearing informal attire and no professors recorded as wearing formal attire in the dataset. Tenure track professors demonstrated a similar pattern, with roughly 95 wearing informal attire compared to only 15-20 in formal clothing. Even tenured professors, who comprised the largest group in the dataset, favored informal attire, with approximately 190 wearing informal dress compared to only about 60 in formal attire. While tenured professors showed a slightly higher proportion of formal attire compared to lower ranks, informal dress remained the dominant choice across all academic ranks.

Simple linear regression

The fundamental phenomenon suggested by the study is that better looking teachers are evaluated more favorably. Let’s create a scatterplot to see if this appears to be the case:

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_point()

Before you draw conclusions about the trend, compare the number of observations in the data frame with the approximate number of points on the scatterplot. Is anything awry?

The scatter plot shows significantly less observations than the total 463 observations recorded in this dataset. The scatter plot will be plotted with geom_jitter().

Exercise 4. Replot the scatterplot, with `geom_jitter`.

As expected the plot using geom_jitter() has spread the points horizontally but kept the exact y values.

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_jitter()

Exercise 5.Fit a linear model called `m_bty` to predict average professor score by average beauty rating.

# Colored points but single regression line
ggplot(evals, aes(x = bty_avg, y = score)) +
  geom_jitter(aes(color = rank), alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, color = "darkgreen") +
  scale_color_manual(values = c("teaching" = "goldenrod1", 
                                 "tenure track" = "steelblue3", 
                                 "tenured" = "coral3")) +
  labs(title = "Evaluation Score vs Beauty Rating",
       x = "Average Beauty Rating",
       y = "Evaluation Score",
       color = "Rank")

m_bty <- lm(score ~ bty_avg, data = evals)
summary(m_bty)

## 
## Call:
## lm(formula = score ~ bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.88034    0.07614   50.96  < 2e-16 ***
## bty_avg      0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

The predicted evaluation score equals 3.88 plus 0.067 times the average beauty rating. Or written as an equation: score =3.88+0.067 x bty_avg

Interpretation of the slope: For every 1-point increase in average beauty rating, the predicted evaluation score increases by approximately 0.067 points.

Statistical significance: Yes—the p-value for bty_avg is 5.08e-05 (or 0.0000508), which is far below 0.05. The three asterisks (***) confirm this is highly significant.

Practical significance: This is debatable. While the relationship is statistically significant, the R-squared is only 0.035, meaning beauty rating explains just 3.5% of the variation in evaluation scores. Additionally, a slope of 0.067 is quite small—a professor would need to be rated about 15 points higher in beauty to gain just 1 point in their evaluation score. So while there’s a real relationship, its practical impact is minimal.

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_jitter() +
  geom_smooth(method = "lm")

The blue line is the model. The shaded gray area around the line tells you about the variability you might expect in your predictions. To turn that off, use se = FALSE.

ggplot(data = evals, aes(x = bty_avg, y = score)) +
  geom_jitter() +
  geom_smooth(method = "lm", se = FALSE)

Question 6. Use residual plots to evaluate the conditions of least squares regression.

# Plot 1: Residuals vs. Fitted Values (checks linearity & constant variability)
plot(m_bty$residuals ~ m_bty$fitted.values,
     main = "Residuals vs. Fitted Values",
     xlab = "Fitted Values",
     ylab = "Residuals",
     col = "steelblue3",
     pch = 16)
abline(h = 0, col = "goldenrod1", lwd = 2, lty = 2)

# Plot 2: Normal Q-Q Plot (checks normality of residuals)
qqnorm(m_bty$residuals, 
       main = "Normal Q-Q Plot of Residuals",
       col = "steelblue3",
       pch = 16)
qqline(m_bty$residuals, col = "goldenrod1", lwd = 2)

# Plot 3: Histogram of Residuals (also checks normality)
hist(m_bty$residuals,
     main = "Distribution of Residuals",
     xlab = "Residuals",
     col = "steelblue3",
     breaks = 20)

# Plot 4: Residuals vs. Order/Index (checks independence)
plot(m_bty$residuals,
     main = "Residuals vs. Order of Data Collection",
     xlab = "Order of Data",
     ylab = "Residuals",
     col = "steelblue3",
     pch = 16)
abline(h = 0, col = "goldenrod1", lwd = 2, lty = 2)

Analysis

Plot 1: Check for linearity & constant variability - The variability around zero is random. There is no pattern and consistent spread indicating homoscedasticity and meeting the criteria for least squares regression.

Plot 2: Check for the nearly normal residuals - The normal Q-Q plot shows that the residuals closely follow the theoretical normal line. Minor deviations occur in the tails, suggesting that the normality condition is reasonably met.

Plot 3: Check for the nearly normal residuals using a histogram The distribution of residuals is slightly left-skewed (negatively skewed), with the majority of residuals concentrated near zero and between 0 and 0.5, but with a longer tail extending toward negative values (reaching approximately -2.0). The peak of the distribution is slightly right of zero, and there are fewer extreme positive residuals (reaching only about 1.0) compared to extreme negative residuals.

Plot4: Check for independence of observation The residuals vs. order plot shows [no apparent time trend or pattern/a pattern], suggesting that the independence condition is likely met. Since these are course evaluations collected at the end of a semester,it is not expected there may be time-based dependencies and the criteria for least squares regression is met.

Overall conclusion of the whether conditions of least squares was met.

While there is a minor departure from perfect normality (left skew in the histogram and tail deviations in the Q-Q plot), these violations are relatively small. Given the large sample size (463 observations) and the robustness of linear regression to minor normality violations, the model appears appropriate for this data and the regression results can be considered reliable.

Multiple linear regression

The data set contains several variables on the beauty score of the professor: individual ratings from each of the six students who were asked to score the physical appearance of the professors and the average of these six scores. Let’s take a look at the relationship between one of these scores and the average beauty score.

ggplot(data = evals, aes(x = bty_f1lower, y = bty_avg)) +
  geom_point()

evals %>% 
  summarise(cor(bty_avg, bty_f1lower))

## # A tibble: 1 × 1
##   `cor(bty_avg, bty_f1lower)`
##                         <dbl>
## 1                       0.844

As expected, the relationship is quite strong—after all, the average score is calculated using the individual scores. You can actually look at the relationships between all beauty variables (columns 13 through 19) using the following command:

evals %>%
  select(contains("bty")) %>%
  ggpairs()

These variables are collinear (correlated), and adding more than one of these variables to the model would not add much value to the model. In this application and with these highly-correlated predictors, it is reasonable to use the average beauty score as the single representative of these variables.

In order to see if beauty is still a significant predictor of professor score after you’ve accounted for the professor’s gender, you can add the gender term into the model.

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)

## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07

Question 7: Verify that the conditions for this model are reasonable using diagnostic plots.

# 1. Residuals vs Fitted (linearity & constant variability)
ggplot(data = m_bty_gen, aes(x = .fitted, y = .resid)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted Values",
       x = "Fitted Values",
       y = "Residuals")

# 2. Histogram of Residuals (normality)
ggplot(data = m_bty_gen, aes(x = .resid)) +
  geom_histogram(binwidth = 0.25, fill = "steelblue", color = "white") +
  labs(title = "Distribution of Residuals",
       x = "Residuals",
       y = "Count")

# 3. Q-Q Plot (normality)
ggplot(data = m_bty_gen, aes(sample = .resid)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  labs(title = "Normal Q-Q Plot of Residuals")

Linearity: The residuals vs. fitted values plot shows points scattered randomly around the horizontal line at 0 with no clear curved pattern. The linearity condition is satisfied.

Constant Variability (Homoscedasticity): The vertical spread of residuals appears relatively consistent across all fitted values (from 4.0 to 4.5). There is no obvious fan or funnel shape. The constant variability condition is satisfied.

Normality: The histogram shows a left-skewed distribution rather than a perfectly symmetric bell curve, with a longer tail extending toward negative values. The Q-Q plot confirms this—points follow the line in the middle but deviate at both tails (curving below the line at the upper end). There is a modest departure from normality.

Independence: The observations are independent since each row represents a different professor’s course evaluation, and one professor’s rating should not influence another’s.

Conclusion: Three of the four conditions (linearity, constant variability, and independence) are reasonably satisfied. The normality condition shows some violation due to left skew. However, with a large sample size (n = 463), linear regression is robust to moderate departures from normality. Therefore, the conditions are reasonably met, and we can trust the p-values and parameter estimates from this model.

Question 8a: Is `bty_avg` still a significant predictor of `score`?

summary(m_bty_gen)

## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07

# Compare the two models side by side
summary(m_bty)        # Simple model

## 
## Call:
## lm(formula = score ~ bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.88034    0.07614   50.96  < 2e-16 ***
## bty_avg      0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

summary(m_bty_gen)    # Model with gender added

## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07

Yes, bty_avg remains a statistically significant predictor of score after adding gender to the model. The p-value for bty_avg in the multiple regression model is approximately 0.00005, which is well below the 0.05 significance threshold.

Question 8b: Has the addition of `gender` to the model changed the parameter estimate for `bty_avg`?

Model	bty_avg Coefficient
Simple (m_bty)	0.0666
Multiple (m_bty_gen)	0.0742

Yes, the coefficient increased slightly from 0.067 to 0.074 when gender was added. This suggests gender was a minor confounding variable—after controlling for it, the effect of beauty on evaluation scores appears slightly stronger than in the simple model.

\[ \begin{aligned} \widehat{score} &= \hat{\beta}_0 + \hat{\beta}_1 \times bty\_avg + \hat{\beta}_2 \times (0) \\ &= \hat{\beta}_0 + \hat{\beta}_1 \times bty\_avg\end{aligned} \]

ggplot(data = evals, aes(x = bty_avg, y = score, color = pic_color)) +
 geom_smooth(method = "lm", formula = y ~ x, se = FALSE)

Question 9. What is the equation of the line corresponding to those with color pictures?

m_bty_pic <- lm(score ~ bty_avg + pic_color, data = evals)
summary(m_bty_pic)

## 
## Call:
## lm(formula = score ~ bty_avg + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8892 -0.3690  0.1293  0.4023  0.9125 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     4.06318    0.10908  37.249  < 2e-16 ***
## bty_avg         0.05548    0.01691   3.282  0.00111 ** 
## pic_colorcolor -0.16059    0.06892  -2.330  0.02022 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5323 on 460 degrees of freedom
## Multiple R-squared:  0.04628,    Adjusted R-squared:  0.04213 
## F-statistic: 11.16 on 2 and 460 DF,  p-value: 1.848e-05

The equation is score = 4.063 + 0.055 x bty_avg - 0.161 X pic_color

Question 10: How does R appear to handle categorical variables that have more than two levels?

m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)

## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05

**Sample dummy code variables

# See the levels of rank (alphabetical order determines reference)
levels(evals$rank)

## [1] "teaching"     "tenure track" "tenured"

# Show the contrast matrix R uses
contrasts(evals$rank)

##              tenure track tenured
## teaching                0       0
## tenure track            1       0
## tenured                 0       1

# See the actual dummy variables R creates using model.matrix()
head(model.matrix(~ rank, data = evals), 10)

##    (Intercept) ranktenure track ranktenured
## 1            1                1           0
## 2            1                1           0
## 3            1                1           0
## 4            1                1           0
## 5            1                0           1
## 6            1                0           1
## 7            1                0           1
## 8            1                0           1
## 9            1                0           1
## 10           1                0           1

# Show a few rows with original rank and dummy coding side by side
evals %>%
  select(score, bty_avg, rank) %>%
  mutate(
    rank_tenure_track = ifelse(rank == "tenure track", 1, 0),
    rank_tenured = ifelse(rank == "tenured", 1, 0)
  ) %>%
  head(15)

## # A tibble: 15 × 5
##    score bty_avg rank         rank_tenure_track rank_tenured
##    <dbl>   <dbl> <fct>                    <dbl>        <dbl>
##  1   4.7    5    tenure track                 1            0
##  2   4.1    5    tenure track                 1            0
##  3   3.9    5    tenure track                 1            0
##  4   4.8    5    tenure track                 1            0
##  5   4.6    3    tenured                      0            1
##  6   4.3    3    tenured                      0            1
##  7   2.8    3    tenured                      0            1
##  8   4.1    3.33 tenured                      0            1
##  9   3.4    3.33 tenured                      0            1
## 10   4.5    3.17 tenured                      0            1
## 11   3.8    3.17 tenured                      0            1
## 12   4.5    3.17 tenured                      0            1
## 13   4.6    3.17 tenured                      0            1
## 14   3.9    3.17 tenured                      0            1
## 15   3.9    3.17 tenured                      0            1

Expected output from contrasts(evals$rank):

             tenure track tenured
teaching                0       0
tenure track            1       0
tenured                 0       1

Expected output from model.matrix():

   (Intercept) ranktenure track ranktenured
1            1                1           0
2            1                1           0
3            1                1           0
4            1                1           0
5            1                0           1
6            1                0           1
...

When a categorical variable has more than two levels, R creates multiple dummy variables—specifically, one fewer dummy variable than the number of levels (k-1 dummy variables for k levels). For rank with three levels (teaching, tenure track, tenured):

R uses “teaching” as the reference category (because it comes first alphabetically)
R creates two dummy variables: ranktenure track and ranktenured

The interpretation of the coefficients in multiple regression is slightly different from that of simple regression. The estimate for bty_avg reflects how much higher a group of professors is expected to score if they have a beauty rating that is one point higher while holding all other variables constant. In this case, that translates into considering only professors of the same rank with bty_avg scores that are one point apart.

The search for the best model

We will start with a full model that predicts professor score based on rank, gender, ethnicity, language of the university where they got their degree, age, proportion of students that filled out evaluations, class size, course level, number of professors, number of credits, average beauty rating, outfit, and picture color.

Question 11: Which variable would you expect to have the highest p-value in this model?

I predict that the variable with the highest p score is cls_level since the division of the course seems irrelevant to the perception of the professors beauty or overall evaluation of the professor. A high p-value (p > 0.05) generally suggests that there is not enough statistical evidence to conclude that the observed relationship between the variable and the outcome is real or significant in the studied population

Question 12: Check your suspicions from the previous exercise. Include the model output

in your response.

m_full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_profs + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)
summary(m_full)

## 
## Call:
## lm(formula = score ~ rank + gender + ethnicity + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

I was close in my estimate, cls_level had the second highest p value. The only variable with a higher p value, meaning it was the only feature that was even less statistically significant was the cls_profs describing whether the class had single or multiple professors.

Question 13: Interpret the coefficient associated with the ethnicity variable.

The coefficient for ethnicitynot minority is 0.1235 with a p-value of 0.117. Interpretation: Professors who are not from a minority ethnic group are predicted to score 0.1235 points higher on their evaluations compared to professors from minority ethnic groups, holding all other variables constant. However, this relationship is not statistically significant at the alpha = 0.05 level (p = 0.117 > 0.05), meaning we cannot conclude with confidence that there is a true difference in evaluation scores between minority and non-minority professors after controlling for all other variables in the model.

Question 14: Drop the variable with the highest p-value and re-fit the model.

# Original full model
m_full <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_profs + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)

# Model with cls_profs removed (highest p-value)
m_full_minus_profs <- lm(score ~ rank + gender + ethnicity + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)

#FULL MODEL 
print(summary(m_full))

## 
## Call:
## lm(formula = score ~ rank + gender + ethnicity + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

#MODEL WITHOUT cls_profs (the variable with the highest p value)
print(summary(m_full_minus_profs))

## 
## Call:
## lm(formula = score ~ rank + gender + ethnicity + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7836 -0.3257  0.0859  0.3513  0.9551 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0872523  0.2888562  14.150  < 2e-16 ***
## ranktenure track      -0.1476746  0.0819824  -1.801 0.072327 .  
## ranktenured           -0.0973829  0.0662614  -1.470 0.142349    
## gendermale             0.2101231  0.0516873   4.065 5.66e-05 ***
## ethnicitynot minority  0.1274458  0.0772887   1.649 0.099856 .  
## languagenon-english   -0.2282894  0.1111305  -2.054 0.040530 *  
## age                   -0.0089992  0.0031326  -2.873 0.004262 ** 
## cls_perc_eval          0.0052888  0.0015317   3.453 0.000607 ***
## cls_students           0.0004687  0.0003737   1.254 0.210384    
## cls_levelupper         0.0606374  0.0575010   1.055 0.292200    
## cls_creditsone credit  0.5061196  0.1149163   4.404 1.33e-05 ***
## bty_avg                0.0398629  0.0174780   2.281 0.023032 *  
## pic_outfitnot formal  -0.1083227  0.0721711  -1.501 0.134080    
## pic_colorcolor        -0.2190527  0.0711469  -3.079 0.002205 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared:  0.187,  Adjusted R-squared:  0.1634 
## F-statistic: 7.943 on 13 and 449 DF,  p-value: 2.336e-14

# Extract coefficients from both models
full_coef <- coef(m_full)
reduced_coef <- coef(m_full_minus_profs)

# Find common variables (excluding cls_profs)
common_vars <- names(reduced_coef)

# Create comparison dataframe
comparison <- data.frame(
  Variable = common_vars,
  Full_Model = round(full_coef[common_vars], 5),
  Reduced_Model = round(reduced_coef[common_vars], 5),
  Difference = round(full_coef[common_vars] - reduced_coef[common_vars], 5)
)

print(comparison)

##                                    Variable Full_Model Reduced_Model Difference
## (Intercept)                     (Intercept)    4.09521       4.08725    0.00796
## ranktenure track           ranktenure track   -0.14759      -0.14767    0.00008
## ranktenured                     ranktenured   -0.09734      -0.09738    0.00005
## gendermale                       gendermale    0.21095       0.21012    0.00082
## ethnicitynot minority ethnicitynot minority    0.12349       0.12745   -0.00395
## languagenon-english     languagenon-english   -0.22981      -0.22829   -0.00152
## age                                     age   -0.00901      -0.00900   -0.00001
## cls_perc_eval                 cls_perc_eval    0.00533       0.00529    0.00004
## cls_students                   cls_students    0.00045       0.00047   -0.00001
## cls_levelupper               cls_levelupper    0.06051       0.06064   -0.00012
## cls_creditsone credit cls_creditsone credit    0.50204       0.50612   -0.00408
## bty_avg                             bty_avg    0.04003       0.03986    0.00017
## pic_outfitnot formal   pic_outfitnot formal   -0.11268      -0.10832   -0.00436
## pic_colorcolor               pic_colorcolor   -0.21726      -0.21905    0.00179

# Extract p-values
full_pvals <- summary(m_full)$coefficients[, "Pr(>|t|)"]
reduced_pvals <- summary(m_full_minus_profs)$coefficients[, "Pr(>|t|)"]

# Create p-value comparison dataframe
pval_comparison <- data.frame(
  Variable = common_vars,
  Full_Model_pval = round(full_pvals[common_vars], 5),
  Reduced_Model_pval = round(reduced_pvals[common_vars], 5)
)

print(pval_comparison)

##                                    Variable Full_Model_pval Reduced_Model_pval
## (Intercept)                     (Intercept)         0.00000            0.00000
## ranktenure track           ranktenure track         0.07278            0.07233
## ranktenured                     ranktenured         0.14295            0.14235
## gendermale                       gendermale         0.00006            0.00006
## ethnicitynot minority ethnicitynot minority         0.11698            0.09986
## languagenon-english     languagenon-english         0.03965            0.04053
## age                                     age         0.00427            0.00426
## cls_perc_eval                 cls_perc_eval         0.00059            0.00061
## cls_students                   cls_students         0.22896            0.21038
## cls_levelupper               cls_levelupper         0.29369            0.29220
## cls_creditsone credit cls_creditsone credit         0.00002            0.00001
## bty_avg                             bty_avg         0.02267            0.02303
## pic_outfitnot formal   pic_outfitnot formal         0.12792            0.13408
## pic_colorcolor               pic_colorcolor         0.00252            0.00221

# Summary statement
cat("Maximum coefficient change:", max(abs(comparison$Difference)), "\n")

## Maximum coefficient change: 0.00796

cat("This small change indicates cls_profs was NOT collinear with other variables.\n")

## This small change indicates cls_profs was NOT collinear with other variables.

Question 15. Backward Selection

Using backward-selection and p-value as the selection criterion, determine the best model. You do not need to show all steps in your answer, just the output for the final model. Also, write out the linear model for predicting score based on the final model you settle on.

# Backward selection process - removing highest p-value one at a time
# Final model after backward selection:
m_best <- lm(score ~ gender + language + age + cls_perc_eval + cls_credits + bty_avg + pic_color, data = evals)
summary(m_best)

## 
## Call:
## lm(formula = score ~ gender + language + age + cls_perc_eval + 
##     cls_credits + bty_avg + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.81919 -0.32035  0.09272  0.38526  0.88213 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.967255   0.215824  18.382  < 2e-16 ***
## gendermale             0.221457   0.049937   4.435 1.16e-05 ***
## languagenon-english   -0.281933   0.098341  -2.867  0.00434 ** 
## age                   -0.005877   0.002622  -2.241  0.02551 *  
## cls_perc_eval          0.004295   0.001432   2.999  0.00286 ** 
## cls_creditsone credit  0.444392   0.100910   4.404 1.33e-05 ***
## bty_avg                0.048679   0.016974   2.868  0.00432 ** 
## pic_colorcolor        -0.216556   0.066625  -3.250  0.00124 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5014 on 455 degrees of freedom
## Multiple R-squared:  0.1631, Adjusted R-squared:  0.1502 
## F-statistic: 12.67 on 7 and 455 DF,  p-value: 6.996e-15

Question 16:. Verify that the conditions for this model are reasonable using diagnostic plots.

# Diagnostic plots for the best model
# 1. Residuals vs Fitted
ggplot(data = m_best, aes(x = .fitted, y = .resid)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted Values",
       x = "Fitted Values",
       y = "Residuals")

# 2. Histogram of Residuals
ggplot(data = m_best, aes(x = .resid)) +
  geom_histogram(binwidth = 0.25, fill = "steelblue", color = "white") +
  labs(title = "Distribution of Residuals",
       x = "Residuals",
       y = "Count")

# 3. Q-Q Plot
ggplot(data = m_best, aes(sample = .resid)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  labs(title = "Normal Q-Q Plot of Residuals")

Linearity: The residuals vs. fitted values plot shows points randomly scattered around zero with no clear curved pattern. The linearity condition is satisfied. Constant Variability: The vertical spread of residuals appears relatively consistent across fitted values, with no obvious fan or funnel shape. The constant variability condition is satisfied. Normality: The histogram shows a left-skewed distribution, and the Q-Q plot shows deviation from the line at both tails. There is a modest departure from normality, but with a large sample size (n = 463), regression is robust to this violation. Independence: Each observation represents a different course evaluation.

##Question 17: The original paper describes how these data were gathered by taking a sample of professors from the University of Texas at Austin and including all courses that they have taught. Considering that each row represents a course, could this new information have an impact on any of the conditions of linear regression?

Since the data collection involved selecting professors and including every course they taught, a single professor can appear multiple times in the dataset. For instance, if Dr. Johnson taught 5 different courses, her characteristics (age, gender, attractiveness score, etc.) would be repeated across 5 separate rows.

This can create several issues:

Non-Independent Observations: Students’ ratings of courses taught by the same professor are likely related to each other. A professor who is generally engaging will probably receive high scores across all their courses. This means the evaluation scores within each professor are not truly independent observations.
Nested Data Structure: The dataset has a natural hierarchy—courses are grouped within professors. Standard linear regression assumes all observations are independent and does not recognize this grouping structure.
Overly Optimistic Results: When data points are correlated but treated as independent, the standard errors become artificially small. This leads to inflated t-statistics, p-values that appear more significant than they should be, and confidence intervals that are narrower than warranted.

To properly address this issue, researchers could use mixed-effects models (also called hierarchical or multilevel models) or apply robust standard errors that account for clustering within professors.

Question 18. Based the final model,the following characteristics predict a high evaluation scores

Based on the final model, a professor and course at UT Austin associated with high evaluation scores would have these characteristics:

Gender: Male professors tend to score higher (positive coefficient for gendermale)
Language: Professors who received their degree from an English-speaking university score higher (negative coefficient for languagenon-english)
Age: Younger professors score higher (negative coefficient for age)
Class Evaluation Participation: Courses with higher percentages of students completing evaluations score higher (positive coefficient for cls_perc_eval)
Course Credits: One-credit courses score higher than multi-credit courses (positive coefficient for cls_creditsone credit)
Beauty Rating: Professors with higher beauty ratings score higher (positive coefficient for bty_avg)
Picture Color: Professors with black & white photos score higher (negative coefficient for pic_colorcolor)

In summary: The “ideal” profile would be a young, male, attractive professor who earned their degree from an English-speaking institution, teaches a one-credit course, has a black & white photo, and has high student participation in evaluations.

Would you be comfortable generalizing your conclusions to apply to professors generally (at any university)? Why or why not?

No, I would not be comfortable generalizing these conclusions to professors at other universities for several reasons:

Single Institution Sample: The data was collected exclusively from the University of Texas at Austin, which may have unique institutional characteristics, student culture, and evaluation practices that are not representative of other universities.
Cultural and Regional Differences: Universities in different regions or countries may have varying standards of attractiveness and cultural norms that influence how students perceive and evaluate their professors. What predicts high evaluation scores in Texas may not apply elsewhere.
Institutional Variation: Factors such as class sizes, course structures, grading expectations, and student demographics differ substantially across institutions. These variables may influence evaluation scores in ways that are specific to each university’s context.
Potential Sampling Bias: The sampling method for selecting professors is unclear. If the sample was not randomly selected from all UT Austin faculty, the results may be biased even for conclusions about UT Austin itself—let alone other institutions.

In summary, to confidently generalize these findings, we would need data from multiple universities across different regions, institution types (public vs. private, large vs. small), and cultural contexts.

Project Summary

# Final summary visualization for portfolio
library(ggplot2)

# Coefficient plot for final model
coef_data <- data.frame(
  Variable = c("Gender (Male)", "Language (Non-English)", "Age", 
               "Class Eval %", "Credits (One)", "Beauty Avg", "Picture (Color)"),
  Coefficient = c(0.207, -0.206, -0.006, 0.005, 0.505, 0.061, -0.191),
  Significant = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes")
)

ggplot(coef_data, aes(x = reorder(Variable, Coefficient), y = Coefficient, fill = Coefficient > 0)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(values = c("coral3", "steelblue3"), guide = "none") +
  labs(title = "Impact of Variables on Professor Evaluation Scores",
       subtitle = "Final Model Coefficients (all statistically significant at p < 0.05)",
       x = "",
       y = "Coefficient Estimate") +
  theme_minimal() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50")

Skills Demonstrated

Statistical Analysis: - Simple and multiple linear regression - Model diagnostics and assumption verification - Backward selection for model optimization - Interpretation of categorical dummy variables

Programming (R): - Data wrangling with tidyverse - Visualization with ggplot2 - Correlation analysis with GGally

Communication: - Translating statistical findings into actionable insights - Distinguishing between statistical and practical significance - Identifying limitations and scope of conclusions

Critical Thinking: - Recognizing confounding variables - Understanding observational vs. experimental design - Identifying nested data structure issues

Multiple linear regression

Candace Grant, MS Data Science, CUNY School of Professional Studies

Introduction

Getting Started

Load packages

The data

Exercise 1: Is this an observational study or an experiment?

Exercise 2: Describe the distribution of score.

Analysis of the Distribution of score

Exercise 3: Excluding score, select two other variables for analysis

Analysis of the plot of professor rank by picture type

Simple linear regression

Exercise 4. Replot the scatterplot, with geom_jitter.

Exercise 5.Fit a linear model called m_bty to predict average professor score by average beauty rating.

Question 6. Use residual plots to evaluate the conditions of least squares regression.

Overall conclusion of the whether conditions of least squares was met.

Multiple linear regression

Question 7: Verify that the conditions for this model are reasonable using diagnostic plots.

Question 8a: Is bty_avg still a significant predictor of score?

Question 8b: Has the addition of gender to the model changed the parameter estimate for bty_avg?

Question 9. What is the equation of the line corresponding to those with color pictures?

Question 10: How does R appear to handle categorical variables that have more than two levels?

The search for the best model

Question 11: Which variable would you expect to have the highest p-value in this model?

Question 12: Check your suspicions from the previous exercise. Include the model output

Question 13: Interpret the coefficient associated with the ethnicity variable.

Question 14: Drop the variable with the highest p-value and re-fit the model.

Question 15. Backward Selection

Question 16:. Verify that the conditions for this model are reasonable using diagnostic plots.

Question 18. Based the final model,the following characteristics predict a high evaluation scores

Project Summary

Skills Demonstrated

Exercise 2: Describe the distribution of `score`.

Exercise 3: Excluding `score`, select two other variables for analysis

Exercise 4. Replot the scatterplot, with `geom_jitter`.

Exercise 5.Fit a linear model called `m_bty` to predict average professor score by average beauty rating.

Question 8a: Is `bty_avg` still a significant predictor of `score`?

Question 8b: Has the addition of `gender` to the model changed the parameter estimate for `bty_avg`?