Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

This multiple regression model is for the evals dataset from the openintro package. It contains evaluation scores and characteristics for each professor. The regression model focuses on the following terms: age, ethnicity, and bty_avg.

library(tidyverse)
library(openintro)
df <- evals
summary(df)
##    course_id        prof_id          score                 rank    
##  Min.   :  1.0   Min.   : 1.00   Min.   :2.300   teaching    :102  
##  1st Qu.:116.5   1st Qu.:20.00   1st Qu.:3.800   tenure track:108  
##  Median :232.0   Median :43.00   Median :4.300   tenured     :253  
##  Mean   :232.0   Mean   :45.15   Mean   :4.175                     
##  3rd Qu.:347.5   3rd Qu.:70.50   3rd Qu.:4.600                     
##  Max.   :463.0   Max.   :94.00   Max.   :5.000                     
##         ethnicity      gender           language        age       
##  minority    : 64   female:195   english    :435   Min.   :29.00  
##  not minority:399   male  :268   non-english: 28   1st Qu.:42.00  
##                                                    Median :48.00  
##                                                    Mean   :48.37  
##                                                    3rd Qu.:57.00  
##                                                    Max.   :73.00  
##  cls_perc_eval     cls_did_eval     cls_students    cls_level      cls_profs  
##  Min.   : 10.42   Min.   :  5.00   Min.   :  8.00   lower:157   multiple:306  
##  1st Qu.: 62.70   1st Qu.: 15.00   1st Qu.: 19.00   upper:306   single  :157  
##  Median : 76.92   Median : 23.00   Median : 29.00                             
##  Mean   : 74.43   Mean   : 36.62   Mean   : 55.18                             
##  3rd Qu.: 87.25   3rd Qu.: 40.00   3rd Qu.: 60.00                             
##  Max.   :100.00   Max.   :380.00   Max.   :581.00                             
##        cls_credits   bty_f1lower     bty_f1upper     bty_f2upper    
##  multi credit:436   Min.   :1.000   Min.   :1.000   Min.   : 1.000  
##  one credit  : 27   1st Qu.:2.000   1st Qu.:4.000   1st Qu.: 4.000  
##                     Median :4.000   Median :5.000   Median : 5.000  
##                     Mean   :3.963   Mean   :5.019   Mean   : 5.214  
##                     3rd Qu.:5.000   3rd Qu.:7.000   3rd Qu.: 6.000  
##                     Max.   :8.000   Max.   :9.000   Max.   :10.000  
##   bty_m1lower     bty_m1upper     bty_m2upper       bty_avg     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.667  
##  1st Qu.:2.000   1st Qu.:3.000   1st Qu.:4.000   1st Qu.:3.167  
##  Median :3.000   Median :4.000   Median :5.000   Median :4.333  
##  Mean   :3.413   Mean   :4.147   Mean   :4.752   Mean   :4.418  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:6.000   3rd Qu.:5.500  
##  Max.   :7.000   Max.   :9.000   Max.   :9.000   Max.   :8.167  
##       pic_outfit        pic_color  
##  formal    : 77   black&white: 78  
##  not formal:386   color      :385  
##                                    
##                                    
##                                    
## 
glimpse(df)
## Rows: 463
## Columns: 23
## $ course_id     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1~
## $ prof_id       <int> 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5,~
## $ score         <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4~
## $ rank          <fct> tenure track, tenure track, tenure track, tenure track, ~
## $ ethnicity     <fct> minority, minority, minority, minority, not minority, no~
## $ gender        <fct> female, female, female, female, male, male, male, male, ~
## $ language      <fct> english, english, english, english, english, english, en~
## $ age           <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, ~
## $ cls_perc_eval <dbl> 55.81395, 68.80000, 60.80000, 62.60163, 85.00000, 87.500~
## $ cls_did_eval  <int> 24, 86, 76, 77, 17, 35, 39, 55, 111, 40, 24, 24, 17, 14,~
## $ cls_students  <int> 43, 125, 125, 123, 20, 40, 44, 55, 195, 46, 27, 25, 20, ~
## $ cls_level     <fct> upper, upper, upper, upper, upper, upper, upper, upper, ~
## $ cls_profs     <fct> single, single, single, single, multiple, multiple, mult~
## $ cls_credits   <fct> multi credit, multi credit, multi credit, multi credit, ~
## $ bty_f1lower   <int> 5, 5, 5, 5, 4, 4, 4, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 7, 7,~
## $ bty_f1upper   <int> 7, 7, 7, 7, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 9, 9,~
## $ bty_f2upper   <int> 6, 6, 6, 6, 2, 2, 2, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 9, 9,~
## $ bty_m1lower   <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 7, 7,~
## $ bty_m1upper   <int> 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6, 6,~
## $ bty_m2upper   <int> 6, 6, 6, 6, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 6, 6,~
## $ bty_avg       <dbl> 5.000, 5.000, 5.000, 5.000, 3.000, 3.000, 3.000, 3.333, ~
## $ pic_outfit    <fct> not formal, not formal, not formal, not formal, not form~
## $ pic_color     <fct> color, color, color, color, color, color, color, color, ~
ggplot(df , aes(x = age, y = score)) +
 geom_point()

ggplot(df , aes(x = ethnicity, y = score)) +
 geom_point()

ggplot(df , aes(x = bty_avg, y = score)) +
 geom_point()

Regression Model

model <- lm(data=df, score ~ age + ethnicity + bty_avg)
summary(model)
## 
## Call:
## lm(formula = score ~ age + ethnicity + bty_avg, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9661 -0.3506  0.1343  0.3829  1.0187 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.955742   0.176774  22.377  < 2e-16 ***
## age                   -0.003555   0.002668  -1.332 0.183377    
## ethnicitynot minority  0.140897   0.072161   1.953 0.051484 .  
## bty_avg                0.061004   0.017047   3.579 0.000382 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.533 on 459 degrees of freedom
## Multiple R-squared:  0.04571,    Adjusted R-squared:  0.03947 
## F-statistic: 7.328 on 3 and 459 DF,  p-value: 8.298e-05

Residual Analysis

plot(model)

The plots show us that the residuals are normally distributed. The Normal Q-Q plot shows the residuals following the main diagonal, giving a strong indication of normality. For this data, the linear model was appropriate.