Johnny Mendoza FL1
Before we can answer any questions, we load the data.
download.file("http://www.openintro.org/stat/data/evals.RData", destfile = "evals.RData")
load("evals.RData")
Picking up from the end of Lab 12, let's consider the multiple regression model from Exercise 11. Report and interpret the coefficient associated with the ethnicity variable.
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval +
cls_students + cls_level + cls_profs + cls_credits + bty_avg + pic_outfit +
pic_color, data = evals)
summary(m_full)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7740 -0.3243 0.0907 0.3518 0.9504
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.095214 0.290528 14.10 < 2e-16 ***
## ranktenure track -0.147593 0.082067 -1.80 0.07278 .
## ranktenured -0.097338 0.066330 -1.47 0.14295
## ethnicitynot minority 0.123493 0.078627 1.57 0.11698
## gendermale 0.210948 0.051823 4.07 5.5e-05 ***
## languagenon-english -0.229811 0.111375 -2.06 0.03965 *
## age -0.009007 0.003136 -2.87 0.00427 **
## cls_perc_eval 0.005327 0.001539 3.46 0.00059 ***
## cls_students 0.000455 0.000377 1.20 0.22896
## cls_levelupper 0.060514 0.057562 1.05 0.29369
## cls_profssingle -0.014662 0.051988 -0.28 0.77806
## cls_creditsone credit 0.502043 0.115939 4.33 1.8e-05 ***
## bty_avg 0.040033 0.017506 2.29 0.02267 *
## pic_outfitnot formal -0.112682 0.073880 -1.53 0.12792
## pic_colorcolor -0.217263 0.071502 -3.04 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared: 0.187, Adjusted R-squared: 0.162
## F-statistic: 7.37 on 14 and 448 DF, p-value: 6.55e-14
Drop the variable with the highest p-value and re-fit the model. Did the coefficients and significance of the other explanatory variables change? (One of the things that makes multiple regression interesting is that coefficient estimates depend on the other variables that are included in the model.) If not, what does this say about whether or not the dropped variable was collinear with the other explanatory variables?
m_drop <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval +
cls_students + cls_level + cls_credits + bty_avg + pic_outfit + pic_color,
data = evals)
summary(m_drop)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7836 -0.3257 0.0859 0.3513 0.9551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.087252 0.288856 14.15 < 2e-16 ***
## ranktenure track -0.147675 0.081982 -1.80 0.07233 .
## ranktenured -0.097383 0.066261 -1.47 0.14235
## ethnicitynot minority 0.127446 0.077289 1.65 0.09986 .
## gendermale 0.210123 0.051687 4.07 5.7e-05 ***
## languagenon-english -0.228289 0.111131 -2.05 0.04053 *
## age -0.008999 0.003133 -2.87 0.00426 **
## cls_perc_eval 0.005289 0.001532 3.45 0.00061 ***
## cls_students 0.000469 0.000374 1.25 0.21038
## cls_levelupper 0.060637 0.057501 1.05 0.29220
## cls_creditsone credit 0.506120 0.114916 4.40 1.3e-05 ***
## bty_avg 0.039863 0.017478 2.28 0.02303 *
## pic_outfitnot formal -0.108323 0.072171 -1.50 0.13408
## pic_colorcolor -0.219053 0.071147 -3.08 0.00221 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.497 on 449 degrees of freedom
## Multiple R-squared: 0.187, Adjusted R-squared: 0.163
## F-statistic: 7.94 on 13 and 449 DF, p-value: 2.34e-14
Using backward-selection and p-value as the selection criterion, determine the best model. You do not need to show all steps in your answer, just the output for the final model. Also, write out the linear model for predicting score based on the final model you settle on.
m_drop7 <- lm(score ~ ethnicity + gender + language + age + cls_perc_eval +
cls_credits + bty_avg + pic_color, data = evals)
summary(m_drop7)
##
## Call:
## lm(formula = score ~ ethnicity + gender + language + age + cls_perc_eval +
## cls_credits + bty_avg + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8532 -0.3239 0.0998 0.3793 0.9361
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.77192 0.23205 16.25 < 2e-16 ***
## ethnicitynot minority 0.16787 0.07528 2.23 0.0262 *
## gendermale 0.20711 0.05013 4.13 4.3e-05 ***
## languagenon-english -0.20618 0.10364 -1.99 0.0473 *
## age -0.00605 0.00261 -2.31 0.0211 *
## cls_perc_eval 0.00466 0.00144 3.24 0.0013 **
## cls_creditsone credit 0.50531 0.10412 4.85 1.7e-06 ***
## bty_avg 0.05107 0.01693 3.02 0.0027 **
## pic_colorcolor -0.19058 0.06735 -2.83 0.0049 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.499 on 454 degrees of freedom
## Multiple R-squared: 0.172, Adjusted R-squared: 0.158
## F-statistic: 11.8 on 8 and 454 DF, p-value: 2.58e-15
Verify that the conditions for this model are reasonable using diagnostic plots.
plot(m_drop7$residuals ~ evals$score)
abline(h = 0, lty = 3)
hist(m_drop7$residuals)
qqnorm(m_drop7$residuals)
qqline(m_drop7$residuals)
The original paper describes how these data were gathered by taking a sample of professors from the University of Texas at Austin and including all courses that they have taught. Considering that each row represents a course, could this new information have an impact on any of the conditions of linear regression?
head(evals)
## score rank ethnicity gender language age cls_perc_eval
## 1 4.7 tenure track minority female english 36 55.81
## 2 4.1 tenure track minority female english 36 68.80
## 3 3.9 tenure track minority female english 36 60.80
## 4 4.8 tenure track minority female english 36 62.60
## 5 4.6 tenured not minority male english 59 85.00
## 6 4.3 tenured not minority male english 59 87.50
## cls_did_eval cls_students cls_level cls_profs cls_credits bty_f1lower
## 1 24 43 upper single multi credit 5
## 2 86 125 upper single multi credit 5
## 3 76 125 upper single multi credit 5
## 4 77 123 upper single multi credit 5
## 5 17 20 upper multiple multi credit 4
## 6 35 40 upper multiple multi credit 4
## bty_f1upper bty_f2upper bty_m1lower bty_m1upper bty_m2upper bty_avg
## 1 7 6 2 4 6 5
## 2 7 6 2 4 6 5
## 3 7 6 2 4 6 5
## 4 7 6 2 4 6 5
## 5 4 2 2 3 3 3
## 6 4 2 2 3 3 3
## pic_outfit pic_color
## 1 not formal color
## 2 not formal color
## 3 not formal color
## 4 not formal color
## 5 not formal color
## 6 not formal color
tail(evals)
## score rank ethnicity gender language age cls_perc_eval
## 458 4.1 tenure track not minority male english 32 42.86
## 459 4.5 tenure track not minority male english 32 60.47
## 460 3.5 tenure track minority female non-english 42 57.14
## 461 4.4 tenure track minority female non-english 42 77.61
## 462 4.4 tenure track minority female non-english 42 81.82
## 463 4.1 tenure track minority female non-english 42 80.00
## cls_did_eval cls_students cls_level cls_profs cls_credits bty_f1lower
## 458 9 21 lower multiple multi credit 6
## 459 52 86 upper multiple multi credit 6
## 460 48 84 upper multiple multi credit 3
## 461 52 67 upper multiple multi credit 3
## 462 54 66 upper multiple multi credit 3
## 463 28 35 lower multiple one credit 3
## bty_f1upper bty_f2upper bty_m1lower bty_m1upper bty_m2upper bty_avg
## 458 6 9 7 8 5 6.833
## 459 6 9 7 8 5 6.833
## 460 8 7 4 6 4 5.333
## 461 8 7 4 6 4 5.333
## 462 8 7 4 6 4 5.333
## 463 8 7 4 6 4 5.333
## pic_outfit pic_color
## 458 not formal color
## 459 not formal color
## 460 not formal color
## 461 not formal color
## 462 not formal color
## 463 not formal color
Based on your final model, describe the characteristics of a professor and course at University of Texas at Austin that would be associated with a high evaluation score.
Would you be comfortable generalizing your conclusions to apply to professors generally (at any university)? Why or why not?