Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?
The data were gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin.
load("evals.RData")
head(evals)
## score rank ethnicity gender language age cls_perc_eval
## 1 4.7 tenure track minority female english 36 55.81395
## 2 4.1 tenure track minority female english 36 68.80000
## 3 3.9 tenure track minority female english 36 60.80000
## 4 4.8 tenure track minority female english 36 62.60163
## 5 4.6 tenured not minority male english 59 85.00000
## 6 4.3 tenured not minority male english 59 87.50000
## cls_did_eval cls_students cls_level cls_profs cls_credits bty_f1lower
## 1 24 43 upper single multi credit 5
## 2 86 125 upper single multi credit 5
## 3 76 125 upper single multi credit 5
## 4 77 123 upper single multi credit 5
## 5 17 20 upper multiple multi credit 4
## 6 35 40 upper multiple multi credit 4
## bty_f1upper bty_f2upper bty_m1lower bty_m1upper bty_m2upper bty_avg
## 1 7 6 2 4 6 5
## 2 7 6 2 4 6 5
## 3 7 6 2 4 6 5
## 4 7 6 2 4 6 5
## 5 4 2 2 3 3 3
## 6 4 2 2 3 3 3
## pic_outfit pic_color
## 1 not formal color
## 2 not formal color
## 3 not formal color
## 4 not formal color
## 5 not formal color
## 6 not formal color
length(evals$score)
## [1] 463
There are 22 columns in our dataset and there are 463 rows of data.
mr_bty_gen_age_clsize <- lm(score ~ bty_avg + gender + age + cls_students, data = evals)
summary(mr_bty_gen_age_clsize)
##
## Call:
## lm(formula = score ~ bty_avg + gender + age + cls_students, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85333 -0.36138 0.08768 0.41174 0.93005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0580973 0.1676976 24.199 < 2e-16 ***
## bty_avg 0.0647559 0.0169816 3.813 0.000156 ***
## gendermale 0.2035758 0.0523620 3.888 0.000116 ***
## age -0.0058029 0.0027197 -2.134 0.033406 *
## cls_students -0.0001202 0.0003317 -0.362 0.717311
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5272 on 458 degrees of freedom
## Multiple R-squared: 0.06859, Adjusted R-squared: 0.06046
## F-statistic: 8.432 on 4 and 458 DF, p-value: 1.432e-06
Conclusion: Class size seems to be the only statistically insignificant variable in our model with the pvalue of 0.72. As expected proffesor’s age seems to be negatively correlated to rating - which would indicate that student score yonger proffesors higher.