score
. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not?library(psych)
library(kableExtra)
library(knitr)
load("more/evals.RData")
par(mfrow=c(1, 2))
hist(evals$score)
describe(evals$score)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 463 4.17 0.54 4.3 4.22 0.59 2.3 5 2.7 -0.7 0.04 0.03
qqnorm((evals$score))
qqline((evals$score))
score
, select two other variables and describe their relationship using an appropriate visualization (scatterplot, side-by-side boxplots, or mosaic plot).scatter.smooth(evals$bty_avg~evals$age)
jitter()
on the \(y\)- or the \(x\)-coordinate. (Use ?jitter
to learn more.) What was misleading about the initial scatterplot?plot((evals$score) ~ jitter(evals$bty_avg))
m_bty
to predict average professor score by average beauty rating and add the line to your plot using abline(m_bty)
. Write out the equation for the linear model and interpret the slope. Is average beauty score a statistically significant predictor? Does it appear to be a practically significant predictor?m_bty <- lm(evals$score ~ evals$bty_avg)
plot((evals$score) ~ jitter(evals$bty_avg))
abline(m_bty)
summary(m_bty)
##
## Call:
## lm(formula = evals$score ~ evals$bty_avg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## evals$bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
\[score= 3.88+.06(bty_avg) \]
par(mfrow=c(2, 2))
plot(m_bty$residuals ~ evals$bty_avg)
hist(m_bty$residuals)
qqnorm(m_bty$residuals)
qqline(m_bty$residuals)
m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
#plot(m_bty$residuals ~ evals$bty_avg)
m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
par(mfrow=c(2, 2))
plot(m_bty_gen)
par(mfrow=c(2, 1))
boxplot(m_bty_gen$residuals~evals$bty_avg)
boxplot(m_bty_gen$residuals~evals$gender)
bty_avg
still a significant predictor of score
? Has the addition of gender
to the model changed the parameter estimate for bty_avg
?multiLines(m_bty_gen)
summary(m_bty_gen)
##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
The decision to call the indicator variable gendermale
instead ofgenderfemale
has no deeper meaning. R simply codes the category that comes first alphabetically as a \(0\). (You can change the reference level of a categorical variable, which is the level that is coded as a 0, using therelevel
function. Use ?relevel
to learn more.)
m_bty_rank
with gender
removed and rank
added in. How does R appear to handle categorical variables that have more than two levels? Note that the rank variable has three levels: teaching
, tenure track
, tenured
.m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
We will start with a full model that predicts professor score based on rank, ethnicity, gender, language of the university where they got their degree, age, proportion of students that filled out evaluations, class size, course level, number of professors, number of credits, average beauty rating, outfit, and picture color.
Let’s run the model…
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_profs + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full)
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7836 -0.3257 0.0859 0.3513 0.9551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0872523 0.2888562 14.150 < 2e-16 ***
## ranktenure track -0.1476746 0.0819824 -1.801 0.072327 .
## ranktenured -0.0973829 0.0662614 -1.470 0.142349
## ethnicitynot minority 0.1274458 0.0772887 1.649 0.099856 .
## gendermale 0.2101231 0.0516873 4.065 5.66e-05 ***
## languagenon-english -0.2282894 0.1111305 -2.054 0.040530 *
## age -0.0089992 0.0031326 -2.873 0.004262 **
## cls_perc_eval 0.0052888 0.0015317 3.453 0.000607 ***
## cls_students 0.0004687 0.0003737 1.254 0.210384
## cls_levelupper 0.0606374 0.0575010 1.055 0.292200
## cls_creditsone credit 0.5061196 0.1149163 4.404 1.33e-05 ***
## bty_avg 0.0398629 0.0174780 2.281 0.023032 *
## pic_outfitnot formal -0.1083227 0.0721711 -1.501 0.134080
## pic_colorcolor -0.2190527 0.0711469 -3.079 0.002205 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared: 0.187, Adjusted R-squared: 0.1634
## F-statistic: 7.943 on 13 and 449 DF, p-value: 2.336e-14
m_full <- lm(score ~ ethnicity + gender + language + age + cls_perc_eval
+ cls_credits + bty_avg
+ pic_color, data = evals)
sum_m_full <- summary(m_full)
sum_m_full$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.771921500 0.232053490 16.254535 3.726633e-47
## ethnicitynot minority 0.167872321 0.075275467 2.230107 2.622848e-02
## gendermale 0.207112134 0.050134781 4.131107 4.298711e-05
## languagenon-english -0.206178078 0.103639310 -1.989381 4.725875e-02
## age -0.006045919 0.002612071 -2.314607 2.107974e-02
## cls_perc_eval 0.004655873 0.001435172 3.244123 1.265094e-03
## cls_creditsone credit 0.505306239 0.104119390 4.853143 1.673798e-06
## bty_avg 0.051069320 0.016933952 3.015795 2.706868e-03
## pic_colorcolor -0.190578800 0.067351272 -2.829624 4.866842e-03
cats <- c("+","*(ethnicity) + ", "* (gender) + ","* (language) +", "* (age) + ", "* (cls_perc_eval) +", "* (cls_credits) +", " *(bty_avg) + ", "* (pic_color)")
new_cats <- (paste(sum_m_full$coefficients[1:9],cats,collapse=""))
kable(paste("the formula is ",new_cats))
x |
---|
the formula is 3.77192150013323 +0.167872321380204 (ethnicity) + 0.207112133664405 (gender) + -0.20617807782666 * (language) +-0.00604591872396087 * (age) + 0.00465587321432217 * (cls_perc_eval) +0.505306239482799 * (cls_credits) +0.0510693200149873 (bty_avg) + -0.190578799689381 (pic_color) |
hist(sum_m_full$residuals)
par(mfrow=c(2, 2))
plot(m_full)
par(mfrow=c(2, 2))
boxplot(sum_m_full$residuals~evals$ethnicity)
boxplot(sum_m_full$residuals~evals$gender)
boxplot(sum_m_full$residuals~evals$cls_credits)
boxplot(sum_m_full$residuals~evals$language)
par(mfrow=c(2, 2))
boxplot(sum_m_full$residuals~evals$bty_avg)
boxplot(sum_m_full$residuals~evals$age)
boxplot(sum_m_full$residuals~evals$pic_color)
par(mfrow=c(1, 1))
boxplot(sum_m_full$residuals~evals$cls_perc_eval)
m_full
##
## Call:
## lm(formula = score ~ ethnicity + gender + language + age + cls_perc_eval +
## cls_credits + bty_avg + pic_color, data = evals)
##
## Coefficients:
## (Intercept) ethnicitynot minority gendermale
## 3.771922 0.167872 0.207112
## languagenon-english age cls_perc_eval
## -0.206178 -0.006046 0.004656
## cls_creditsone credit bty_avg pic_colorcolor
## 0.505306 0.051069 -0.190579
Gains points depending on beauty(potentially alot.05 *(1-10))
This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was written by Mine Çetinkaya-Rundel and Andrew Bray.