Load data:
## score rank ethnicity gender language age cls_perc_eval
## 1 4.7 tenure track minority female english 36 55.81395
## 2 4.1 tenure track minority female english 36 68.80000
## 3 3.9 tenure track minority female english 36 60.80000
## 4 4.8 tenure track minority female english 36 62.60163
## 5 4.6 tenured not minority male english 59 85.00000
## 6 4.3 tenured not minority male english 59 87.50000
## cls_did_eval cls_students cls_level cls_profs cls_credits bty_f1lower
## 1 24 43 upper single multi credit 5
## 2 86 125 upper single multi credit 5
## 3 76 125 upper single multi credit 5
## 4 77 123 upper single multi credit 5
## 5 17 20 upper multiple multi credit 4
## 6 35 40 upper multiple multi credit 4
## bty_f1upper bty_f2upper bty_m1lower bty_m1upper bty_m2upper bty_avg
## 1 7 6 2 4 6 5
## 2 7 6 2 4 6 5
## 3 7 6 2 4 6 5
## 4 7 6 2 4 6 5
## 5 4 2 2 3 3 3
## 6 4 2 2 3 3 3
## pic_outfit pic_color
## 1 not formal color
## 2 not formal color
## 3 not formal color
## 4 not formal color
## 5 not formal color
## 6 not formal color
This is an observational study because the researchers were simply observering and collecting data rather than manipulating variables for the experiment. It is not possible to answer the question because an observational study cannot display causation. Instead, they could ask if beauty was associated with course evaluations.
The distribution is skewed left, and unimodal. The center of the data seems to be between a score of 4.0 and 4.5, and the spread is between 2.5 (min) and 5.0 (max). Because the data is skewed left, this means that majority of the students rated professors between 4.0, and 5.0 scores on the end of year evaluations. This is expected because many of the teachers are tenured or tenure track, and it is less likely for students to give low ratings unless their teachers were extremely bad.
hist(evals$score)
It seems as if over 60% of professors, regardless on whether or not they have received tenure, are not minorities, while less than 20% of professors are minorities.
plot(evals$rank, evals$ethnicity, xlab= "rank", ylab= "ethnicity", main= "rankvsethnicity")
There are 463 observations in the data set though significantly less points on the scatter plot, which means it is not accounting for all of the data.
plot(evals$score ~ evals$bty_avg)
plot(jitter(evals$score),evals$bty_avg)
plot(evals$score, jitter(evals$bty_avg))
The equation for the linear model m_bty is: m_bty= 3.8803379 + 0.06663704*evals$bty_avg The intercept is 3.8803379, and the slope means that for every 1 unit increase of x, y increases by 0.06663704 units. The p-val is = 5.083e-05, and because it is less than 0.05, we can determine that average beauty score is a statistically significant predictor. It appears to be a practically significant predictor because it makes sense that a professors beauty would impact a students impression on them.
m_bty=lm(evals$score~evals$bty_avg, data=evals)
plot(m_bty)
plot(evals$score, evals$bty_avg, col="red")
abline(m_bty)
m_bty$coefficients
## (Intercept) evals$bty_avg
## 3.88033795 0.06663704
summary(m_bty)
##
## Call:
## lm(formula = evals$score ~ evals$bty_avg, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## evals$bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
Based on the conditions for least squares regression, linearity, independent observations, nearly normal residuals, and constant variability, I do not think that the conditions of least square regression are reasonable. The distribution of the residuals is not normal, as they lack symmetry, and are skewed left.
res <- resid(m_bty)
plot(fitted(m_bty), res)
abline(0,0)
hist(res)
plot(evals$score ~ evals$bty_avg, col="orange")
abline(m_bty, col="magenta")
qqnorm(m_bty$residuals, col="red")
Based on the residuals vs. fitted plot, the data seems to be skewed, and thus the residuals are not normally distrubuted.
plot(evals$bty_avg ~ evals$bty_f1lower)
cor(evals$bty_avg, evals$bty_f1lower)
## [1] 0.8439112
plot(evals[,13:19])
m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
plot(m_bty_gen)
Yes, because the p-value is still smaller than 0.05 (8.177e-07), we can reject the null hypothesis, and displays that beauty is still a significant predictor of score, also, because the coefficients of beauty avergae for both m_bty_gen and m_bty are not significantly different, that means that gender did not change the parameter estimate.
summary(m_bty_gen)
##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
#m_bty_gen$coefficients
#m_bty$coefficients
The equation for the line corresponding to males is: y=3.74734 + 0.07416bty_avg + 0.17239gendermale
For two professors who received the same beauty rating, males tend to have the higher course evaluation score, because of the addition of the parameter estimate, where as females do not have that additional value added. Because of this, if the beauty rating is the same, it is estimated that male professors will always have a higher score.
multiLines(m_bty_gen)
#### 10: R handles the multi-level variables by displaying the three
different subsets of rank: teaching, tenure, and tenure track, as
separate lines in their relation to score and beauty average. Doing this
displays what discrepancies between professors scores with the same
beauty average but different ranks are. By evaluating the summary of
beauty and rank, R only displays tenure track and tenured, which are
dummy variables.
m_bty_rank <- lm(score ~bty_avg + rank, data = evals)
summary(m_bty_rank)
##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
plot(m_bty_rank)
multiLines(m_bty_rank)
#### 11: I would expect the cls_profs to have the highest p-value as I
would believe the number of profssors teaching sections in a course
would have little to no association with their score.
I was correct, the highest p value was from class professor single, which had a p value of 0.77806.
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + bty_avg + pic_outfit + pic_color, data = evals)
summary(m_full)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77397 -0.32432 0.09067 0.35183 0.95036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0952141 0.2905277 14.096 < 2e-16 ***
## ranktenure track -0.1475932 0.0820671 -1.798 0.07278 .
## ranktenured -0.0973378 0.0663296 -1.467 0.14295
## ethnicitynot minority 0.1234929 0.0786273 1.571 0.11698
## gendermale 0.2109481 0.0518230 4.071 5.54e-05 ***
## languagenon-english -0.2298112 0.1113754 -2.063 0.03965 *
## age -0.0090072 0.0031359 -2.872 0.00427 **
## cls_perc_eval 0.0053272 0.0015393 3.461 0.00059 ***
## cls_students 0.0004546 0.0003774 1.205 0.22896
## cls_levelupper 0.0605140 0.0575617 1.051 0.29369
## cls_profssingle -0.0146619 0.0519885 -0.282 0.77806
## cls_creditsone credit 0.5020432 0.1159388 4.330 1.84e-05 ***
## bty_avg 0.0400333 0.0175064 2.287 0.02267 *
## pic_outfitnot formal -0.1126817 0.0738800 -1.525 0.12792
## pic_colorcolor -0.2172630 0.0715021 -3.039 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1617
## F-statistic: 7.366 on 14 and 448 DF, p-value: 6.552e-14
The ethnicity slope means that if all of the other variables stayed constant, for every 1 unit increase in ethnicity of non minority slope, the score would increase by 0.1234929213.
m_full$coefficients
## (Intercept) ranktenure track ranktenured
## 4.0952140795 -0.1475932457 -0.0973377624
## ethnicitynot minority gendermale languagenon-english
## 0.1234929213 0.2109481296 -0.2298111901
## age cls_perc_eval cls_students
## -0.0090071896 0.0053272412 0.0004546339
## cls_levelupper cls_profssingle cls_creditsone credit
## 0.0605139602 -0.0146619208 0.5020431770
## bty_avg pic_outfitnot formal pic_colorcolor
## 0.0400333017 -0.1126816871 -0.2172629964
Yes, almost all of the p values are less than prior. Also, the coefficients barely changed. This means that the dropped value was not co-linear with the other variables, and the remaining variables in the model have a stronger relationship with the dependent variable.
m_full1 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval + cls_students + cls_level + cls_credits + bty_avg + pic_outfit + pic_color, data = evals)
summary(m_full1)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7836 -0.3257 0.0859 0.3513 0.9551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0872523 0.2888562 14.150 < 2e-16 ***
## ranktenure track -0.1476746 0.0819824 -1.801 0.072327 .
## ranktenured -0.0973829 0.0662614 -1.470 0.142349
## ethnicitynot minority 0.1274458 0.0772887 1.649 0.099856 .
## gendermale 0.2101231 0.0516873 4.065 5.66e-05 ***
## languagenon-english -0.2282894 0.1111305 -2.054 0.040530 *
## age -0.0089992 0.0031326 -2.873 0.004262 **
## cls_perc_eval 0.0052888 0.0015317 3.453 0.000607 ***
## cls_students 0.0004687 0.0003737 1.254 0.210384
## cls_levelupper 0.0606374 0.0575010 1.055 0.292200
## cls_creditsone credit 0.5061196 0.1149163 4.404 1.33e-05 ***
## bty_avg 0.0398629 0.0174780 2.281 0.023032 *
## pic_outfitnot formal -0.1083227 0.0721711 -1.501 0.134080
## pic_colorcolor -0.2190527 0.0711469 -3.079 0.002205 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared: 0.187, Adjusted R-squared: 0.1634
## F-statistic: 7.943 on 13 and 449 DF, p-value: 2.336e-14
Yes, the conditions for this model are not resonable because the data is skewed left in the histogram.
plot(m_full1)
hist(m_full1$residuals)
#### 16: This would mean that each row is not independent because a
teacher could have taught multiple classes, which would have resulted in
a skew of data.
plot(m_full1)