library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.3 ✓ purrr 0.3.4
✓ tibble 3.1.0 ✓ dplyr 1.0.4
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.0
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
download.file("http://www.openintro.org/stat/data/evals.RData", destfile = "evals.RData")
trying URL 'http://www.openintro.org/stat/data/evals.RData'
Content type 'unknown' length 9032 bytes
==================================================
downloaded 9032 bytes
load("evals.RData")
Is this an observational study or an experiment? The original research question posed in the paper is whether beauty leads directly to the differences in course evaluations. Given the study design, is it possible to answer this question as it is phrased? If not, rephrase the question.
It is an observational study. It is not possible to answer the question as phrased because we cannot determine a causal relationship between two variables from an observational study. We could rephrase it as “whether beauty is correlated with a difference in course evaluation”.
Describe the distribution of score. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not?
ggplot() +
geom_boxplot(mapping = aes(x = evals$score))
The data is left skewed. If we were to assume for argument that the distribution of course quality is normal, this perhaps suggests that students were overly harsh towards some courses. I am not sure what I was expecting, but I think the graph makes sense. Even assuming no bias among the students, we would expect the the distribution to skew towards positive because bad teachers are generally going to leave the profession, so over time we end up with more 4-5 star teachers than 0 star teachers. Plus, students/teacher relationships are personal to varying degrees and so I think students are generally unlikely to give negative (perhaps, << 3) scores unless they had an especially bad experience in a class.
Excluding score, select two other variables and describe their relationship using an appropriate visualization (scatterplot, side-by-side boxplots, or mosaic plot).
evals %>%
ggplot() +
geom_point(mapping=aes(x = age, y = bty_avg))
Based on the scatter plot, there seems to be a very slight negative correlation between age and bty_avg, though actually not as much as I was expecting, to be honest.
plot(evals$score ~ evals$bty_avg)
Replot the scatterplot, but this time use the function jitter() on the y- or the x-coordinate. (Use ?jitter to learn more.) What was misleading about the initial scatterplot?
Many points were overlapping in the initial plot
Let’s see if the apparent trend in the plot is something more than natural variation. Fit a linear model called m_bty to predict average professor score by average beauty rating and add the line to your plot using abline(m_bty). Write out the equation for the linear model and interpret the slope. Is average beauty score a statistically significant predictor? Does it appear to be a practically significant predictor?
m_bty <- lm(evals$score ~ evals$bty_avg)
summary(m_bty)
Call:
lm(formula = evals$score ~ evals$bty_avg)
Residuals:
Min 1Q Median 3Q Max
-1.9246 -0.3690 0.1420 0.3977 0.9309
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
evals$bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5348 on 461 degrees of freedom
Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
plot(jitter(evals$score) ~ evals$bty_avg)
abline(m_bty)
I would say the importance is both statistically and practically important. We can see that there is a clear relationship. It seems that a teacher would lose about 0.05 points from their evaluations for each point of attractiveness on average. We do not really have enough information to say if that is important, but it could be, especially when it comes to career opportunities and so forth.
Use residual plots to evaluate whether the conditions of least squares regression are reasonable. Provide plots and comments for each one (see the Simple Regression Lab for a reminder of how to make these).
We need to check for linearity, nearly normal residuals, and constant variability.
We need nearly normal residuals. This plot is strongly skewed to the left; I would say generally that it is not normal.
qqnorm(m_bty$residuals)
qqline(m_bty$residuals)
Based on this plot, I am a bit more comfortable calling the distribution of residuals “nearly” normal.
plot(evals$bty_avg ~ evals$bty_f1lower)
cor(evals$bty_avg, evals$bty_f1lower)
[1] 0.8439112
plot(evals[,13:19])
m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
Call:
lm(formula = score ~ bty_avg + gender, data = evals)
Residuals:
Min 1Q Median 3Q Max
-1.8305 -0.3625 0.1055 0.4213 0.9314
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
gendermale 0.17239 0.05022 3.433 0.000652 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5287 on 460 degrees of freedom
Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
P-values and parameter estimates should only be trusted if the conditions for the regression are reasonable. Verify that the conditions for this model are reasonable using diagnostic plots.
hist(m_bty_gen$residuals)
qqnorm(m_bty_gen$residuals)
qqline(m_bty_gen$residuals)
Is bty_avg still a significant predictor of score? Has the addition of gender to the model changed the parameter estimate for bty_avg?
Yes, it is still significant, with a p-value of 6.48e-6 compared to 5.08e-05 for gender
multiLines(m_bty_gen)
What is the equation of the line corresponding to males? (Hint: For males, the parameter estimate is multiplied by 1.) For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score?
score = 3.74734 + 0.17239*gender + 0.07416*bty_avg ; where gender = 1 if male
Create a new model called m_bty_rank with gender removed and rank added in. How does R appear to handle categorical variables that have more than two levels? Note that the rank variable has three levels: teaching, tenure track, tenured.
m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
Call:
lm(formula = score ~ bty_avg + rank, data = evals)
Residuals:
Min 1Q Median 3Q Max
-1.8713 -0.3642 0.1489 0.4103 0.9525
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
ranktenure track -0.16070 0.07395 -2.173 0.0303 *
ranktenured -0.12623 0.06266 -2.014 0.0445 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5328 on 459 degrees of freedom
Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
Which variable would you expect to have the highest p-value in this model? Why? Hint: Think about which variable would you expect to not have any association with the professor score.
I would guess that cls_credit would have the highest p-value; I would guess that students are as likely to rate their teachers positively or negatively regardless of class credits.
m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_profs + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full)
Call:
lm(formula = score ~ rank + ethnicity + gender + language + age +
cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
bty_avg + pic_outfit + pic_color, data = evals)
Residuals:
Min 1Q Median 3Q Max
-1.77397 -0.32432 0.09067 0.35183 0.95036
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0952141 0.2905277 14.096 < 2e-16 ***
ranktenure track -0.1475932 0.0820671 -1.798 0.07278 .
ranktenured -0.0973378 0.0663296 -1.467 0.14295
ethnicitynot minority 0.1234929 0.0786273 1.571 0.11698
gendermale 0.2109481 0.0518230 4.071 5.54e-05 ***
languagenon-english -0.2298112 0.1113754 -2.063 0.03965 *
age -0.0090072 0.0031359 -2.872 0.00427 **
cls_perc_eval 0.0053272 0.0015393 3.461 0.00059 ***
cls_students 0.0004546 0.0003774 1.205 0.22896
cls_levelupper 0.0605140 0.0575617 1.051 0.29369
cls_profssingle -0.0146619 0.0519885 -0.282 0.77806
cls_creditsone credit 0.5020432 0.1159388 4.330 1.84e-05 ***
bty_avg 0.0400333 0.0175064 2.287 0.02267 *
pic_outfitnot formal -0.1126817 0.0738800 -1.525 0.12792
pic_colorcolor -0.2172630 0.0715021 -3.039 0.00252 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.498 on 448 degrees of freedom
Multiple R-squared: 0.1871, Adjusted R-squared: 0.1617
F-statistic: 7.366 on 14 and 448 DF, p-value: 6.552e-14
Check your suspicions from the previous exercise. Include the model output in your response.
I was wrong, cls_credit apparently has the lowest p-value with 1.84e-05
Interpret the coefficient associated with the ethnicity variable.
Since the value is positive, individuals who are not minorities were likely to have had their classes rated higher.
Drop the variable with the highest p-value and re-fit the model. Did the coefficients and significance of the other explanatory variables change? (One of the things that makes multiple regression interesting is that coefficient estimates depend on the other variables that are included in the model.) If not, what does this say about whether or not the dropped variable was collinear with the other explanatory variables?
m_full_drop1 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full_drop1)
Call:
lm(formula = score ~ rank + ethnicity + gender + language + age +
cls_perc_eval + cls_students + cls_level + cls_credits +
bty_avg + pic_outfit + pic_color, data = evals)
Residuals:
Min 1Q Median 3Q Max
-1.7836 -0.3257 0.0859 0.3513 0.9551
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0872523 0.2888562 14.150 < 2e-16 ***
ranktenure track -0.1476746 0.0819824 -1.801 0.072327 .
ranktenured -0.0973829 0.0662614 -1.470 0.142349
ethnicitynot minority 0.1274458 0.0772887 1.649 0.099856 .
gendermale 0.2101231 0.0516873 4.065 5.66e-05 ***
languagenon-english -0.2282894 0.1111305 -2.054 0.040530 *
age -0.0089992 0.0031326 -2.873 0.004262 **
cls_perc_eval 0.0052888 0.0015317 3.453 0.000607 ***
cls_students 0.0004687 0.0003737 1.254 0.210384
cls_levelupper 0.0606374 0.0575010 1.055 0.292200
cls_creditsone credit 0.5061196 0.1149163 4.404 1.33e-05 ***
bty_avg 0.0398629 0.0174780 2.281 0.023032 *
pic_outfitnot formal -0.1083227 0.0721711 -1.501 0.134080
pic_colorcolor -0.2190527 0.0711469 -3.079 0.002205 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4974 on 449 degrees of freedom
Multiple R-squared: 0.187, Adjusted R-squared: 0.1634
F-statistic: 7.943 on 13 and 449 DF, p-value: 2.336e-14
Using backward-selection and p-value as the selection criterion, determine the best model. You do not need to show all steps in your answer, just the output for the final model. Also, write out the linear model for predicting score based on the final model you settle on.
#m_full_dropout <- lm(score ~ cls_credits, data = evals)
#summary(m_full_dropout)
m_full_dropout <- lm(score ~ ethnicity + gender + cls_perc_eval
+ cls_credits + bty_avg, data = evals)
summary(m_full_dropout)
Call:
lm(formula = score ~ ethnicity + gender + cls_perc_eval + cls_credits +
bty_avg, data = evals)
Residuals:
Min 1Q Median 3Q Max
-1.8857 -0.3294 0.1066 0.3774 1.0540
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.137381 0.146450 21.423 < 2e-16 ***
ethnicitynot minority 0.233794 0.071275 3.280 0.001117 **
gendermale 0.157832 0.048493 3.255 0.001219 **
cls_perc_eval 0.005208 0.001443 3.608 0.000343 ***
cls_creditsone credit 0.541067 0.104669 5.169 3.52e-07 ***
bty_avg 0.073644 0.015773 4.669 3.98e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5053 on 457 degrees of freedom
Multiple R-squared: 0.146, Adjusted R-squared: 0.1366
F-statistic: 15.62 on 5 and 457 DF, p-value: 3.338e-14
Verify that the conditions for this model are reasonable using diagnostic plots.
hist(m_full_dropout$residuals)
qqnorm(m_full_dropout$residuals)
qqline(m_full_dropout$residuals)
The original paper describes how these data were gathered by taking a sample of professors from the University of Texas at Austin and including all courses that they have taught. Considering that each row represents a course, could this new information have an impact on any of the conditions of linear regression?
Yes, because many of the samples are not independent of one another.
Based on your final model, describe the characteristics of a professor and course at University of Texas at Austin that would be associated with a high evaluation score.
In exercise 15, the variables that remained in the model were: ethnicity, gender, cls_perc_eval, cls_credits, and bty_avg. Therefore, the ideal professor at the University of Texas at Austin is one who is male, not a minority, grades generously, teaches a one credit class, and is good looking.
Would you be comfortable generalizing your conclusions to apply to professors generally (at any university)? Why or why not?
No, the sample is too local, and therefore non-random, to be immediately generalized to any university. It may be that it does generalize, but we cannot say that something is true at any university just because it is true at one. I do think we could generalize our conclusions to other professors at the same university who are not included in the dataset.