download.file("http://www.openintro.org/stat/data/evals.RData", destfile = "evals.RData")
load("evals.RData")
Exercise 1
Is this an observational study or an experiment? The original research question posed in the paper is whether beauty leads directly to the differences in course evaluations. Given the study design, is it possible to answer this question as it is phrased? If not, rephrase the question.
This is an observational study, primarily because there is no control group (no group to compare the “more attractive” professors against). This makes answering the question whether beauty leads directly to the the differences in course evaluations difficult to answer, primarily because correlation does not mean causation. There could be multiple factors that lead to a different evaluation score that have nothing to do with a professor’s attractiveness. A better question to ask (as in, a more manageable question to answer), is whether there exists a relationship between a professor’s attractiveness and course evaluations.
Exercise 2
Describe the distribution of score. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not?
hist(evals$score, main = "Histogram of Scores", xlab = "Score")
The distribution of scores is left skewed, meaning that the students rated courses more positively. What I expected to see was a unimodal, normal distribution, but it could very well be possible (though somewhat unlikely) that the professors sampled were all excellent and garnered their students’ appreciation.
Exercise 3
Excluding score, select two other variables and describe their relationship using an appropriate visualization (scatterplot, side-by-side boxplots, or mosaic plot).
boxplot(evals$bty_avg ~ evals$language, main = "Boxplot of Beauty Average Score by Language", ylab = "Beauty Average", xlab = "Language")
It is interesting to note that even though the beauty-average medians of the two groups (professors who received an education in a school that taught in english vs non-english) are relatively similar (non-english being a tad higher), the variability and spread of the two boxplots is starkly different. Professors who received an education in a school that taught in english received beauty average scores up to 8 and as low as 2, however its non-english counterpart had a beauty-average of up to five and a half and down to three and a half with an outlier of approximatly three. Data for the non-english data is densly concentred around the median.
Exercise 4
Replot the scatterplot, but this time use the function jitter() on the y- or the x-coordinate. What was misleading about the initial scatterplot?
plot(evals$score ~ evals$bty_avg)
plot(jitter(evals$score) ~ jitter(evals$bty_avg))
What the jitter() function does is returns data with some noise added in, which I assume in this case means data points that overlap. There is a lot more data here than we originally assumed, giving more weight to certain points.
Exercise 5
Fit a linear model called m_bty to predict average professor score by average beauty rating and add the line to your plot using abline(m_bty). Write out the equation for the linear model and interpret the slope. Is average beauty score a statistically significant predictor? Does it appear to be a practically significant predictor?
m_bty = lm(evals$score ~ evals$bty_avg)
plot(jitter(evals$score) ~ jitter(evals$bty_avg))
abline(m_bty)
summary(m_bty)
##
## Call:
## lm(formula = evals$score ~ evals$bty_avg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9246 -0.3690 0.1420 0.3977 0.9309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.88034 0.07614 50.96 < 2e-16 ***
## evals$bty_avg 0.06664 0.01629 4.09 5.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293
## F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
Linear Model Equation: y-hat = 3.88034+ 0.06664(bty_avg) —> this means that for every single count increase in bty_avg, the score increases by 0.6664. Though the average beauty score does appear to be a significant predictor, the model has a very low R-squared value, which implies that this model is not appropriate for the data. As such, beauty average may not be a practically significant predictor.
Exercise 6
Use residual plots to evaluate whether the conditions of least squares regression are reasonable. Provide plots and comments for each one.
hist(m_bty$residuals)
plot(x=m_bty$residuals, y=evals$bty_avg)
abline(h = 0, lty = 3)
The residuals are not normally distributed (left skewed) and, though the residuals do not appear to have a somewhat constant spread (though more densely populated in the right end of the plot) , they are not concentrated around the zero line. The conditions are not being met.
Exercise 7
P-values and parameter estimates should only be trusted if the conditions for the regression are reasonable. Verify that the conditions for this model are reasonable using diagnostic plots.
m_bty_gen = lm(score ~ bty_avg + gender, data = evals)
qqnorm(m_bty_gen$residuals)
qqline(m_bty_gen$residuals)
plot(m_bty_gen$residuals)
abline(h = 0, lty = 3)
We can assume independence based on the assumption that students do not have courses with more than one teacher. As for the qq-plot, most data falls along the normal line, however towards the upper end, the points curve off a bit but not enought to assume non-normality. The plotted residuals are spread along the zero line with no apparent pattern. The conditions for this model are met.
Exercise 8
Is bty_avg still a significant predictor of score? Has the addition of gender to the model changed the parameter estimate for bty_avg?
summary(m_bty_gen)
##
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8305 -0.3625 0.1055 0.4213 0.9314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.74734 0.08466 44.266 < 2e-16 ***
## bty_avg 0.07416 0.01625 4.563 6.48e-06 ***
## gendermale 0.17239 0.05022 3.433 0.000652 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503
## F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
Yes, bty_avg is still a significant predictor of score and adding the gender variable to the model has changed the parameter extimate for bty_avg (increased it from 0.06664 to 0.07416), but not significantly. However, the R-square is still very low for this model, so there exists a chance that, given a more significant predictor is added to the model, score may not be significant at that point.
Exercise 9
What is the equation of the line corresponding to males? For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score?
score-hat = b0-hat+ b1-hat(bty_avg)+ b2-hat(1) = b0-hat+ b1-hat(bty_avg)+ b2-hat. The male gender will tend to have a higher course evaluation score.
Exercise 10
Create a new model called m_bty_rank with gender removed and rank added in. How does R appear to handle categorical variables that have more than two levels?
m_bty_rank = lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
##
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8713 -0.3642 0.1489 0.4103 0.9525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.98155 0.09078 43.860 < 2e-16 ***
## bty_avg 0.06783 0.01655 4.098 4.92e-05 ***
## ranktenure track -0.16070 0.07395 -2.173 0.0303 *
## ranktenured -0.12623 0.06266 -2.014 0.0445 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared: 0.04652, Adjusted R-squared: 0.04029
## F-statistic: 7.465 on 3 and 459 DF, p-value: 6.88e-05
Since the rank variable has three variables (teaching, tenure track and tenured), R has added another line into the regression summary to account for it. R leaves out one level but mentions the rest as variables.
Exercise 11
Which variable would you expect to have the highest p-value in this model? Why?
I would expect the pic_color to have the highest p-value because I don’t believe it affects the professor’s score in that I don’t think students will rely on the pictures to determine either attractiveness (if beauty average is relevant) or their rating of their professor’s quality of work (if beauty averge is not relevant).
Exercise 12
Check your suspicions from the previous exercise. Include the model output in your response.
m_full = lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval +
cls_students + cls_level + cls_profs + cls_credits + bty_avg + pic_outfit +
pic_color, data = evals)
summary(m_full)
##
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age +
## cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits +
## bty_avg + pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77397 -0.32432 0.09067 0.35183 0.95036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0952141 0.2905277 14.096 < 2e-16 ***
## ranktenure track -0.1475932 0.0820671 -1.798 0.07278 .
## ranktenured -0.0973378 0.0663296 -1.467 0.14295
## ethnicitynot minority 0.1234929 0.0786273 1.571 0.11698
## gendermale 0.2109481 0.0518230 4.071 5.54e-05 ***
## languagenon-english -0.2298112 0.1113754 -2.063 0.03965 *
## age -0.0090072 0.0031359 -2.872 0.00427 **
## cls_perc_eval 0.0053272 0.0015393 3.461 0.00059 ***
## cls_students 0.0004546 0.0003774 1.205 0.22896
## cls_levelupper 0.0605140 0.0575617 1.051 0.29369
## cls_profssingle -0.0146619 0.0519885 -0.282 0.77806
## cls_creditsone credit 0.5020432 0.1159388 4.330 1.84e-05 ***
## bty_avg 0.0400333 0.0175064 2.287 0.02267 *
## pic_outfitnot formal -0.1126817 0.0738800 -1.525 0.12792
## pic_colorcolor -0.2172630 0.0715021 -3.039 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1617
## F-statistic: 7.366 on 14 and 448 DF, p-value: 6.552e-14
My assumptions in the previous exercise was incorrect. In fact, the cls_profs variable (number of professors teaching sections in course in sample: single, multiple) was the one that had the highest p-value, meaning it was the variable with the least association with score in relation to the other variables.
Exercise 13
Interpret the coefficient associated with the ethnicity variable.
The ethnicity_not_minority variable increases the score by 0.1234929 when all other variables are held constant.
Exercise 14
Drop the variable with the highest p-value and re-fit the model. Did the coefficients and significance of the other explanatory variables change? If not, what does this say about whether or not the dropped variable was collinear with the other explanatory variables?
minus_ethn = lm(score ~ rank + gender + language + age + cls_perc_eval +
cls_students + cls_level + cls_profs + cls_credits + bty_avg + pic_outfit +
pic_color, data = evals)
summary(minus_ethn)
##
## Call:
## lm(formula = score ~ rank + gender + language + age + cls_perc_eval +
## cls_students + cls_level + cls_profs + cls_credits + bty_avg +
## pic_outfit + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.73681 -0.32734 0.08283 0.35834 0.98639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.2676351 0.2694274 15.840 < 2e-16 ***
## ranktenure track -0.1660677 0.0813523 -2.041 0.041801 *
## ranktenured -0.1127978 0.0657022 -1.717 0.086705 .
## gendermale 0.2241744 0.0512176 4.377 1.50e-05 ***
## languagenon-english -0.2862448 0.1055924 -2.711 0.006968 **
## age -0.0092040 0.0031385 -2.933 0.003534 **
## cls_perc_eval 0.0051119 0.0015357 3.329 0.000944 ***
## cls_students 0.0004785 0.0003777 1.267 0.205899
## cls_levelupper 0.0767503 0.0567182 1.353 0.176677
## cls_profssingle -0.0292174 0.0512393 -0.570 0.568817
## cls_creditsone credit 0.4589918 0.1128358 4.068 5.61e-05 ***
## bty_avg 0.0375980 0.0174661 2.153 0.031880 *
## pic_outfitnot formal -0.1208610 0.0738165 -1.637 0.102265
## pic_colorcolor -0.2400696 0.0701264 -3.423 0.000675 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4988 on 449 degrees of freedom
## Multiple R-squared: 0.1826, Adjusted R-squared: 0.159
## F-statistic: 7.717 on 13 and 449 DF, p-value: 6.792e-14
With the removal of the cls_profs variable, the coefficients and significance of the other variables changed (significance for most increased and most coefficients decreased).
Exercise 15
Using backward-selection and p-value as the selection criterion, determine the best model. You do not need to show all steps in your answer, just the output for the final model. Also, write out the linear model for predicting score based on the final model you settle on.
backwards = lm(score ~ ethnicity + gender + language + age + cls_perc_eval + cls_credits + bty_avg + pic_color, data = evals)
summary(backwards)
##
## Call:
## lm(formula = score ~ ethnicity + gender + language + age + cls_perc_eval +
## cls_credits + bty_avg + pic_color, data = evals)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85320 -0.32394 0.09984 0.37930 0.93610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.771922 0.232053 16.255 < 2e-16 ***
## ethnicitynot minority 0.167872 0.075275 2.230 0.02623 *
## gendermale 0.207112 0.050135 4.131 4.30e-05 ***
## languagenon-english -0.206178 0.103639 -1.989 0.04726 *
## age -0.006046 0.002612 -2.315 0.02108 *
## cls_perc_eval 0.004656 0.001435 3.244 0.00127 **
## cls_creditsone credit 0.505306 0.104119 4.853 1.67e-06 ***
## bty_avg 0.051069 0.016934 3.016 0.00271 **
## pic_colorcolor -0.190579 0.067351 -2.830 0.00487 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4992 on 454 degrees of freedom
## Multiple R-squared: 0.1722, Adjusted R-squared: 0.1576
## F-statistic: 11.8 on 8 and 454 DF, p-value: 2.58e-15
score-hat = b0-hat+ b1-hat(ethnicty)+ b2-hat(gender)+ b3-hat(language)+ b4-hat(age)+ b5-hat(cls_perc_eval)+ b6-hat(cls_credits)+ b7-hat(bty_avg)+ b8-hat(pic_color)
Exercise 16
Verify that the conditions for this model are reasonable using diagnostic plots.
qqnorm(backwards$residuals)
qqline(backwards$residuals)
plot(backwards$residuals)
abline(h = 0, lty = 3)
Assuming independence, the conditions for normality and homoscedasticity are met since in the qq-plot the data points fall along the normal line (with the exception of the ends, which is permitted more leeway than points near the center of the line) and the spread of the residuals along the zero line does not fall into a pattern.
Exercise 17
The original paper describes how these data were gathered by taking a sample of professors from the University of Texas at Austin and including all courses that they have taught. Considering that each row represents a course, could this new information have an impact on any of the conditions of linear regression?
I think it would affect the linear regression because there would be overlapping data or maybe multicollinearity and, if not, I think at the very least this breaks our assumption of independence because there is a higher probability that at least one student in each course has scored other professors within the sample.
Exercise 18
Based on your final model, describe the characteristics of a professor and course at University of Texas at Austin that would be associated with a high evaluation score.
According to this model, a professor typically associated with a high evalutaion score is male, not a minority, received an education in a school that taught in english, teaches a one-credit course, uses a photo that has color, with a relatively young age, a high average beauty score and a high percentage of students within the course that complete the evaluations.
Exercise 19
Would you be comfortable generalizing your conclusions to apply to professors generally (at any university)? Why or why not?
I would not feel comfortable generalizing these conclusions to those at any university primarily because the sample was restricted to the Austin Texas campus and since each university has its own atmosphere and even personality, these results may not apply to any other location.