Step 1. Load data.
EXERCISE 1. Exploring the Data
Obtain the distribution (histogram) of the variable score.
Describe the distribution (histogram) or the variable score. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not? The distribution appears to be negatively skewed. Based on how the students rate the courses, this tells us most courses are technically rated high. From research, there is ‘grade inflation’ in surveys, professors are seen as competent, so low scores are rare. Therefore, yes, this is what I expected to see.
EXERCISE 2. Simple Linear Regression
Create a scatterplot of scores against average beauty.
Fit a linear model called m_bty to predict average professor score by average beauty rating.
Add the line based on the model to your plot using abline(m_bty).
Use residual plots to evaluate whether the conditions of least squares regression are reasonable.
Can you say that the model is probably reliable? Why or why not? There is a middle ground here, the model may be reliable to showcase trends but not to predict, it is weak. The conditions regarding least squares are met but the line has too much variation within the scatterplot.
Obtain a summary of the model m_bty. Call: lm(formula = score ~ bty_avg, data = evals)
Residuals: Min 1Q Median 3Q Max -1.9246 -0.3690 0.1420 0.3977 0.9309
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.88034 0.07614 50.96 < 2e-16 bty_avg
0.06664 0.01629 4.09 5.08e-05 — Signif. codes: 0
‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5348 on 461 degrees of freedom Multiple R-squared: 0.03502, Adjusted R-squared: 0.03293 F-statistic: 16.73 on 1 and 461 DF, p-value: 5.083e-05
EXERCISE 3. Multiple Linear Regression
Create a model of evaluation score based on average beauty and professor gender.
Verify that the conditions for linear regression are reasonable for this model using diagnostic plots of the residuals.
Comment on the findings of the diagnostic plots. Do you think the conditions seem to be generally met to use this model? Why or why not? Generally the conditions are met but it isn’t perfect. R vs fp indicates constant variance. The qorm-qplot indicates residuals are NOT perfectly normal, with left skew and outliers, in upper quartiles.
Obtain the summary details of this model. Call: lm(formula = score ~ bty_avg + gender, data = evals)
Residuals: Min 1Q Median 3Q Max -1.8305 -0.3625 0.1055 0.4213 0.9314
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.74734 0.08466 44.266 < 2e-16 bty_avg
0.07416 0.01625 4.563 6.48e-06 gendermale 0.17239 0.05022
3.433 0.000652 *** — Signif. codes: 0 ‘’ 0.001 ‘’
0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5287 on 460 degrees of freedom Multiple R-squared: 0.05912, Adjusted R-squared: 0.05503 F-statistic: 14.45 on 2 and 460 DF, p-value: 8.177e-07
Is bty_avg still a significant predictor of score? Why or why not? What is the parameter estimate (coefficient) for bty_avg? Has the addition of gender to the model changed the parameter estimate for bty_avg? Explain. The bty_avg is still a significant predictor, the addition of gender adds significance, and the p-value usually becomes smaller. The parameter estimate is 0.07416 and adding gender did change the estimate, due to the change the model now controls for gender.
Is the coefficient for gendermale statistically significant? Why or why not? Is the coefficient positive or negative? What does this mean? For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score? Yes, the coefficient for gendermale is statistcally significant, typically when the p-value is smaller than 0.05, suggesting gender impacts the score. The coefficient would be positive, this would mean male professors typically have higher evaluations versus female professors.
Plot the linear regression line for females and the one corresponding to males.