Load data:

##   score         rank    ethnicity gender language age cls_perc_eval
## 1   4.7 tenure track     minority female  english  36      55.81395
## 2   4.1 tenure track     minority female  english  36      68.80000
## 3   3.9 tenure track     minority female  english  36      60.80000
## 4   4.8 tenure track     minority female  english  36      62.60163
## 5   4.6      tenured not minority   male  english  59      85.00000
## 6   4.3      tenured not minority   male  english  59      87.50000
##   cls_did_eval cls_students cls_level cls_profs  cls_credits bty_f1lower
## 1           24           43     upper    single multi credit           5
## 2           86          125     upper    single multi credit           5
## 3           76          125     upper    single multi credit           5
## 4           77          123     upper    single multi credit           5
## 5           17           20     upper  multiple multi credit           4
## 6           35           40     upper  multiple multi credit           4
##   bty_f1upper bty_f2upper bty_m1lower bty_m1upper bty_m2upper bty_avg
## 1           7           6           2           4           6       5
## 2           7           6           2           4           6       5
## 3           7           6           2           4           6       5
## 4           7           6           2           4           6       5
## 5           4           2           2           3           3       3
## 6           4           2           2           3           3       3
##   pic_outfit pic_color
## 1 not formal     color
## 2 not formal     color
## 3 not formal     color
## 4 not formal     color
## 5 not formal     color
## 6 not formal     color

1:

This is an observational study because the researchers were simply observering and collecting data rather than manipulating variables for the experiment. It is not possible to answer the question because an observational study cannot display causation. Instead, they could ask if beauty was associated with course evaluations.

2:

The distribution is skewed left, and unimodal. The center of the data seems to be between a score of 4.0 and 4.5, and the spread is between 2.5 (min) and 5.0 (max). Because the data is skewed left, this means that majority of the students rated professors between 4.0, and 5.0 scores on the end of year evaluations. This is expected because many of the teachers are tenured or tenure track, and it is less likely for students to give low ratings unless their teachers were extremely bad.

hist(evals$score)

3:

It seems as if over 60% of professors, regardless on whether or not they have received tenure, are not minorities, while less than 20% of professors are minorities.

plot(evals$rank, evals$ethnicity, xlab= "rank", ylab= "ethnicity", main= "rankvsethnicity")

4:

There are 463 observations in the data set though significantly less points on the scatter plot, which means it is not accounting for all of the data.

plot(evals$score ~ evals$bty_avg)

plot(jitter(evals$score),evals$bty_avg)

plot(evals$score, jitter(evals$bty_avg))

5:

The equation for the linear model m_bty is: m_bty= 3.8803379 + 0.06663704*evals$bty_avg The intercept is 3.8803379, and the slope means that for every 1 unit increase of x, y increases by 0.06663704 units. The p-val is = 5.083e-05, and because it is less than 0.05, we can determine that average beauty score is a statistically significant predictor. It appears to be a practically significant predictor because it makes sense that a professors beauty would impact a students impression on them.

m_bty=lm(evals$score~evals$bty_avg, data=evals)
plot(m_bty)

plot(evals$score, evals$bty_avg, col="red")
abline(m_bty)

m_bty$coefficients
##   (Intercept) evals$bty_avg 
##    3.88033795    0.06663704
summary(m_bty)
## 
## Call:
## lm(formula = evals$score ~ evals$bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.88034    0.07614   50.96  < 2e-16 ***
## evals$bty_avg  0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

6:

Based on the conditions for least squares regression, linearity, independent observations, nearly normal residuals, and constant variability, I do not think that the conditions of least square regression are reasonable. The distribution of the residuals is not normal, as they lack symmetry, and are skewed left.

res <- resid(m_bty)
plot(fitted(m_bty), res)
abline(0,0)

hist(res)

plot(evals$score ~ evals$bty_avg, col="orange")
abline(m_bty, col="magenta")

qqnorm(m_bty$residuals, col="red")

7:

Based on the residuals vs. fitted plot, the data seems to be skewed, and thus the residuals are not normally distrubuted.

plot(evals$bty_avg ~ evals$bty_f1lower)

cor(evals$bty_avg, evals$bty_f1lower)
## [1] 0.8439112
plot(evals[,13:19])

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07
plot(m_bty_gen)

8:

Yes, because the p-value is still smaller than 0.05 (8.177e-07), we can reject the null hypothesis, and displays that beauty is still a significant predictor of score, also, because the coefficients of beauty avergae for both m_bty_gen and m_bty are not significantly different, that means that gender did not change the parameter estimate.

summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07
#m_bty_gen$coefficients
#m_bty$coefficients

9:

The equation for the line corresponding to males is: y=3.74734 + 0.07416bty_avg + 0.17239gendermale

For two professors who received the same beauty rating, males tend to have the higher course evaluation score, because of the addition of the parameter estimate, where as females do not have that additional value added. Because of this, if the beauty rating is the same, it is estimated that male professors will always have a higher score.

multiLines(m_bty_gen)

#### 10: R handles the multi-level variables by displaying the three different subsets of rank: teaching, tenure, and tenure track, as separate lines in their relation to score and beauty average. Doing this displays what discrepancies between professors scores with the same beauty average but different ranks are. By evaluating the summary of beauty and rank, R only displays tenure track and tenured, which are dummy variables.

m_bty_rank <- lm(score ~bty_avg + rank, data = evals)
summary(m_bty_rank)
## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05
plot(m_bty_rank)

multiLines(m_bty_rank)

#### 11: I would expect the cls_profs to have the highest p-value as I would believe the number of profssors teaching sections in a course would have little to no association with their score.

12:

I was correct, the highest p value was from class professor single, which had a p value of 0.77806.

m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + bty_avg + pic_outfit + pic_color, data = evals)
summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

13

The ethnicity slope means that if all of the other variables stayed constant, for every 1 unit increase in ethnicity of non minority slope, the score would increase by 0.1234929213.

m_full$coefficients
##           (Intercept)      ranktenure track           ranktenured 
##          4.0952140795         -0.1475932457         -0.0973377624 
## ethnicitynot minority            gendermale   languagenon-english 
##          0.1234929213          0.2109481296         -0.2298111901 
##                   age         cls_perc_eval          cls_students 
##         -0.0090071896          0.0053272412          0.0004546339 
##        cls_levelupper       cls_profssingle cls_creditsone credit 
##          0.0605139602         -0.0146619208          0.5020431770 
##               bty_avg  pic_outfitnot formal        pic_colorcolor 
##          0.0400333017         -0.1126816871         -0.2172629964

14:

Yes, almost all of the p values are less than prior. Also, the coefficients barely changed. This means that the dropped value was not co-linear with the other variables, and the remaining variables in the model have a stronger relationship with the dependent variable.

m_full1 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval + cls_students + cls_level + cls_credits + bty_avg + pic_outfit + pic_color, data = evals)
summary(m_full1)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7836 -0.3257  0.0859  0.3513  0.9551 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0872523  0.2888562  14.150  < 2e-16 ***
## ranktenure track      -0.1476746  0.0819824  -1.801 0.072327 .  
## ranktenured           -0.0973829  0.0662614  -1.470 0.142349    
## ethnicitynot minority  0.1274458  0.0772887   1.649 0.099856 .  
## gendermale             0.2101231  0.0516873   4.065 5.66e-05 ***
## languagenon-english   -0.2282894  0.1111305  -2.054 0.040530 *  
## age                   -0.0089992  0.0031326  -2.873 0.004262 ** 
## cls_perc_eval          0.0052888  0.0015317   3.453 0.000607 ***
## cls_students           0.0004687  0.0003737   1.254 0.210384    
## cls_levelupper         0.0606374  0.0575010   1.055 0.292200    
## cls_creditsone credit  0.5061196  0.1149163   4.404 1.33e-05 ***
## bty_avg                0.0398629  0.0174780   2.281 0.023032 *  
## pic_outfitnot formal  -0.1083227  0.0721711  -1.501 0.134080    
## pic_colorcolor        -0.2190527  0.0711469  -3.079 0.002205 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared:  0.187,  Adjusted R-squared:  0.1634 
## F-statistic: 7.943 on 13 and 449 DF,  p-value: 2.336e-14

15:

Yes, the conditions for this model are not resonable because the data is skewed left in the histogram.

plot(m_full1)

hist(m_full1$residuals)

#### 16: This would mean that each row is not independent because a teacher could have taught multiple classes, which would have resulted in a skew of data.

plot(m_full1)