Exercise 1

There is no control group, so this is an observational study. The question could be reworded as: Is beauty correlated with higher professor evaluations?

Exercise 2

ggplot(evals) + geom_histogram(aes(x = score), fill = "darkgreen", color = "black")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This distribution is heavily left skewed. This tells you that students give mostly perfect or near perfect scores. 3 does not mean an average course. This is inline with other ratings, such as restaraunt reviews. People do not think 4 means 1 standard deviation above the mean.

Exercise 3

ggplot(evals) + geom_boxplot(aes(x = pic_outfit, y = bty_avg)) + labs(y= "Beauty Score", x = "Pic Type")

It looks as though the beauty score is higher on average with formal pictures, but the spread is much higher for not formal. If the scores were much higher for one group, bty_avg would be correlated with pic_outfit. There is probably not enough of difference to worry about pic_outfit being collinear with bty_avg.

Excercise 4

plot(evals$score ~ evals$bty_avg)

plot(evals$score ~ jitter(evals$bty_avg))

The original plot had many points on top of each other because both dimensions are discrete.

Exercise 5

m_bty <- lm(score ~ bty_avg, data = evals)

summary(m_bty)
## 
## Call:
## lm(formula = score ~ bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.88034    0.07614   50.96  < 2e-16 ***
## bty_avg      0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

\(\hat { y } =3.88+.0667*bty\)

Beauty score is stastically signficant, with a t value over 4. However, R squared is only .03, so it doesn’t explain very much of the variation in score. Additionally, each point increase in beauty corresponds to only a .0667 increase in score. This means that if professor went from a 2 to an 8 in beaty avg, they would be only expected to increase .36 points in eval score. This isn’t irrelevant, but it’s not groundbreaking either.

Exercise 6

df <-  data.frame(m_bty$model, residuals = m_bty$residuals, predicted = m_bty$fitted.values, residuals_abs = abs(m_bty$residuals))

ggplot(df) + geom_point( aes(x = predicted, y = residuals)) + geom_hline(yintercept = 0)

qqnorm(m_bty$residuals)
qqline(m_bty$residuals)

ggplot(df) + geom_point( aes(x = bty_avg, y = residuals_abs))

ggplot(m_bty) + geom_histogram(aes( x = m_bty$residuals), fill = "darkblue", color = "black") 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Based on the scatter from Excercise 4, it appears the linearity assumption holds or at least isn’t obviously violated. The points are scattered, but a non-linear trend isn’t obvious. The normality of residuals assumption does not hold. The distribution is skewed to the left. This can be seen in all three plots. The variance appears to be constant, with the spread about the same over all bty_avg scores and fitted values. This is not time series data, though the same professor provides multiple data points, so there is some dependance in observations.

Exercise 7

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07
df_gen <- data.frame(m_bty_gen$model, residuals = m_bty_gen$residuals, predicted = m_bty_gen$fitted.values, residuals_abs = abs(m_bty$residuals))

ggplot(df_gen) + geom_point( aes(x = bty_avg, y = residuals))

ggplot(df_gen) + geom_point( aes(x = predicted, y = residuals)) + geom_hline(yintercept = 0)

qqnorm(m_bty_gen$residuals)
qqline(m_bty_gen$residuals)

ggplot(df_gen) + geom_boxplot(aes(y = residuals, x = gender)) 

ggplot(df_gen) + geom_histogram( aes( x = residuals), fill = "darkblue", color = "black") +  facet_wrap(~gender)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

It appears that the normality of residuals assumption is still violated when gender is added. This is more prounounced for males. There is strong left skew for males. The spread of residuals is at least about the same for males and females, indicating constant variance still holds. The p values for the parameter estimates should be not be entirely trusted because of the non-normal residuals. This violation is not strong enough to disregard the results.

Exercise 8

The parameter estimate for bty_avg has changed by almost .01 from the one variable model. It is still signficant, with a p value near 0. The p value shouldn’t be entirely trusted given what we know about the model assumptions.

Exercise 9

\(\hat { y } =3.747+.0746*bty+.17239*genmale\)

Since the parameter estimate is positive for genmale, this means that a male professor would have a higher expected score with all else equal.

Exercise 10

m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05

With a 3 level categorical, it gives us two additional parameters. It appearns R creates n - 1 paraemters where n is the number of levels.

Exercise 11

I would expect class credits not to have any correlation to professor score. It is in no way reflective of the professor. It should therefore have the lowest p value.

Exercise 12

m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_profs + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)
summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14

It turns out that the cls_credits variable was signficant. Apparently students like their one credit classes. Maybe my veiwpoint is skewed from having physics lab be my only one crediter. The cls_pfofs variable had by far the highest p value. Apparently students are indifferent to the number of professors.

Exercise 13

The students rate the professor .12 points higher on average, all else equal, when the professor is not a minority.

Exercise 14

m_2 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval 
             + cls_students + cls_level + cls_credits + bty_avg 
             + pic_outfit + pic_color, data = evals)
summary(m_2)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7836 -0.3257  0.0859  0.3513  0.9551 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0872523  0.2888562  14.150  < 2e-16 ***
## ranktenure track      -0.1476746  0.0819824  -1.801 0.072327 .  
## ranktenured           -0.0973829  0.0662614  -1.470 0.142349    
## ethnicitynot minority  0.1274458  0.0772887   1.649 0.099856 .  
## gendermale             0.2101231  0.0516873   4.065 5.66e-05 ***
## languagenon-english   -0.2282894  0.1111305  -2.054 0.040530 *  
## age                   -0.0089992  0.0031326  -2.873 0.004262 ** 
## cls_perc_eval          0.0052888  0.0015317   3.453 0.000607 ***
## cls_students           0.0004687  0.0003737   1.254 0.210384    
## cls_levelupper         0.0606374  0.0575010   1.055 0.292200    
## cls_creditsone credit  0.5061196  0.1149163   4.404 1.33e-05 ***
## bty_avg                0.0398629  0.0174780   2.281 0.023032 *  
## pic_outfitnot formal  -0.1083227  0.0721711  -1.501 0.134080    
## pic_colorcolor        -0.2190527  0.0711469  -3.079 0.002205 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared:  0.187,  Adjusted R-squared:  0.1634 
## F-statistic: 7.943 on 13 and 449 DF,  p-value: 2.336e-14

The parameter estimates remained almost identical. Only bty_avg and pic_outfit changed slightly. This means that the cls_profs was independant of the other explanitory variables.

Exercise 15

m_3 <-lm(score ~ ethnicity + gender + language + age + cls_perc_eval 
           + cls_credits + bty_avg 
          + pic_color, data = evals)
summary(m_3)
## 
## Call:
## lm(formula = score ~ ethnicity + gender + language + age + cls_perc_eval + 
##     cls_credits + bty_avg + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85320 -0.32394  0.09984  0.37930  0.93610 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.771922   0.232053  16.255  < 2e-16 ***
## ethnicitynot minority  0.167872   0.075275   2.230  0.02623 *  
## gendermale             0.207112   0.050135   4.131 4.30e-05 ***
## languagenon-english   -0.206178   0.103639  -1.989  0.04726 *  
## age                   -0.006046   0.002612  -2.315  0.02108 *  
## cls_perc_eval          0.004656   0.001435   3.244  0.00127 ** 
## cls_creditsone credit  0.505306   0.104119   4.853 1.67e-06 ***
## bty_avg                0.051069   0.016934   3.016  0.00271 ** 
## pic_colorcolor        -0.190579   0.067351  -2.830  0.00487 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4992 on 454 degrees of freedom
## Multiple R-squared:  0.1722, Adjusted R-squared:  0.1576 
## F-statistic:  11.8 on 8 and 454 DF,  p-value: 2.58e-15

\(\hat { y } =3.771+.1679*ethnicity+.2071*gender-.2062*language-.006*age+.0047*cls\_ perc+.5053*cls\_ credit+.051*bty\_ avg-.1906*pic\_ color\)

Exercise 16

df_m3 <- data.frame(m_3$model, residuals = m_bty_gen$residuals, predicted = m_bty_gen$fitted.values, residuals_abs = abs(m_bty$residuals))

ggplot(df_m3) + geom_point(aes(x = age, y = score))

ggplot(df_m3) + geom_point(aes(x = cls_perc_eval, y = score))

ggplot(df_m3) + geom_point(aes(x = predicted, y = residuals))

qqnorm(df_m3$residuals)
qqline(df_m3$residuals)

ggplot(df_m3) + geom_point(aes(x = bty_avg, y = residuals_abs))

ggplot(df_m3) + geom_point(aes(x = age, y = residuals_abs))

ggplot(df_m3) + geom_point(aes(x = cls_perc_eval, y = residuals_abs))

ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = ethnicity)) 

ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = language)) 

ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = cls_credits)) 

ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = pic_color)) 

ggplot(df_m3) + geom_boxplot(aes(y = residuals, x = gender)) 

  • The normality of residuals assumption appears to somewhat hold, though the skew is still present. The is probably unavoidable because of the maximum possible score of 5.
  • There is about equal spread accross the numerical variables, age, cls_perc, and bty_avg. The spread is also similar for the categoricals, other than the ones where one class does’t have many samples, eg language. The contanst variance assumption holds.
  • It is difficult to judge linearity from the pairwise scatters. There aren’t any obvious patterns in the residuals of the numerics, indicating linearity holds.
  • The indepedance of observations remains the same, somewhat holds due to multiple samples from the same professor.

Exercise 17

This was mentioned earlier, but it does violate the independance of observations assumption. This is probably a necesary cost to gather enough data points and doesn’t totally invalidate the model. However, it would be interesting to randomly choose 1 class from each professor and rerun the model.

Exercise 18

It appears that ethnicity of not a minority, gender of male, language of english, class credits of one credit, bty_avg, and black and white profile picture are the most signficant factors in a high evaluation score at UT austin. Age and cl_perc_eval were statistically signficant, but not practically signficant, given the parameter estimates were under .001.

Exercise 19

3 of the model variables are related to demographics and probably depend on the demographics of the student body. These variables likely wouldn’t generalize well to other universities where the student body looks different. The other variables would generalize well, so I would rerun the model fit without the demographic variables before generalizing.