Intro Stuff

getwd()
## [1] "C:/Users/Jerome/Documents/From_Toshiba_HD_Work_Files/0000_Montgomery_College/Math_217/Week_13"
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.2
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## Warning: package 'tibble' was built under R version 4.0.3
## Warning: package 'tidyr' was built under R version 4.0.2
## Warning: package 'dplyr' was built under R version 4.0.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(mosaic)
## Warning: package 'mosaic' was built under R version 4.0.2
## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
## 
##     mean
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## The following object is masked from 'package:purrr':
## 
##     cross
## The following object is masked from 'package:ggplot2':
## 
##     stat
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
library(pastecs)
## Warning: package 'pastecs' was built under R version 4.0.3
## 
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## The following object is masked from 'package:tidyr':
## 
##     extract
download.file("http://www.openintro.org/stat/data/evals.RData", destfile = "evals.RData")
load("evals.RData")

Question 1

The original research question posed in the paper is whether beauty leads directly to the differences in course evaluations. The paper more narrowly focused on beauty vs. professorial productivity, with ratings for the course itself being a proxy for productivity. This is not an experimental study because there is no control group; there is no testing to determine if an intervention had an effect. Had they conducted this study and then subjected the poorer-looking profs to cosmetic surgery (presumably successful) and then re-run the study, then it might be argued it was an experimental study.

Exercise 2 Describe the Distribution of Score

Based on the histogram, I’d say this distribution is skewed to the right. By a wide margin. Must have some good-looking profs at that school.

hist(evals$score)

write.csv(evals, file = "evals.csv", row.names = FALSE)

Exercise 3 Fiddle w/ some plots. See if there is something awry w/ the last plot.

First, the plots

plot(evals$rank ~ evals$gender)

plot(bty_m1upper ~ bty_f1upper, data = evals)

cor(bty_m1upper ~ bty_f1upper, data = evals)
## [1] 0.6906175
boxplot(evals$bty_m1lower, evals$bty_f1lower)

cor(bty_m1lower ~ bty_f1lower, data = evals)
## [1] 0.6124262
table(evals$score, evals$bty_avg)
##      
##       1.667  2 2.333 2.5 2.667 2.833  3 3.167 3.333 3.5 3.667 3.833  4 4.167
##   2.3     0  0     0   0     0     0  0     0     0   0     0     0  0     0
##   2.4     1  0     0   0     0     0  0     0     0   0     0     0  0     0
##   2.5     1  0     0   0     0     0  0     0     0   0     0     0  1     0
##   2.7     0  0     1   0     0     0  0     0     0   0     0     0  0     1
##   2.8     0  1     0   0     0     0  1     0     0   0     0     0  0     0
##   2.9     0  0     0   0     0     0  0     0     0   1     0     0  0     0
##   3       0  1     0   0     0     0  0     0     1   0     0     0  0     0
##   3.1     1  2     0   0     0     0  0     0     1   0     0     0  0     0
##   3.2     0  0     0   0     1     0  0     0     0   1     0     1  0     0
##   3.3     0  1     2   0     0     0  0     0     0   1     0     0  0     0
##   3.4     0  1     0   0     0     2  0     1     1   0     0     0  1     0
##   3.5     0  0     1   0     0     1  1     1     0   0     0     0  2     1
##   3.6     0  0     0   2     0     1  3     2     1   0     1     1  1     0
##   3.7     0  0     0   0     0     0  3     0     1   0     1     0  0     1
##   3.8     1  0     4   0     0     2  0     3     0   0     1     0  0     0
##   3.9     0  0     3   1     0     1  1     3     1   0     1     2  0     0
##   4       2  0     4   0     0     0  0     1     0   0     1     0  0     0
##   4.1     3  0     1   0     1     0  2     2     2   0     0     0  0     2
##   4.2     0  1     2   0     1     1  0     2     2   0     0     0  1     2
##   4.3     0  0     1   2     0     0  4     2     2   0     0     2  2     3
##   4.4     0  0     0   4     0     1  2     3     2   1     1     2  4     2
##   4.5     0  0     2   1     0     0  0     3     4   3     0     1  2     3
##   4.6     0  0     3   0     0     1  3     2     3   2     2     0  0     2
##   4.7     0  0     0   0     0     0  2     1     1   0     2     1  1     1
##   4.8     0  0     0   1     0     1  0     2     3   1     3     0  0     2
##   4.9     0  0     0   0     1     0  0     0     5   2     0     0  1     0
##   5       0  0     0   0     0     1  0     0     2   0     0     0  0     0
##      
##       4.333 4.5 4.667 4.833  5 5.167 5.333 5.5 5.667 5.833  6 6.167 6.333 6.5
##   2.3     0   0     0     0  0     1     0   0     0     0  0     0     0   0
##   2.4     0   0     0     0  0     0     0   0     0     0  0     0     0   0
##   2.5     0   0     0     0  0     0     0   0     0     0  0     0     0   0
##   2.7     0   0     0     0  0     0     0   0     0     0  0     0     0   0
##   2.8     1   0     0     0  0     0     0   0     0     0  0     0     0   0
##   2.9     0   0     0     0  0     0     0   1     0     0  0     0     0   0
##   3       2   0     0     0  0     1     0   0     0     0  0     0     0   0
##   3.1     0   0     0     0  0     0     0   0     0     0  0     0     1   0
##   3.2     0   0     0     0  0     0     0   1     0     0  0     0     0   0
##   3.3     1   0     0     0  0     0     0   2     0     0  0     1     0   0
##   3.4     5   0     0     0  0     0     0   0     0     0  0     0     0   0
##   3.5     3   0     0     3  0     0     1   0     0     1  0     0     0   1
##   3.6     2   0     0     2  0     1     0   0     0     0  0     0     0   0
##   3.7     3   0     1     3  0     0     0   0     0     0  1     0     0   2
##   3.8     2   0     0     2  0     0     0   0     0     1  1     1     0   0
##   3.9     5   0     0     1  1     0     0   1     0     2  0     0     0   1
##   4       4   0     0     2  0     1     0   1     0     0  0     0     2   5
##   4.1     4   1     0     2  1     0     1   0     0     1  0     1     0   1
##   4.2     4   0     0     2  0     0     0   1     1     1  0     0     0   0
##   4.3     4   0     0     2  0     0     0   1     1     0  0     0     0   0
##   4.4    10   0     2     3  0     0     2   3     1     0  0     0     0   0
##   4.5     3   0     1     2  0     0     0   3     2     3  0     0     1   0
##   4.6     1   0     0     2  0     0     0   2     0     1  0     0     0   2
##   4.7     4   0     2     2  1     0     0   3     0     1  0     0     0   1
##   4.8     1   1     2     3  1     0     0   1     0     1  0     4     0   3
##   4.9     2   1     0     0  0     0     0   3     0     2  0     1     0   1
##   5       2   0     0     0  0     0     0   3     0     0  0     0     0   0
##      
##       6.667 6.833  7 7.167 7.333 7.833 8.167
##   2.3     0     0  0     0     0     0     0
##   2.4     0     0  0     0     0     0     0
##   2.5     0     0  0     0     0     0     0
##   2.7     0     0  0     0     0     0     0
##   2.8     0     0  0     0     0     0     0
##   2.9     0     0  0     0     0     0     0
##   3       0     0  0     0     0     0     0
##   3.1     0     0  1     0     0     0     0
##   3.2     0     0  0     0     0     0     0
##   3.3     1     0  0     0     0     1     1
##   3.4     0     0  0     0     0     0     0
##   3.5     1     0  0     0     0     0     0
##   3.6     1     0  0     0     0     1     1
##   3.7     0     0  1     1     0     1     0
##   3.8     0     1  0     0     0     0     0
##   3.9     0     0  0     0     0     2     0
##   4       1     0  0     0     0     0     0
##   4.1     1     1  0     0     0     1     0
##   4.2     0     1  0     0     0     0     0
##   4.3     0     0  0     0     0     0     0
##   4.4     0     0  0     0     0     0     0
##   4.5     0     5  0     1     1     0     0
##   4.6     2     1  0     0     3     1     1
##   4.7     1     0  0     0     0     0     1
##   4.8     0     1  0     0     1     3     0
##   4.9     0     1  1     0     1     3     0
##   5       0     0  0     0     0     3     0
#jitter(evals$score, factor = 1, amount = NULL)
plot(score ~ bty_avg, data = evals)

cor(score ~ bty_avg, data = evals)
## [1] 0.1871424
table(evals$score)
## 
## 2.3 2.4 2.5 2.7 2.8 2.9   3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9   4 4.1 4.2 4.3 
##   1   1   2   2   3   2   5   6   4  11  11  17  20  19  19  26  24  28  22  26 
## 4.4 4.5 4.6 4.7 4.8 4.9   5 
##  43  41  34  25  35  25  11
table(evals$bty_avg)
## 
## 1.667     2 2.333   2.5 2.667 2.833     3 3.167 3.333   3.5 3.667 3.833     4 
##     9     7    24    11     4    12    22    28    32    12    13    10    16 
## 4.167 4.333   4.5 4.667 4.833     5 5.167 5.333   5.5 5.667 5.833     6 6.167 
##    20    63     3     8    31     4     4     4    26     5    14     2     8 
## 6.333   6.5 6.667 6.833     7 7.167 7.333 7.833 8.167 
##     4    17     8    11     3     2     6    16     4

Now try to answer the question: Is something awry?

To be honest, I can’t tell. I wish R would give Row and Column totals on the tables. The only thing I can saw is this: it seems the higher values on Score do not match the higher values on bty_avg. I downloaded the article and skimmed through it, but I didn’t find anything helpful. I wish the authors had indicated how they normalized the scores.

Exercise 4 - Use Jitter

It seems the dots in the plot are further apart, but that’s what jittering does. I can’t see any difference.

jitter(evals$score, factor = 1, amount = NULL)
##   [1] 4.698963 4.116336 3.906491 4.786684 4.609210 4.314423 2.787481 4.091614
##   [9] 3.414409 4.508752 3.809385 4.495621 4.587424 3.888594 3.885878 4.311737
##  [17] 4.489337 4.819139 4.614017 4.587187 4.917467 4.603587 4.519421 4.397913
##  [25] 4.582218 4.687213 4.515171 4.801850 4.917004 4.507768 4.385235 4.298531
##  [33] 4.106157 4.217860 3.508279 3.409260 4.518526 4.383478 4.401852 2.485750
##  [41] 4.284600 4.489730 4.800900 4.781108 4.409458 4.684905 4.392536 4.713325
##  [49] 4.509044 3.996546 4.293540 4.396816 4.484975 5.003649 4.898360 4.604951
##  [57] 4.996279 4.686748 5.010817 3.596481 3.713821 4.295027 4.100336 4.213391
##  [65] 4.718345 4.695549 3.493573 4.103994 4.194702 3.987485 4.014815 3.919605
##  [73] 4.398360 3.795901 3.515235 4.201877 3.489569 3.595671 2.897533 3.305184
##  [81] 3.306135 3.182908 4.615693 4.183475 4.298882 4.413378 4.100953 4.611700
##  [89] 4.417329 4.815364 4.303224 3.597729 4.304243 4.019566 4.181936 4.102983
##  [97] 4.093132 4.394983 4.284060 4.396922 4.387671 4.905172 4.999128 4.390791
## [105] 4.797515 4.914452 4.287034 5.015290 4.713921 4.496331 3.515278 3.881835
## [113] 4.019236 3.980942 3.704166 3.415801 3.307785 3.794999 3.895949 3.416366
## [121] 3.682451 4.085489 3.690506 3.517748 3.485885 4.381943 3.388025 4.302650
## [129] 3.703066 4.702418 3.896767 3.586083 4.507110 4.496615 4.810379 4.819656
## [137] 4.699432 4.517552 4.300571 4.781426 4.096559 4.380153 4.316057 3.602953
## [145] 4.498000 4.291739 4.408296 4.686286 4.804277 3.493839 3.808885 3.604666
## [153] 4.215436 3.591681 4.388024 3.694727 4.308259 4.611283 4.581169 4.086467
## [161] 3.619176 2.293647 4.313317 4.414081 3.612947 4.419749 3.907688 3.807759
## [169] 3.404960 4.911013 4.085128 3.200456 4.192161 3.882162 4.903395 4.714958
## [177] 4.399878 4.191084 4.007110 4.419296 3.887787 4.408616 3.017454 3.487379
## [185] 2.811051 4.592949 4.306148 3.414494 2.994849 4.186783 4.309138 4.088012
## [193] 4.580373 3.915424 3.487835 3.983427 4.001936 3.914412 3.297751 3.986818
## [201] 3.794338 4.196556 3.998047 3.805307 3.309483 4.110395 4.702953 4.415064
## [209] 4.815411 4.784157 4.581163 4.584251 4.788997 4.416352 4.685274 4.687354
## [217] 3.314939 4.383395 4.314005 4.917113 4.383610 4.690666 4.299205 4.781543
## [225] 4.513431 4.717501 3.300566 4.694653 4.591822 3.583565 3.988124 4.111262
## [233] 4.000961 4.518111 4.614617 4.805908 4.591426 4.889382 3.091593 3.714881
## [241] 3.693762 3.912663 3.885225 3.211238 4.415130 4.198854 4.696701 3.905949
## [249] 3.581147 3.396445 4.394729 4.393264 4.094598 3.586828 3.483396 4.091183
## [257] 3.786367 4.019244 4.783520 4.186335 4.618051 4.306245 4.815473 3.780274
## [265] 4.498773 4.890420 4.882919 4.790796 4.688795 4.599167 4.299129 4.415931
## [273] 4.500804 4.216902 4.807899 4.602362 4.901162 4.782156 4.803697 4.595409
## [281] 4.682606 4.082980 3.788720 4.002029 4.103960 4.000456 4.109439 3.483820
## [289] 4.085324 3.610785 4.010292 3.891466 3.812307 4.386056 4.684578 3.818897
## [297] 4.086256 4.118003 4.686160 4.297557 4.406539 4.488732 3.093909 3.712872
## [305] 4.510742 3.009534 4.611299 3.705460 3.594562 3.208492 3.314096 2.900778
## [313] 4.188528 4.519261 3.818048 3.694493 3.703866 3.982921 3.702574 4.517339
## [321] 3.790669 3.896950 4.596818 4.518262 4.210241 4.005542 3.781270 3.501014
## [329] 2.694020 3.985339 4.616264 3.897172 4.488643 3.714886 2.401315 3.113239
## [337] 2.500994 3.012164 4.494462 4.810481 4.892164 4.502081 4.590002 4.519932
## [345] 4.892939 4.413851 4.601625 4.588113 5.004743 4.891861 4.619596 4.816741
## [353] 4.892871 4.905733 4.903390 5.019752 4.509591 3.494561 3.803534 3.883933
## [361] 3.893880 4.190665 4.107282 4.788930 4.812183 4.807170 4.802718 4.886365
## [369] 4.188379 4.515528 3.891129 4.399783 4.013243 3.616428 3.709856 2.717260
## [377] 4.487913 4.389400 3.914698 3.586123 4.410283 4.386242 4.712929 4.503741
## [385] 4.100090 3.719918 4.290487 3.504343 3.701381 3.991888 4.019647 3.092904
## [393] 4.519114 4.813212 4.186955 4.891002 4.803891 3.500548 3.584334 4.381677
## [401] 3.399861 3.916109 3.819408 4.806738 4.581608 4.986178 3.807969 4.193835
## [409] 3.312387 4.707282 4.589108 4.619343 4.018314 4.192662 4.899887 4.491386
## [417] 4.805527 3.806701 4.795394 5.004447 5.002761 4.896850 4.614138 4.987281
## [425] 4.789750 4.888095 4.916924 3.902817 3.884193 4.517973 4.492456 3.292892
## [433] 3.113166 2.805768 3.097539 4.202154 3.393147 2.995767 3.312062 3.597367
## [441] 3.688641 3.586119 4.293577 4.095071 4.884695 4.791518 3.700892 3.895896
## [449] 4.482442 3.587724 4.395540 3.415468 4.406150 4.481466 4.494846 4.495023
## [457] 4.613771 4.096883 4.489930 3.490656 4.389958 4.404756 4.100002
plot(evals$score ~ evals$bty_avg)

cor(score ~ bty_avg, data = evals)
## [1] 0.1871424
table(evals$score)
## 
## 2.3 2.4 2.5 2.7 2.8 2.9   3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9   4 4.1 4.2 4.3 
##   1   1   2   2   3   2   5   6   4  11  11  17  20  19  19  26  24  28  22  26 
## 4.4 4.5 4.6 4.7 4.8 4.9   5 
##  43  41  34  25  35  25  11
table(evals$bty_avg)
## 
## 1.667     2 2.333   2.5 2.667 2.833     3 3.167 3.333   3.5 3.667 3.833     4 
##     9     7    24    11     4    12    22    28    32    12    13    10    16 
## 4.167 4.333   4.5 4.667 4.833     5 5.167 5.333   5.5 5.667 5.833     6 6.167 
##    20    63     3     8    31     4     4     4    26     5    14     2     8 
## 6.333   6.5 6.667 6.833     7 7.167 7.333 7.833 8.167 
##     4    17     8    11     3     2     6    16     4

Exercise 5 - Fit a Linear Model

If I read the output correctly, and interpret the notes from class on the 24th correctly, for each additional point in the score, the average beauty increases by only 0.067 points. The model fits the data well; the F statistic is 16.73 with an extremely low p-value. But the model doesn’t explain much; the adjusted R-squared is only 0.033. The model thus has little explanatory value.

Exercise 6 - Interpret the results.

the Normal Q-Q plot shows outliers at the high end; the curve of the observations away from the regression line suggests the model doesn’t fit the data. The other 2 plots similarly show fit lines that have aberrations at one end. The conclusion is the data are not amenable to analysis by linear regression.

m_bty <- lm(score ~ bty_avg, data = evals)

plot(m_bty)

abline(m_bty)

summary(m_bty)
## 
## Call:
## lm(formula = score ~ bty_avg, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9246 -0.3690  0.1420  0.3977  0.9309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.88034    0.07614   50.96  < 2e-16 ***
## bty_avg      0.06664    0.01629    4.09 5.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5348 on 461 degrees of freedom
## Multiple R-squared:  0.03502,    Adjusted R-squared:  0.03293 
## F-statistic: 16.73 on 1 and 461 DF,  p-value: 5.083e-05

Exercise 7

The 3 conditions that must be satisfied are linearity, constant variability, and nearly normal residuals. The Normal Q-Q plot shows non-normality (it’s not linear). I can’t tell about the constant variability, but it does seem the residuals are all over the place, with a number of outliers. It would seem the conditions for regression are not met by this model. Even with highly statistically-significant numbers, the adjusted R-squared value of 0.055 shows the model doesn’t explain much.

plot(evals$bty_avg ~ evals$bty_f1lower)

cor(evals$bty_avg, evals$bty_f1lower)
## [1] 0.8439112
plot(evals[,13:19])

m_bty_gen <- lm(score ~ bty_avg + gender, data = evals)
summary(m_bty_gen)
## 
## Call:
## lm(formula = score ~ bty_avg + gender, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8305 -0.3625  0.1055  0.4213  0.9314 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.74734    0.08466  44.266  < 2e-16 ***
## bty_avg      0.07416    0.01625   4.563 6.48e-06 ***
## gendermale   0.17239    0.05022   3.433 0.000652 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5287 on 460 degrees of freedom
## Multiple R-squared:  0.05912,    Adjusted R-squared:  0.05503 
## F-statistic: 14.45 on 2 and 460 DF,  p-value: 8.177e-07
plot(m_bty_gen)

Exercise 8

As I stated above, the addition of gender to the model does not add much, if any, explanatory value.

multiLines(m_bty_gen)

Exercise 9

The equation for males would be Average Score = 3.747 + 0.074 + 0.172*1 = 3.993 But then why is the mean score for males = 4.234?

males <- filter(evals, gender == "male")
mean(males$score)
## [1] 4.234328
males2 = subset(evals, gender == "male")

Exercise 10

In this model, rank seems to lower the average score. bty_avg still is positive, but the two rank levels shown have negative values and have lower, though still significant, p-values.

m_bty_rank <- lm(score ~ bty_avg + rank, data = evals)
summary(m_bty_rank)
## 
## Call:
## lm(formula = score ~ bty_avg + rank, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8713 -0.3642  0.1489  0.4103  0.9525 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.98155    0.09078  43.860  < 2e-16 ***
## bty_avg           0.06783    0.01655   4.098 4.92e-05 ***
## ranktenure track -0.16070    0.07395  -2.173   0.0303 *  
## ranktenured      -0.12623    0.06266  -2.014   0.0445 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5328 on 459 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04029 
## F-statistic: 7.465 on 3 and 459 DF,  p-value: 6.88e-05
plot(m_bty_rank)

Exercise 11 - The full model

I would expect variables concerning the photograph would have the strongest predictive value. That’s the variable used to determine beauty.

m_full <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level + cls_profs + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_profs + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77397 -0.32432  0.09067  0.35183  0.95036 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0952141  0.2905277  14.096  < 2e-16 ***
## ranktenure track      -0.1475932  0.0820671  -1.798  0.07278 .  
## ranktenured           -0.0973378  0.0663296  -1.467  0.14295    
## ethnicitynot minority  0.1234929  0.0786273   1.571  0.11698    
## gendermale             0.2109481  0.0518230   4.071 5.54e-05 ***
## languagenon-english   -0.2298112  0.1113754  -2.063  0.03965 *  
## age                   -0.0090072  0.0031359  -2.872  0.00427 ** 
## cls_perc_eval          0.0053272  0.0015393   3.461  0.00059 ***
## cls_students           0.0004546  0.0003774   1.205  0.22896    
## cls_levelupper         0.0605140  0.0575617   1.051  0.29369    
## cls_profssingle       -0.0146619  0.0519885  -0.282  0.77806    
## cls_creditsone credit  0.5020432  0.1159388   4.330 1.84e-05 ***
## bty_avg                0.0400333  0.0175064   2.287  0.02267 *  
## pic_outfitnot formal  -0.1126817  0.0738800  -1.525  0.12792    
## pic_colorcolor        -0.2172630  0.0715021  -3.039  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.498 on 448 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1617 
## F-statistic: 7.366 on 14 and 448 DF,  p-value: 6.552e-14
plot(m_full)

Exercise 12

My suspicions were incorrect. Pic_outfit & pic_color had negative values; pic_outfit was not significant. The variable with the greatest value and a significance level of 0.0 was the credits of the class; a one-credit class like PE was the variable w/ the strongest predictive value. Conclusion: if you want a good rating, teach PE, not biostats. :-))

Exercise 13

The ethnicity variable has 2 values: not minority and minority. Minority is built into the intercept. The value of not minority is 0.123, but is not statistically significant.

Exercise 14

The variable w/ the highest p-value is cls_profs, the # of professors teaching single or multiple sections of a course. The p-value for that variable is 0.778. Dropping that variable raised the adjusted R-squared value by 0.17 points. Overall, this model is not very explanatory.

m_full_14 <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students + cls_level  + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full_14)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_level + cls_credits + 
##     bty_avg + pic_outfit + pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7836 -0.3257  0.0859  0.3513  0.9551 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0872523  0.2888562  14.150  < 2e-16 ***
## ranktenure track      -0.1476746  0.0819824  -1.801 0.072327 .  
## ranktenured           -0.0973829  0.0662614  -1.470 0.142349    
## ethnicitynot minority  0.1274458  0.0772887   1.649 0.099856 .  
## gendermale             0.2101231  0.0516873   4.065 5.66e-05 ***
## languagenon-english   -0.2282894  0.1111305  -2.054 0.040530 *  
## age                   -0.0089992  0.0031326  -2.873 0.004262 ** 
## cls_perc_eval          0.0052888  0.0015317   3.453 0.000607 ***
## cls_students           0.0004687  0.0003737   1.254 0.210384    
## cls_levelupper         0.0606374  0.0575010   1.055 0.292200    
## cls_creditsone credit  0.5061196  0.1149163   4.404 1.33e-05 ***
## bty_avg                0.0398629  0.0174780   2.281 0.023032 *  
## pic_outfitnot formal  -0.1083227  0.0721711  -1.501 0.134080    
## pic_colorcolor        -0.2190527  0.0711469  -3.079 0.002205 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 449 degrees of freedom
## Multiple R-squared:  0.187,  Adjusted R-squared:  0.1634 
## F-statistic: 7.943 on 13 and 449 DF,  p-value: 2.336e-14
plot(m_full_14)

Exercise 15

In the last run, the cls_level had the highest p-value of 0.292. Removing it actually reduced the adjusted R-squared value by 0.02. This is not helpful.

m_full_15a <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
+ cls_students   + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full_15a)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_students + cls_credits + bty_avg + pic_outfit + 
##     pic_color, data = evals)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7761 -0.3187  0.0875  0.3547  0.9367 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.0856255  0.2888881  14.143  < 2e-16 ***
## ranktenure track      -0.1420696  0.0818201  -1.736 0.083184 .  
## ranktenured           -0.0895940  0.0658566  -1.360 0.174372    
## ethnicitynot minority  0.1424342  0.0759800   1.875 0.061491 .  
## gendermale             0.2037722  0.0513416   3.969 8.40e-05 ***
## languagenon-english   -0.2093185  0.1096785  -1.908 0.056966 .  
## age                   -0.0087287  0.0031224  -2.795 0.005404 ** 
## cls_perc_eval          0.0053545  0.0015306   3.498 0.000515 ***
## cls_students           0.0003573  0.0003585   0.997 0.319451    
## cls_creditsone credit  0.4733728  0.1106549   4.278 2.31e-05 ***
## bty_avg                0.0410340  0.0174449   2.352 0.019092 *  
## pic_outfitnot formal  -0.1172152  0.0716857  -1.635 0.102722    
## pic_colorcolor        -0.1973196  0.0681052  -2.897 0.003948 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4975 on 450 degrees of freedom
## Multiple R-squared:  0.185,  Adjusted R-squared:  0.1632 
## F-statistic:  8.51 on 12 and 450 DF,  p-value: 1.275e-14
plot(m_full_15a)

Another try

cls_students had the highest p-value in the last run. Removing it gives an adjusted R-squared value of 0.1632, the same as before.

m_full_15b <- lm(score ~ rank + ethnicity + gender + language + age + cls_perc_eval
   + cls_credits + bty_avg
+ pic_outfit + pic_color, data = evals)
summary(m_full_15b)
## 
## Call:
## lm(formula = score ~ rank + ethnicity + gender + language + age + 
##     cls_perc_eval + cls_credits + bty_avg + pic_outfit + pic_color, 
##     data = evals)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.78424 -0.31397  0.09261  0.35904  0.92154 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.152893   0.280892  14.785  < 2e-16 ***
## ranktenure track      -0.142239   0.081819  -1.738 0.082814 .  
## ranktenured           -0.083092   0.065532  -1.268 0.205469    
## ethnicitynot minority  0.143509   0.075972   1.889 0.059535 .  
## gendermale             0.208080   0.051159   4.067 5.61e-05 ***
## languagenon-english   -0.222515   0.108876  -2.044 0.041558 *  
## age                   -0.009074   0.003103  -2.924 0.003629 ** 
## cls_perc_eval          0.004841   0.001441   3.359 0.000849 ***
## cls_creditsone credit  0.472669   0.110652   4.272 2.37e-05 ***
## bty_avg                0.043578   0.017257   2.525 0.011903 *  
## pic_outfitnot formal  -0.136594   0.068998  -1.980 0.048347 *  
## pic_colorcolor        -0.189905   0.067697  -2.805 0.005246 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4975 on 451 degrees of freedom
## Multiple R-squared:  0.1832, Adjusted R-squared:  0.1632 
## F-statistic: 9.193 on 11 and 451 DF,  p-value: 6.364e-15
plot(m_full_15b)

Summary

Thanks for the offer of extra credit, but I don’t have time to do any more than this. I’m not sure removing more variables would improve things; the adjusted R-squared value seems to hover around 0.163. I didn’t read the article, so I don’t know what conclusions the authors reached, but given what I’ve found, this project seems like an interesting academic exercise, but it’s not terribly informative. From what little I’ve read about student evaluations of professors, they are all sort of arbitrary. If you look at ratemyprofessors.com, it seems many evaluations are either very high or very low. Profs in the middle of the pack are not evaluated - they don’t generate much emotion, one way or the other.

But this was a fun exercise that allowed us to see how multiple regression works.