Anova & Ancova

Salaries - extra task

The Salaries dataset from the carData consists of nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S. The data were collected as part of the on-going effort of the college’s administration to monitor salary differences between male and female faculty members.

  1. Perform the 3-way Anova with and w/o interactions. Interpret the results.

Descriptive statistics

Basic statistics for groups by three factors
rank discipline sex variable n mean sd
AsstProf A Female salary 6 72933 5463
AsstProf B Female salary 5 84190 9792
AssocProf A Female salary 4 72129 6403
AssocProf B Female salary 6 99436 14086
Prof A Female salary 8 109632 15095
Prof B Female salary 10 131836 17504
AsstProf A Male salary 18 74270 4580
AsstProf B Male salary 38 84647 6900
AssocProf A Male salary 22 85049 10612
AssocProf B Male salary 32 101622 9608
Prof A Male salary 123 120619 28505
Prof B Male salary 125 133518 26514

Assumptions

Outliers

Various outliers have been identified.

rank discipline sex yrs.since.phd yrs.service salary is.outlier is.extreme
AssocProf B Female 14 7 109650 TRUE TRUE
AssocProf B Female 12 9 71065 TRUE TRUE
AsstProf A Male 2 0 85000 TRUE TRUE
AsstProf A Female 7 6 63100 TRUE FALSE
AssocProf A Female 25 22 62884 TRUE FALSE
AsstProf A Male 3 1 63900 TRUE FALSE
AsstProf A Male 8 4 81035 TRUE FALSE
AssocProf A Male 14 8 100102 TRUE FALSE
AssocProf A Male 9 7 70000 TRUE FALSE
AssocProf A Male 11 1 104800 TRUE FALSE
AssocProf A Male 45 39 70700 TRUE FALSE
AssocProf A Male 10 1 108413 TRUE FALSE
AssocProf A Male 11 8 104121 TRUE FALSE
Prof A Male 29 7 204000 TRUE FALSE
Prof A Male 42 18 194800 TRUE FALSE
Prof A Male 43 43 205500 TRUE FALSE
AssocProf B Male 13 11 126431 TRUE FALSE
Prof B Male 38 38 231545 TRUE FALSE

Normality

Shapiro-Wilk test and a Quantile-Quantile plot.
variable statistic p
salary 0.96 0

Testing normality by all 3 factors with Shapiro-Wilk test:
rank discipline sex variable statistic p
Prof A Male salary 0.952 0.000
AssocProf B Female salary 0.635 0.001
AssocProf A Male salary 0.878 0.011
Prof B Male salary 0.978 0.044
AsstProf B Male salary 0.941 0.046
AsstProf A Female salary 0.870 0.226
AssocProf A Female salary 0.863 0.269
AsstProf A Male salary 0.941 0.300
AsstProf B Female salary 0.889 0.354
AssocProf B Male salary 0.967 0.416
Prof A Female salary 0.934 0.549
Prof B Female salary 0.974 0.923

Visualising normality violations by sex, rank and discipline:

So we need to transform the data, trying log

Data transformation

Outliers

Various outliers have been identified.

rank discipline sex yrs.since.phd yrs.service salary is.outlier is.extreme
AssocProf B Female 14 7 11.6 TRUE TRUE
AssocProf B Female 12 9 11.2 TRUE TRUE
AsstProf A Male 3 1 11.1 TRUE TRUE
AsstProf A Female 7 6 11.1 TRUE FALSE
AssocProf A Female 25 22 11.0 TRUE FALSE
AsstProf A Male 2 0 11.3 TRUE FALSE
AsstProf A Male 8 4 11.3 TRUE FALSE
AssocProf A Male 14 8 11.5 TRUE FALSE
AssocProf A Male 9 7 11.2 TRUE FALSE
AssocProf A Male 11 1 11.6 TRUE FALSE
AssocProf A Male 45 39 11.2 TRUE FALSE
AssocProf A Male 10 1 11.6 TRUE FALSE
AssocProf A Male 11 8 11.6 TRUE FALSE
Prof A Male 51 51 11.0 TRUE FALSE
AssocProf B Male 13 11 11.7 TRUE FALSE
Prof B Male 38 38 12.4 TRUE FALSE
Prof B Male 46 45 11.1 TRUE FALSE

The extreme one is for female no. 57 with high rank, treated with X.

Normality
Shapiro-Wilk test and a Quantile-Quantile plot.
variable statistic p
salary 0.99 0.01

Testing normality by all 3 factors with Shapiro-Wilk test:
rank discipline sex variable statistic p
AssocProf B Female salary 0.611 0.001
AsstProf B Male salary 0.931 0.022
AssocProf A Male salary 0.904 0.036
AsstProf A Female salary 0.852 0.164
AssocProf A Female salary 0.847 0.216
Prof A Male salary 0.986 0.235
AsstProf A Male salary 0.941 0.302
AsstProf B Female salary 0.896 0.387
Prof A Female salary 0.936 0.575
Prof B Male salary 0.991 0.639
AssocProf B Male salary 0.980 0.789
Prof B Female salary 0.976 0.939

Visualising normality violations by sex, rank and discipline: Still not normal, proceeding anyway

Homogeneity of variance
df1 df2 statistic p
11 385 7.09 0

P-value, according to the Levene’s test above, is 0 and thus significant. The homogeneity of variance in three groups is violated.

Also the plot of residuals versus fitted values disproves the homogeneity of variance.

Anova

Simple 3-way

## Analysis of Variance Table
## 
## Response: salary
##             Df Sum Sq Mean Sq F value               Pr(>F)    
## sex          1   0.58    0.58    17.7            0.0000314 ***
## discipline   1   0.81    0.81    24.5            0.0000011 ***
## rank         2  12.49    6.25   189.5 < 0.0000000000000002 ***
## Residuals  392  12.92    0.03                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All interactions

## Analysis of Variance Table
## 
## Response: salary
##                      Df Sum Sq Mean Sq F value               Pr(>F)    
## sex                   1   0.58    0.58   17.67            0.0000327 ***
## discipline            1   0.81    0.81   24.39            0.0000012 ***
## rank                  2  12.49    6.25  188.75 < 0.0000000000000002 ***
## sex:discipline        1   0.05    0.05    1.58                 0.21    
## sex:rank              2   0.03    0.01    0.40                 0.67    
## discipline:rank       2   0.08    0.04    1.25                 0.29    
## sex:discipline:rank   2   0.02    0.01    0.24                 0.79    
## Residuals           385  12.74    0.03                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No significant interacions.

Post-hoc tests

Two-way interactions

The analysis for each case of sex.

rank sex Effect DFn DFd F p p<.05 ges
AsstProf Female discipline 1 385 1.63 0.202 0.004
AssocProf Female discipline 1 385 7.16 0.008
0.018
Prof Female discipline 1 385 4.58 0.033
0.012
AsstProf Male discipline 1 385 6.16 0.013
0.016
AssocProf Male discipline 1 385 12.88 0.000
0.032
Prof Male discipline 1 385 22.13 0.000
0.054

Main effects

The analysis for each case of sex and rank.

rank sex Effect DFn DFd F p p<.05 ges
AsstProf Female discipline 1 385 1.63 0.202 0.004
AssocProf Female discipline 1 385 7.16 0.008
0.018
Prof Female discipline 1 385 4.58 0.033
0.012
AsstProf Male discipline 1 385 6.16 0.013
0.016
AssocProf Male discipline 1 385 12.88 0.000
0.032
Prof Male discipline 1 385 22.13 0.000
0.054

Pairwise comparisons

Estimated Marginal Means with Bonferroni correction:
sex rank term .y. group1 group2 df statistic p p.adj p.adj.signif
Female AssocProf discipline salary A B 385 -2.68 0.008 0.008 **
Female AsstProf discipline salary A B 385 -1.28 0.202 0.202 ns
Female Prof discipline salary A B 385 -2.14 0.033 0.033
Male AssocProf discipline salary A B 385 -3.59 0.000 0.000 ***
Male AsstProf discipline salary A B 385 -2.48 0.013 0.013
Male Prof discipline salary A B 385 -4.70 0.000 0.000 ****
Pairwise by discipline:
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
salary A B 181 216 0 *** 0 ***
Pairwise by rank:
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
salary AsstProf AssocProf 67 64 0 **** 0 ****
salary AsstProf Prof 67 266 0 **** 0 ****
salary AssocProf Prof 64 266 0 **** 0 ****

Means between rank groups are significantly different (at p = 0).

Pairwise by sex:
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
salary Female Male 39 358 0.003 ** 0.003 **

Ancova

  1. Can years since doctorate (yrs.since.phd), length of service (yrs.service) be significant as covariates?

Independance of yrs.since.phd

## Analysis of Variance Table
## 
## Response: yrs.since.phd
##            Df Sum Sq Mean Sq F value Pr(>F)   
## sex         1   1456    1456    8.94  0.003 **
## Residuals 395  64310     163                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: yrs.since.phd
##            Df Sum Sq Mean Sq F value              Pr(>F)    
## rank        2  32390   16195     191 <0.0000000000000002 ***
## Residuals 394  33376      85                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: yrs.since.phd
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## discipline   1   3128    3128    19.7 0.000012 ***
## Residuals  395  62638     159                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Independance of yrs.service

## Analysis of Variance Table
## 
## Response: yrs.service
##            Df Sum Sq Mean Sq F value Pr(>F)   
## sex         1   1583    1583    9.56 0.0021 **
## Residuals 395  65403     166                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: yrs.service
##            Df Sum Sq Mean Sq F value              Pr(>F)    
## rank        2  24812   12406     116 <0.0000000000000002 ***
## Residuals 394  42175     107                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Analysis of Variance Table
## 
## Response: yrs.service
##             Df Sum Sq Mean Sq F value Pr(>F)    
## discipline   1   1815    1815      11  0.001 ***
## Residuals  395  65171     165                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

yrs.since.phd covariate

Effect SSn SSd DFn DFd F p p<.05 ges
(Intercept) 8087.495 12.7 1 384 243864.818 0.000
0.998
yrs.since.phd 0.008 12.7 1 384 0.228 0.633 0.001
sex 0.074 12.7 1 384 2.242 0.135 0.006
rank 4.092 12.7 2 384 61.697 0.000
0.243
discipline 0.924 12.7 1 384 27.852 0.000
0.068
sex:rank 0.029 12.7 2 384 0.435 0.648 0.002
sex:discipline 0.041 12.7 1 384 1.250 0.264 0.003
rank:discipline 0.067 12.7 2 384 1.014 0.364 0.005
sex:rank:discipline 0.015 12.7 2 384 0.231 0.794 0.001
Sum Sq Df F value Pr(>F)
(Intercept) 748.931 1 22582.766 0.000
yrs.since.phd 0.008 1 0.228 0.633
sex 0.002 1 0.051 0.822
rank 0.737 2 11.110 0.000
discipline 0.055 1 1.647 0.200
sex:rank 0.038 2 0.580 0.561
sex:discipline 0.000 1 0.011 0.915
rank:discipline 0.041 2 0.612 0.543
sex:rank:discipline 0.015 2 0.231 0.794
Residuals 12.735 384 NA NA
yrs.since.phd sex emmean se df conf.low conf.high method
22.3 Female 11.5 0.038 394 11.5 11.6 Emmeans test
22.3 Male 11.6 0.012 394 11.6 11.6 Emmeans test
term .y. group1 group2 df statistic p p.adj p.adj.signif
yrs.since.phd*sex salary Female Male 394 -1.88 0.061 0.061 ns

yrs.service covariance

Effect SSn SSd DFn DFd F p p<.05 ges
(Intercept) 11425.097 12.7 1 384 346748.441 0.000
0.999
yrs.service 0.090 12.7 1 384 2.731 0.099 0.007
sex 0.082 12.7 1 384 2.474 0.117 0.006
rank 4.887 12.7 2 384 74.158 0.000
0.279
discipline 0.918 12.7 1 384 27.858 0.000
0.068
sex:rank 0.028 12.7 2 384 0.428 0.652 0.002
sex:discipline 0.042 12.7 1 384 1.262 0.262 0.003
rank:discipline 0.061 12.7 2 384 0.931 0.395 0.005
sex:rank:discipline 0.015 12.7 2 384 0.221 0.802 0.001
Sum Sq Df F value Pr(>F)
(Intercept) 751.754 1 22815.511 0.000
yrs.service 0.090 1 2.731 0.099
sex 0.002 1 0.047 0.828
rank 0.774 2 11.753 0.000
discipline 0.054 1 1.642 0.201
sex:rank 0.038 2 0.572 0.565
sex:discipline 0.000 1 0.009 0.923
rank:discipline 0.036 2 0.549 0.578
sex:rank:discipline 0.015 2 0.221 0.802
Residuals 12.653 384 NA NA

Both confounding variables may be considered as significant.

MANOVA

  1. Is there any significant difference in years since PhD (yrs.since.phd) and seniority (yrs.service) of different rank professors?

Descriptive analysis

rank variable n mean sd
AsstProf yrs.service 67 2.37 1.50
AsstProf yrs.since.phd 67 5.10 2.54
AssocProf yrs.since.phd 64 15.45 9.65
AssocProf yrs.service 64 11.95 10.10
Prof yrs.since.phd 266 28.30 10.11
Prof yrs.service 266 22.82 11.59

Normality

Univariate Shapiro-Wilk test
rank variable statistic p
AsstProf yrs.service 0.934 0.001
AssocProf yrs.service 0.691 0.000
Prof yrs.service 0.978 0.000
AsstProf yrs.since.phd 0.936 0.002
AssocProf yrs.since.phd 0.727 0.000
Prof yrs.since.phd 0.971 0.000

Yielding no normality

Only AsstProf and a few of Profs try to follow normal distribution

Shapiro-Wilk test for multivariate normality
statistic p.value
0.877 0

Multivariate normality violated

Pearson’s multicollinearity
var1 var2 cor statistic p conf.low conf.high method
yrs.since.phd yrs.service 0.91 43.5 0 0.891 0.925 Pearson

Multicollinearity identified, yet not extreme one.

Homogeneity:
variable df1 df2 statistic p
yrs.service 2 394 39.5 0
yrs.since.phd 2 394 35.2 0
statistic p.value parameter method
265 0 6 Box’s M-test for Homogeneity of Covariance Matrices

Heteroscedastic again (variances and covariance)

Manova

## 
## Type II MANOVA Tests: Pillai test statistic
##      Df test stat approx F num Df den Df              Pr(>F)    
## rank  2     0.499     65.4      4    788 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Significant difference shown

Post-hoc

One-way Welch Anova, suitable for all the violations:
variable .y. n statistic DFn DFd p method
yrs.service value 397 407 2 143 0 Welch ANOVA
yrs.since.phd value 397 568 2 151 0 Welch ANOVA
Pairwise comparisions
variables .y. group1 group2 p.adj p.adj.signif
yrs.service value AsstProf AssocProf 0 ****
yrs.service value AsstProf Prof 0 ****
yrs.service value AssocProf Prof 0 ****
yrs.since.phd value AsstProf AssocProf 0 ****
yrs.since.phd value AsstProf Prof 0 ****
yrs.since.phd value AssocProf Prof 0 ****

Significant difference between ranks confirmed

Results

Significant differences in years since PhD and seniority of different rank professors has been proven.