The psychology department at a hypothetical university has been accused of underpaying female faculty members. The data represent salary (in thousands of dollars) for all 22 professors in the department. This problem is from Maxwell and Delaney (2004).
q3.data <- read.csv("psych.csv")
#View(q3.data)
#str(q3.data)
# Convert char to factor for 2 independent cariables:
q3.data$sex <- factor(q3.data$sex)
q3.data$rank <- factor(q3.data$rank)
str(q3.data)
## 'data.frame': 22 obs. of 3 variables:
## $ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 2 2 2 2 ...
## $ rank : Factor w/ 2 levels "Assist","Assoc": 1 1 1 1 1 1 1 1 1 1 ...
## $ salary: int 33 36 35 38 42 37 39 38 40 44 ...
We have two independent variables:
1- Sex: 2 levels (F/M)
2- Rank: 2 levels (Assist/Assoc)
table(q3.data$sex)
##
## F M
## 10 12
table(q3.data$rank)
##
## Assist Assoc
## 10 12
Data of Sex and Rank are unbalanced. Thus, we must use Type I and Type III for checking results.
q3.aov.a <-aov(salary~ sex * rank, data=q3.data)
summary(q3.aov.a)
## Df Sum Sq Mean Sq F value Pr(>F)
## sex 1 155.15 155.15 17.007 0.000637 ***
## rank 1 169.82 169.82 18.616 0.000417 ***
## sex:rank 1 0.63 0.63 0.069 0.795101
## Residuals 18 164.21 9.12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sex:
H0: there is no sex effect
H1: there exist sex effect
Rank:
H0: there is no rank effect
H1: there exist rank effect
Sex * Rank:
H0: there is no interaction effect
H1: there exist interaction effect
q3.lm.a <- lm(salary ~ sex+rank , data=q3.data)
#aov(q3.lm.a)
summary(q3.lm.a)$r.square
## [1] 0.6634627
The R Square is the percentage of variation in a response variable that is explained by the model.
q3.a.t1.1 <- aov(data=q3.data, salary ~ sex * rank)
summary(q3.a.t1.1)
## Df Sum Sq Mean Sq F value Pr(>F)
## sex 1 155.15 155.15 17.007 0.000637 ***
## rank 1 169.82 169.82 18.616 0.000417 ***
## sex:rank 1 0.63 0.63 0.069 0.795101
## Residuals 18 164.21 9.12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
q3.a.t1.2 <- aov(data= q3.data, salary ~ rank * sex)
summary(q3.a.t1.2)
## Df Sum Sq Mean Sq F value Pr(>F)
## rank 1 252.22 252.22 27.647 5.33e-05 ***
## sex 1 72.76 72.76 7.975 0.0112 *
## rank:sex 1 0.63 0.63 0.069 0.7951
## Residuals 18 164.21 9.12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because Type I is sequential SS, therefore the order does matter, hence outputs are different. we add “Sex” at the first to model and next add the “rank”, thus we will use the first one (q3.a.t1.1) as below:
SS(sex): 155.15
SS(rank|sex): 169.82
Regarding to P-Values of Type I (first sex, rank next):
- there exist significant sex effect.
- there exist significant rank effect.
- there is no interaction effect between sex and rank (P_Value is very large).
Anova(q3.a.t1.1,type=3)
## Anova Table (Type III tests)
##
## Response: salary
## Sum Sq Df F value Pr(>F)
## (Intercept) 8140.2 1 892.2994 < 2e-16 ***
## sex 28.0 1 3.0711 0.09671 .
## rank 70.4 1 7.7189 0.01240 *
## sex:rank 0.6 1 0.0695 0.79510
## Residuals 164.2 18
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(q3.a.t1.2,type=3)
## Anova Table (Type III tests)
##
## Response: salary
## Sum Sq Df F value Pr(>F)
## (Intercept) 8140.2 1 892.2994 < 2e-16 ***
## rank 70.4 1 7.7189 0.01240 *
## sex 28.0 1 3.0711 0.09671 .
## rank:sex 0.6 1 0.0695 0.79510
## Residuals 164.2 18
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# q3.a.t3.1 <- Anova(q3.a.t1.1,type=3)
# q3.a.t3.1
#
#
# q3.a.t3.2 <- Anova(q3.a.t1.2,type=3)
# q3.a.t3.2
Because Type III is partial SS, therefore the order doesn’t matter. As we can see the results, both outputs are equal.
SS(sex): 28.0
SS(rank|sex): 70.4
Regarding to P-Values:
- There is not significant sex effect.
- There exist significant rank effect.
- There is not significant interaction effect between sex and rank.
leveneTest(q3.aov.a)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.4446 0.7241
## 18
P-Value is very large, thus we don’t have enough evidence to reject the Null, therefore the variances are equal (Variances are Homogeneity)
par(mfrow=c(2,2))
plot(q3.aov.a)
##### Result:
According to QQ plot, normality assumption is reasonable, although there are some extreme values out of the line, but it’s acceptable, because ANOVA works on the semi-normal style data.
Regarding to Residuals plot, the variances are equal (Same result of Levene Test).
q3.aov.b <-aov(data= q3.data , salary~ sex + rank)
summary(q3.aov.b)
## Df Sum Sq Mean Sq F value Pr(>F)
## sex 1 155.2 155.15 17.88 0.000454 ***
## rank 1 169.8 169.82 19.57 0.000291 ***
## Residuals 19 164.8 8.68
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sex:
H0: there is no sex effect
H1: there exist sex effect
Rank:
H0: there is no rank effect
H1: there exist rank effect
q3.lm.b <- lm(salary ~ sex+rank , data=q3.data)
#aov(q3.lm.b)
summary(q3.lm.b)$r.square
## [1] 0.6634627
The R Square is the percentage of variation in a response variable that is explained by the model.
q3.b.t1 <- aov(data=q3.data, salary ~ sex + rank)
summary(q3.b.t1)
## Df Sum Sq Mean Sq F value Pr(>F)
## sex 1 155.2 155.15 17.88 0.000454 ***
## rank 1 169.8 169.82 19.57 0.000291 ***
## Residuals 19 164.8 8.68
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SS(sex): 155.2
SS(rank|sex): 169.8
Regarding to P-Values of Type I:
- there exist significant sex effect.
- there exist significant rank effect.
Anova(q3.aov.b,type=3)
## Anova Table (Type III tests)
##
## Response: salary
## Sum Sq Df F value Pr(>F)
## (Intercept) 10227.6 1 1178.8469 < 2.2e-16 ***
## sex 72.8 1 8.3862 0.0092618 **
## rank 169.8 1 19.5743 0.0002912 ***
## Residuals 164.8 19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# q3.a.t3 <- Anova(q3.aov.b,type=3)
# q3.a.t3
SS(sex): 72.8
SS(rank|sex): 169.8
Regarding to P-Values:
- There exist significant sex effect.
- There exist significant rank effect.
because we have not interaction, we can not run levene test, thus we will use the residuals plot.
par(mfrow=c(2,2))
plot(q3.aov.b)
##### Result:
According to QQ plot, normality assumption is reasonable, although there are some extreme values out of the line, but it’s acceptable, because ANOVA works on the semi-normal style data.
Regarding to Residuals plot, the variances are not equal (data are heterogeneous). thus we must run Welch test.
Normality assumption have been checked at the part (a and b), here we have it again:
par(mfrow=c(2,2))
plot(q3.aov.b)
According to QQ plot, normality assumption is reasonable, although there are some extreme values out of the line, but it’s acceptable, because ANOVA works on the semi-normal style data.
TukeyHSD(q3.aov.b)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = salary ~ sex + rank, data = q3.data)
##
## $sex
## diff lwr upr p adj
## M-F 5.333333 2.693648 7.973019 0.0004544
##
## $rank
## diff lwr upr p adj
## Assoc-Assist 5.377778 2.738092 8.017463 0.0004193
µ(M:Assoc) > µ(M:Assist) AND µ(M:Assoc) > µ(F:Assist)
therefore,
At least male associated have different mean, thus all males and females who are associated ranked gain more salary than others who are assistant.