Two-way ANOVA with Rstudio: Two-Case Studies
Two-way ANOVA with Rstudio: Two-Case Studies
- 1 Butterfat content in cow breed groups and age groups
- 1.1 One-way ANOVA
- 1.2 Hypothesis Testing
- 1.3 One-way ANOVA Model Fit
- 1.4 Tukey Contrast Test: Log_Butterfat ~ Breed
- 1.5 Diagnostic check: Log_Butterfat ~ Breed
- 1.6 Tukey Contrast Test: Log_Butterfat ~ Age
- 1.7 Diagnostic check: Log_Butterfat ~ Age
- 1.8 Two-way ANOVA
- 1.9 Two-way ANOVA Model: Type II Test
- 1.10 Diagnostic check: Log_Butterfat ~ Age + Breed
- 1.11 Two-way ANOVA Model: Type III Test
- 1.12 Diagnostic check: Log_Butterfat ~ Age*Breed
- 1.13 Conclusion
- 2 Salary under factor levels of education(Degree) and gender
- Reference
1 Butterfat content in cow breed groups and age groups
- Dataset
- butterfat content of milk
- 5 types of cows breed
- cow aged 2 years old
- cow matured
1.1 One-way ANOVA
mydataLog_Butterfat = faraway::butterfat
mydataLog_Butterfat[1] = log(mydataLog_Butterfat[1])
colnames(mydataLog_Butterfat)[1] = "Log_Butterfat"
head(mydataLog_Butterfat, 30)## Log_Butterfat Breed Age
## 1 1.319086 Ayrshire Mature
## 2 1.388791 Ayrshire 2year
## 3 1.327075 Ayrshire Mature
## 4 1.329724 Ayrshire 2year
## 5 1.410987 Ayrshire Mature
## 6 1.401183 Ayrshire 2year
## 7 1.451614 Ayrshire Mature
## 8 1.371181 Ayrshire 2year
## 9 1.413423 Ayrshire Mature
## 10 1.446919 Ayrshire 2year
## 11 1.490654 Ayrshire Mature
## 12 1.474763 Ayrshire 2year
## 13 1.446919 Ayrshire Mature
## 14 1.311032 Ayrshire 2year
## 15 1.406097 Ayrshire Mature
## 16 1.360977 Ayrshire 2year
## 17 1.483875 Ayrshire Mature
## 18 1.413423 Ayrshire 2year
## 19 1.474763 Ayrshire Mature
## 20 1.261298 Ayrshire 2year
## 21 1.366092 Canadian Mature
## 22 1.599388 Canadian 2year
## 23 1.497388 Canadian Mature
## 24 1.453953 Canadian 2year
## 25 1.403643 Canadian Mature
## 26 1.410987 Canadian 2year
## 27 1.477049 Canadian Mature
## 28 1.381282 Canadian 2year
## 29 1.495149 Canadian Mature
## 30 1.619388 Canadian 2year
## 'data.frame': 100 obs. of 3 variables:
## $ Log_Butterfat: num 1.32 1.39 1.33 1.33 1.41 ...
## $ Breed : Factor w/ 5 levels "Ayrshire","Canadian",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Age : Factor w/ 2 levels "2year","Mature": 2 1 2 1 2 1 2 1 2 1 ...
Dataset = mydataLog_Butterfat
with(Dataset, numSummary( Log_Butterfat , groups= Breed , statistics=c("mean", "sd")))## mean sd data:n
## Ayrshire 1.399189 0.06506757 20
## Canadian 1.487173 0.08087717 20
## Guernsey 1.594831 0.09823049 20
## Holstein-Fresian 1.297799 0.06817269 20
## Jersey 1.660308 0.11185900 20
## mean sd data:n
## 2year 1.476169 0.1562635 50
## Mature 1.499551 0.1569660 50
1.2 Hypothesis Testing
- \(H_o:\) All group means are equal
- \(H_a:\) Means are not all equal
1.3 One-way ANOVA Model Fit
## Df Sum Sq Mean Sq F value Pr(>F)
## Breed 4 1.7033 0.4258 56.65 <2e-16 ***
## Residuals 95 0.7141 0.0075
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Age 1 0.0137 0.01367 0.557 0.457
## Residuals 98 2.4038 0.02453
1.4 Tukey Contrast Test: Log_Butterfat ~ Breed
local({
.Pairs <- glht(AnovaModel.21A, linfct = mcp(Breed = "Tukey"))
print(summary(.Pairs)) # pairwise tests
print(confint(.Pairs)) # confidence intervals
print(cld(.Pairs)) # compact letter display
old.oma <- par(oma=c(0,5,0,0))
plot(confint(.Pairs))
par(old.oma)
})##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Butterfat ~ Breed, data = Dataset)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Canadian - Ayrshire == 0 0.08798 0.02742 3.209 0.01526 *
## Guernsey - Ayrshire == 0 0.19564 0.02742 7.136 < 1e-04 ***
## Holstein-Fresian - Ayrshire == 0 -0.10139 0.02742 -3.698 0.00323 **
## Jersey - Ayrshire == 0 0.26112 0.02742 9.524 < 1e-04 ***
## Guernsey - Canadian == 0 0.10766 0.02742 3.927 0.00151 **
## Holstein-Fresian - Canadian == 0 -0.18937 0.02742 -6.907 < 1e-04 ***
## Jersey - Canadian == 0 0.17313 0.02742 6.315 < 1e-04 ***
## Holstein-Fresian - Guernsey == 0 -0.29703 0.02742 -10.834 < 1e-04 ***
## Jersey - Guernsey == 0 0.06548 0.02742 2.388 0.12757
## Jersey - Holstein-Fresian == 0 0.36251 0.02742 13.222 < 1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Butterfat ~ Breed, data = Dataset)
##
## Quantile = 2.7806
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## Canadian - Ayrshire == 0 0.08798 0.01175 0.16422
## Guernsey - Ayrshire == 0 0.19564 0.11941 0.27188
## Holstein-Fresian - Ayrshire == 0 -0.10139 -0.17762 -0.02516
## Jersey - Ayrshire == 0 0.26112 0.18488 0.33735
## Guernsey - Canadian == 0 0.10766 0.03142 0.18389
## Holstein-Fresian - Canadian == 0 -0.18937 -0.26561 -0.11314
## Jersey - Canadian == 0 0.17313 0.09690 0.24937
## Holstein-Fresian - Guernsey == 0 -0.29703 -0.37327 -0.22080
## Jersey - Guernsey == 0 0.06548 -0.01076 0.14171
## Jersey - Holstein-Fresian == 0 0.36251 0.28627 0.43874
##
## Ayrshire Canadian Guernsey Holstein-Fresian
## "b" "c" "d" "a"
## Jersey
## "d"
1.5 Diagnostic check: Log_Butterfat ~ Breed
## Analysis of Variance Table
##
## Response: Log_Butterfat
## Df Sum Sq Mean Sq F value Pr(>F)
## Breed 4 1.7033 0.42584 56.651 < 2.2e-16 ***
## Residuals 95 0.7141 0.00752
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "65"
1.6 Tukey Contrast Test: Log_Butterfat ~ Age
local({
.Pairs <- glht(AnovaModel.21B, linfct = mcp(Age = "Tukey"))
print(summary(.Pairs)) # pairwise tests
print(confint(.Pairs)) # confidence intervals
print(cld(.Pairs)) # compact letter display
old.oma <- par(oma=c(0,5,0,0))
plot(confint(.Pairs))
par(old.oma)
})##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Butterfat ~ Age, data = Dataset)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Mature - 2year == 0 0.02338 0.03132 0.746 0.457
## (Adjusted p values reported -- single-step method)
##
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Butterfat ~ Age, data = Dataset)
##
## Quantile = 1.9845
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## Mature - 2year == 0 0.02338 -0.03878 0.08554
##
## 2year Mature
## "a" "a"
1.7 Diagnostic check: Log_Butterfat ~ Age
## Analysis of Variance Table
##
## Response: Log_Butterfat
## Df Sum Sq Mean Sq F value Pr(>F)
## Age 1 0.01367 0.013668 0.5572 0.4572
## Residuals 98 2.40377 0.024528
1.8 Two-way ANOVA
- Differences in Butterfat content in cow breed groups and age groups
- Fox (2015)
- Langsrud (2003)
- Herr (1986)
## Ayrshire Canadian Guernsey Holstein-Fresian Jersey
## 2year 1.375929 1.496848 1.582964 1.296779 1.628324
## Mature 1.422449 1.477498 1.606697 1.298819 1.692292
## Ayrshire Canadian Guernsey Holstein-Fresian Jersey
## 2year 0.06390070 0.1000854 0.11656340 0.05768545 0.12703137
## Mature 0.06043512 0.0598659 0.08044023 0.08050776 0.08947025
## Breed
## Age Ayrshire Canadian Guernsey Holstein-Fresian Jersey
## 2year 10 10 10 10 10
## Mature 10 10 10 10 10
1.9 Two-way ANOVA Model: Type II Test
- \(SS(Age | Breed) = SS(Age , Breed) – SS(Breed)\) for factor Age
- \(SS(Breed | Age) = SS(Breed, Age) – SS(Age)\) for factor Breed
- The type 2 test is to test the presence of a main effect after the other main effect with no significant interaction effects
##
## Call:
## lm(formula = Log_Butterfat ~ Age + Breed, data = Dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.22730 -0.05548 -0.01101 0.05986 0.21546
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.38750 0.02114 65.620 < 2e-16 ***
## AgeMature 0.02338 0.01726 1.354 0.178865
## BreedCanadian 0.08798 0.02730 3.223 0.001743 **
## BreedGuernsey 0.19564 0.02730 7.167 1.71e-10 ***
## BreedHolstein-Fresian -0.10139 0.02730 -3.714 0.000346 ***
## BreedJersey 0.26112 0.02730 9.566 1.54e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08632 on 94 degrees of freedom
## Multiple R-squared: 0.7103, Adjusted R-squared: 0.6948
## F-statistic: 46.09 on 5 and 94 DF, p-value: < 2.2e-16
## Anova Table (Type II tests)
##
## Response: Log_Butterfat
## Sum Sq Df F value Pr(>F)
## Age 0.01367 1 1.8343 0.1789
## Breed 1.70334 4 57.1486 <2e-16 ***
## Residuals 0.70043 94
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1.10 Diagnostic check: Log_Butterfat ~ Age + Breed
1.11 Two-way ANOVA Model: Type III Test
- \(SS(Age | Breed, Age*Breed) = SS(Age, Breed, Age*Breed) – SS(Breed, Age*Breed)\) for factor Age
- \(SS(Breed | Age, Age*Breed) = SS(Breed, Age, Age*Breed) – SS(Age, Age*Breed)\) for factor Breed
- The type 3 test is to test the presence of a main effect after the other main effect with interaction
AnovaModel.21D <- lm(Log_Butterfat ~ Age*Breed, data=Dataset, contrasts=list(Age ="contr.Sum", Breed ="contr.Sum"))
summary(AnovaModel.21D)##
## Call:
## lm(formula = Log_Butterfat ~ Age * Breed, data = Dataset, contrasts = list(Age = "contr.Sum",
## Breed = "contr.Sum"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.227129 -0.050731 -0.006887 0.053899 0.235756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.4878599 0.0086802 171.409 < 2e-16
## Age[S.2year] -0.0116911 0.0086802 -1.347 0.181
## Breed[S.Ayrshire] -0.0886708 0.0173603 -5.108 1.81e-06
## Breed[S.Canadian] -0.0006865 0.0173603 -0.040 0.969
## Breed[S.Guernsey] 0.1069707 0.0173603 6.162 1.99e-08
## Breed[S.Holstein-Fresian] -0.1900612 0.0173603 -10.948 < 2e-16
## Age[S.2year]:Breed[S.Ayrshire] -0.0115690 0.0173603 -0.666 0.507
## Age[S.2year]:Breed[S.Canadian] 0.0213660 0.0173603 1.231 0.222
## Age[S.2year]:Breed[S.Guernsey] -0.0001757 0.0173603 -0.010 0.992
## Age[S.2year]:Breed[S.Holstein-Fresian] 0.0106713 0.0173603 0.615 0.540
##
## (Intercept) ***
## Age[S.2year]
## Breed[S.Ayrshire] ***
## Breed[S.Canadian]
## Breed[S.Guernsey] ***
## Breed[S.Holstein-Fresian] ***
## Age[S.2year]:Breed[S.Ayrshire]
## Age[S.2year]:Breed[S.Canadian]
## Age[S.2year]:Breed[S.Guernsey]
## Age[S.2year]:Breed[S.Holstein-Fresian]
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0868 on 90 degrees of freedom
## Multiple R-squared: 0.7195, Adjusted R-squared: 0.6914
## F-statistic: 25.65 on 9 and 90 DF, p-value: < 2.2e-16
## Anova Table (Type III tests)
##
## Response: Log_Butterfat
## Sum Sq Df F value Pr(>F)
## (Intercept) 221.373 1 29381.0602 <2e-16 ***
## Age 0.014 1 1.8141 0.1814
## Breed 1.703 4 56.5179 <2e-16 ***
## Age:Breed 0.022 4 0.7406 0.5668
## Residuals 0.678 90
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Single term deletions
##
## Model:
## Log_Butterfat ~ Age * Breed
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 0.67811 -479.36
## Age 1 0.01367 0.69178 -479.37 1.8141 0.1814
## Breed 4 1.70334 2.38145 -361.75 56.5179 <2e-16 ***
## Age:Breed 4 0.02232 0.70043 -484.12 0.7406 0.5668
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1.12 Diagnostic check: Log_Butterfat ~ Age*Breed
1.13 Conclusion
By using both One-way ANOVA and Two-way ANOVA, they show that only cow breed groups has statistically significant effect on butterfat content.
2 Salary under factor levels of education(Degree) and gender
2.1 One-way ANOVA Model:
mydataedu = read.table("./GENDER_EDU.txt", header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE)
mydataedu[1] = log(mydataedu[1] )
colnames(mydataedu)[1] = "Log_Income"
head(mydataedu, 30) ## Log_Income Degree GENDER
## 1 4.248495 HS F
## 2 4.248495 HS M
## 3 4.248495 HS F
## 4 4.248495 HS F
## 5 4.262680 HS F
## 6 4.262680 HS F
## 7 4.262680 HS F
## 8 4.262680 HS F
## 9 4.276666 HS F
## 10 4.276666 HS F
## 11 4.276666 HS F
## 12 4.276666 HS F
## 13 4.276666 HS F
## 14 4.290459 HS F
## 15 4.290459 HS F
## 16 4.304065 HS F
## 17 4.304065 HS F
## 18 4.304065 HS F
## 19 4.317488 HS F
## 20 4.317488 HS F
## 21 4.330733 BSC M
## 22 4.330733 BSC M
## 23 4.343805 BSC M
## 24 4.343805 BSC M
## 25 4.343805 BSC M
## 26 4.343805 BSC M
## 27 4.343805 BSC M
## 28 4.343805 BSC F
## 29 4.343805 BSC F
## 30 4.356709 BSC F
Dataset = mydataedu
AnovaModel.22A <- aov(Log_Income ~ Degree, data=Dataset)
summary(AnovaModel.22A)## Df Sum Sq Mean Sq F value Pr(>F)
## Degree 3 0.7721 0.25738 24.26 3.09e-12 ***
## Residuals 115 1.2201 0.01061
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## mean sd data:n
## BSC 4.480160 0.10873569 46
## HS 4.347804 0.12532122 29
## MSC 4.545248 0.08423647 30
## PHD 4.579815 0.05716771 14
local({
.Pairs <- glht(AnovaModel.22A, linfct = mcp(Degree = "Tukey"))
print(summary(.Pairs)) # pairwise tests
print(confint(.Pairs)) # confidence intervals
print(cld(.Pairs)) # compact letter display
old.oma <- par(oma=c(0,5,0,0))
plot(confint(.Pairs))
par(old.oma)
})##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Income ~ Degree, data = Dataset)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## HS - BSC == 0 -0.13236 0.02442 -5.419 <0.001 ***
## MSC - BSC == 0 0.06509 0.02417 2.693 0.0390 *
## PHD - BSC == 0 0.09965 0.03144 3.170 0.0103 *
## MSC - HS == 0 0.19744 0.02682 7.361 <0.001 ***
## PHD - HS == 0 0.23201 0.03352 6.921 <0.001 ***
## PHD - MSC == 0 0.03457 0.03334 1.037 0.7243
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Income ~ Degree, data = Dataset)
##
## Quantile = 2.5974
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## HS - BSC == 0 -0.132356 -0.195791 -0.068921
## MSC - BSC == 0 0.065087 0.002304 0.127871
## PHD - BSC == 0 0.099654 0.017994 0.181314
## MSC - HS == 0 0.197443 0.127774 0.267113
## PHD - HS == 0 0.232010 0.144944 0.319076
## PHD - MSC == 0 0.034567 -0.052025 0.121159
##
## BSC HS MSC PHD
## "b" "a" "c" "c"
## Df Sum Sq Mean Sq F value Pr(>F)
## GENDER 1 0.3729 0.3729 26.95 8.92e-07 ***
## Residuals 117 1.6193 0.0138
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## mean sd data:n
## F 4.409128 0.1356974 49
## M 4.522875 0.1032377 70
local({
.Pairs <- glht(AnovaModel.22B, linfct = mcp(GENDER = "Tukey"))
print(summary(.Pairs)) # pairwise tests
print(confint(.Pairs)) # confidence intervals
print(cld(.Pairs)) # compact letter display
old.oma <- par(oma=c(0,5,0,0))
plot(confint(.Pairs))
par(old.oma)
})##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Income ~ GENDER, data = Dataset)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## M - F == 0 0.11375 0.02191 5.191 8.92e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Income ~ GENDER, data = Dataset)
##
## Quantile = 1.9804
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## M - F == 0 0.11375 0.07035 0.15714
##
## F M
## "a" "b"
## mean sd data:n
## BSC 4.480160 0.10873569 46
## HS 4.347804 0.12532122 29
## MSC 4.545248 0.08423647 30
## PHD 4.579815 0.05716771 14
## mean sd data:n
## F 4.409128 0.1356974 49
## M 4.522875 0.1032377 70
2.2 Two-way ANOVA Model:
2.2.1 Type III Test
- \(SS(Degree|Gender, Degree*Gender)= SS(Degree,Gender,Degree*Gender)– SS(Gender, Degree*Gender)\) for Degree factor
- \(SS(Gender | Degree,Degree*Gender)= SS(Gender,Degree,Degree*Gender)–SS(Degree, Degree*Gender)\) for GENDER factor
- The type 3 test is to test the presence of a main effect after the other main effect with interaction
AnovaModel.22C <- lm(Log_Income ~ Degree*GENDER, data=Dataset, contrasts=list(Degree ="contr.Sum", GENDER ="contr.Sum"))
summary(AnovaModel.22C)##
## Call:
## lm(formula = Log_Income ~ Degree * GENDER, data = Dataset, contrasts = list(Degree = "contr.Sum",
## GENDER = "contr.Sum"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.22460 -0.05612 -0.00387 0.05983 0.33645
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.4996216 0.0115704 388.892 < 2e-16 ***
## Degree[S.BSC] -0.0247453 0.0155253 -1.594 0.11381
## Degree[S.HS] -0.1091032 0.0188408 -5.791 6.62e-08 ***
## Degree[S.MSC] 0.0566198 0.0194546 2.910 0.00436 **
## GENDER[S.F] -0.0234379 0.0115704 -2.026 0.04520 *
## Degree[S.BSC]:GENDER[S.F] -0.0008686 0.0155253 -0.056 0.95548
## Degree[S.HS]:GENDER[S.F] -0.0591426 0.0188408 -3.139 0.00217 **
## Degree[S.MSC]:GENDER[S.F] 0.0417609 0.0194546 2.147 0.03400 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09692 on 111 degrees of freedom
## Multiple R-squared: 0.4766, Adjusted R-squared: 0.4436
## F-statistic: 14.44 on 7 and 111 DF, p-value: 3.038e-13
## Anova Table (Type III tests)
##
## Response: Log_Income
## Sum Sq Df F value Pr(>F)
## (Intercept) 1420.57 1 1.5124e+05 < 2.2e-16 ***
## Degree 0.37 3 1.3152e+01 2.073e-07 ***
## GENDER 0.04 1 4.1034e+00 0.04520 *
## Degree:GENDER 0.11 3 3.8987e+00 0.01084 *
## Residuals 1.04 111
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## F M
## BSC 4.450570 4.499183
## HS 4.307938 4.473099
## MSC 4.574564 4.537918
## PHD 4.571663 4.582038
## F M
## BSC 0.09564147 0.11396207
## HS 0.08865537 0.14710516
## MSC 0.09502159 0.08187640
## PHD 0.09619556 0.04870832
## GENDER
## Degree F M
## BSC 18 28
## HS 22 7
## MSC 6 24
## PHD 3 11
2.3 Diagnostic check: Log_Income ~ Degree*GENDER
2.4 Making a post-hoc test to evaluate pairwise group differences within a main factor and an interaction
## Warning: package 'phia' was built under R version 3.6.2
AnovaModel.22D <- lm(Log_Income ~ Degree*GENDER, data=Dataset)
testInteractions(AnovaModel.22D, pairwise="Degree", adjustment="holm")## F Test:
## P-value adjustment method: holm
## Value Df Sum of Sq F Pr(>F)
## BSC-HS 0.084358 1 0.10181 10.8393 0.005333 **
## BSC-MSC -0.081365 1 0.08839 9.4099 0.008135 **
## BSC-PHD -0.101974 1 0.08069 8.5901 0.008209 **
## HS-MSC -0.165723 1 0.27696 29.4863 2.013e-06 ***
## HS-PHD -0.186332 1 0.22672 24.1372 1.558e-05 ***
## MSC-PHD -0.020609 1 0.00269 0.2859 0.593913
## Residuals 111 1.04262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## F Test:
## P-value adjustment method: holm
## Value Df Sum of Sq F Pr(>F)
## BSC-HS : F 0.142632 1 0.20140 21.4420 0.0001097 ***
## BSC-MSC : F -0.123995 1 0.06919 7.3657 0.0693853 .
## BSC-PHD : F -0.121093 1 0.03771 4.0143 0.2853143
## HS-MSC : F -0.266627 1 0.33514 35.6795 3.429e-07 ***
## HS-PHD : F -0.263725 1 0.18361 19.5480 0.0002297 ***
## MSC-PHD : F 0.002902 1 0.00002 0.0018 1.0000000
## BSC-HS : M 0.026084 1 0.00381 0.4056 1.0000000
## BSC-MSC : M -0.038736 1 0.01939 2.0644 0.6143671
## BSC-PHD : M -0.082855 1 0.05422 5.7719 0.1435463
## HS-MSC : M -0.064819 1 0.02277 2.4241 0.6116324
## HS-PHD : M -0.108939 1 0.05077 5.4048 0.1532990
## MSC-PHD : M -0.044119 1 0.01468 1.5631 0.6415163
## Residuals 111 1.04262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## F Test:
## P-value adjustment method: holm
## Value Df Sum of Sq F Pr(>F)
## F-M : BSC -0.048613 1 0.02589 2.7566 0.2990276
## F-M : HS -0.165161 1 0.14486 15.4218 0.0005987 ***
## F-M : MSC 0.036646 1 0.00645 0.6863 0.8184329
## F-M : PHD -0.010375 1 0.00025 0.0270 0.8697503
## Residuals 111 1.04262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## F Test:
## P-value adjustment method: holm
## Degree1 Degree2 Degree3 Df Sum of Sq F Pr(>F)
## BSC-HS : F -0.121093 -0.26372 0.002902 3 0.49965 17.7313 2.152e-08 ***
## BSC-MSC : F -0.121093 -0.26372 0.002902 3 0.49965 17.7313 2.152e-08 ***
## BSC-PHD : F -0.121093 -0.26372 0.002902 3 0.49965 17.7313 2.152e-08 ***
## HS-MSC : F -0.121093 -0.26372 0.002902 3 0.49965 17.7313 2.152e-08 ***
## HS-PHD : F -0.121093 -0.26372 0.002902 3 0.49965 17.7313 2.152e-08 ***
## MSC-PHD : F -0.121093 -0.26372 0.002902 3 0.49965 17.7313 2.152e-08 ***
## BSC-HS : M -0.082855 -0.10894 -0.044119 3 0.07699 2.7323 0.2831
## BSC-MSC : M -0.082855 -0.10894 -0.044119 3 0.07699 2.7323 0.2831
## BSC-PHD : M -0.082855 -0.10894 -0.044119 3 0.07699 2.7323 0.2831
## HS-MSC : M -0.082855 -0.10894 -0.044119 3 0.07699 2.7323 0.2831
## HS-PHD : M -0.082855 -0.10894 -0.044119 3 0.07699 2.7323 0.2831
## MSC-PHD : M -0.082855 -0.10894 -0.044119 3 0.07699 2.7323 0.2831
## Residuals 111 1.04262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2.5 Conclusion
- With one-way ANOVA test, it indicated that both Degree of Eduction and Gender are statistically significant group factors.
- With one-way ANOVA model Tukey test, it indicated that except the interaction of “PHD - MSC”, all other pairwise interactions within the Degree of eduction group are statistically significant different from zero.
- With one-way ANOVA model Tukey test, it indicated that pairwise interaction of “F-M” within the gender group are statistically significant different from zero.
- With two-way ANOVA model type 3 test, it indicated that the both groups of Degree of eduction and gender are higly statistically significant while across group interaction between “Degree” and “GENDER” is weakly statistically significant different from zero.
- With a two-way ANOVA post-hoc test for pairwise group differences within main factors and interactions, it found that all pairwise interaction within Degree of eduction group has highly statistically significant across interaction with the factor level of “Female” but not statistically signfiicant with “Male”.
Reference
Fox, John. 2015. Applied Regression Analysis and Generalized Linear Models. Sage Publications.
Herr, David G. 1986. “On the History of Anova in Unbalanced, Factorial Designs: The First 30 Years.” The American Statistician 40 (4): 265–70.
Langsrud, Øyvind. 2003. “ANOVA for Unbalanced Data: Use Type 2 Instead of Type 3 Sums of Squares.” Statistics and Computing 13 (2): 163–67.
2020-01-21