TASK
The Salaries dataset from the carData consists of nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the U.S. The data were collected as part of the on-going effort of the college’s administration to monitor salary differences between male and female faculty members.
1. Perform the 3-way Anova with and w/o interactions. Interpret the results.
2. Can years since doctorate (yrs.since.phd), length of service (yrs.service) be significant as covariates?
3. Is there any significant difference in years since PhD (yrs.since.phd) and seniority (yrs.service) of different rank professors?
3-WAY ANOVA
In the first point, we want to perform 3-way Anova test. We will start with understanding the data.
Understanding the data
At the beginning we will add an identifier for each observation - it will be useful later, when dealing with outliers.
sal <- Salaries
v <- c(1:nrow(sal))
sal <- cbind(v, sal)
a <- sal %>%
group_by(rank, discipline, sex) %>%
get_summary_stats(salary, type = "mean")
| rank | discipline | sex | variable | n | mean |
|---|---|---|---|---|---|
| AsstProf | A | Female | salary | 6 | 72933.33 |
| AsstProf | A | Male | salary | 18 | 74269.61 |
| AsstProf | B | Female | salary | 5 | 84189.80 |
| AsstProf | B | Male | salary | 38 | 84647.08 |
| AssocProf | A | Female | salary | 4 | 72128.50 |
| AssocProf | A | Male | salary | 22 | 85048.86 |
| AssocProf | B | Female | salary | 6 | 99435.67 |
| AssocProf | B | Male | salary | 32 | 101621.53 |
| Prof | A | Female | salary | 8 | 109631.88 |
| Prof | A | Male | salary | 123 | 120619.26 |
| Prof | B | Female | salary | 10 | 131836.20 |
| Prof | B | Male | salary | 125 | 133518.36 |
We create a summary table using get_summary_stats to observe mean values of observations. We can see that the increasing trend in salaries is visible when looking at rank of teachers - Assistant Professors earn less than Professors. There is also a difference in discipline - in each case professors from group B earn more than those in group A. Another trend may be seen in gender of professors - men tend to earn more than women. This will be three factors we will consider when performing 3-way Anova: rank, discipline, sex.
Visualization of those variables is visible on a plot below:
ggplot(data = sal, aes(y = salary, x = discipline, color = sex)) + geom_boxplot() +
facet_wrap(~rank) + xlab("") + ylab("Salary")
Assumptions
To be able to perform 3-way Anova, we must check following assumptions: - not significant outliers
- samples come from populations that are normally distributed - normality check
- homogeneity of variances in groups - check of homoscedasticity
- Outliers
We will try to identify outliers in our data.
outliers_data <- sal %>%
group_by(sex, rank, discipline) %>%
identify_outliers(salary)
x <- outliers_data %>% filter(is.outlier == 'TRUE')
x <- x[, 4]
x <- as.numeric(unlist(x))
sal1 <- sal[-which(sal$v %in% x),]
| rank | discipline | sex | v | yrs.since.phd | yrs.service | salary | is.outlier | is.extreme |
|---|---|---|---|---|---|---|---|---|
| AsstProf | A | Female | 238 | 7 | 6 | 63100 | TRUE | FALSE |
| AssocProf | A | Female | 124 | 25 | 22 | 62884 | TRUE | FALSE |
| AssocProf | B | Female | 219 | 14 | 7 | 109650 | TRUE | TRUE |
| AssocProf | B | Female | 317 | 12 | 9 | 71065 | TRUE | TRUE |
| AsstProf | A | Male | 227 | 3 | 1 | 63900 | TRUE | FALSE |
| AsstProf | A | Male | 288 | 2 | 0 | 85000 | TRUE | TRUE |
| AsstProf | A | Male | 397 | 8 | 4 | 81035 | TRUE | FALSE |
| AssocProf | A | Male | 141 | 14 | 8 | 100102 | TRUE | FALSE |
| AssocProf | A | Male | 228 | 9 | 7 | 70000 | TRUE | FALSE |
| AssocProf | A | Male | 294 | 11 | 1 | 104800 | TRUE | FALSE |
| AssocProf | A | Male | 300 | 45 | 39 | 70700 | TRUE | FALSE |
| AssocProf | A | Male | 368 | 10 | 1 | 108413 | TRUE | FALSE |
| AssocProf | A | Male | 380 | 11 | 8 | 104121 | TRUE | FALSE |
| AssocProf | B | Male | 323 | 13 | 11 | 126431 | TRUE | FALSE |
| Prof | A | Male | 250 | 29 | 7 | 204000 | TRUE | FALSE |
| Prof | A | Male | 272 | 42 | 18 | 194800 | TRUE | FALSE |
| Prof | A | Male | 365 | 43 | 43 | 205500 | TRUE | FALSE |
| Prof | B | Male | 44 | 38 | 38 | 231545 | TRUE | FALSE |
We chose to delete all outliers to make the data easier for later calculations.
- Normality
To check normality, we will perform shapiro test and visualize the QQ-plots.
a <- sal1 %>%
group_by(rank, discipline, sex) %>%
shapiro_test(salary)
| rank | discipline | sex | variable | statistic | p |
|---|---|---|---|---|---|
| AsstProf | A | Female | salary | 0.8127783 | 0.1025736 |
| AsstProf | A | Male | salary | 0.9534922 | 0.5810276 |
| AsstProf | B | Female | salary | 0.8892976 | 0.3535834 |
| AsstProf | B | Male | salary | 0.9411888 | 0.0457877 |
| AssocProf | A | Female | salary | 0.9760308 | 0.7031217 |
| AssocProf | A | Male | salary | 0.8994292 | 0.0786954 |
| AssocProf | B | Female | salary | 0.9157855 | 0.5135847 |
| AssocProf | B | Male | salary | 0.9760940 | 0.6978685 |
| Prof | A | Female | salary | 0.9335779 | 0.5491953 |
| Prof | A | Male | salary | 0.9667284 | 0.0045659 |
| Prof | B | Female | salary | 0.9737136 | 0.9229797 |
| Prof | B | Male | salary | 0.9856971 | 0.2179443 |
ggqqplot(residuals(lm(salary ~ rank*discipline*sex, data = sal1))) + theme_minimal() +
ggtitle("Residuals for checking normality assumption")
In Shapiro-Wilk test we observe two groups where normality assumption is not met. The QQ plot also does not allow us to see perfectly normal distribution.
- Homogeneity of variances
To check homogeneity of variance we use Levene’s Test.
a <- leveneTest(salary ~ sex * rank * discipline, data = sal1)
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 11 | 10.91118 | 0 |
| 367 | NA | NA |
As we can observe, we cannot accept two main assumptions - normality and homogeneity of variance. P-value from Levene’s Test is basically 0. Hence, we will try making some transformations to the data.
Data transformation
Firstly, we will try to change the data by taking log of salaries. We want to try to find a solution when a 3-way ANOVA can be performed.
Transformed data will again go through assumptions checking.
a <- sal22 %>%
group_by(rank, discipline, sex) %>%
shapiro_test(salary)
| rank | discipline | sex | variable | statistic | p |
|---|---|---|---|---|---|
| AsstProf | A | Female | salary | 0.8133727 | 0.1036839 |
| AsstProf | A | Male | salary | 0.9153652 | 0.2498065 |
| AsstProf | B | Female | salary | 0.8957397 | 0.3867889 |
| AsstProf | B | Male | salary | 0.9311392 | 0.0217904 |
| AssocProf | A | Female | salary | 0.9782176 | 0.7170927 |
| AssocProf | A | Male | salary | 0.9326272 | 0.4742249 |
| AssocProf | B | Female | salary | 0.9163571 | 0.5167221 |
| AssocProf | B | Male | salary | 0.9810190 | 0.8404347 |
| Prof | A | Female | salary | 0.9363242 | 0.5753139 |
| Prof | A | Male | salary | 0.9828775 | 0.1351391 |
| Prof | B | Female | salary | 0.9758636 | 0.9392650 |
| Prof | B | Male | salary | 0.9832500 | 0.1318367 |
ggqqplot(residuals(lm(salary ~ rank*discipline*sex, data = sal22))) + theme_minimal() +
ggtitle("Residuals for checking normality assumption")
a <- leveneTest(data = sal22, salary ~ rank * discipline * sex)
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 11 | 9.418411 | 0 |
| 356 | NA | NA |
By taking log of salaries we can observe that most of connections rank * discipline * sex turns out to have normal distribution. The only problem occurs with one subgroup. However, we did not manage to increase p-value for homogeneity of variance to be able to accept this assumption.
Hence, we will perform the calculations in two ways. First, we will use log(salaries) to perform a 3-way ANOVA as was asked, even though assumptions are not met. We will assume small sample with not normal distribution does not reflects real population parameters. In second part, we will use non parametric test.
3-way Anova
First, performing 3-way Anova with interactions. We will begin by stating null hypothesis:
H0A : Factor A = rank has no effect on the salary of employees in the university.
H0B : Factor B = discipline the employees work in has no effect on the salary.
H0C : Factor C = sex, gender of employees has no effect on the salary amount.
H0AxB : There is no interaction between rank and discipline that has an affect on the salary they get.
H0AxC : There is no interaction between rank and sex of employees that has an affect on the salary they get.
H0BxC : There is no interaction between discipline and sex of employees performed on participants that has an affect on the salary they get.
H0AxBxC : There is no interaction between rank, discipline and sex that has an affect on the salary the employees get.
Now we can create an Anova model and check the null hypothesis.
model1 <- aov(data = sal22, salary ~ rank * discipline * sex)
a <- anova(model1)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| rank | 2 | 9.9298550 | 4.9649275 | 176.6799363 | 0.0000000 |
| discipline | 1 | 1.5383121 | 1.5383121 | 54.7417639 | 0.0000000 |
| sex | 1 | 0.0165964 | 0.0165964 | 0.5905943 | 0.4426989 |
| rank:discipline | 2 | 0.0849683 | 0.0424842 | 1.5118241 | 0.2219196 |
| rank:sex | 2 | 0.0060102 | 0.0030051 | 0.1069385 | 0.8986097 |
| discipline:sex | 1 | 0.0234150 | 0.0234150 | 0.8332386 | 0.3619555 |
| rank:discipline:sex | 2 | 0.0177163 | 0.0088582 | 0.3152232 | 0.7298294 |
| Residuals | 356 | 10.0040459 | 0.0281013 | NA | NA |
From performing 3-way Anova with interactions we can observe these results:
- Rank and discipline have very significant influence on salary value. Hence, we can conclude that there is a significant difference between salaries of Professors, Assistant Professors and Associate Professors as well as between all employees in department A and department B.
Sex does not turn out to have significant impact on salary. Hence, the null hypothesis cannot be rejected and we do not assume any difference between males and females employees.
There is no significant interaction between any pair of variables: rank-discipline, rank-sex, discipline-sex, nor rank-discipline-sex. In each of these cases p-value is higher than our significance level. Hence, null hypothesis about existing significant interactions between variables are all not rejected.
model2 <- aov(data = sal22, salary ~ rank + discipline + sex)
a <- anova(model2)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| rank | 2 | 9.9298550 | 4.9649275 | 177.8059375 | 0.0000000 |
| discipline | 1 | 1.5383121 | 1.5383121 | 55.0906394 | 0.0000000 |
| sex | 1 | 0.0165964 | 0.0165964 | 0.5943583 | 0.4412402 |
| Residuals | 363 | 10.1361558 | 0.0279233 | NA | NA |
If we wanted to perform 3-way ANOVA without interactions, we will get only three first values from previous calculations. We only check statements whether rank, discipline or sex has any significant impact on the salary of employees - but without mixing the factors. We can again observe that significant values are reported when taking about rank and discipline, however not with gender.
Post-hoc test
As we managed to reject null hypothesis, we will now perform a post-hoc test to check which pairs of possible ranks, disciplines and genders had significant influence on reject H0.
We should check:
Simple two-way interaction: run two-way interaction at each level of third variable,
Simple simple main effect: run one-way model at each level of second variable,
Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.
As we had non significant 3-way interaction nor any significant 2-way interaction, we will only focus on finding main effects.
sal2grouped <- sal2 %>% unite(col = "group", rank, discipline, sex)
pair <- pairwise_t_test(data = sal2grouped, salary ~ group,
p.adjust.method = "bonferroni")
| .y. | group1 | group2 | n1 | n2 | p | p.signif | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|
| salary | AssocProf_A_Female | AssocProf_A_Male | 3 | 16 | 0.429000 | ns | 1.0000 | ns |
| salary | AssocProf_A_Female | AssocProf_B_Female | 3 | 4 | 0.014600 |
|
0.9640 | ns |
| salary | AssocProf_A_Male | AssocProf_B_Female | 16 | 4 | 0.014300 |
|
0.9450 | ns |
| salary | AssocProf_A_Female | AssocProf_B_Male | 3 | 31 | 0.005870 | ** | 0.3870 | ns |
| salary | AssocProf_A_Male | AssocProf_B_Male | 16 | 31 | 0.000154 | *** | 0.0101 |
|
| salary | AssocProf_B_Female | AssocProf_B_Male | 4 | 31 | 0.709000 | ns | 1.0000 | ns |
| salary | AssocProf_A_Female | AsstProf_A_Female | 3 | 5 | 0.972000 | ns | 1.0000 | ns |
| salary | AssocProf_A_Male | AsstProf_A_Female | 16 | 5 | 0.307000 | ns | 1.0000 | ns |
| salary | AssocProf_B_Female | AsstProf_A_Female | 4 | 5 | 0.004880 | ** | 0.3220 | ns |
| salary | AssocProf_B_Male | AsstProf_A_Female | 31 | 5 | 0.000468 | *** | 0.0309 |
|
From performing pair-wise comparison between grouped three variables we can observe significant interaction between most of the pairs. Taking from first ten for example between AssocProf_A_Female and AssocProf_B_Female -> what supports the fact that null hypothesis stating that discipline has significant impact on salary score. As there are as many as 66 combinations (12 * 11 / 2) we will only show first 10 for observing the p-value.
Non Parametric Anova
Second possibility we can use is to perform a non parametric version of Anova test. As there exist only an option to do 1-way non parametric Anova - Kruskal-Wallis Anova we will use grouped variable corresponding to rank, discipline and sex at once.
sal1grouped <- sal1 %>% unite(col = "group", rank, discipline, sex)
kruskal.test(data = sal1grouped, salary ~ group)
##
## Kruskal-Wallis rank sum test
##
## data: salary by group
## Kruskal-Wallis chi-squared = 210.93, df = 11, p-value < 2.2e-16
Using this test, we managed to again reject null hypothesis. However, the only think we know thanks to that is that the rank, discipline a person works in and his/her gender have significant impact on their salary value.
ANCOVA
In this task we also wanted to take a look at possible covariates - years since phd and years of working at the university. We are interested whether these values should be taken into account when talking about salaries.
Plots
We would like to visualize possible connections between covariates, salary and other variables values.
ggplot(data = sal22, aes(y = salary, x = yrs.since.phd, color = sex)) + geom_point() +
facet_wrap(~discipline~rank) + xlab("") + ylab("Salary") +
ggtitle('Years since phd vs salary for different groups of Professors')
ggplot(data = sal22, aes(y = salary, x = yrs.service, color = sex)) + geom_point() +
facet_wrap(~discipline~rank) + xlab("") + ylab("Salary") +
ggtitle('Years of service vs for different groups of Professors')
On both plots we can see that the Professors are mostly spread - in both years of service / since phd and salary. Assistant Professors earn less and also have the lowest seniority.
Years since phd
In this test we should also take a look at Ancova assumptions. Most of them were checked in previous exercise, however we will also want to take a look at few others. We are using the log(salary) to be closer to our normality and homogeneity assumptions.
Here we want to check whether the normality assumption of covariate (yrs.since.phd) is met.
ggscatter(sal22, x = "yrs.since.phd", y = "salary",
facet.by = c("rank", "discipline")) + stat_smooth(method = "loess")
On this plot we can also observe how years since phd and log(salary) looks like in terms of different ranks and disciplines of employees.
We also must take a look if our covariate variable is independent. To do that, we will perform a Anova test.
a <- anova(aov(salary ~ group * yrs.since.phd, data = sal2grouped))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| group | 11 | 12.4732241 | 1.1339295 | 37.3768354 | 0.0000000 |
| yrs.since.phd | 1 | 0.0191795 | 0.0191795 | 0.6321975 | 0.4270817 |
| group:yrs.since.phd | 11 | 0.1896278 | 0.0172389 | 0.5682321 | 0.8544456 |
| Residuals | 355 | 10.7699048 | 0.0303378 | NA | NA |
Here we can observe that interaction between grouped variable and our covariate is not significant (0.85). Hence, this assumption is fulfiled.
Ancova Test
ancova <- aov(salary ~ yrs.since.phd + rank * discipline * sex, data = sal22)
a <- anova(ancova)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| yrs.since.phd | 1 | 4.2386151 | 4.2386151 | 150.4140707 | 0.0000000 |
| rank | 2 | 5.7370542 | 2.8685271 | 101.7942950 | 0.0000000 |
| discipline | 1 | 1.4934390 | 1.4934390 | 52.9970837 | 0.0000000 |
| sex | 1 | 0.0160431 | 0.0160431 | 0.5693152 | 0.4510314 |
| rank:discipline | 2 | 0.0850348 | 0.0425174 | 1.5087987 | 0.2225903 |
| rank:sex | 2 | 0.0058061 | 0.0029030 | 0.1030192 | 0.9021366 |
| discipline:sex | 1 | 0.0233634 | 0.0233634 | 0.8290876 | 0.3631543 |
| rank:discipline:sex | 2 | 0.0177897 | 0.0088948 | 0.3156474 | 0.7295210 |
| Residuals | 355 | 10.0037740 | 0.0281796 | NA | NA |
Having performed this Ancova test, we can observe that years since phd does have strong impact on salary value. We could have observed that on also previous plots, hence we reject the null hypothesis.
Years of service
Now we will take a look at other suspicious variable - years of service.
First, ploting it agains salary.
ggscatter(sal22, x = "yrs.service", y = "salary",
facet.by = c("rank", "discipline")) + stat_smooth(method = "loess")
a <- anova(aov(salary ~ group * yrs.service, data = sal2grouped))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| group | 11 | 12.4732241 | 1.1339295 | 38.076712 | 0.0000000 |
| yrs.service | 1 | 0.0729543 | 0.0729543 | 2.449765 | 0.1184334 |
| group:yrs.service | 11 | 0.3338114 | 0.0303465 | 1.019018 | 0.4287228 |
| Residuals | 355 | 10.5719463 | 0.0297801 | NA | NA |
We can observe that our variable is independent on group. Hence, we can fulfill that assumption.
ancova2 <- aov(salary ~ yrs.service + rank * discipline * sex, data = sal22)
a <- anova(ancova2)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| yrs.service | 1 | 2.9106921 | 2.9106921 | 103.3961891 | 0.0000000 |
| rank | 2 | 7.0785977 | 3.5392989 | 125.7261165 | 0.0000000 |
| discipline | 1 | 1.4849723 | 1.4849723 | 52.7505030 | 0.0000000 |
| sex | 1 | 0.0189667 | 0.0189667 | 0.6737516 | 0.4122975 |
| rank:discipline | 2 | 0.0845366 | 0.0422683 | 1.5014919 | 0.2242090 |
| rank:sex | 2 | 0.0070863 | 0.0035431 | 0.1258626 | 0.8817753 |
| discipline:sex | 1 | 0.0247324 | 0.0247324 | 0.8785674 | 0.3492316 |
| rank:discipline:sex | 2 | 0.0177785 | 0.0088893 | 0.3157720 | 0.7294303 |
| Residuals | 355 | 9.9935568 | 0.0281509 | NA | NA |
This Ancova test also shows significant impact of years of service on salary value.
Hence, both our suspictious variables turned out to be good covariates as they should be taken into account when talking about salary of professors.
KRUSKAL-WALLIS TEST
Is there any significant difference in years since PhD (yrs.since.phd) and seniority (yrs.service) of different rank professors?
Plots & Data
In the last task, we were asked to find whether there exists a difference between years since PhD and years of service of different rank professors. We will start with ploting the data.
ggplot(data = sal1, aes(x = yrs.since.phd, y = yrs.service)) + geom_point() +
facet_grid(~rank) + xlab('Years since PhD') + ylab('Years of service') +
ggtitle('Years since PhD vs years of service for different Professors')
To check last exercise, we will create a new variable called diff. This will keep the value of difference between years since PhD - years of service. Positive value will correspond to a professor that started his/her learning adventure after completing PhD. Negative value will mean that he/she was teaching before that.
saldiff <- sal1 %>% mutate(diff = yrs.since.phd - yrs.service)
a <- saldiff %>%
group_by(rank) %>%
get_summary_stats(diff, type = "mean")
| rank | variable | n | mean |
|---|---|---|---|
| AsstProf | diff | 63 | 2.762 |
| AssocProf | diff | 54 | 3.204 |
| Prof | diff | 262 | 5.393 |
Summary of data can show us that there occurs to be some difference between ranks. Professors tend to have bigger mean difference between years since PhD and years of service. We must take into account however, that these people are usually older and tend to have worked longer. However, we cannot check that using our dataset.
ggplot(data = saldiff, aes(y = diff, x = rank, color = rank)) + geom_boxplot() +
theme(legend.position = 'none') + xlab('') + ylab('yrs since PhD - yrs of service') +
ggtitle('Difference between years since PhD and years of service for different Professors')
Assumptions
In this situation we would like to perform 1-way ANOVA. However, first we must take a look at assumptions.
- samples are independent
- samples come from populations that are normally distributed - normality check
- homogeneity of variances in groups - check of homoscedasticity
- not significant outliers
- Outliers check
outliers_data <- saldiff %>%
group_by(sex, rank, discipline) %>%
identify_outliers(salary)
x <- outliers_data %>% filter(is.extreme == 'TRUE')
x <- x[, 4]
x <- as.numeric(unlist(x))
saldiff2 <- saldiff[-which(saldiff$v %in% x),]
| rank | discipline | sex | v | yrs.since.phd | yrs.service | salary | diff | is.outlier | is.extreme |
|---|---|---|---|---|---|---|---|---|---|
| AsstProf | A | Male | 235 | 8 | 3 | 69700 | 5 | TRUE | FALSE |
| AsstProf | A | Male | 241 | 5 | 3 | 69200 | 2 | TRUE | FALSE |
| AsstProf | A | Male | 360 | 11 | 4 | 78785 | 7 | TRUE | FALSE |
| AssocProf | A | Male | 139 | 10 | 7 | 73877 | 3 | TRUE | TRUE |
| AssocProf | A | Male | 258 | 30 | 23 | 74000 | 7 | TRUE | TRUE |
| AssocProf | A | Male | 261 | 41 | 33 | 88600 | 8 | TRUE | TRUE |
| AssocProf | A | Male | 285 | 8 | 6 | 88650 | 2 | TRUE | TRUE |
| AssocProf | A | Male | 371 | 13 | 8 | 78182 | 5 | TRUE | FALSE |
| AssocProf | A | Male | 383 | 8 | 5 | 86895 | 3 | TRUE | FALSE |
| Prof | A | Male | 390 | 33 | 18 | 186023 | 15 | TRUE | FALSE |
As we found some significant outliers, we will delete them from our dataset.
- Normality check
a <- saldiff2 %>%
group_by(rank) %>%
shapiro_test(diff)
| rank | variable | statistic | p |
|---|---|---|---|
| AsstProf | diff | 0.9077412 | 0.0001785 |
| AssocProf | diff | 0.9089302 | 0.0009604 |
| Prof | diff | 0.9064980 | 0.0000000 |
qqnorm(saldiff2$diff, main="QQ plot", pch=19)
qqline(saldiff2$diff)
As we can observe, we are not dealing with normal data. We will check homoscedasticity, however even then we will use non - parametric test.
- Homoscedasticity
a <- leveneTest(data = saldiff2, diff ~ rank)
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 2 | 17.28444 | 1e-07 |
| 372 | NA | NA |
Our data do not contain homogeneous variance. Hence, we must perform non parametric test - Kruskal-Wallis.
Kruskal-Wallis Test
kruskal.test(data = saldiff2, diff ~ rank)
##
## Kruskal-Wallis rank sum test
##
## data: diff by rank
## Kruskal-Wallis chi-squared = 8.4193, df = 2, p-value = 0.01485
We obtained a very nice value of p 0.01 -> meaning with our significance level 0.05 we can reject null hypothesis. Hence, there is a difference in differences between years since PhD and years of service for different rank professors.