Tukey Test Vs Kruskal-Wallis Rank Sum Test
Tukey Test Vs Kruskal-Wallis Rank Sum Test
1 Different costs between different housing options in Stockholm
1.1 Log Transformation of housing cost
mydata = read.table("./Cost_month.txt", header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE)
mydata[3] = log(mydata[3] )
colnames(mydata)[3] = "Log_MonthlyCost_SEK"
mydata## Person Housing Log_MonthlyCost_SEK
## 1 1 Rental 8.070906
## 2 2 Rental 8.517193
## 3 3 Rental 8.294050
## 4 4 Rental 8.160518
## 5 5 Rental 7.972466
## 6 6 Rental 8.411833
## 7 7 Rental 8.366370
## 8 8 Rental 7.600902
## 9 9 Rental 8.131531
## 10 10 Rental 7.972466
## 11 11 Condo 8.853665
## 12 12 Condo 8.987197
## 13 13 Condo 9.392662
## 14 14 Condo 9.024011
## 15 15 Condo 8.974618
## 16 16 Condo 9.350102
## 17 17 Condo 9.104980
## 18 18 Condo 9.392662
## 19 19 Condo 9.126959
## 20 20 Condo 9.210340
## 21 21 Co_op 8.006368
## 22 22 Co_op 7.313220
## 23 23 Co_op 7.600902
## 24 24 Co_op 6.907755
## 25 25 Co_op 6.802395
## 26 26 Co_op 7.824046
## 27 27 Co_op 7.740664
## 28 28 Co_op 8.006368
## 29 29 Co_op 7.937375
## 30 30 Co_op 6.907755
Dataset = mydata
with(Dataset, numSummary(Log_MonthlyCost_SEK, groups=Housing, statistics=c("mean", "sd")))## mean sd data:n
## Co_op 7.504685 0.4830513 10
## Condo 9.141720 0.1898171 10
## Rental 8.149824 0.2671258 10
1.2 Hypothesis Testing
- Question: Should there any mean differences of housing cost for housing options in Stockholm ?
- \(H_o: \mu(Rental) = \mu(Condo) = \mu(Co\_op)\)
- \(H_a:\) Means are not all equal
1.3 ANOVA Model Fit
## Df Sum Sq Mean Sq F value Pr(>F)
## Housing 2 13.600 6.800 59.87 1.19e-10 ***
## Residuals 27 3.067 0.114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1.4 Tukey Contrast Test
local({
.Pairs <- glht(AnovaModel.10, linfct = mcp(Housing = "Tukey"))
print(summary(.Pairs)) # pairwise tests
print(confint(.Pairs)) # confidence intervals
print(cld(.Pairs)) # compact letter display
old.oma <- par(oma=c(0,5,0,0))
plot(confint(.Pairs))
par(old.oma)
})##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_MonthlyCost_SEK ~ Housing, data = Dataset)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## Condo - Co_op == 0 1.6370 0.1507 10.862 < 1e-04 ***
## Rental - Co_op == 0 0.6451 0.1507 4.281 0.000555 ***
## Rental - Condo == 0 -0.9919 0.1507 -6.581 < 1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_MonthlyCost_SEK ~ Housing, data = Dataset)
##
## Quantile = 2.4804
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## Condo - Co_op == 0 1.6370 1.2632 2.0109
## Rental - Co_op == 0 0.6451 0.2713 1.0190
## Rental - Condo == 0 -0.9919 -1.3657 -0.6181
##
## Co_op Condo Rental
## "a" "c" "b"
1.5 Diagnostic tools to check for normality and fit of model
## Analysis of Variance Table
##
## Response: Log_MonthlyCost_SEK
## Df Sum Sq Mean Sq F value Pr(>F)
## Housing 2 13.5998 6.7999 59.871 1.188e-10 ***
## Residuals 27 3.0665 0.1136
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1.6 Conclusion
Tukey Contrasts test indicated that \(\mu(Condo) - \mu(Co\_op)\), \(\mu(Rental)- \mu(Condo)\) and \(\mu(Rental)- \mu(Co\_op)\) are highly significant different from zero. Hence there is no reason to believe that housing costs have no differences with respect to housing options in Stockholm
2 Different stress factors effects on heart rate
2.1 import data
- Definition:
- Cold Water = CW
- Mental Stress = MS
- Physical Exercise = PE
- Heat Rate = HR
mydatastress = read.table("./stress.txt", header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE)
mydatastress[2] = log(mydatastress[2] )
colnames(mydatastress)[2] = "Log_HR"
mydatastress## Treatment Log_HR
## 1 MS 4.276666
## 2 MS 4.204693
## 3 MS 4.406719
## 4 MS 4.828314
## 5 MS 4.700480
## 6 MS 4.356709
## 7 MS 4.553877
## 8 MS 4.532599
## 9 MS 4.356709
## 10 MS 4.234107
## 11 CW 4.007333
## 12 CW 3.828641
## 13 CW 3.988984
## 14 CW 3.988984
## 15 CW 4.077537
## 16 CW 4.219508
## 17 CW 4.110874
## 18 CW 4.143135
## 19 CW 4.043051
## 20 CW 4.219508
## 21 PE 4.094345
## 22 PE 4.189655
## 23 PE 4.430817
## 24 PE 4.330733
## 25 PE 4.219508
## 26 PE 4.143135
## 27 PE 4.158883
## 28 PE 4.317488
## 29 PE 4.356709
## 30 PE 4.406719
2.2 Hypothesis Testing
- \(H_o: \mu(CW) = \mu(MS) = \mu(PE)\)
- \(H_a:\) Means are not all equal
Dataset = mydatastress
with(Dataset, numSummary(Log_HR, groups=Treatment, statistics=c("mean", "sd")))## mean sd data:n
## CW 4.062756 0.1189262 10
## MS 4.445087 0.2053029 10
## PE 4.264799 0.1183438 10
2.3 ANOVA Model Fit
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 2 0.7317 0.3658 15.61 3.12e-05 ***
## Residuals 27 0.6327 0.0234
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2.4 Tukey Contrast Test
local({
.Pairs <- glht(AnovaModel.20, linfct = mcp(Treatment = "Tukey"))
print(summary(.Pairs)) # pairwise tests
print(confint(.Pairs)) # confidence intervals
print(cld(.Pairs)) # compact letter display
old.oma <- par(oma=c(0,5,0,0))
plot(confint(.Pairs))
par(old.oma)
})##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_HR ~ Treatment, data = Dataset)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## MS - CW == 0 0.38233 0.06846 5.585 <0.001 ***
## PE - CW == 0 0.20204 0.06846 2.951 0.0173 *
## PE - MS == 0 -0.18029 0.06846 -2.634 0.0357 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_HR ~ Treatment, data = Dataset)
##
## Quantile = 2.4788
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## MS - CW == 0 0.38233 0.21264 0.55202
## PE - CW == 0 0.20204 0.03235 0.37173
## PE - MS == 0 -0.18029 -0.34998 -0.01060
##
## CW MS PE
## "a" "c" "b"
2.5 Diagnostic tools to check for normality and fit of model
## Analysis of Variance Table
##
## Response: Log_HR
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 2 0.73168 0.36584 15.612 3.122e-05 ***
## Residuals 27 0.63268 0.02343
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2.6 Conclusion
Tukey Contrasts test indicated that \(\mu(PE) - \mu(MS)\), \(\mu(PE)- \mu(CW)\) and \(\mu(MS)- \mu(CW)\) are significant different from zero. Hence there is no reason to believe that all three stressors have no effects on heat rate.
3 Chlorophyll-α concentrations (μg/l) in the different basins of the Baltic Sea
3.1 import data
- LARGE_basin Definition:
- Arkona = south-western parts of the Baltic Sea including Öresund
- Baltic Proper = from Bornholm to Ålands sea
- Bothnian = northern part from Ålands sea to Bothnian bay
- Western = Kattegatt and Skagerrak
mydatachla = read.table("./chla.txt", header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
mydatachla[,2] = as.numeric(gsub("," , ".",mydatachla[ , 2]))
mydatachla[2] = log(mydatachla[2] )
colnames(mydatachla)[2] = "Log_Chlorophyll_a"
head(mydatachla, 30)## Parameter Log_Chlorophyll_a Basin LARGE_basin
## 1 Chlorophyll_a -1.6094379 Arkona Arkona
## 2 Chlorophyll_a -1.3862944 Arkona Arkona
## 3 Chlorophyll_a -1.2039728 Arkona Arkona
## 4 Chlorophyll_a -1.2039728 Arkona Arkona
## 5 Chlorophyll_a -1.2039728 Arkona Arkona
## 6 Chlorophyll_a -1.2039728 Arkona Arkona
## 7 Chlorophyll_a -1.2039728 Arkona Arkona
## 8 Chlorophyll_a -1.2039728 Arkona Arkona
## 9 Chlorophyll_a -1.2039728 Arkona Arkona
## 10 Chlorophyll_a -1.2039728 Arkona Arkona
## 11 Chlorophyll_a -1.2039728 Arkona Arkona
## 12 Chlorophyll_a -1.2039728 Arkona Arkona
## 13 Chlorophyll_a -1.2039728 Arkona Arkona
## 14 Chlorophyll_a -0.9162907 Arkona Arkona
## 15 Chlorophyll_a -0.9162907 Arkona Arkona
## 16 Chlorophyll_a -0.9162907 Arkona Arkona
## 17 Chlorophyll_a -0.9162907 Arkona Arkona
## 18 Chlorophyll_a -0.9162907 Arkona Arkona
## 19 Chlorophyll_a -0.9162907 Arkona Arkona
## 20 Chlorophyll_a -0.6931472 Arkona Arkona
## 21 Chlorophyll_a -0.6931472 Arkona Arkona
## 22 Chlorophyll_a -0.6931472 Arkona Arkona
## 23 Chlorophyll_a -0.6931472 Arkona Arkona
## [ reached 'max' / getOption("max.print") -- omitted 7 rows ]
3.2 Hypothesis Testing
- \(H_o:\) Means Chlorophyll-α concentrations in different basins of the Baltic Sea are all equal
- \(H_a:\) Means are not all equal
Dataset = mydatachla
with(Dataset, numSummary(Log_Chlorophyll_a, groups=LARGE_basin , statistics=c("mean", "sd")))## mean sd data:n
## Arkona 0.4786409 0.6995542 347
## BalticProper 1.3570184 0.9928270 8056
## Bothnian 0.6412756 0.9554050 3323
## Western 0.5141939 0.9358696 1630
## mean sd data:n
## Alands hav 1.1074110 0.8341088 51
## Arkona 0.4791835 0.7013780 345
## Bornholm_Hano 0.6488659 0.6605049 222
## Bottenhavet 0.7728728 0.9082209 1231
## Bottenviken 0.3978281 0.9881547 1257
## Kattegatt 0.5816730 0.9349497 758
## Norra Gotlandshavet 1.5992880 0.9183594 6154
## Norra Kvarken 0.7946486 0.8978374 784
## Oresund 0.3850541 0.2867071 2
## Skagerrak 0.4555366 0.9332449 872
## West Gotlandshavet 0.5631388 0.8177435 1680
3.3 ANOVA Model Fit
## Df Sum Sq Mean Sq F value Pr(>F)
## LARGE_basin 3 1896 631.9 671.3 <2e-16 ***
## Residuals 13352 12568 0.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.4 Tukey Contrast Test: LARGE_basin
local({
.Pairs <- glht(AnovaModel.3A, linfct = mcp(LARGE_basin = "Tukey"))
print(summary(.Pairs)) # pairwise tests
print(confint(.Pairs)) # confidence intervals
print(cld(.Pairs)) # compact letter display
old.oma <- par(oma=c(0,5,0,0))
plot(confint(.Pairs))
par(old.oma)
})##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Chlorophyll_a ~ LARGE_basin, data = Dataset)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## BalticProper - Arkona == 0 0.87838 0.05319 16.513 <0.001 ***
## Bothnian - Arkona == 0 0.16263 0.05474 2.971 0.0136 *
## Western - Arkona == 0 0.03555 0.05736 0.620 0.9189
## Bothnian - BalticProper == 0 -0.71574 0.02000 -35.782 <0.001 ***
## Western - BalticProper == 0 -0.84282 0.02635 -31.986 <0.001 ***
## Western - Bothnian == 0 -0.12708 0.02934 -4.332 <0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
##
##
## Simultaneous Confidence Intervals
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = Log_Chlorophyll_a ~ LARGE_basin, data = Dataset)
##
## Quantile = 2.5207
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## BalticProper - Arkona == 0 0.87838 0.74429 1.01246
## Bothnian - Arkona == 0 0.16263 0.02466 0.30061
## Western - Arkona == 0 0.03555 -0.10903 0.18014
## Bothnian - BalticProper == 0 -0.71574 -0.76616 -0.66532
## Western - BalticProper == 0 -0.84282 -0.90925 -0.77640
## Western - Bothnian == 0 -0.12708 -0.20104 -0.05313
##
## Arkona BalticProper Bothnian Western
## "a" "c" "b" "a"
3.5 Diagnostic tools to check for normality and fit of model:LARGE_basin
## Analysis of Variance Table
##
## Response: Log_Chlorophyll_a
## Df Sum Sq Mean Sq F value Pr(>F)
## LARGE_basin 3 1895.6 631.86 671.26 < 2.2e-16 ***
## Residuals 13352 12568.3 0.94
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9"
## [10] "10" "348" "349" "350" "351" "352" "353" "354" "355"
## [19] "356" "357" "8401" "8402" "8403" "8404" "8405" "8406" "8407"
## [28] "8408" "8409" "8410" "8411" "8412" "8413" "11726" "11725" "11724"
## [37] "11723" "11722" "11721" "11720" "11719" "11716" "11717" "11727" "11728"
## [46] "11729" "11730" "11731" "11732" "11733" "11734" "11735" "11736" "13356"
## [55] "13355" "13354" "13353" "13352" "13350" "13351" "13349" "13348" "13347"
3.6 Kruskal-Wallis rank sum test
##
## Kruskal-Wallis rank sum test
##
## data: Log_Chlorophyll_a by LARGE_basin
## Kruskal-Wallis chi-squared = 1818.4, df = 3, p-value < 2.2e-16
3.7 Conclusion
- Tukey Contrasts test indicated that only \(\mu(Western) - \mu(Arkona)\) is not statistically significant different from zero.
- Kruskal-Wallis rank sum test indicated that at least one group of mean concentration stochastically dominates one other group.
- Hence not all groups from the large basin have the same mean Chlorophyll-α concentrations.
2020-01-20