class: center, middle, inverse, title-slide .title[ # An Introduction to
[ comment ]
and RStudio for Educational Researchers ] .subtitle[ ##
Descriptive and Inferential Statistics:
Nonparametric Tests ] .author[ ### Jorge Sinval ] .date[ ### 2025-11-18 ] --- class: inverse, center, middle # `\(\chi^2\)` test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> <style> .orange { color: #EB811B; } .kbd { display: inline-block; padding: .2em .5em; font-size: 0.75em; line-height: 1.75; color: #555; vertical-align: middle; background-color: #fcfcfc; border: solid 1px #ccc; border-bottom-color: #bbb; border-radius: 3px; box-shadow: inset 0 -1px 0 #bbb } </style>
--- # `\(\chi^2\)` test (independence) .panelset[ .panel[.panel-name[Assumptions] .pull-left[ 1. `\(X_1\)` and `\(X_2\)` on at least nominal measurement scale with the data organized in a contingency table `\(\rightarrow\)` 2. Independent samples 3. `\(N>20\)` 4. At least `\(80\% E_{ij} \geq5\)` 5. `\(100\% E_{ij} >1\)` ] .pull-right[ <table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Contingency table</caption> <thead> <tr> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Variable A</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Variable B</div></th> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> </tr> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> 1 </th> <th style="text-align:left;"> 2 </th> <th style="text-align:left;"> ... </th> <th style="text-align:left;"> C </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> 1 </td> <td style="text-align:left;"> \(O_{11}\) </td> <td style="text-align:left;"> \(O_{12}\) </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> \(O_{C1}\) </td> <td style="text-align:left;"> \(n_{1.}\) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 2 </td> <td style="text-align:left;"> \(O_{21}\) </td> <td style="text-align:left;"> \(O_{22}\) </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> \(O_{C2}\) </td> <td style="text-align:left;"> \(n_{2.}\) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> ... </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> ... </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> R </td> <td style="text-align:left;"> \(O_{R1}\) </td> <td style="text-align:left;"> \(O_{R2}\) </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> \(O_{RC}\) </td> <td style="text-align:left;"> \(n_{R.}\) </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> </td> <td style="text-align:left;"> \(n_{.1}\) </td> <td style="text-align:left;"> \(n_{.2}\) </td> <td style="text-align:left;"> ... </td> <td style="text-align:left;"> \(n_{.C}\) </td> <td style="text-align:left;"> \(N\) </td> </tr> </tbody> </table> ] ] .panel[.panel-name[Hypotheses] `\(H_0: \pi_{i.j} = \pi_{i.} \times \pi_{.j}\)` vs. `\(H_1: \pi_{i.j} \neq \pi_{i.} \times \pi_{.j}\)` (two-tailed test) ] .panel[.panel-name[Test Statistic] .pull-left[ `\(X^2= \sum\limits_{i=1}^R\sum\limits_{j=1}^C \frac{(O_{ij}-E_{ij})^2}{E_{ij}} \sim \mathcal{\chi^2}_{(R-1)(C-1)}\)` ] .pull-right[ `\(O_{ij}\)` — observed values (frequencies in each `\(ij\)` cell of the table of contingency) `\(E_{ij}\)` — expected values under `\(H_0:E_{ij}=\frac{n_{i.}\times n_{.j}}{N}\)` `\(n_{i.}=\sum\limits^C_{j=1}Oij\)` `\(n_{.j}=\sum\limits^R_{i=1}Oij\)` `\(N=\sum\limits^C_{j=1}\sum\limits^R_{i=1}Oij\)` ] ] .panel[.panel-name[Decision] Reject `\(H_0\)` if `\(X^2 \geq \chi^2_{1-\alpha;(R-1)(C-1)}\)` On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value \leq \alpha\)` ] .panel[.panel-name[R Code] <div class="pre-name">chisq_test.R</div> ``` r ds <- readr::read_csv(trimws('https://ndownloader.figshare.com/files/22299075 ')) x1_x2_table <- table(ds$Sex, ds$Country) #create a contingency table chisq.test(x1_x2_table, correct = F) #using a table object *chisq.test(x = ds$Sex, y = ds$Country, correct = F) #directly use the variable names ``` You can provide the `chisq.test()` function with a table (i.e., contingency table) or directly providing the two variables. To verify if the expected value meet the assumptions, request the expected values matrix: <div class="pre-name">chisq_test.R</div> ``` r chisq_test <- chisq.test(x = ds$Sex, y = ds$Country, correct = F) #assign the test to an object *chisq_test$expected #check the expected values ``` ] .panel[.panel-name[Output] ``` ## ## Pearson's Chi-squared test ## ## data: ds$Sex and ds$Country *## X-squared = 2.2127, df = 1, p-value = 0.1369 ``` Expected values (i.e., `\(E_{ij}\)`): ``` ## ds$Country ## ds$Sex Brazil Portugal *## Female 343.5547 334.4453 *## Male 184.4453 179.5547 ``` ] .panel[.panel-name[Effect Size] .pull-left[ `\(V=\sqrt{\frac{X^2}{N[min(R,C)-1]}}\)` with `\(0 \leq V \leq 1\)` `\(R\)` — number of rows `\(C\)` — number of columns ] .pull-right[ <table> <caption>How to classify? ⚠️</caption> <thead> <tr> <th style="text-align:left;"> Effect Size </th> <th style="text-align:left;"> \( V \) 📔 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Small </td> <td style="text-align:left;"> 0.1 </td> </tr> <tr> <td style="text-align:left;"> Medium </td> <td style="text-align:left;"> 0.3 </td> </tr> <tr> <td style="text-align:left;"> Large </td> <td style="text-align:left;"> 0.5 </td> </tr> </tbody> <tfoot> <tr> <td style = 'padding: 0; border:0;' colspan='100%'><sup></sup> ⚠️ No precise definitions, it is always context dependent.</td> </tr> </tfoot> </table> .font60[ <sup>📔</sup> Cohen, J. (1992). A power primer. _Psychological Bulletin, 112_(1), 155–159. [https://doi.org/10.1037/0033-2909.112.1.155](https://doi.org/10.1037/0033-2909.112.1.155) ] ] ] .panel[.panel-name[R Code] <div class="pre-name">chisq_test.R</div> ``` r *library(lsr) #the package estimate the effect size measure(s) *cramersV(ds$Sex, ds$Country) #function to estimate Cramér's V ``` ] .panel[.panel-name[Output] ``` *## [1] 0.0440688 ``` ] .panel[.panel-name[Statistical Analysis] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ A `\(\chi^2\)` test was conducted to test if the two variables (i.e., sex and country) are independent, after validating the assumptions (Marôco, 2021): sample size greater than 20 `\((n=1042)\)`, at least `\(80\%~E_{ij}\geq5\)` `\((\frac{4}{4} \times 100\% = 100\%)\)` and `\(100\%~E_{ij}>1\)` `\((\frac{4}{4} \times 100\% = 100\%)\)`. All statistical analyses were conducted with the statistical programming language _R_ (R Core Team, 2022) via the integrated development environment, _RStudio_ (RStudio Team, 2022). An `\(\alpha = .05\)` is considered for all statistical analyses. ]] .panel[.panel-name[Results] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ The variables have shown to be independent `\((\chi^2_{(1)}=2.213; p= .137; V = 0.044)\)`. The contingency table is presented below. <table style="width:35%; font-size: 12px; color: black; margin-left: auto; margin-right: auto;" class="table table-striped table-hover table-condensed table-responsive"> <caption style="font-size: initial !important;">Absolute Frequencies: Observed Values</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Brazil </th> <th style="text-align:right;"> Portugal </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 355 </td> <td style="text-align:right;"> 323 </td> </tr> <tr> <td style="text-align:left;"> Male </td> <td style="text-align:right;"> 173 </td> <td style="text-align:right;"> 191 </td> </tr> </tbody> </table> .tr[📚 Marôco, J. (2021). _Análise estatística com o SPSS statistics_ (8<sup>th</sup> ed.). ReportNumber.]]] ] --- name: wmw class: inverse, center, middle # Wilcoxon-Mann-Whitney test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # Wilcoxon-Mann-Whitney test .panelset[ .panel[.panel-name[Assumptions] 1. Ordinal or quantitative<sup>💡</sup> dependent variable 2. Independent samples 3. .orange[Dependent variable equally distributed] <br> <br> <br> <br> .footnote[<sup>💡</sup>The Wilcoxon-Mann-Whitney test is a non-parametric alternative to the _t_-test (independent samples). If both the normality and the homoscedasticity assumptions fail is preferable to use the Wilcoxon-Mann-Whitney test. However, if only the homoscedasticity assumption fails, the Welch's _t_-test will be a preferable alternative. <sup>🤓</sup> This test is also called the _Mann–Whitney U-test_ or the _Mann–Whitney–Wilcoxon test_.] ] .panel[.panel-name[Hypotheses] `\(H_0: F_1(Y) = F_2(Y)\)` vs. `\(H_1: F_1(Y) \neq F_2(Y)\)` (two-tailed test) `\(H_0: F_1(Y) \leq F_2(Y)\)` vs. `\(H_1: F_1(Y) > F_2(Y)\)` (right-tailed test) `\(H_0: F_1(Y) \geq F_2(Y)\)` vs. `\(H_1: F_1(Y) < F_2(Y)\)` (left-tailed test) or<sup>🤔</sup> .orange[`\\(H_0: \\theta_1(Y) = \\theta_2(Y)\\)` vs. `\\(H_1: \\theta_1(Y) \\neq \\theta_2(Y)\\)`] (two-tailed test) .orange[`\\(H_0: \\theta_1(Y) \\leq \\theta_2(Y)\\)` vs. `\\(H_1: \\theta_1(Y) > \\theta_2(Y)\\)`] (right-tailed test) .orange[`\\(H_0: \\theta_1(Y) \\geq \\theta_2(Y)\\)` vs. `\\(H_1: \\theta_1(Y) < \\theta_2(Y)\\)`] (left-tailed test) <br> <br> .footnote[<sup>🤔</sup> If the 3<sup>rd</sup> assumption is valid (i.e., identical population distribution in all groups).] ] .panel[.panel-name[Test Statistic] .pull-left[For small samples `\((N\leq20)\)`: Rank all `\((N)\)` observations in ascending order.<sup>⚠️</sup> `\(U= n_1n_2+\frac{n_1(n_1+1)}{2}-T_2\)` For "big" samples `\((N>20)\)`: `\(Z=\frac{U-\mathbb{E}(U)}{\sqrt{\mathbb{VAR}(U)}} \overset{a}{\sim} \mathcal{N}(0;1)\)` .font60[<sup>💡</sup> <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> calls the test statistic W instead of U, because the statistic is also known by that label (for Wilcox, of course). <sup>⚠️</sup> if there are ties use the `average` method (i.e., the mean of ranks that observations would have if they were not tied).] ] .pull-right[ `\(n_1\)` — sample size of the 1<sup>st</sup> group (by the hypotheses order) `\(n_2\)` — sample size of the 2<sup>nd</sup> group (by the hypotheses order) `\(N=n_1+n_2\)` — total sample size `\(T_2\)` — sum of ranks of the 2<sup>nd</sup> group `\(\mathbb{E}(U) = \frac{n_1n_2}{2}\)` `\(\mathbb{VAR}(U)=\frac{n_1n_2}{N(N-1)} \left[\frac{N^3 - N}{12}-\sum\limits^g_{i=1}\frac{t^3_i-t_i}{12}\right]\)` `\(g\)` — number of groups of ties `\(t_i\)` — number of ties in `\(i\)` group ] ] .panel[.panel-name[Decision] .pull-left[ For small samples `\((N\leq20)\)`: On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{exact} \leq \alpha\)` ] .pull-right[ For big samples `\((N>20)\)`: On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{asymptotic} \leq \alpha\)` ] ] .panel[.panel-name[R Code] <div class="pre-name">wmw.R</div> ``` r ds <- readr::read_csv('https://ndownloader.figshare.com/files/22299075 ') #load the dataset ds_small <- ds[complete.cases(ds[,c("Sex","Socioeconomic_status")]),][1:10,]#example: ordered dv ds_small$Socioeconomic_status <- ordered(ds_small$Socioeconomic_status, levels = c("B2", "B1", "A2", "A1")) ds_small$Socioeconomic_status <- as.numeric(ds_small$Socioeconomic_status)#transform in numeric *exactRankTests::wilcox.exact(Socioeconomic_status ~ Sex, data = ds_small, paired=F, exact = T) coin::wilcox_test(Socioeconomic_status ~ as.factor(Sex), ds_small, distribution = "asymptotic") #obtain z ``` .pull-left[ `paired` argument can be: * `TRUE` for a [Wilcoxon test (paired samples)](#wilcoxonpaired); or * `FALSE` for a [Wilcoxon-Mann-Whitney test](#wmw).] .pull-right[ `exact` argument can be: * `TRUE` for an exact _p_-value (corrected for ties); or * `FALSE` for an asymptotic _p_-value.] <br> .font60[<sup>⚠️</sup> <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> The `wilcox.test` function from the `stats` package (R Core Team, 2022), also calculates the exact _p_-value, however it is not corrected for ties. In that sense the `exactRankTests` package (Hothorn and Hornik, 2022) brings the `wilcox.exact` function which provides the exact _p_-value corrected for ties. <sup>🤓</sup> your dependent variable might be ordinal, however you have to provide it to <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> as `numeric`.] ] .panel[.panel-name[Output] ``` ## *## Exact Wilcoxon rank sum test ## ## data: Socioeconomic_status by Sex *## W = 16, p-value = 0.4857 ## alternative hypothesis: true mu is not equal to 0 ``` ``` ## *## Asymptotic Wilcoxon-Mann-Whitney Test ## ## data: Socioeconomic_status by as.factor(Sex) (Female, Male) *## Z = 0.89443, p-value = 0.3711 ## alternative hypothesis: true mu is not equal to 0 ``` ] .panel[.panel-name[Effect Size] .pull-left[ .center[ <br> <br> <br> <br> `\(r=\frac{z}{\sqrt{N}}\)` ] ] .pull-right[ <table> <caption>How to classify? ⚠️</caption> <thead> <tr> <th style="text-align:left;"> Effect Size </th> <th style="text-align:left;"> \( r \) 📔 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Small </td> <td style="text-align:left;"> \( \lbrack .1;.3 \lbrack \) </td> </tr> <tr> <td style="text-align:left;"> Medium </td> <td style="text-align:left;"> \( \lbrack .3;.5 \lbrack \) </td> </tr> <tr> <td style="text-align:left;"> Large </td> <td style="text-align:left;"> \( \lbrack .5;1 \rbrack \) </td> </tr> </tbody> <tfoot> <tr> <td style = 'padding: 0; border:0;' colspan='100%'><sup></sup> ⚠️ No precise definitions, it is always context dependent.</td> </tr> </tfoot> </table> .font60[<sup>📔</sup> Effect size `\(r\)` (Coolican, 2018) interpretation suggested by Cohen (1988). ] ] ] .panel[.panel-name[R Code] <div class="pre-name">wmw.R</div> ``` r library(rstatix) wilcox_effsize(data = ds_small, Socioeconomic_status ~ Sex, paired = F)#rstatix classifies the effect size ``` ] .panel[.panel-name[Output] ``` ## # A tibble: 1 × 7 ## .y. group1 group2 effsize n1 n2 magnitude ## * <chr> <chr> <chr> <dbl> <int> <int> <ord> *## 1 Socioeconomic_status Female Male 0.283 6 4 small ``` ] .panel[.panel-name[Statistical Analysis] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt3[A Wilcoxon-Mann-Whitney test (Wilcoxon, 1945; Mann and Whitney, 1947) was conducted to test if the distributions of the socioeconomic status are equal among sex (Marôco, 2021). To conduct the Wilcoxon-Mann-Whitney test the _exactRankTests_ (Hothorn and Hornik, 2022) and the _rstatix_ (Kassambara, 2021) packages were used. All statistical analyses were conducted with the statistical programming language _R_ (R Core Team, 2022) via the integrated development environment, _RStudio_ (RStudio Team, 2022). An `\(\alpha = .05\)` is considered for all statistical analyses. ]] .panel[.panel-name[Results] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[ The groups did not present statistically significant different distributions `\((n_{Female} =6; n_{Male}=4;U=16; p= .486; r = 0.283)\)`. The effect size was small. .center[ <img src="data:image/png;base64,#slides8of9_files/figure-html/plot_boxplot_wmw-1.png" width="50%" height="99%" /> ] .tr[📚 Marôco, J. (2021). _Análise estatística com o SPSS statistics_ (8<sup>th</sup> ed.). ReportNumber.]]] ] --- name: wilcoxonpaired class: inverse, center, middle # Wilcoxon test (paired samples) <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # Wilcoxon test (paired samples) .panelset[ .panel[.panel-name[Assumptions] 1. Ordinal or quantitative<sup>💡</sup> dependent variable 2. Paired samples 3. .orange[Dependent variable equally distributed] <br> <br> <br> <br> .footnote[<sup>💡</sup>The Wilcoxon test (paired samples) is a non-parametric alternative to the _t_-test (paired samples). If the normality assumption fails is preferable to use the Wilcoxon test (paired samples). <sup>🤓</sup> This test is also known as _Wilcoxon Signed-Ranks test_.] ] .panel[.panel-name[Hypotheses] `\(H_0: F_1(Y) = F_2(Y)\)` vs. `\(H_1: F_1(Y) \neq F_2(Y)\)` (two-tailed test) `\(H_0: F_1(Y) \leq F_2(Y)\)` vs. `\(H_1: F_1(Y) > F_2(Y)\)` (right-tailed test) `\(H_0: F_1(Y) \geq F_2(Y)\)` vs. `\(H_1: F_1(Y) < F_2(Y)\)` (left-tailed test) or<sup>🤔</sup> .orange[`\\(H_0: \\theta_1(Y) = \\theta_2(Y)\\)` vs. `\\(H_1: \\theta_1(Y) \\neq \\theta_2(Y)\\)`] (two-tailed test) .orange[`\\(H_0: \\theta_1(Y) \\leq \\theta_2(Y)\\)` vs. `\\(H_1: \\theta_1(Y) > \\theta_2(Y)\\)`] (right-tailed test) .orange[`\\(H_0: \\theta_1(Y) \\geq \\theta_2(Y)\\)` vs. `\\(H_1: \\theta_1(Y) < \\theta_2(Y)\\)`] (left-tailed test) <br> <br> .footnote[<sup>🤔</sup> If the 3<sup>rd</sup> assumption is valid (i.e., identical population distribution in all groups).] ] .panel[.panel-name[Test Statistic] .pull-left[For small samples `\((n\leq20)\)`: <sup>⚠️</sup> `\(T^+=\sum\limits^n_{i=1} r^+_i\)` For big samples `\((n>20)\)`: `\(Z = \frac{T^+ - \mathbb{E}(T)}{\sqrt{\mathbb{VAR}(T)}} \overset{a}{\sim} \mathcal{N}(0;1)\)` .font60[<sup>⚠️</sup> Most statistics books call this value `\(T\)` (not to be confused with the `\(\mathcal{t}\)` distribution) (Pace, 2012) however <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> calls the test statistic `\(V\)` instead of `\(T\)`. ] ] .pull-right[ `\(D_i= X_{1_i}-X_{2_i}\)` with `\(i=1,...,n;\)` `\(n\)` — number of blocks or subjects with repeated measures with `\(D_i\neq0\)` `\(r\)` — rank in ascending order of `\(|D_i|\)` ignoring `\(D_i=0;\)` `\(r^+_i\)` — the ranks with `\(+\)` sign `\(\mathbb{E}(T) = \frac{n(n+1)}{4}\)` `\(\mathbb{VAR}(T)=\frac{n(n+1) (2n+1)}{24}-\sum\limits_{i=1}^g\frac{t^3_i-t_i}{48}\)` `\(g\)` — number of groups of ties `\(t_i\)` — number of ties in `\(i\)` group ] ] .panel[.panel-name[Decision] .pull-left[ For small samples `\((n\leq20)\)`: On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{exact} \leq \alpha\)` ] .pull-right[ For big samples `\((n>20)\)`: On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{asymptotic} \leq \alpha\)` ] ] .panel[.panel-name[R Code] <div class="pre-name">wilcoxon_paired.R</div> ``` r ds <- readr::read_csv('https://ndownloader.figshare.com/files/22299075 ') #load the dataset ds_small <- ds[complete.cases(ds[,c("OLBI3","OLBI12")]),][1:10,]#example: ordered dv ds_small[,c("OLBI3","OLBI12")] <- lapply(X = ds_small[,c("OLBI3","OLBI12")], ordered, levels=1:5,labels = c("Strongly disagree","Disagree","Neutral","Agree","Strongly agree")) ds_small[,c("OLBI3","OLBI12")] <- lapply(ds_small[,c("OLBI3","OLBI12")],as.numeric)#transform in numeric *exactRankTests::wilcox.exact(ds_small$OLBI3, ds_small$OLBI12, paired=T, exact = T) coin::wilcoxsign_test(ds_small$OLBI3 ~ ds_small$OLBI12, paired =T, distribution = "asymptotic") #obtain z ``` .pull-left[ `paired` argument can be: * `TRUE` for a [Wilcoxon test (paired samples)](#wilcoxonpaired); or * `FALSE` for a [Wilcoxon-Mann-Whitney test](#wmw).] .pull-right[ `exact` argument can be: * `TRUE` for an exact _p_-value (corrected for ties); or * `FALSE` for an asymptotic _p_-value.] <br> .font60[<sup>⚠️</sup> <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> The `wilcox.test` function (with the argument `paired = T`) from the `stats` package (R Core Team, 2022), also calculates the exact _p_-value, however it is not corrected for ties. In that sense the `exactRankTests` package (Hothorn and Hornik, 2022) brings the `wilcox.exact` function which provides the exact _p_-value corrected for ties. <sup>🤓</sup> your dependent variable might be ordinal, however you have to provide it to <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> as `numeric`.] ] .panel[.panel-name[Output] ``` ## *## Exact Wilcoxon signed rank test ## ## data: ds_small$OLBI3 and ds_small$OLBI12 *## V = 2.5, p-value = 0.03906 ## alternative hypothesis: true mu is not equal to 0 ## ## *## Asymptotic Wilcoxon-Pratt Signed-Rank Test ## ## data: y by x (pos, neg) ## stratified by block *## Z = -2.2265, p-value = 0.02598 ## alternative hypothesis: true mu is not equal to 0 ``` ] .panel[.panel-name[Effect Size] .pull-left[ .center[ <br> <br> <br> <br> `\(r=\frac{z}{\sqrt{n}}\)` ] ] .pull-right[ <table> <caption>How to classify? ⚠️</caption> <thead> <tr> <th style="text-align:left;"> Effect Size </th> <th style="text-align:left;"> \( r \) 📔 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Small </td> <td style="text-align:left;"> \( \lbrack .1;.3 \lbrack \) </td> </tr> <tr> <td style="text-align:left;"> Medium </td> <td style="text-align:left;"> \( \lbrack .3;.5 \lbrack \) </td> </tr> <tr> <td style="text-align:left;"> Large </td> <td style="text-align:left;"> \( \lbrack .5;1 \rbrack \) </td> </tr> </tbody> <tfoot> <tr> <td style = 'padding: 0; border:0;' colspan='100%'><sup></sup> ⚠️ No precise definitions, it is always context dependent.</td> </tr> </tfoot> </table> .font60[<sup>📔</sup> Effect size `\(r\)` (Coolican, 2018) interpretation suggested by Cohen (1988). ] ] ] .panel[.panel-name[R Code] <div class="pre-name">wilcoxon_paired.R</div> ``` r library(rstatix) ds_long <- gather(ds_small, "OLBI3", "OLBI12", key = item, value = score) #transform in long format wilcox_effsize(data = ds_long, formula = score ~ item, paired = T)#rstatix classifies the effect size ``` ] .panel[.panel-name[Output] ``` ## # A tibble: 1 × 7 ## .y. group1 group2 effsize n1 n2 magnitude ## * <chr> <chr> <chr> <dbl> <int> <int> <ord> *## 1 score OLBI12 OLBI3 0.704 10 10 large ``` ] .panel[.panel-name[Statistical Analysis] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ A Wilcoxon test (paired samples) (Wilcoxon, 1945) was conducted to test if the distributions of the items "OLBI 3" and "OLBI 12" are equal (Marôco, 2021). To conduct the Wilcoxon (paired samples) test the _exactRankTests_ (Hothorn and Hornik, 2022) and _rstatix_ (Kassambara, 2021) packages were used. All statistical analyses were conducted with statistical programming language _R_ (R Core Team, 2022) via the integrated development environment, _RStudio_ (RStudio Team, 2022). An `\(\alpha = .05\)` is considered for all statistical analyses. ]] .panel[.panel-name[Results] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[ The groups presented statistically significant different distributions `\((n=10;T^+=2.5; p= .039; r = 0.704)\)`. The effect size was large. .center[ <img src="data:image/png;base64,#slides8of9_files/figure-html/plot_boxplot_wilcoxon_paired-1.png" width="50%" height="99%" /> ] .tr[📚 Marôco, J. (2021). _Análise estatística com o SPSS statistics_ (8<sup>th</sup> ed.). ReportNumber.]]] ] --- name: kruskalwallis class: inverse, center, middle # Kruskal-Wallis' test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # Kruskal-Wallis' test .panelset[ .panel[.panel-name[Assumptions] 1. Ordinal or quantitative<sup>💡</sup> dependent variable 2. Independent samples 3. .orange[Dependent variable equally distributed] <br> <br> <br> <br> <br> .footnote[<sup>💡</sup>The Kruskal-Wallis test is a non-parametric alternative to the One-Way ANOVA test. If both the normality and the homoscedasticity assumptions fail is preferable to use the Kruskal-Wallis test. However, if only the homoscedasticity assumption fails, the Welch's ANOVA will be a preferable alternative. <sup>🤓</sup> This test is also called the _Kruskal-Wallis' ANOVA_ or _Kruskal–Wallis test by ranks_ or _Kruskal–Wallis H_.] ] .panel[.panel-name[Hypotheses] `\(H_0: F_1(Y) = F_2(Y)=...=F_K(Y)\)` vs. `\(H_1: \exists{i,j}: F_i(Y) \neq F_j(Y); i\neq j; i,j=1,2,...,k\)` (two-tailed test) <br> <br> or<sup>🤔</sup> .orange[`\\(H_0: \\theta_1(Y) = \\theta_2(Y)=...=\\theta_k(Y)\\)` vs. `\\(H_1: \\exists{i,j}: \\theta_i(Y) \\neq \\theta_j(Y); i\\neq j; i,j=1,2,...,k\\)`] (two-tailed test) <br> <br> <br> .footnote[<sup>🤔</sup> If the 3<sup>rd</sup> assumption is valid (i.e., identical population distribution in all groups).] ] .panel[.panel-name[Test Statistic] .pull-left[1. Rank all `\((N)\)` observations in ascending order.<sup>⚠️</sup> 2. Sum of ranks per sample `\((R_j=\sum\limits_{i=1}^{n_j}r_{ij})\)` `\(H= \frac{\frac{12}{N(N+1)}\sum\limits^k_{j=1}\frac{R^2_j}{n_j}-3(N+1)}{1-\frac{\sum\limits_{i=1}^g (t^3_i-t_i)}{N^3-N}}\)` or<sup>💡</sup> `\(H=df_T \frac{SQF_r}{SQT_r}\)` .font60[<sup>💡</sup> if a One-way ANOVA is conducted using the ranks `\((r)\)`. <sup>⚠️</sup> Lehmann (1975) suggested to correct the `\(H\)` statistic in the presence of ties. In this case the `average` method also known as mid-ranks (i.e., the mean of ranks that observations would have if they were not tied) is used.] ] .pull-right[ `\(k\)` — number of groups `\(n_1\)` — sample size of the 1<sup>st</sup> group (by the hypotheses order) `\(n_2\)` — sample size of the 2<sup>nd</sup> group (by the hypotheses order) `\(n_k\)` — sample size of the k<sup>th</sup> group (by the hypotheses order) `\(N=n_1+n_2+...+n_k\)` — total sample size `\(g\)` — number of groups of ties `\(t_i\)` — number of ties in `\(i\)` group ] ] .panel[.panel-name[Decision] .pull-left[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{approximate} \leq \alpha\)` ] .pull-right[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{asymptotic} \leq \alpha\)` ] If the null hypothesis is rejected (and more than two groups were compared) it means that there are at least a couple of significantly different distributions (i.e., groups). To identify the groups that are different, it is necessary to do the .orange[Post-Hoc] tests (see, [Dwass-Steel-Crichtlow-Fligner's test](#DwassSteelCrichtlowFligner)). ] .panel[.panel-name[R Code] <div class="pre-name">kw.R</div> ``` r ds <- readr::read_csv('https://ndownloader.figshare.com/files/22299075 ') #load the dataset ds_small <- dplyr::filter(ds,Academic_level %in% c("Unfinished graduation","Graduation","Master")) set.seed(1416) #to reproduce the desired behavior ds_small <- sampling::getdata(ds_small, sampling::strata(ds_small, "Academic_level", size = c(7, 6, 4), method = "srswor"))#simple random sampling without replacement ds_small$Socioeconomic_status <- ordered(ds_small$Socioeconomic_status, levels = c("B2", "B1", "A2", "A1")) ds_small$Socioeconomic_status <- as.numeric(ds_small$Socioeconomic_status)#transform in numeric *coin::kruskal_test(Socioeconomic_status ~ as.factor(Academic_level), data = ds_small,distribution = coin::approximate(nresample = 10000)) ``` .font60[`distribution` argument can be: - `approximate(nresample = N)` (from the `coin` package) uses the Monte Carlo method (Hothorn, Hornik, van de Wiel, and Zeileis, 2008) instead of testing all permutations/combinations, `\(N\)` of them picked up randomly the test statistic is calculated for those samples. It converges to the real _p_-value in `\(1\sqrt{N}\)`; or - `"asymptotic"` (default) for _p_-value from `\(H \overset{a}{\sim} \mathcal{\chi^2}_{(k-1)}\)` and also for obtaining the **degrees of freedom**.] .font60[<sup>⚠️</sup> <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> The `kruskal.test` function from the `stats` package (R Core Team, 2022) only estimates the asymptotic _p_-value. The `coin` package (Hothorn Hornik et al., 2008) comes with the `kruskal_test` function which provides the approximative _p_-value (works **even for small samples**). <sup>🤓</sup> your dependent variable might be ordinal, however you have to provide it to <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> as `numeric`.] ] .panel[.panel-name[Output] ``` ## *## Approximative Kruskal-Wallis Test ## ## data: Socioeconomic_status by ## as.factor(Academic_level) (Graduation, Master, Unfinished graduation) *## chi-squared = 8.1757, p-value = 0.0111 ``` ] .panel[.panel-name[Effect Size] .pull-left[ .center[ <br> <br> <br> `\(\eta^2_H=\frac{H-k+1}{N-k}\)` or `\(\epsilon^2_R=\frac{H}{\frac{N^2-1}{N+1}}\)` ] ] .pull-right[ <br> <table> <caption>How to classify? ⚠️</caption> <thead> <tr> <th style="text-align:left;"> Effect Size </th> <th style="text-align:left;"> \(\eta^2_H\) 📔 📜 </th> <th style="text-align:left;"> \(\epsilon^2_R\) 📔 📜 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Small </td> <td style="text-align:left;"> \( .01 \) </td> <td style="text-align:left;"> \( \lbrack .01;.08 \lbrack \) </td> </tr> <tr> <td style="text-align:left;"> Medium </td> <td style="text-align:left;"> \( .06 \) </td> <td style="text-align:left;"> \( \lbrack .08;.26 \lbrack \) </td> </tr> <tr> <td style="text-align:left;"> Large </td> <td style="text-align:left;"> \( .14 \) </td> <td style="text-align:left;"> \( \lbrack .26;1 \rbrack \) </td> </tr> </tbody> <tfoot> <tr> <td style = 'padding: 0; border:0;' colspan='100%'><sup></sup> ⚠️ No precise definitions, it is always context dependent.</td> </tr> </tfoot> </table> .font60[<sup>📔📜</sup> Effect sizes `\(\eta^2_H\)` (King, Rosopa, and Minium, 2018; Tomczak and Tomczak, 2014) and `\(\epsilon^2_R\)` (Cohen, 2013; Tomczak and Tomczak, 2014) the interpretation is similar to the ones presented in Cohen (1988). ] ] ] .panel[.panel-name[R Code] <div class="pre-name">kw.R</div> ``` r library(rstatix) kruskal_effsize(data = ds_small, Socioeconomic_status ~ as.factor(Academic_level))#rstatix classifies the effect size rcompanion::epsilonSquared(x = ds_small$Socioeconomic_status,g = as.factor(ds_small$Academic_level)) ``` <sup>⚠️</sup> set the independent variable as factor. ] .panel[.panel-name[Output] ``` ## # A tibble: 1 × 5 ## .y. n effsize method magnitude ## * <chr> <int> <dbl> <chr> <ord> *## 1 Socioeconomic_status 17 0.441 eta2[H] large ## epsilon.squared *## 0.511 ``` ] .panel[.panel-name[Statistical Analysis] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt3[A Kruskal-Wallis test (Kruskal and Wallis, 1952) was conducted to test if the distributions of the socioeconomic status are equal among academic levels (i.e., Graduation; Master; Unfinished graduation) (Marôco, 2021). To conduct the Kruskal-Wallis test the _coin_ (Hothorn Hornik et al., 2008) and the _rstatix_ (Kassambara, 2021) packages were used. Approximate _p_-values were obtained via the Monte Carlo method with `\(10000\)` iterations (Hothorn Hornik et al., 2008). All statistical analyses were conducted with the statistical programming language _R_ (R Core Team, 2022) via the integrated development environment, _RStudio_ (RStudio Team, 2022). An `\(\alpha = .05\)` is considered for all statistical analyses. ]] .panel[.panel-name[Results] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[At least one pair of groups presented statistically significant different distributions `\((n_{\text{Graduation}}=7; n_{\text{Master}}=6;\)` `\(n_{\text{Unfinished graduation}}=4; H_{(2)}=8.176; p= .011; \eta^2_H = 0.441)\)`. The effect size was large. .center[ <img src="data:image/png;base64,#slides8of9_files/figure-html/plot_boxplot_kw-1.png" width="40%" height="99%" /> ] .tr[📚 Marôco, J. (2021). _Análise estatística com o SPSS statistics_ (8<sup>th</sup> ed.). ReportNumber.]]] ] --- class: inverse, center, middle # Post-Hoc <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- class: inverse, center, middle # Dwass-Steel-Crichtlow-Fligner's test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- name: DwassSteelCrichtlowFligner # Dwass-Steel-Crichtlow-Fligner's test .panelset[ .panel[.panel-name[Assumptions] **Dwass-Steel-Crichtlow-Fligner's test** (Dwass, 1960; Steel, 1960; Critchlow and Fligner, 1991) has the same assumptions of [Kruskal-Wallis' test](#kruskalwallis), in fact, .orange[it should only be used if Kruskal-Wallis' `\\(H_0\\)` is rejected.] The Dwass-Steel-Crichtlow-Fligner test<sup>💡</sup> a specific approach to controlling the familywise error rate (FWER) built in, uses the ranks from the Kruskal-Wallis test, and, uses the pooled variance implied by the null hypothesis in the Kruskal-Wallis test. <br> <br> <br> <br> .footnote[<sup>💡</sup>The Dwass-Steel-Crichtlow-Fligner test is basically an extension of the U-test as re-ranking is conducted for each pairwise test.] ] .panel[.panel-name[Hypotheses] <br> <br> <br> `\(H_0: F_i(Y) = F_j(Y)\)` vs. `\(H_1: F_i(Y) \neq F_j(Y); i\neq j; i,j=1,...,k\)` (two-tailed) <br> <br> or<sup>🤔</sup> .orange[`\\(H_0: \\theta_i(Y) = \\theta_j(Y)\\)` vs. `\\(H_1: \\theta_i(Y) \\neq \\theta_j(Y); i\\neq j; i,j=1,2,...,k\\)`] (two-tailed test) <br> .footnote[<sup>🤔</sup> If the assumption of **identical population distribution in all groups** is valid.] ] .panel[.panel-name[Test Statistic] A `\(W\)` test statistic should be calculated for each of the `\(\frac{k(k − 1)}{2}\)` different pairs `\(i\)` and `\(j\)` of the factor under study. For each pair of treatments `\((i,j)\)`, let: <br> .pull-left[ Rank all `\((n_i+n_j)\)` observations in ascending order.<sup>⚠️</sup> <sup>💡</sup> `\(W_{ij}=\sum\limits_{b=1}^{n_j} r_{ib}\)`, for `\(1 \leq i \leq j \leq k\)` <sup>🤓</sup> `\(W^*_{ij}=\frac{W_{ij}-\mathbb{E}(W_{ij})}{\sqrt{\mathbb{VAR}(W_{ij})}}\sim \mathcal{W_{n_1,n_2,...,n_k}}\)` .font60[<sup>💡</sup> `\(w_{ij}\)` — the Wilcoxon rank sum of the `\(j^{th}\)` sample ranks in the joint two-sample ranking of the `\(i^{th}\)` and `\(j^{th}\)` sample observations <sup>🤓</sup> `\(W^*_{ij}\)` is the standardized (under `\(H_0\)`) version of `\(W_{ij}\)` multiplied by `\(\sqrt{2}\)` (Hollander, Wolfe, and Chicken, 2014). <sup>⚠️</sup>if there are ties use the `average` method (i.e., the mean of ranks that observations would have if they were not tied).]] .pull-right[ `\(\mathbb{E}(W_{ij})=\frac{n_j(n_i+n_j+1)}{2}\)` `\(\mathbb{VAR}(W_{ij})=\frac{n_i+n_j}{24} \left[n_i +n_j+1 - \frac{\sum\limits^{g_{ij}}_{I=1}(t_I-1)t_I(t_I+1)}{(n_i+n_j)(n_i+n_j-1)} \right]\)` `\(n_{i}\)` — sample size of `\(i\)` group `\(n_{j}\)` — sample size of `\(j\)` group `\(g_{ij}\)` — number of groups of ties `\(t_I\)` — number of ties in `\(I\)` group ] ] .panel[.panel-name[Decision] .pull-left[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{exact} \leq \alpha\)` ] .pull-right[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{approximate} \leq \alpha\)` ] <br> <br> .center[ Concluding that the pair of distributions `\(i\)` and `\(j\)` are different. ] <br> <br> <br> <br> .footnote[<sup>💡</sup>The "Asymptotic" _p_-value can be also used.] ] .panel[.panel-name[R Code] <div class="pre-name">dscf.R</div> ``` r ds <- readr::read_csv(trimws('https://ndownloader.figshare.com/files/22299075 ')) #load the dataset ds_small <- dplyr::filter(ds, Academic_level %in% c("Unfinished graduation","Graduation","Master")) set.seed(1416) #to reproduce the desired behavior ds_small$Academic_level <- ordered(ds_small$Academic_level,levels=c("Unfinished graduation","Graduation","Master")) ds_small <- sampling::getdata(ds_small, sampling::strata(ds_small, "Academic_level", size = c(7, 6, 4), method = "srswor"))#simple random sampling without replacement ds_small$Socioeconomic_status <- ordered(ds_small$Socioeconomic_status, levels = c("B2", "B1", "A2", "A1")) ds_small$Socioeconomic_status <- as.numeric(ds_small$Socioeconomic_status)#transform in numeric coin::kruskal_test(Socioeconomic_status ~ as.factor(Academic_level), data = ds_small,distribution = coin::approximate(nresample = 10000))#kruskal-Wallis' H0 is rejected, the DSCF's test can be done library(NSM3) *pSDCFlig(x = ds_small$Socioeconomic_status, g = as.numeric(as.factor(ds_small$Academic_level)), method = "Monte Carlo") ``` .font60[<sup>⚠️</sup> Either `"Exact"`, `"Monte Carlo"`, or `"Asymptotic"`, indicating the desired distribution. When `method=NA`, `"Exact"` will be used if the number of permutations is `\(10000\)` or less. Otherwise, `"Monte Carlo"` will be used. To increase the number of interactions when using the `"Monte Carlo"` method, set the argument `n.mc` to the desired number (e.g., `n.mc = 100000`). When setting the desired distribution, be cautious with capitalizing the first letter. `Exact` (✅) is not the same as `exact` (🛑). <sup>🧙♂️</sup> the `"Exact"` will be more precise, however the `"Monte Carlo"` besides being faster provides fairly close results. ] ] .panel[.panel-name[Output] ``` ## ## Approximative Linear-by-Linear Association Test ## ## data: Socioeconomic_status by ## as.factor(Academic_level) (Unfinished graduation < Graduation < Master) ## Z = 2.4706, p-value = 0.0091 ## alternative hypothesis: two.sided ## ## Ties are present, so p-values are based on conditional null distribution. ## Group sizes: 4 7 6 *## Using the Monte Carlo (with 10000 Iterations) method: ## *## For treatments 1 - 2, the Dwass, Steel, Critchlow-Fligner W Statistic is 0. *## The smallest experimentwise error rate leading to rejection is 1 . ## *## For treatments 1 - 3, the Dwass, Steel, Critchlow-Fligner W Statistic is 3.2388. *## The smallest experimentwise error rate leading to rejection is 0.0443 . ## *## For treatments 2 - 3, the Dwass, Steel, Critchlow-Fligner W Statistic is 3.646. *## The smallest experimentwise error rate leading to rejection is 0.0098 . ## ``` ] .panel[.panel-name[Statistical Analysis] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ As a post-hoc test to the conducted Kruskal-Wallis' test for (independent samples), the Dwass-Steel-Crichtlow-Fligner test (Dwass, 1960; Steel, 1960; Critchlow and Fligner, 1991) was used to identify which pairs presented statistically significant differences among their distribution functions (Hollander Wolfe et al., 2014). To conduct the Dwass-Steel-Crichtlow-Fligner test the _NSM3_ package (Schneider, Chicken, and Becvarik, 2021) was used. Approximate _p_-values were obtained via the Monte Carlo method with `\(10000\)` iterations (Schneider Chicken et al., 2021). All statistical analyses were conducted with the statistical programming language _R_ (R Core Team, 2022) via the integrated development environment, _RStudio_ (RStudio Team, 2022). An `\(\alpha = .05\)` is considered for all statistical analyses. ]] .panel[.panel-name[Results] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt0[ After rejecting the Kruskal-Wallis' `\(H_0\)` the post-hoc results revealed that there were statistically significant differences between the distribution functions of academic level groups _Graduation_ vs. _Master_ `\((w= 3.646 ; p = .010)\)`, and between the groups _Master_ vs. _Unfinished graduation_ `\((w= 3.239 ; p = .044)\)`. However, no statistically significant differences were observed between groups _Graduation_ vs. _Unfinished graduation_ `\((w= 0 ; p > .999)\)`. .center[ <img src="data:image/png;base64,#slides8of9_files/figure-html/plot_boxplot_dscf-1.png" width="40%" height="99%" /> ] .tr[📚 Hollander, M., Wolfe, D. A., & Chicken, E. (2014). _Nonparametric statistical methods_ (3rd ed.). John Wiley & Sons.]]]] --- name: friedman class: inverse, center, middle # Friedman's test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # Friedman's test .panelset[ .panel[.panel-name[Assumptions] 1. Ordinal or quantitative<sup>💡</sup> dependent variable 2. Paired samples 3. .orange[Dependent variable equally distributed] <br> <br> <br> <br> <br> .footnote[<sup>💡</sup>The Friedman's test is a non-parametric alternative to the Repeated-Measures ANOVA. If the normality assumption fails is preferable to use the Friedman's test. If the only the sphericity assumption fails is preferable to use a MANOVA. <sup>🤓</sup> This test is also called the _Friedman's ANOVA_.] ] .panel[.panel-name[Hypotheses] `\(H_0: F_1(Y) = F_2(Y)=...=F_K(Y)\)` vs. `\(H_1: \exists{i,j}: F_i(Y) \neq F_j(Y); i\neq j; i,j=1,2,...,k\)` (two-tailed test) <br> <br> or<sup>🤔</sup> .orange[`\\(H_0: \\theta_1(Y) = \\theta_2(Y)=...=\\theta_k(Y)\\)` vs. `\\(H_1: \\exists{i,j}: \\theta_i(Y) \\neq \\theta_j(Y); i\\neq j; i,j=1,2,...,k\\)`] (two-tailed test) <br> <br> <br> .footnote[<sup>🤔</sup> If the 3<sup>rd</sup> assumption is valid (i.e., identical population distribution in all groups).] ] .panel[.panel-name[Test Statistic] .pull-left[1. Rank observations (in ascending order) for each `\(i\)` block.<sup>⚠️</sup> 2. Sum of ranks per each `\(j\)` group `\((R_j=\sum\limits_{i=1}^{n_j}r_{ij})\)` `\(S= \frac{\frac{12}{bk(k+1)}\sum\limits^k_{j=1}R^2_j-3b(k+1)}{1-\frac{\sum\limits_{i=1}^b \sum\limits_{j=1}^{g_i} (t^3_{ij}-t_{ij})}{bk^3-bk}}\)` or<sup>💡</sup> `\(S=\frac{SSF_r}{\frac{SSF_r+SSE_r}{df_F+df_E}}=\frac{SSF_r}{MST_r}\)`<sup>🧙♂️</sup> ] .pull-right[ `\(k\)` — number of groups `\(b\)` — number of blocks `\(g_{i}\)` — number of groups of ties in each `\(i\)` block `\(t_{ij}\)` — number of ties in `\(ij\)` group <br> <br> .font60[<sup>🧙♂️</sup> There is no between subjects variability (i.e., `\(SSB=0\)`) <sup>💡</sup> if a Repeated-Measures ANOVA is conducted using the ranks `\((r)\)`. <sup>⚠️</sup> Marascuilo and McSweeney (1977) suggested to correct the `\(S\)` statistic in the presence of ties. In this case the `average` method also known as mid-ranks (i.e., the mean of ranks that observations would have if they were not tied) is used. The ties correction is important if `\(H_0\)` is not rejected with the uncorrected `\(S\)` statistic for ties.] ] ] .panel[.panel-name[Decision] .pull-left[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{approximate} \leq \alpha\)` ] .pull-right[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{asymptotic} \leq \alpha\)` ] If the null hypothesis is rejected (and more than two groups were compared) it means that there are at least a couple of significantly different distributions (i.e., groups/treatments ). To identify the groups that are different, it is necessary to do the .orange[Post-Hoc] tests (see, [Wilcoxon-Nemenyi-McDonald-Thompson's test](#WilcoxonNemenyiMcDonaldThompson)). ] .panel[.panel-name[R Code] <div class="pre-name">friedman.R</div> ``` r ds <- readr::read_csv(trimws('https://ndownloader.figshare.com/files/22299075 ')) #load the dataset ds_small <- ds[complete.cases(ds[,c("UWES1","UWES3","UWES6")]),][1:10,]#example: ordered dv ds_small[,c("UWES1","UWES3","UWES6")] <- lapply(X = ds_small[,c("UWES1","UWES3","UWES6")], ordered, levels=0:6,labels = c("Never","Almost never","Rarely","Sometimes","Often","Very often", "Always")) ds_small[,c("UWES1","UWES3","UWES6")] <- lapply(ds_small[,c("UWES1","UWES3","UWES6")], as.numeric)#as numeric ds_small$id <- 1:nrow(ds_small) #unique id per block ds_long <- tidyr::gather(ds_small,"UWES1","UWES3","UWES6", key = item, value = score) *coin::friedman_test(score~as.factor(item), data = ds_long, distribution = coin::approximate(nresample = 10000)) ``` .font60[`distribution` argument can be: - `approximate(nresample = N)` (from the `coin` package) uses the Monte Carlo method (Hothorn Hornik et al., 2008) instead of testing all permutations/combinations, `\(N\)` of them picked up randomly the test statistic is calculated for those samples. It converges to the real _p_-value in `\(1\sqrt{N}\)`; or - `"asymptotic"` (default) for _p_-value from `\(S \overset{a}{\sim} \mathcal{\chi^2}_{(k-1)}\)` and also for obtaining the **degrees of freedom**.] .font60[<sup>⚠️</sup> <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> The `friedman.test` function from the `stats` package (R Core Team, 2022) only estimates the asymptotic _p_-value. The `coin` package (Hothorn Hornik et al., 2008) comes with the `friedman_test` function which provides the approximative _p_-value (works **even for small samples**). <sup>🤓</sup> your dependent variable might be ordinal, however you have to provide it to <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> as `numeric`.] ] .panel[.panel-name[Output] ``` ## *## Approximative Friedman Test ## ## data: score by ## as.factor(item) (UWES1, UWES3, UWES6) ## stratified by block *## chi-squared = 7.28, p-value = 0.0248 ``` ] .panel[.panel-name[Effect Size] .pull-left[ .center[ <br> <br> <br> <br> <sup>🤔</sup> `\(w=\frac{S}{b(k-1)}\)` ] .font60[ <sup>🤔</sup>The Kendall's `\(w\)` (Kendall and Smith, 1939) is also known as _Kendall’s coefficient_.] ] .pull-right[ <table class="table" style="font-size: 11px; color: black; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;">How to classify? ⚠️</caption> <thead> <tr> <th style="text-align:left;"> Effect Size </th> <th style="text-align:left;"> \( w \) 📜 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Poor </td> <td style="text-align:left;"> \( \lbrack .00; .01 \rbrack \) </td> </tr> <tr> <td style="text-align:left;"> Slight </td> <td style="text-align:left;"> \( \lbrack .01; .20 \rbrack \) </td> </tr> <tr> <td style="text-align:left;"> Fair </td> <td style="text-align:left;"> \( \lbrack .20; .40 \rbrack \) </td> </tr> <tr> <td style="text-align:left;"> Moderate </td> <td style="text-align:left;"> \( \lbrack .40; .60 \rbrack \) </td> </tr> <tr> <td style="text-align:left;"> Substantial </td> <td style="text-align:left;"> \( \lbrack .60; .80 \rbrack \) </td> </tr> <tr> <td style="text-align:left;"> Almost Perfect </td> <td style="text-align:left;"> \( \lbrack .80; 1.00 \lbrack \) </td> </tr> </tbody> </table> <table class="table" style="font-size: 11px; color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Effect Size </th> <th style="text-align:left;"> \( w \) 📔 📜 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Small </td> <td style="text-align:left;"> \( .1 \) </td> </tr> <tr> <td style="text-align:left;"> Medium </td> <td style="text-align:left;"> \( .3 \) </td> </tr> <tr> <td style="text-align:left;"> Large </td> <td style="text-align:left;"> \( .5 \) </td> </tr> </tbody> <tfoot> <tr> <td style="padding: 0; border:0;" colspan="100%"> <sup></sup> ⚠️ No precise definitions, it is always context dependent.</td> </tr> </tfoot> </table> .font60[<sup>📜</sup>Interpretation based on Landis and Koch (1977) guidelines for agreement coefficients. <sup>📔📜</sup>Interpretation based on Cohen (1988) guidelines. ] ] ] .panel[.panel-name[R Code] <div class="pre-name">friedman.R</div> ``` r library(rstatix) friedman_effsize(formula = score~item | id, data = ds_long)#rstatix classifies the effect size ds_kendall <- xtabs(score ~ item + id, data = ds_long) DescTools::KendallW(x = ds_kendall, correct = T) #also computable using the Destools package ``` .font60[ Both functions can be used, however, if the `KendallW` function from the `DescTools` package (Signorell et al., 2022) is used, the argument `correct` must be set with `T` otherwise ties correction won't be used for `\(S\)` when computing `\(w\)`. ] ] .panel[.panel-name[Output] ``` ## # A tibble: 1 × 5 ## .y. n effsize method magnitude ## * <chr> <int> <dbl> <chr> <ord> *## 1 score 10 0.364 Kendall W moderate *## [1] 0.364 ``` ] .panel[.panel-name[Statistical Analysis] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt3[A Friedman's test (Friedman, 1937) was conducted to test if the distributions of the scores are equal among different items (i.e., UWES1; UWES3; UWES6) (Marôco, 2021). To conduct the Friedmans' test the _coin_ (Hothorn Hornik et al., 2008) and the _rstatix_ (Kassambara, 2021) packages were used. Approximate _p_-values were obtained via the Monte Carlo method with `\(10000\)` iterations (Hothorn Hornik et al., 2008). All statistical analyses were conducted with the statistical programming language _R_ (R Core Team, 2022) via the integrated development environment, _RStudio_ (RStudio Team, 2022). An `\(\alpha = .05\)` is considered for all statistical analyses. ]] .panel[.panel-name[Results] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt1[At least one pair of groups presented statistically significant different distributions `\((n_{\text{UWES1}}=10; n_{\text{UWES3}}=10;\)` `\(n_{\text{UWES6}}=10; S_{(2)}=7.28; p= .025; w = 0.364)\)`. The effect size was moderate. .center[ <img src="data:image/png;base64,#slides8of9_files/figure-html/plot_boxplot_fr-1.png" width="40%" height="99%" /> ] .tr[📚 Marôco, J. (2021). _Análise estatística com o SPSS statistics_ (8<sup>th</sup> ed.). ReportNumber.]]] ] --- class: inverse, center, middle # Wilcoxon-Nemenyi-McDonald-Thompson's test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- name: WilcoxonNemenyiMcDonaldThompson # Wilcoxon-Nemenyi-McDonald-Thompson's test .panelset[ .panel[.panel-name[Assumptions] **Wilcoxon-Nemenyi-McDonald-Thompson's test**<sup>🤓</sup> (McDonald and Thompson, 1967; Nemenyi, 1963; Hollander Wolfe et al., 2014) has the same assumptions of [Friedman's test](#friedman), in fact, .orange[it should only be used if Friedman's `\\(H_0\\)` is rejected.] <br> <br> .font80[The Wilcoxon-Nemenyi-McDonald-Thompson's test<sup>💡</sup> uses the ranks from the Friedman's test and has a specific approach to controlling the familywise error rate (FWER) built in; it is more advantageous than the Conover-Iman’s test (Conover and Iman, 1979; Conover, 1999) which does not correct for FWER (meaning that a correction should be used, per example, the Bonferroni correction). The Conover-Iman's test uses a _t_-distribution: - `\(T=\frac{\bar R_i - \bar R_j}{\frac{1}{b}\sqrt{\frac{2 \left( b\sum_{i=1}^b \sum_{j=1}^k R^2_{ij} - \sum_{j=1}^k R^2_j \right)}{df}}} \sim t_{(df)}\)` with `\(df=(b-1)(k-1)\)`. <sup>🤓</sup> It is also called _Nemenyi test_, named after Peter Nemenyi (1963) Ph. D. dissertation at Princeton University. ] <br> <br> .footnote[<sup>💡</sup>The Wilcoxon-Nemenyi-McDonald-Thompson's test is basically an adaptation of the Tukey's HSD test.] ] .panel[.panel-name[Hypotheses] <br> <br> <br> `\(H_0: F_i(Y) = F_j(Y)\)` vs. `\(H_1: F_i(Y) \neq F_j(Y); i\neq j; i,j=1,...,k\)` (two-tailed) <br> <br> or<sup>🤔</sup> .orange[`\\(H_0: \\theta_i(Y) = \\theta_j(Y)\\)` vs. `\\(H_1: \\theta_i(Y) \\neq \\theta_j(Y); i\\neq j; i,j=1,2,...,k\\)`] (two-tailed test) <br> .footnote[<sup>🤔</sup> If the assumption of **identical population distribution in all groups** is valid.] ] .panel[.panel-name[Test Statistic] A `\(Q\)` test statistic should be calculated for each of the `\(\frac{k(k − 1)}{2}\)` different pairs `\(i\)` and `\(j\)` of the factor under study. For each pair of treatments `\((i,j)\)`, let: <br> .pull-left[ 1. Rank observations (in ascending order) for each `\(i\)` block.<sup>⚠️</sup> 2. Sum of ranks per each `\(j\)` group `\((R_j=\sum\limits_{i=1}^{n_j}r_{ij})\)` `\(Q= \frac{|R_i-R_j|}{\sqrt{\frac{bk(k+1)}{12}}} \sim q_{(k, df)}\)`<sup>💡</sup> .font60[<sup>⚠️</sup> In the presence of ties the `average` method (also known as mid-ranks, the mean of ranks that observations would have if they were not tied) is used. <sup>💡</sup> some sources use `\(df=\infty\)`, while others use `\(df=b-k\)`. ] ] .pull-right[ `\(k\)` — number of groups `\(b\)` — number of blocks `\(df\)` — degrees of freedom ] ] .panel[.panel-name[Decision] .pull-left[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{exact} \leq \alpha\)` ] .pull-right[ On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value_{approximate} \leq \alpha\)` ] <br> <br> .center[ Concluding that the pair of distributions `\(i\)` and `\(j\)` are different. <br> <br> <br> <br> .footnote[<sup>💡</sup>The "Asymptotic" _p_-value can be also used.] ] ] .panel[.panel-name[R Code] <div class="pre-name">wnmt.R</div> ``` r ds <- readr::read_csv(trimws('https://ndownloader.figshare.com/files/22299075 ')) #load the dataset ds_small <- ds[complete.cases(ds[,c("UWES1","UWES3","UWES6")]),][1:10,]#example: ordered dv ds_small[,c("UWES1","UWES3","UWES6")] <- lapply(X = ds_small[,c("UWES1","UWES3","UWES6")], ordered, levels=0:6,labels = c("Never","Almost never","Rarely","Sometimes","Often","Very often", "Always")) ds_small[,c("UWES1","UWES3","UWES6")] <- lapply(ds_small[,c("UWES1","UWES3","UWES6")], as.numeric)#is numeric ds_small$id <- 1:nrow(ds_small) #unique id per block ds_long <- tidyr::gather(ds_small,"UWES1","UWES3","UWES6", key = item, value = score) coin::friedman_test(score~as.factor(item), data = ds_long, distribution = coin::approximate(nresample = 10000)) #Friedman's H0 is rejected, the Wilcoxon-Nemenyi-McDonald-Thompson's test can be done library(NSM3) *pWNMT(x = ds_long$score, b = ds_long$id, trt = as.numeric(as.factor(ds_long$item)),method = "Monte Carlo", standardized = T) ``` .font60[<sup>⚠️</sup> Either `"Exact"`, `"Monte Carlo"`, or `"Asymptotic"`, indicating the desired distribution. When `method=NA`, `"Exact"` will be used if the number of permutations is `\(10000\)` or less. Otherwise, `"Monte Carlo"` will be used. To increase the number of interactions when using the `"Monte Carlo"` method, set the argument `n.mc` to the desired number (e.g., `n.mc = 100000`). When setting the desired distribution, be cautious with capitalizing the first letter. `Exact` (✅) is not the same as `exact` (🛑). `standardized`: `TRUE` if the test statistic (i.e., `\(|R_i-R_j|\)`) is to be divided by `\(\sqrt{\frac{bk(k+1)}{12}}\)` **or** `FALSE` if the test statistic is just `\(|R_i-R_j|\)`. <sup>🧙♂️</sup> the `"Exact"` will be more precise, however the `"Monte Carlo"` besides being faster provides fairly close results. ] ] .panel[.panel-name[Output] .scroll-box-26[ ``` ## ## Approximative Friedman Test ## ## data: score by ## as.factor(item) (UWES1, UWES3, UWES6) ## stratified by block ## chi-squared = 7.28, p-value = 0.025 ## ## Number of blocks: n=10 ## Number of treatments: k=3 *## Using the Monte Carlo (with 10000 Iterations) method: ## *## For treatments 1 - 2, the Wilcoxon, Nemenyi, McDonald-Thompson R (standardized) Statistic is 2.6879. *## The smallest experimentwise error rate leading to rejection is 0.0466. ## *## For treatments 1 - 3, the Wilcoxon, Nemenyi, McDonald-Thompson R (standardized) Statistic is 2.5298. *## The smallest experimentwise error rate leading to rejection is 0.0642. ## *## For treatments 2 - 3, the Wilcoxon, Nemenyi, McDonald-Thompson R (standardized) Statistic is 0.1581. *## The smallest experimentwise error rate leading to rejection is 0.9926. ## * ``` ] ] .panel[.panel-name[Statistical Analysis] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ As a post-hoc test to the conducted Friedman's test for (paired samples), the Wilcoxon-Nemenyi-McDonald-Thompson's test (McDonald and Thompson, 1967; Nemenyi, 1963) was used to identify which pairs presented statistically significant differences among their distribution functions (Hollander Wolfe et al., 2014). To conduct the Wilcoxon-Nemenyi-McDonald-Thompson's test the _NSM3_ package (Schneider Chicken et al., 2021) was used. Approximate _p_-values were obtained via the Monte Carlo method with `\(10000\)` iterations (Schneider Chicken et al., 2021). All statistical analyses were conducted with the statistical programming language _R_ (R Core Team, 2022) via the integrated development environment, _RStudio_ (RStudio Team, 2022). An `\(\alpha = .05\)` is considered for all statistical analyses. ]] .panel[.panel-name[Results] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt0[ After rejecting Friedman's `\(H_0\)` the post-hoc results revealed that there were statistically significant differences between the distribution functions of items _UWES1_ vs. _UWES3_ `\((q= 2.688 ; p = .047)\)`. However, no statistically significant differences were observed between items _UWES1_ vs. _UWES6_ `\((q= 2.53 ; p = .064)\)`, and between the items _UWES3_ vs. _UWES6_ `\((q= 0.158 ; p = .993)\)`. .center[ <img src="data:image/png;base64,#slides8of9_files/figure-html/plot_boxplot_wnmt-1.png" width="40%" height="99%" /> ] .tr[📚 Hollander, M., Wolfe, D. A., & Chicken, E. (2014). _Nonparametric statistical methods_ (3rd ed.). John Wiley & Sons.]]]] --- # References Cohen, B. H. (2013). _Explaining psychological statistics_. 4th ed. Hoboken, NJ: John Wiley & Sons. Cohen, J. (1988). _Statistical power analysis for the behavioral sciences_. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates. ISBN: 0805802835. Conover, W. J. (1999). _Practical nonparametric statistics_. 3rd ed. New York, NY: Wiley. Conover, W. and R. Iman (1979). _Multiple-comparisons procedures. Informal report_. Los Alamos, NM (United States): Los Alamos National Laboratory (LANL). DOI: [10.2172/6057803](https://doi.org/10.2172%2F6057803). URL: [https://www.osti.gov/servlets/purl/6057803/](https://www.osti.gov/servlets/purl/6057803/). Coolican, H. (2018). _Research methods and statistics in psychology_. Routledge. ISBN: 9781315201009. DOI: [10.4324/9781315201009](https://doi.org/10.4324%2F9781315201009). URL: [https://www.taylorfrancis.com/books/9781351780704](https://www.taylorfrancis.com/books/9781351780704). --- # References Critchlow, D. E. and M. A. Fligner (1991). "On distribution-free multiple comparisons in the one-way analysis of variance". In: _Communications in Statistics - Theory and Methods_ 20.1, pp. 127-139. ISSN: 0361-0926. DOI: [10.1080/03610929108830487](https://doi.org/10.1080%2F03610929108830487). URL: [https://www.tandfonline.com/doi/full/10.1080/03610929108830487](https://www.tandfonline.com/doi/full/10.1080/03610929108830487). Dwass, M. (1960). "Some k-sample rank-order tests". In: _Contributions to probability and statistics: Essays in honor of Harold Hotelling_. Ed. by I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow and H. B. Mann. Stanford, CA: Stanford University Press, pp. 198-202. Friedman, M. (1937). "The use of ranks to avoid the assumption of normality implicit in the analysis of variance". In: _Journal of the American Statistical Association_ 32.200, pp. 675-701. ISSN: 0162-1459. DOI: [10.1080/01621459.1937.10503522](https://doi.org/10.1080%2F01621459.1937.10503522). URL: [http://www.tandfonline.com/doi/abs/10.1080/01621459.1937.10503522](http://www.tandfonline.com/doi/abs/10.1080/01621459.1937.10503522). Hollander, M., D. A. Wolfe, et al. (2014). _Nonparametric statistical methods_. 3rd ed. Hoboken, NJ: John Wiley & Sons. ISBN: 9780470387375. Hothorn, T. and K. Hornik (2022). _exactRankTests: Exact distributions for rank and permutation tests (R package version 0.8-35) [Computer software]_. URL: [https://cran.r-project.org/package=exactRankTests](https://cran.r-project.org/package=exactRankTests). --- # References Hothorn, T., K. Hornik, et al. (2008). "Implementing a class of permutation tests: The coin package". In: _Journal of Statistical Software_ 28.8, pp. 1-23. ISSN: 1548-7660. DOI: [10.18637/jss.v028.i08](https://doi.org/10.18637%2Fjss.v028.i08). URL: [http://www.jstatsoft.org/v28/i08/](http://www.jstatsoft.org/v28/i08/). Kassambara, A. (2021). _rstatix: Pipe-friendly framework for basic statistical tests (R package version 0.7.0) [Computer software]_. URL: [https://cran.r-project.org/package=rstatix](https://cran.r-project.org/package=rstatix). Kendall, M. G. and B. B. Smith (1939). "The problem of m rankings". In: _The Annals of Mathematical Statistics_ 10.3, pp. 275-287. ISSN: 0003-4851. DOI: [10.1214/aoms/1177732186](https://doi.org/10.1214%2Faoms%2F1177732186). URL: [http://projecteuclid.org/euclid.aoms/1177732186](http://projecteuclid.org/euclid.aoms/1177732186). King, B. M., P. J. Rosopa, et al. (2018). _Statistical reasoning in the behavioral sciences_. 7th ed. Hoboken, NJ: John Wiley & Sons. URL: [https://lccn.loc.gov/2017060694](https://lccn.loc.gov/2017060694). Kruskal, W. H. and W. A. Wallis (1952). "Use of ranks in one-criterion variance analysis". In: _Journal of the American Statistical Association_ 47.260, pp. 583-621. ISSN: 0162-1459. DOI: [10.1080/01621459.1952.10483441](https://doi.org/10.1080%2F01621459.1952.10483441). URL: [http://www.tandfonline.com/doi/abs/10.1080/01621459.1952.10483441](http://www.tandfonline.com/doi/abs/10.1080/01621459.1952.10483441). --- # References Landis, J. R. and G. G. Koch (1977). "The measurement of observer agreement for categorical data". In: _Biometrics_ 33.1, pp. 159-174. ISSN: 0006341X. DOI: [10.2307/2529310](https://doi.org/10.2307%2F2529310). URL: [https://www.jstor.org/stable/2529310?origin=crossref](https://www.jstor.org/stable/2529310?origin=crossref). Lehmann, E. L. (1975). _Nonparametrics: Statistical methods based on ranks_. San Francisco, CA: Holden-Day. Mann, H. B. and D. R. Whitney (1947). "On a test of whether one of two random variables is stochastically larger than the other". In: _The Annals of Mathematical Statistics_ 18.1, pp. 50-60. URL: [https://www.jstor.org/stable/2236101 http://www.jstor.org/stable/2236101%5Cnhttp://www.jstor.org/stable/2236101?seq=1\&cid=pdf-reference\#references\_tab\_contents%5Cnhttp://about.jstor.org/terms](https://www.jstor.org/stable/2236101 http://www.jstor.org/stable/2236101%5Cnhttp://www.jstor.org/stable/2236101?seq=1\&cid=pdf-reference\#references\_tab\_contents%5Cnhttp://about.jstor.org/terms). Marascuilo, L. A. and M. McSweeney (1977). _Nonparametric and distribution-free methods for the social sciences_. Belmont, CA: Wadsworth. DOI: [10.2307/2286625](https://doi.org/10.2307%2F2286625). Marôco, J. (2021). _Análise estatística com o SPSS statistics_. 8th ed. Pêro Pinheiro: ReportNumber. --- # References McDonald, B. J. and W. A. Thompson (1967). "Rank sum multiple comparisons in one- and two-way classifications". In: _Biometrika_ 54.3-4, pp. 487-497. ISSN: 0006-3444. DOI: [10.1093/biomet/54.3-4.487](https://doi.org/10.1093%2Fbiomet%2F54.3-4.487). URL: [https://academic.oup.com/biomet/article-lookup/doi/10.1093/biomet/54.3-4.487](https://academic.oup.com/biomet/article-lookup/doi/10.1093/biomet/54.3-4.487). Nemenyi, P. B. (1963). "Distribution-free multiple comparisons". Pace, L. A. (2012). _Beginning R: An introduction to statistical programming_. Berkeley, CA: Apress. ISBN: 978-1-4302-4554-4. DOI: [10.1007/978-1-4302-4555-1](https://doi.org/10.1007%2F978-1-4302-4555-1). URL: [http://link.springer.com/10.1007/978-1-4302-4555-1](http://link.springer.com/10.1007/978-1-4302-4555-1). R Core Team (2022). _R: A language and environment for statistical computing (version 4.2.0) [Computer software]_. Vienna. URL: [https://www.r-project.org/](https://www.r-project.org/). RStudio Team (2022). _RStudio: Integrated development for R (version 2022.2.2.485) [Computer software]_. Boston, MA. URL: [http://www.rstudio.com/](http://www.rstudio.com/). --- # References Schneider, G., E. Chicken, et al. (2021). _NSM3: Functions and datasets to accompany Hollander, Wolfe, and Chicken - Nonparametric statistical methods, third edition (R package version 1.16) [Computer software]_. URL: [https://cran.r-project.org/package=NSM3](https://cran.r-project.org/package=NSM3). Signorell, A., K. Aho, et al. (2022). _DescTools: Tools for descriptive statistics (R package version 0.99.45) [Computer software]_. URL: [https://cran.r-project.org/package=DescTools](https://cran.r-project.org/package=DescTools). Steel, R. G. D. (1960). "A rank sum test for comparing all pairs of treatments". In: _Technometrics_ 2.2, pp. 197-207. ISSN: 00401706. DOI: [10.2307/1266545](https://doi.org/10.2307%2F1266545). URL: [https://www.jstor.org/stable/1266545?origin=crossref](https://www.jstor.org/stable/1266545?origin=crossref). Tomczak, M. and E. Tomczak (2014). "The need to report effect size estimates revisited. An overview of some recommended measures of effect size". In: _Trends in Sport Sciences_ 1.21, pp. 19-25. URL: [http://www.wbc.poznan.pl/Content/325867/5\_Trends\_Vol21\_2014\_ no1\_20.pdf](http://www.wbc.poznan.pl/Content/325867/5\_Trends\_Vol21\_2014\_ no1\_20.pdf). Wilcoxon, F. (1945). "Individual comparisons by ranking methods". In: _Biometrics Bulletin_ 1.6, pp. 80-83. DOI: [10.2307/3001968](https://doi.org/10.2307%2F3001968). URL: [https://www.jstor.org/stable/10.2307/3001968?origin=crossref](https://www.jstor.org/stable/10.2307/3001968?origin=crossref). --- class: center, bottom, inverse # More info -- Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan). -- <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="M 132.62426,316.69067 C 119.2805,301.94483 112.56962,274.5073 112.56962,234.39862 v -54.79191 c 0,-37.32217 -5.81677,-63.58084 -17.532347,-78.83466 -11.6757,-15.293118 -31.159702,-22.922596 -58.353466,-22.922596 -5.958581,0 -11.409226,0.22492 -16.45319,0.5917 -5.04455,0.427121 -9.742846,1.037046 -14.1564111,1.83092 V 95.057199 H 16.671281 c 12.325533,0 20.908335,3.82414 25.667559,11.532201 4.77973,7.74964 7.139712,25.48587 7.139712,53.14663 v 68.01321 c 0,42.12298 13.016861,74.19672 39.233939,96.16314 19.627549,16.47424 46.636229,27.23363 81.030059,32.40064 v -20.17708 c -16.3928,-4.27176 -29.04346,-10.51565 -37.11829,-19.44413 z m 246.75144,0 c 13.34377,-14.74584 20.05466,-42.18337 20.05466,-82.29205 v -54.79191 c 0,-37.32217 5.81673,-63.58084 17.53235,-78.83466 11.67568,-15.293118 31.15971,-22.922596 58.35348,-22.922596 5.95858,0 11.40922,0.22492 16.45315,0.5917 5.04457,0.427121 9.74287,1.037046 14.15645,1.83092 v 14.785125 h -10.59712 c -12.32549,0 -20.90826,3.82414 -25.66752,11.532201 -4.77974,7.74964 -7.13972,25.48587 -7.13972,53.14663 v 68.01321 c 0,42.12298 -13.01688,74.19672 -39.23394,96.16314 -19.6275,16.47424 -46.63622,27.23363 -81.03006,32.40064 v -20.17708 c 16.39279,-4.27176 29.04347,-10.51565 37.11827,-19.44413 z M 303.95857,87.165762 c 8.42049,-6.691524 25.52576,-10.536158 51.23486,-11.492333 V 63.999997 H 156.80716 v 11.673432 c 26.1755,0.956175 43.38268,4.800809 51.68248,11.492333 8.31852,6.73139 12.40691,20.033568 12.40691,39.904818 V 384.6851 c 0,20.80641 -4.08839,34.5146 -12.40691,41.02332 -8.2998,6.56905 -25.50698,10.10729 -51.68248,10.65744 V 448 h 197.71597 l 0.67087,-11.63414 c -25.50471,-0.54955 -42.56835,-4.35266 -51.07201,-11.40918 -8.4182,-6.95638 -12.73153,-20.44184 -12.73153,-40.27158 V 127.07058 c 0,-19.87125 4.16983,-33.173428 12.56922,-39.904818 z" style="stroke-width:0.0753388"></path> </g></svg> + <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> = <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:red;"> [ comment ] <path d="M462.3 62.6C407.5 15.9 326 24.3 275.7 76.2L256 96.5l-19.7-20.3C186.1 24.3 104.5 15.9 49.7 62.6c-62.8 53.6-66.1 149.8-9.9 207.9l193.5 199.8c12.5 12.9 32.8 12.9 45.3 0l193.5-199.8c56.3-58.1 53-154.3-9.8-207.9z"></path></svg> -- <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> has infinite possibilities. -- Practice is the best strategy for learning. -- . -- _In God we trust, all others bring data_ -- Edwards Deming -- . -- . -- . -- THE END --- class: center, bottom, inverse 