class: center, middle, inverse, title-slide .title[ # An Introduction to
[ comment ]
and RStudio for Educational Researchers ] .subtitle[ ##
Descriptive and Inferential Statistics:
Correlation ] .author[ ### Jorge Sinval ] .date[ ### 2025-11-18 ] --- class: inverse, center, middle <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> <style> .orange { color: #EB811B; } .kbd { display: inline-block; padding: .2em .5em; font-size: 0.75em; line-height: 1.75; color: #555; vertical-align: middle; background-color: #fcfcfc; border: solid 1px #ccc; border-bottom-color: #bbb; border-radius: 3px; box-shadow: inset 0 -1px 0 #bbb } </style>
# 4. Correlation <html><div style='float:left'></div><hr color='#EB811B' size=1px width=800px></html> --- # Pearson correlation .panelset[ .panel[.panel-name[Test] **Pearson correlation coefficient significance test** Is the observed correlation statistically significant? Can we assume that in the population the correlation is significantly different from zero? ] .panel[.panel-name[Assumptions] 1. `\(X_1\)` e `\(X_2\)` quantitative 2. `\(X_1\)` e `\(X_2\)` linearly related 3. `\(X_1\)` e `\(X_2\)` with normal distribution (bivariate) ] .panel[.panel-name[Hypotheses] `\(H_0: \rho = 0\)` vs. `\(H_1: \rho \neq 0\)` ] .panel[.panel-name[Test statistic] $$T_R=\frac{R}{\sqrt{\frac{1-R^2}{n-2}}} \sim t(n-2) $$ ] .panel[.panel-name[Decision] Reject `\(H_0\)` if `\(|T_R| \geq t_{1−\frac{\alpha}{2};(n−2)}\)` On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value \leq \alpha\)` ] .panel[.panel-name[Conclusion] The variables `\(X_1\)` and `\(X_2\)` [are/are not] significantly correlated `\((t_r(df) = t.ttt; r = .rrr; p = .ppp; N = n)\)`. The correlation is [nule/small/moderate/large] [-/and positive/and negative]. ] .panel[.panel-name[R code] Example: _Investigate if work engagement and burnout are significantly correlated._ <div class="pre-name">pearson_cor_test.R</div> ``` r ds <- readr::read_csv(trimws("https://ndownloader.figshare.com/files/22299075 ")) ds$we <- rowMeans(ds[,paste0("UWES",1:9)]) #create work engagement variable ds$burnout <- rowMeans(ds[,paste0("OLBI",c(1:12,14:16))]) #create burnout variable library(ggplot2) ggplot(ds, aes(x = we, y = burnout)) + #scatter plot geom_point() + geom_smooth(method = "lm", se = FALSE) + labs(title = "Work engagement and burnout", x = "Work engagement", y = "Burnout") + LittleHelpers::theme_mr() #package LittleHelpers to provide a APA ready theme *cor.test(x = ds$we,y = ds$burnout, method = "pearson") ``` ] .panel[.panel-name[Output] <img src="data:image/png;base64,#slides9of9_files/figure-html/plot_pearson_cor-1.png" width="30%" height="99%" /> ``` ## ## Pearson's product-moment correlation ## ## data: ds$we and ds$burnout *## t = -33.7, df = 1080, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.7437948 -0.6855893 ## sample estimates: ## cor *## -0.7159338 ``` ] ] --- # Spearman correlation .panelset[ .panel[.panel-name[Test] **Spearman correlation coefficient significance test** Is the observed correlation statistically significant? Can we assume that in the population the correlation is significantly different from zero? ] .panel[.panel-name[Assumptions] 1. `\(X_1\)` and `\(X_2\)` measured on an at least ordinal measurement scale ] .panel[.panel-name[Hypotheses] `\(H_0: \rho_s = 0\)` vs. `\(H_1: \rho_s \neq 0\)` ] .panel[.panel-name[Test statistic] `\(T_{R_S} =R_S \sqrt{\frac{n-2}{1-R_S^2}} \sim t(n-2)\)` ] .panel[.panel-name[Decision] Reject `\(H_0\)` if `\(|T_{R_S}| \geq t_{1−\frac{\alpha}{2};(n−2)}\)` On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value \leq \alpha\)` ] .panel[.panel-name[Conclusion] The variables `\(X_1\)` and `\(X_2\)` [are/are not] significantly correlated `\((t_{r_S}(df) = t.ttt; r = .rrr; p = .ppp; N = n)\)`. The correlation is [nule/small/moderate/large] [-/and positive/and negative]. ] .panel[.panel-name[R code] Example: _Investigate if there is an association between OLBI's 8<sup>th</sup> item (i.e., "During my work, I often feel emotionally drained") and UWES' 1<sup>st</sup> item (i.e., "At my work, I feel bursting with energy").UWES' items are answered on a 7-point ordinal scale (0 — "Never" to 6 — "Always"), while OLBI's items are answered on a 5-point Likert scale (1 — "Strongly Disagree" to 5 — "Strongly Agree")_. <div class="pre-name">spearman_cor_test.R</div> ``` r ds <- readr::read_csv(trimws("https://ndownloader.figshare.com/files/22299075 ")) *cor.test(x = ds$OLBI8,y = ds$UWES1, method = "spearman", exact = T) ``` ] .panel[.panel-name[Output] ``` ## ## Spearman's rank correlation rho ## ## data: ds$OLBI8 and ds$UWES1 *## S = 290153241, p-value < 2.2e-16 ## alternative hypothesis: true rho is not equal to 0 ## sample estimates: ## rho *## -0.3743497 ``` ] ] --- # Cramér's V correlation .panelset[ .panel[.panel-name[Test] **Cramér's V correlation coefficient significance test** Is the observed correlation statistically significant? Can we assume that in the population the correlation is significantly different from zero? ] .panel[.panel-name[Assumptions] 1. `\(X_1\)` and `\(X_2\)` are on (at least) nominal measurement scale with the data organized in a contingency table 2. Independent samples 3. `\(N>20\)` 4. At least `\(80\% E_{ij} \geq5\)` 5. `\(100\% E_{ij} >1\)` ] .panel[.panel-name[Hypotheses] `\(H_0: \rho_v = 0\)` vs. `\(H_1: \rho_v \neq 0\)` ] .panel[.panel-name[Test statistic] `\(X^2= \sum\limits_{i=1}^R\sum\limits_{j=1}^C \frac{(O_{ij}-E_{ij})^2}{E_{ij}} \sim \chi^2_{(R-1)(C-1)}\)` ] .panel[.panel-name[Decision] Reject `\(H_0\)` if `\(X^2 \geq \chi^2_{1-\alpha;(R-1)(C-1)}\)` On <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg>: Reject `\(H_0\)` if `\(p-value \leq \alpha\)` ] .panel[.panel-name[Conclusion] The variables `\(X_1\)` and `\(X_2\)` [are/are not] significantly correlated `\((\chi^2_{(df)} = x.xxx; V = .vvv; p = .ppp; N = n)\)`. The correlation is [nule/small/moderate/large]. ] .panel[.panel-name[R code] Example: _Investigate if sex and country are independent._ <div class="pre-name">cramer_v_test.R</div> ``` r ds <- readr::read_csv(trimws("https://ndownloader.figshare.com/files/22299075 ")) *library(lsr) contigency_table <- table(ds$Sex, ds$Country) #contingency table with qualitative variables *chisq.test(contigency_table,correct = F) #significance test *cramersV(contigency_table) #Cramér V estimate ``` .pull-left[`correct` argument can be: * `TRUE` (default) applies continuity correction when computing the test statistic for 2 by 2 tables; one half is subtracted from all `\(|O-E|\)` differences: `\(X^2= \sum\limits_{i=1}^R\sum\limits_{j=1}^C \frac{(|O_{ij}-E_{ij}|-0.5)^2}{E_{ij}} \sim \chi^2_{(R-1)(C-1)}\)`; ] .pull-right[ * `FALSE` does not apply continuity correction: `\(X^2= \sum\limits_{i=1}^R\sum\limits_{j=1}^C \frac{(O_{ij}-E_{ij})^2}{E_{ij}} \sim \chi^2_{(R-1)(C-1)}\)`. ] ] .panel[.panel-name[Output] ``` ## ## Pearson's Chi-squared test ## ## data: contigency_table *## X-squared = 2.2127, df = 1, p-value = 0.1369 ## *## [1] 0.0440688 ``` ] ] --- class: center, bottom, inverse # More info -- Slides created with the <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> package [`xaringan`](https://github.com/yihui/xaringan). -- <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> <g label="icon" id="layer6" groupmode="layer"> <path id="path2" d="M 132.62426,316.69067 C 119.2805,301.94483 112.56962,274.5073 112.56962,234.39862 v -54.79191 c 0,-37.32217 -5.81677,-63.58084 -17.532347,-78.83466 -11.6757,-15.293118 -31.159702,-22.922596 -58.353466,-22.922596 -5.958581,0 -11.409226,0.22492 -16.45319,0.5917 -5.04455,0.427121 -9.742846,1.037046 -14.1564111,1.83092 V 95.057199 H 16.671281 c 12.325533,0 20.908335,3.82414 25.667559,11.532201 4.77973,7.74964 7.139712,25.48587 7.139712,53.14663 v 68.01321 c 0,42.12298 13.016861,74.19672 39.233939,96.16314 19.627549,16.47424 46.636229,27.23363 81.030059,32.40064 v -20.17708 c -16.3928,-4.27176 -29.04346,-10.51565 -37.11829,-19.44413 z m 246.75144,0 c 13.34377,-14.74584 20.05466,-42.18337 20.05466,-82.29205 v -54.79191 c 0,-37.32217 5.81673,-63.58084 17.53235,-78.83466 11.67568,-15.293118 31.15971,-22.922596 58.35348,-22.922596 5.95858,0 11.40922,0.22492 16.45315,0.5917 5.04457,0.427121 9.74287,1.037046 14.15645,1.83092 v 14.785125 h -10.59712 c -12.32549,0 -20.90826,3.82414 -25.66752,11.532201 -4.77974,7.74964 -7.13972,25.48587 -7.13972,53.14663 v 68.01321 c 0,42.12298 -13.01688,74.19672 -39.23394,96.16314 -19.6275,16.47424 -46.63622,27.23363 -81.03006,32.40064 v -20.17708 c 16.39279,-4.27176 29.04347,-10.51565 37.11827,-19.44413 z M 303.95857,87.165762 c 8.42049,-6.691524 25.52576,-10.536158 51.23486,-11.492333 V 63.999997 H 156.80716 v 11.673432 c 26.1755,0.956175 43.38268,4.800809 51.68248,11.492333 8.31852,6.73139 12.40691,20.033568 12.40691,39.904818 V 384.6851 c 0,20.80641 -4.08839,34.5146 -12.40691,41.02332 -8.2998,6.56905 -25.50698,10.10729 -51.68248,10.65744 V 448 h 197.71597 l 0.67087,-11.63414 c -25.50471,-0.54955 -42.56835,-4.35266 -51.07201,-11.40918 -8.4182,-6.95638 -12.73153,-20.44184 -12.73153,-40.27158 V 127.07058 c 0,-19.87125 4.16983,-33.173428 12.56922,-39.904818 z" style="stroke-width:0.0753388"></path> </g></svg> + <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> = <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:red;"> [ comment ] <path d="M462.3 62.6C407.5 15.9 326 24.3 275.7 76.2L256 96.5l-19.7-20.3C186.1 24.3 104.5 15.9 49.7 62.6c-62.8 53.6-66.1 149.8-9.9 207.9l193.5 199.8c12.5 12.9 32.8 12.9 45.3 0l193.5-199.8c56.3-58.1 53-154.3-9.8-207.9z"></path></svg> -- <svg viewBox="0 0 581 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;position:relative;display:inline-block;top:.1em;fill:#384CB7;"> [ comment ] <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> has infinite possibilities. -- Practice is the best strategy for learning. -- . -- _In God we trust, all others bring data_ -- Edwards Deming -- . -- . -- . -- THE END --- class: center, bottom, inverse 