Thông thường đối với phân tích tương quan, người ta thường nói đến phân tích tương quan giữa 2 biến liên tục với nhau (continuous variable vs continuous variable). Khi đó, hệ số tương quan tuyến tính được dùng phổ biến là Pearson, Spearman và Kendall.

Trường hợp phân tích tương quan giữa 2 biến rời rạc (categorical variable vs categorical variable) và biến rời rạc với biến liên tục (categorical variable vs continuous variable) thì ít được nhắc đến.

Một số sách/bài báo có nhắc đến các phân tích tương quan với biến category

  • Credit Risk Analytics: Variable selection

1 Kiểm định giả thiết thống kê (Significance test)

1.1 Continuous vs. Continuous

  • H0: Hệ số tương quan = 0 (no relationship)
  • H1: Hệ số tương quan != 0
psych::corr.test(state.x77[,1:5], use="complete", method="spearman")
## Call:psych::corr.test(x = state.x77[, 1:5], use = "complete", method = "spearman")
## Correlation matrix 
##            Population Income Illiteracy Life Exp Murder
## Population       1.00   0.12       0.31    -0.10   0.35
## Income           0.12   1.00      -0.31     0.32  -0.22
## Illiteracy       0.31  -0.31       1.00    -0.56   0.67
## Life Exp        -0.10   0.32      -0.56     1.00  -0.78
## Murder           0.35  -0.22       0.67    -0.78   1.00
## Sample Size 
## [1] 50
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##            Population Income Illiteracy Life Exp Murder
## Population       0.00   0.78       0.13     0.78   0.10
## Income           0.39   0.00       0.13     0.13   0.39
## Illiteracy       0.03   0.03       0.00     0.00   0.00
## Life Exp         0.47   0.02       0.00     0.00   0.00
## Murder           0.01   0.13       0.00     0.00   0.00
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

1.2 Continuous vs. Nominal

# ?aov()

1.3 Nominal vs. Nominal

Sử dụng bảng chéo (Contingency tables or cross tabulation) và thống kê chi-squared để kiểm tra sự tương quan

  • H0: The The two variables are independent
  • H1: The The two variables are dependent
library(vcd)
## Loading required package: grid
mytable <- xtabs( ~ Treatment + Improved, data = Arthritis)
mytable
##          Improved
## Treatment None Some Marked
##   Placebo   29    7      7
##   Treated   13    7     21
chisq.test(mytable)
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 13.055, df = 2, p-value = 0.001463
  • p-value < 0.05 chứng tỏ 2 biến này phụ thuộc lẫn nhau

Các kiểm định tương tự dựa trên thống kê Chi-squared

  • fisher.test: tương tự như pearson’s Chi-squared
fisher.test(mytable)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.001393
## alternative hypothesis: two.sided
  • mantelhaen.test: Kiểm định tính độc lập của 3 biến
mytable <- xtabs(~Treatment+Improved+Sex, data=Arthritis)
mantelhaen.test(mytable)
## 
##  Cochran-Mantel-Haenszel test
## 
## data:  mytable
## Cochran-Mantel-Haenszel M^2 = 14.632, df = 2, p-value = 0.0006647

2 Đo lường mức độ kết hợp (Effect size)

Trong phân kiểm định giả thiết, kiểm tra xem có đủ bằng chứng để kết luận 2 biến có tương quan với nhau không. Trong phần này sẽ đo lường mức độ chặt chẽ của các kết hợp.

2.1 Continuous vs. Continuous

  • Sử dụng hệ số tương quan tuyến tính: Pearson, Spearman, Kendall
cor(state.x77[,1:5], method="spearman")
##            Population     Income Illiteracy   Life Exp     Murder
## Population  1.0000000  0.1246098  0.3130496 -0.1040171  0.3457401
## Income      0.1246098  1.0000000 -0.3145948  0.3241050 -0.2174623
## Illiteracy  0.3130496 -0.3145948  1.0000000 -0.5553735  0.6723592
## Life Exp   -0.1040171  0.3241050 -0.5553735  1.0000000 -0.7802406
## Murder      0.3457401 -0.2174623  0.6723592 -0.7802406  1.0000000

2.2 Continuous vs. Nominal

Sử dụng hệ số tương quan nội cụm Intraclass Correlations thể hiện sự giống nhau giữa các đơn vị mẫu trong cùng một cụm. Hệ số này có giá trị từ 0-1, càng gần 1 càng giống nhau

Có đến khoảng 10 dạng cách tính ICC (gói psych cung cấp 6 form). Như vậy nên công thức tính ICC nào.

Lựa chọn ICC:

library(psych)
data(sai)
sai.xray <- subset(sai,(sai$study=="XRAY") & (sai$time==1))
xray.icc <- ICC(sai.xray[-c(1:3)],lmer=TRUE,check.keys=TRUE)
## Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
##  reversed items
##    tense regretful upset worrying anxious nervous jittery high.strung worried rattled
xray.icc
## Call: ICC(x = sai.xray[-c(1:3)], lmer = TRUE, check.keys = TRUE)
## 
## Intraclass correlation coefficients 
##                          type  ICC    F df1  df2        p lower bound
## Single_raters_absolute   ICC1 0.28  8.8 199 3800 3.1e-194        0.24
## Single_random_raters     ICC2 0.29 13.1 199 3781 7.0e-301        0.23
## Single_fixed_raters      ICC3 0.38 13.1 199 3781 7.0e-301        0.33
## Average_raters_absolute ICC1k 0.89  8.8 199 3800 3.1e-194        0.86
## Average_random_raters   ICC2k 0.89 13.1 199 3781 7.0e-301        0.86
## Average_fixed_raters    ICC3k 0.92 13.1 199 3781 7.0e-301        0.91
##                         upper bound
## Single_raters_absolute         0.33
## Single_random_raters           0.35
## Single_fixed_raters            0.43
## Average_raters_absolute        0.91
## Average_random_raters          0.92
## Average_fixed_raters           0.94
## 
##  Number of subjects = 200     Number of Judges =  20
xray.icc$lme
##           variance   Percent
## ID       0.3033248 0.2894686
## Items    0.2434623 0.2323407
## Residual 0.5010804 0.4781906
## Total    1.0478675 1.0000000

2.3 Nominal vs. Nominal

mytable <- xtabs( ~ Treatment + Improved, data = Arthritis)
assocstats(mytable)
##                     X^2 df  P(> X^2)
## Likelihood Ratio 13.530  2 0.0011536
## Pearson          13.055  2 0.0014626
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.367 
## Cramer's V        : 0.394

Các hệ số Phi, Contigency, Cramer’s V trong khoảng từ 0 đến 1 càng lớn chứng tỏ sự kết hợp càng chặt

? Xem thêm Kappa() của gói vcd

LS0tDQp0aXRsZTogIkNvcnJlbGF0aW9uIGFuYWx5c2lzIg0KYXV0aG9yOiAiTmd1eeG7hW4gTmfhu41jIELDrG5oIg0KZGF0ZTogIjggSlVMIDIwMTkiDQpvdXRwdXQ6DQogIGh0bWxfZG9jdW1lbnQ6IA0KICAgIGNvZGVfZG93bmxvYWQ6IHRydWUNCiAgICBjb2RlX2ZvbGRpbmc6IHNob3cNCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcw0KICAgIHRoZW1lOiAiZGVmYXVsdCINCiAgICB0b2M6IFRSVUUNCiAgICB0b2NfZmxvYXQ6IFRSVUUNCiAgICBkZXY6ICdzdmcnDQplZGl0b3Jfb3B0aW9uczogDQogIGNodW5rX291dHB1dF90eXBlOiBjb25zb2xlDQotLS0NCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpDQpgYGANCg0KVGjDtG5nIHRoxrDhu51uZyDEkeG7kWkgduG7m2kgcGjDom4gdMOtY2ggdMawxqFuZyBxdWFuLCBuZ8aw4budaSB0YSB0aMaw4budbmcgbsOzaSDEkeG6v24gcGjDom4gdMOtY2ggdMawxqFuZyBxdWFuIGdp4buvYSAyIGJp4bq/biBsacOqbiB04bulYyB24bubaSBuaGF1IChjb250aW51b3VzIHZhcmlhYmxlIHZzIGNvbnRpbnVvdXMgdmFyaWFibGUpLiBLaGkgxJHDsywgaOG7hyBz4buRIHTGsMahbmcgcXVhbiB0dXnhur9uIHTDrW5oIMSRxrDhu6NjIGTDuW5nIHBo4buVIGJp4bq/biBsw6AgUGVhcnNvbiwgU3BlYXJtYW4gdsOgIEtlbmRhbGwuIA0KDQpUcsaw4budbmcgaOG7o3AgcGjDom4gdMOtY2ggdMawxqFuZyBxdWFuIGdp4buvYSAyIGJp4bq/biBy4budaSBy4bqhYyAoY2F0ZWdvcmljYWwgdmFyaWFibGUgdnMgY2F0ZWdvcmljYWwgdmFyaWFibGUpIHbDoCBiaeG6v24gcuG7nWkgcuG6oWMgduG7m2kgYmnhur9uIGxpw6puIHThu6VjIChjYXRlZ29yaWNhbCB2YXJpYWJsZSB2cyBjb250aW51b3VzIHZhcmlhYmxlKSB0aMOsIMOtdCDEkcaw4bujYyBuaOG6r2MgxJHhur9uLg0KDQpN4buZdCBz4buRIHPDoWNoL2LDoGkgYsOhbyBjw7Mgbmjhuq9jIMSR4bq/biBjw6FjIHBow6JuIHTDrWNoIHTGsMahbmcgcXVhbiB24bubaSBiaeG6v24gY2F0ZWdvcnkNCg0KLSBDcmVkaXQgUmlzayBBbmFseXRpY3M6IFZhcmlhYmxlIHNlbGVjdGlvbg0KDQohW10oaW5wdXQvcmVmX2NyZWRpdF9yaXNrX2FuYWx5dGljcy5wbmcpDQoNCi0gW01lZGl1bV0oaHR0cHM6Ly9tZWRpdW0uY29tL0BvdXRzaWRlMlNEcy9hbi1vdmVydmlldy1vZi1jb3JyZWxhdGlvbi1tZWFzdXJlcy1iZXR3ZWVuLWNhdGVnb3JpY2FsLWFuZC1jb250aW51b3VzLXZhcmlhYmxlcy00YzdmODU2MTAzNjUpDQoNCiFbXShpbnB1dC9tZWRpdW0ucG5nKQ0KDQotIFtyLWJsb2dnZXJdKGh0dHBzOi8vd3d3LnItYmxvZ2dlcnMuY29tL3RvLWVhdC1vci1ub3QtdG8tZWF0LXRoYXRzLXRoZS1xdWVzdGlvbi1tZWFzdXJpbmctdGhlLWFzc29jaWF0aW9uLWJldHdlZW4tY2F0ZWdvcmljYWwtdmFyaWFibGVzLykNCg0KLSBSIGluIGFjdGlvbjogTeG7pWMgNy4yLjIgVGVzdHMgb2YgaW5kZXBlbmRlbmNlIA0KDQotIFtDaGVhdCBzaGVldF0oaHR0cHM6Ly9zdGF0cy5pZHJlLnVjbGEuZWR1L290aGVyL211bHQtcGtnL3doYXRzdGF0LykgY8OhY2ggbOG7sWEgY2jhu41uIGtp4buDbSDEkeG7i25oDQoNCi0gW3N0YXRzLnN0YWNrZXhjaGFuZ2UuY29tXShodHRwczovL3N0YXRzLnN0YWNrZXhjaGFuZ2UuY29tL3F1ZXN0aW9ucy8xMDgwMDcvY29ycmVsYXRpb25zLXdpdGgtdW5vcmRlcmVkLWNhdGVnb3JpY2FsLXZhcmlhYmxlcykNCg0KIVtdKGlucHV0L3N0YXRzX3N0YWNrZXhjaGFuZ2UucG5nKQ0KDQojIEtp4buDbSDEkeG7i25oIGdp4bqjIHRoaeG6v3QgdGjhu5FuZyBrw6ogKFNpZ25pZmljYW5jZSB0ZXN0KQ0KDQojIyBDb250aW51b3VzIHZzLiBDb250aW51b3VzDQoNCi0gSDA6IEjhu4cgc+G7kSB0xrDGoW5nIHF1YW4gPSAwIChubyByZWxhdGlvbnNoaXApDQotIEgxOiBI4buHIHPhu5EgdMawxqFuZyBxdWFuICE9IDANCg0KYGBge3J9DQpwc3ljaDo6Y29yci50ZXN0KHN0YXRlLng3N1ssMTo1XSwgdXNlPSJjb21wbGV0ZSIsIG1ldGhvZD0ic3BlYXJtYW4iKQ0KYGBgDQoNCiMjIENvbnRpbnVvdXMgdnMuIE5vbWluYWwgDQoNCmBgYHtyfQ0KIyA/YW92KCkNCmBgYA0KDQoNCiMjIE5vbWluYWwgdnMuIE5vbWluYWwNCg0KU+G7rSBk4bulbmcgYuG6o25nIGNow6lvIChDb250aW5nZW5jeSB0YWJsZXMgb3IgY3Jvc3MgdGFidWxhdGlvbikgdsOgIHRo4buRbmcga8OqIGNoaS1zcXVhcmVkIMSR4buDIGtp4buDbSB0cmEgc+G7sSB0xrDGoW5nIHF1YW4NCg0KLSBIMDogVGhlIFRoZSB0d28gdmFyaWFibGVzIGFyZSBpbmRlcGVuZGVudA0KLSBIMTogVGhlIFRoZSB0d28gdmFyaWFibGVzIGFyZSBkZXBlbmRlbnQNCg0KYGBge3J9DQpsaWJyYXJ5KHZjZCkNCm15dGFibGUgPC0geHRhYnMoIH4gVHJlYXRtZW50ICsgSW1wcm92ZWQsIGRhdGEgPSBBcnRocml0aXMpDQpteXRhYmxlDQpjaGlzcS50ZXN0KG15dGFibGUpDQpgYGANCg0KLSBwLXZhbHVlIDwgMC4wNSBjaOG7qW5nIHThu48gMiBiaeG6v24gbsOgeSBwaOG7pSB0aHXhu5ljIGzhuqtuIG5oYXUNCg0KQ8OhYyBraeG7g20gxJHhu4tuaCB0xrDGoW5nIHThu7EgZOG7sWEgdHLDqm4gdGjhu5FuZyBrw6ogQ2hpLXNxdWFyZWQNCg0KLSBmaXNoZXIudGVzdDogdMawxqFuZyB04buxIG5oxrAgcGVhcnNvbidzIENoaS1zcXVhcmVkDQpgYGB7cn0NCmZpc2hlci50ZXN0KG15dGFibGUpDQpgYGANCg0KLSBtYW50ZWxoYWVuLnRlc3Q6IEtp4buDbSDEkeG7i25oIHTDrW5oIMSR4buZYyBs4bqtcCBj4bunYSAzIGJp4bq/bg0KDQpgYGB7cn0NCm15dGFibGUgPC0geHRhYnMoflRyZWF0bWVudCtJbXByb3ZlZCtTZXgsIGRhdGE9QXJ0aHJpdGlzKQ0KbWFudGVsaGFlbi50ZXN0KG15dGFibGUpDQpgYGANCg0KDQojIMSQbyBsxrDhu51uZyBt4bupYyDEkeG7mSBr4bq/dCBo4bujcCAoRWZmZWN0IHNpemUpDQoNClRyb25nIHBow6JuIGtp4buDbSDEkeG7i25oIGdp4bqjIHRoaeG6v3QsIGtp4buDbSB0cmEgeGVtIGPDsyDEkeG7pyBi4bqxbmcgY2jhu6luZyDEkeG7gyBr4bq/dCBsdeG6rW4gMiBiaeG6v24gY8OzIHTGsMahbmcgcXVhbiB24bubaSBuaGF1IGtow7RuZy4gVHJvbmcgcGjhuqduIG7DoHkgc+G6vSDEkW8gbMaw4budbmcgbeG7qWMgxJHhu5kgY2jhurd0IGNo4bq9IGPhu6dhIGPDoWMga+G6v3QgaOG7o3AuDQoNCiMjIENvbnRpbnVvdXMgdnMuIENvbnRpbnVvdXMNCg0KLSBT4butIGThu6VuZyBo4buHIHPhu5EgdMawxqFuZyBxdWFuIHR1eeG6v24gdMOtbmg6IFBlYXJzb24sIFNwZWFybWFuLCBLZW5kYWxsDQpgYGB7cn0NCmNvcihzdGF0ZS54NzdbLDE6NV0sIG1ldGhvZD0ic3BlYXJtYW4iKQ0KYGBgDQoNCg0KIyMgQ29udGludW91cyB2cy4gTm9taW5hbCANCg0KU+G7rSBk4bulbmcgaOG7hyBz4buRIHTGsMahbmcgcXVhbiBu4buZaSBj4bulbSBbSW50cmFjbGFzcyBDb3JyZWxhdGlvbnNdKGh0dHBzOi8vd3d3LnN0YXRpc3RpY3Nob3d0by5kYXRhc2NpZW5jZWNlbnRyYWwuY29tL2ludHJhY2xhc3MtY29ycmVsYXRpb24vKSAgdGjhu4MgaGnhu4duIHPhu7EgZ2nhu5FuZyBuaGF1IGdp4buvYSBjw6FjIMSRxqFuIHbhu4sgbeG6q3UgdHJvbmcgY8O5bmcgbeG7mXQgY+G7pW0uIEjhu4cgc+G7kSBuw6B5IGPDsyBnacOhIHRy4buLIHThu6sgMC0xLCBjw6BuZyBn4bqnbiAxIGPDoG5nIGdp4buRbmcgbmhhdQ0KDQpDw7MgxJHhur9uIGtob+G6o25nIDEwIGThuqFuZyBjw6FjaCB0w61uaCBJQ0MgKGfDs2kgcHN5Y2ggY3VuZyBj4bqlcCA2IGZvcm0pLiBOaMawIHbhuq15IG7Dqm4gY8O0bmcgdGjhu6ljIHTDrW5oIElDQyBuw6BvLiANCg0KTOG7sWEgY2jhu41uIElDQzoNCg0KLSBbR3VpZGUgbGluZV0oaHR0cHM6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wbWMvYXJ0aWNsZXMvUE1DNDkxMzExOC8pIA0KLSBbR3VpZGUgbGluZV0oaHR0cHM6Ly9saW5rLnNwcmluZ2VyLmNvbS9hcnRpY2xlLzEwLjM3NTgvczEzNDI4LTAxNS0wNjIzLXkpDQoNCg0KYGBge3J9DQpsaWJyYXJ5KHBzeWNoKQ0KZGF0YShzYWkpDQpzYWkueHJheSA8LSBzdWJzZXQoc2FpLChzYWkkc3R1ZHk9PSJYUkFZIikgJiAoc2FpJHRpbWU9PTEpKQ0KeHJheS5pY2MgPC0gSUNDKHNhaS54cmF5Wy1jKDE6MyldLGxtZXI9VFJVRSxjaGVjay5rZXlzPVRSVUUpDQp4cmF5LmljYw0KeHJheS5pY2MkbG1lDQpgYGANCg0KIyMgTm9taW5hbCB2cy4gTm9taW5hbA0KDQpgYGB7cn0NCm15dGFibGUgPC0geHRhYnMoIH4gVHJlYXRtZW50ICsgSW1wcm92ZWQsIGRhdGEgPSBBcnRocml0aXMpDQphc3NvY3N0YXRzKG15dGFibGUpDQpgYGANCg0KQ8OhYyBo4buHIHPhu5EgUGhpLCBDb250aWdlbmN5LCBDcmFtZXIncyBWIHRyb25nIGtob+G6o25nIHThu6sgMCDEkeG6v24gMSBjw6BuZyBs4bubbiBjaOG7qW5nIHThu48gc+G7sSBr4bq/dCBo4bujcCBjw6BuZyBjaOG6t3QgDQoNCj8gWGVtIHRow6ptIEthcHBhKCkgY+G7p2EgZ8OzaSB2Y2QNCg==