Thông thường đối với phân tích tương quan, người ta thường nói đến phân tích tương quan giữa 2 biến liên tục với nhau (continuous variable vs continuous variable). Khi đó, hệ số tương quan tuyến tính được dùng phổ biến là Pearson, Spearman và Kendall.
Trường hợp phân tích tương quan giữa 2 biến rời rạc (categorical variable vs categorical variable) và biến rời rạc với biến liên tục (categorical variable vs continuous variable) thì ít được nhắc đến.
Một số sách/bài báo có nhắc đến các phân tích tương quan với biến category
- Credit Risk Analytics: Variable selection



Kiểm định giả thiết thống kê (Significance test)
Continuous vs. Continuous
- H0: Hệ số tương quan = 0 (no relationship)
- H1: Hệ số tương quan != 0
psych::corr.test(state.x77[,1:5], use="complete", method="spearman")
## Call:psych::corr.test(x = state.x77[, 1:5], use = "complete", method = "spearman")
## Correlation matrix
## Population Income Illiteracy Life Exp Murder
## Population 1.00 0.12 0.31 -0.10 0.35
## Income 0.12 1.00 -0.31 0.32 -0.22
## Illiteracy 0.31 -0.31 1.00 -0.56 0.67
## Life Exp -0.10 0.32 -0.56 1.00 -0.78
## Murder 0.35 -0.22 0.67 -0.78 1.00
## Sample Size
## [1] 50
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## Population Income Illiteracy Life Exp Murder
## Population 0.00 0.78 0.13 0.78 0.10
## Income 0.39 0.00 0.13 0.13 0.39
## Illiteracy 0.03 0.03 0.00 0.00 0.00
## Life Exp 0.47 0.02 0.00 0.00 0.00
## Murder 0.01 0.13 0.00 0.00 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
Continuous vs. Nominal
# ?aov()
Nominal vs. Nominal
Sử dụng bảng chéo (Contingency tables or cross tabulation) và thống kê chi-squared để kiểm tra sự tương quan
- H0: The The two variables are independent
- H1: The The two variables are dependent
library(vcd)
## Loading required package: grid
mytable <- xtabs( ~ Treatment + Improved, data = Arthritis)
mytable
## Improved
## Treatment None Some Marked
## Placebo 29 7 7
## Treated 13 7 21
chisq.test(mytable)
##
## Pearson's Chi-squared test
##
## data: mytable
## X-squared = 13.055, df = 2, p-value = 0.001463
- p-value < 0.05 chứng tỏ 2 biến này phụ thuộc lẫn nhau
Các kiểm định tương tự dựa trên thống kê Chi-squared
- fisher.test: tương tự như pearson’s Chi-squared
fisher.test(mytable)
##
## Fisher's Exact Test for Count Data
##
## data: mytable
## p-value = 0.001393
## alternative hypothesis: two.sided
- mantelhaen.test: Kiểm định tính độc lập của 3 biến
mytable <- xtabs(~Treatment+Improved+Sex, data=Arthritis)
mantelhaen.test(mytable)
##
## Cochran-Mantel-Haenszel test
##
## data: mytable
## Cochran-Mantel-Haenszel M^2 = 14.632, df = 2, p-value = 0.0006647
Đo lường mức độ kết hợp (Effect size)
Trong phân kiểm định giả thiết, kiểm tra xem có đủ bằng chứng để kết luận 2 biến có tương quan với nhau không. Trong phần này sẽ đo lường mức độ chặt chẽ của các kết hợp.
Continuous vs. Continuous
- Sử dụng hệ số tương quan tuyến tính: Pearson, Spearman, Kendall
cor(state.x77[,1:5], method="spearman")
## Population Income Illiteracy Life Exp Murder
## Population 1.0000000 0.1246098 0.3130496 -0.1040171 0.3457401
## Income 0.1246098 1.0000000 -0.3145948 0.3241050 -0.2174623
## Illiteracy 0.3130496 -0.3145948 1.0000000 -0.5553735 0.6723592
## Life Exp -0.1040171 0.3241050 -0.5553735 1.0000000 -0.7802406
## Murder 0.3457401 -0.2174623 0.6723592 -0.7802406 1.0000000
Continuous vs. Nominal
Sử dụng hệ số tương quan nội cụm Intraclass Correlations thể hiện sự giống nhau giữa các đơn vị mẫu trong cùng một cụm. Hệ số này có giá trị từ 0-1, càng gần 1 càng giống nhau
Có đến khoảng 10 dạng cách tính ICC (gói psych cung cấp 6 form). Như vậy nên công thức tính ICC nào.
Lựa chọn ICC:
library(psych)
data(sai)
sai.xray <- subset(sai,(sai$study=="XRAY") & (sai$time==1))
xray.icc <- ICC(sai.xray[-c(1:3)],lmer=TRUE,check.keys=TRUE)
## Some items were negatively correlated with total scale and were automatically reversed.
## This is indicated by a negative sign for the variable name.
## reversed items
## tense regretful upset worrying anxious nervous jittery high.strung worried rattled
xray.icc
## Call: ICC(x = sai.xray[-c(1:3)], lmer = TRUE, check.keys = TRUE)
##
## Intraclass correlation coefficients
## type ICC F df1 df2 p lower bound
## Single_raters_absolute ICC1 0.28 8.8 199 3800 3.1e-194 0.24
## Single_random_raters ICC2 0.29 13.1 199 3781 7.0e-301 0.23
## Single_fixed_raters ICC3 0.38 13.1 199 3781 7.0e-301 0.33
## Average_raters_absolute ICC1k 0.89 8.8 199 3800 3.1e-194 0.86
## Average_random_raters ICC2k 0.89 13.1 199 3781 7.0e-301 0.86
## Average_fixed_raters ICC3k 0.92 13.1 199 3781 7.0e-301 0.91
## upper bound
## Single_raters_absolute 0.33
## Single_random_raters 0.35
## Single_fixed_raters 0.43
## Average_raters_absolute 0.91
## Average_random_raters 0.92
## Average_fixed_raters 0.94
##
## Number of subjects = 200 Number of Judges = 20
xray.icc$lme
## variance Percent
## ID 0.3033248 0.2894686
## Items 0.2434623 0.2323407
## Residual 0.5010804 0.4781906
## Total 1.0478675 1.0000000
Nominal vs. Nominal
mytable <- xtabs( ~ Treatment + Improved, data = Arthritis)
assocstats(mytable)
## X^2 df P(> X^2)
## Likelihood Ratio 13.530 2 0.0011536
## Pearson 13.055 2 0.0014626
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.367
## Cramer's V : 0.394
Các hệ số Phi, Contigency, Cramer’s V trong khoảng từ 0 đến 1 càng lớn chứng tỏ sự kết hợp càng chặt
? Xem thêm Kappa() của gói vcd
LS0tDQp0aXRsZTogIkNvcnJlbGF0aW9uIGFuYWx5c2lzIg0KYXV0aG9yOiAiTmd1eeG7hW4gTmfhu41jIELDrG5oIg0KZGF0ZTogIjggSlVMIDIwMTkiDQpvdXRwdXQ6DQogIGh0bWxfZG9jdW1lbnQ6IA0KICAgIGNvZGVfZG93bmxvYWQ6IHRydWUNCiAgICBjb2RlX2ZvbGRpbmc6IHNob3cNCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcw0KICAgIHRoZW1lOiAiZGVmYXVsdCINCiAgICB0b2M6IFRSVUUNCiAgICB0b2NfZmxvYXQ6IFRSVUUNCiAgICBkZXY6ICdzdmcnDQplZGl0b3Jfb3B0aW9uczogDQogIGNodW5rX291dHB1dF90eXBlOiBjb25zb2xlDQotLS0NCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpDQpgYGANCg0KVGjDtG5nIHRoxrDhu51uZyDEkeG7kWkgduG7m2kgcGjDom4gdMOtY2ggdMawxqFuZyBxdWFuLCBuZ8aw4budaSB0YSB0aMaw4budbmcgbsOzaSDEkeG6v24gcGjDom4gdMOtY2ggdMawxqFuZyBxdWFuIGdp4buvYSAyIGJp4bq/biBsacOqbiB04bulYyB24bubaSBuaGF1IChjb250aW51b3VzIHZhcmlhYmxlIHZzIGNvbnRpbnVvdXMgdmFyaWFibGUpLiBLaGkgxJHDsywgaOG7hyBz4buRIHTGsMahbmcgcXVhbiB0dXnhur9uIHTDrW5oIMSRxrDhu6NjIGTDuW5nIHBo4buVIGJp4bq/biBsw6AgUGVhcnNvbiwgU3BlYXJtYW4gdsOgIEtlbmRhbGwuIA0KDQpUcsaw4budbmcgaOG7o3AgcGjDom4gdMOtY2ggdMawxqFuZyBxdWFuIGdp4buvYSAyIGJp4bq/biBy4budaSBy4bqhYyAoY2F0ZWdvcmljYWwgdmFyaWFibGUgdnMgY2F0ZWdvcmljYWwgdmFyaWFibGUpIHbDoCBiaeG6v24gcuG7nWkgcuG6oWMgduG7m2kgYmnhur9uIGxpw6puIHThu6VjIChjYXRlZ29yaWNhbCB2YXJpYWJsZSB2cyBjb250aW51b3VzIHZhcmlhYmxlKSB0aMOsIMOtdCDEkcaw4bujYyBuaOG6r2MgxJHhur9uLg0KDQpN4buZdCBz4buRIHPDoWNoL2LDoGkgYsOhbyBjw7Mgbmjhuq9jIMSR4bq/biBjw6FjIHBow6JuIHTDrWNoIHTGsMahbmcgcXVhbiB24bubaSBiaeG6v24gY2F0ZWdvcnkNCg0KLSBDcmVkaXQgUmlzayBBbmFseXRpY3M6IFZhcmlhYmxlIHNlbGVjdGlvbg0KDQohW10oaW5wdXQvcmVmX2NyZWRpdF9yaXNrX2FuYWx5dGljcy5wbmcpDQoNCi0gW01lZGl1bV0oaHR0cHM6Ly9tZWRpdW0uY29tL0BvdXRzaWRlMlNEcy9hbi1vdmVydmlldy1vZi1jb3JyZWxhdGlvbi1tZWFzdXJlcy1iZXR3ZWVuLWNhdGVnb3JpY2FsLWFuZC1jb250aW51b3VzLXZhcmlhYmxlcy00YzdmODU2MTAzNjUpDQoNCiFbXShpbnB1dC9tZWRpdW0ucG5nKQ0KDQotIFtyLWJsb2dnZXJdKGh0dHBzOi8vd3d3LnItYmxvZ2dlcnMuY29tL3RvLWVhdC1vci1ub3QtdG8tZWF0LXRoYXRzLXRoZS1xdWVzdGlvbi1tZWFzdXJpbmctdGhlLWFzc29jaWF0aW9uLWJldHdlZW4tY2F0ZWdvcmljYWwtdmFyaWFibGVzLykNCg0KLSBSIGluIGFjdGlvbjogTeG7pWMgNy4yLjIgVGVzdHMgb2YgaW5kZXBlbmRlbmNlIA0KDQotIFtDaGVhdCBzaGVldF0oaHR0cHM6Ly9zdGF0cy5pZHJlLnVjbGEuZWR1L290aGVyL211bHQtcGtnL3doYXRzdGF0LykgY8OhY2ggbOG7sWEgY2jhu41uIGtp4buDbSDEkeG7i25oDQoNCi0gW3N0YXRzLnN0YWNrZXhjaGFuZ2UuY29tXShodHRwczovL3N0YXRzLnN0YWNrZXhjaGFuZ2UuY29tL3F1ZXN0aW9ucy8xMDgwMDcvY29ycmVsYXRpb25zLXdpdGgtdW5vcmRlcmVkLWNhdGVnb3JpY2FsLXZhcmlhYmxlcykNCg0KIVtdKGlucHV0L3N0YXRzX3N0YWNrZXhjaGFuZ2UucG5nKQ0KDQojIEtp4buDbSDEkeG7i25oIGdp4bqjIHRoaeG6v3QgdGjhu5FuZyBrw6ogKFNpZ25pZmljYW5jZSB0ZXN0KQ0KDQojIyBDb250aW51b3VzIHZzLiBDb250aW51b3VzDQoNCi0gSDA6IEjhu4cgc+G7kSB0xrDGoW5nIHF1YW4gPSAwIChubyByZWxhdGlvbnNoaXApDQotIEgxOiBI4buHIHPhu5EgdMawxqFuZyBxdWFuICE9IDANCg0KYGBge3J9DQpwc3ljaDo6Y29yci50ZXN0KHN0YXRlLng3N1ssMTo1XSwgdXNlPSJjb21wbGV0ZSIsIG1ldGhvZD0ic3BlYXJtYW4iKQ0KYGBgDQoNCiMjIENvbnRpbnVvdXMgdnMuIE5vbWluYWwgDQoNCmBgYHtyfQ0KIyA/YW92KCkNCmBgYA0KDQoNCiMjIE5vbWluYWwgdnMuIE5vbWluYWwNCg0KU+G7rSBk4bulbmcgYuG6o25nIGNow6lvIChDb250aW5nZW5jeSB0YWJsZXMgb3IgY3Jvc3MgdGFidWxhdGlvbikgdsOgIHRo4buRbmcga8OqIGNoaS1zcXVhcmVkIMSR4buDIGtp4buDbSB0cmEgc+G7sSB0xrDGoW5nIHF1YW4NCg0KLSBIMDogVGhlIFRoZSB0d28gdmFyaWFibGVzIGFyZSBpbmRlcGVuZGVudA0KLSBIMTogVGhlIFRoZSB0d28gdmFyaWFibGVzIGFyZSBkZXBlbmRlbnQNCg0KYGBge3J9DQpsaWJyYXJ5KHZjZCkNCm15dGFibGUgPC0geHRhYnMoIH4gVHJlYXRtZW50ICsgSW1wcm92ZWQsIGRhdGEgPSBBcnRocml0aXMpDQpteXRhYmxlDQpjaGlzcS50ZXN0KG15dGFibGUpDQpgYGANCg0KLSBwLXZhbHVlIDwgMC4wNSBjaOG7qW5nIHThu48gMiBiaeG6v24gbsOgeSBwaOG7pSB0aHXhu5ljIGzhuqtuIG5oYXUNCg0KQ8OhYyBraeG7g20gxJHhu4tuaCB0xrDGoW5nIHThu7EgZOG7sWEgdHLDqm4gdGjhu5FuZyBrw6ogQ2hpLXNxdWFyZWQNCg0KLSBmaXNoZXIudGVzdDogdMawxqFuZyB04buxIG5oxrAgcGVhcnNvbidzIENoaS1zcXVhcmVkDQpgYGB7cn0NCmZpc2hlci50ZXN0KG15dGFibGUpDQpgYGANCg0KLSBtYW50ZWxoYWVuLnRlc3Q6IEtp4buDbSDEkeG7i25oIHTDrW5oIMSR4buZYyBs4bqtcCBj4bunYSAzIGJp4bq/bg0KDQpgYGB7cn0NCm15dGFibGUgPC0geHRhYnMoflRyZWF0bWVudCtJbXByb3ZlZCtTZXgsIGRhdGE9QXJ0aHJpdGlzKQ0KbWFudGVsaGFlbi50ZXN0KG15dGFibGUpDQpgYGANCg0KDQojIMSQbyBsxrDhu51uZyBt4bupYyDEkeG7mSBr4bq/dCBo4bujcCAoRWZmZWN0IHNpemUpDQoNClRyb25nIHBow6JuIGtp4buDbSDEkeG7i25oIGdp4bqjIHRoaeG6v3QsIGtp4buDbSB0cmEgeGVtIGPDsyDEkeG7pyBi4bqxbmcgY2jhu6luZyDEkeG7gyBr4bq/dCBsdeG6rW4gMiBiaeG6v24gY8OzIHTGsMahbmcgcXVhbiB24bubaSBuaGF1IGtow7RuZy4gVHJvbmcgcGjhuqduIG7DoHkgc+G6vSDEkW8gbMaw4budbmcgbeG7qWMgxJHhu5kgY2jhurd0IGNo4bq9IGPhu6dhIGPDoWMga+G6v3QgaOG7o3AuDQoNCiMjIENvbnRpbnVvdXMgdnMuIENvbnRpbnVvdXMNCg0KLSBT4butIGThu6VuZyBo4buHIHPhu5EgdMawxqFuZyBxdWFuIHR1eeG6v24gdMOtbmg6IFBlYXJzb24sIFNwZWFybWFuLCBLZW5kYWxsDQpgYGB7cn0NCmNvcihzdGF0ZS54NzdbLDE6NV0sIG1ldGhvZD0ic3BlYXJtYW4iKQ0KYGBgDQoNCg0KIyMgQ29udGludW91cyB2cy4gTm9taW5hbCANCg0KU+G7rSBk4bulbmcgaOG7hyBz4buRIHTGsMahbmcgcXVhbiBu4buZaSBj4bulbSBbSW50cmFjbGFzcyBDb3JyZWxhdGlvbnNdKGh0dHBzOi8vd3d3LnN0YXRpc3RpY3Nob3d0by5kYXRhc2NpZW5jZWNlbnRyYWwuY29tL2ludHJhY2xhc3MtY29ycmVsYXRpb24vKSAgdGjhu4MgaGnhu4duIHPhu7EgZ2nhu5FuZyBuaGF1IGdp4buvYSBjw6FjIMSRxqFuIHbhu4sgbeG6q3UgdHJvbmcgY8O5bmcgbeG7mXQgY+G7pW0uIEjhu4cgc+G7kSBuw6B5IGPDsyBnacOhIHRy4buLIHThu6sgMC0xLCBjw6BuZyBn4bqnbiAxIGPDoG5nIGdp4buRbmcgbmhhdQ0KDQpDw7MgxJHhur9uIGtob+G6o25nIDEwIGThuqFuZyBjw6FjaCB0w61uaCBJQ0MgKGfDs2kgcHN5Y2ggY3VuZyBj4bqlcCA2IGZvcm0pLiBOaMawIHbhuq15IG7Dqm4gY8O0bmcgdGjhu6ljIHTDrW5oIElDQyBuw6BvLiANCg0KTOG7sWEgY2jhu41uIElDQzoNCg0KLSBbR3VpZGUgbGluZV0oaHR0cHM6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wbWMvYXJ0aWNsZXMvUE1DNDkxMzExOC8pIA0KLSBbR3VpZGUgbGluZV0oaHR0cHM6Ly9saW5rLnNwcmluZ2VyLmNvbS9hcnRpY2xlLzEwLjM3NTgvczEzNDI4LTAxNS0wNjIzLXkpDQoNCg0KYGBge3J9DQpsaWJyYXJ5KHBzeWNoKQ0KZGF0YShzYWkpDQpzYWkueHJheSA8LSBzdWJzZXQoc2FpLChzYWkkc3R1ZHk9PSJYUkFZIikgJiAoc2FpJHRpbWU9PTEpKQ0KeHJheS5pY2MgPC0gSUNDKHNhaS54cmF5Wy1jKDE6MyldLGxtZXI9VFJVRSxjaGVjay5rZXlzPVRSVUUpDQp4cmF5LmljYw0KeHJheS5pY2MkbG1lDQpgYGANCg0KIyMgTm9taW5hbCB2cy4gTm9taW5hbA0KDQpgYGB7cn0NCm15dGFibGUgPC0geHRhYnMoIH4gVHJlYXRtZW50ICsgSW1wcm92ZWQsIGRhdGEgPSBBcnRocml0aXMpDQphc3NvY3N0YXRzKG15dGFibGUpDQpgYGANCg0KQ8OhYyBo4buHIHPhu5EgUGhpLCBDb250aWdlbmN5LCBDcmFtZXIncyBWIHRyb25nIGtob+G6o25nIHThu6sgMCDEkeG6v24gMSBjw6BuZyBs4bubbiBjaOG7qW5nIHThu48gc+G7sSBr4bq/dCBo4bujcCBjw6BuZyBjaOG6t3QgDQoNCj8gWGVtIHRow6ptIEthcHBhKCkgY+G7p2EgZ8OzaSB2Y2QNCg==