In this session, you will handle categorical data with R.
## Load vcd package
library(vcd)
## Load Arthritis dataset (data frame)
data(Arthritis)
Arthritis[1:17, ]
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked
7 75 Treated Male 59 None
8 39 Treated Male 59 Marked
9 33 Treated Male 63 None
10 55 Treated Male 63 None
11 30 Treated Male 64 None
12 5 Treated Male 64 Some
13 63 Treated Male 69 None
14 83 Treated Male 70 Marked
15 66 Treated Female 23 None
16 40 Treated Female 32 None
17 6 Treated Female 37 Some
summary(Arthritis)
ID Treatment Sex Age Improved
Min. : 1.0 Placebo:43 Female:59 Min. :23.0 None :42
1st Qu.:21.8 Treated:41 Male :25 1st Qu.:46.0 Some :14
Median :42.5 Median :57.0 Marked:28
Mean :42.5 Mean :53.4
3rd Qu.:63.2 3rd Qu.:63.0
Max. :84.0 Max. :74.0
Arthritis$Treatment
[1] Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated
[15] Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated
[29] Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Treated Placebo
[43] Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo
[57] Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo
[71] Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo Placebo
Levels: Placebo Treated
levels(Arthritis$Improved)
[1] "None" "Some" "Marked"
Arthritis$Improved
[1] Some None None Marked Marked Marked None Marked None None None Some None Marked None None
[17] Some None Marked None Marked Marked Marked Marked Marked Marked Marked None Marked Marked Marked Marked
[33] Some Marked Marked Marked Some Marked None Some Some None None None None None None None
[49] None None None Marked None None None Some None Marked None None None None None None
[65] None None None None Marked Marked None Some Some Some Marked None Some None Marked None
[81] None Some Some Marked
Levels: None < Some < Marked
length(Arthritis$Improved)
[1] 84
## table()
table(Arthritis$Improved)
None Some Marked
42 14 28
## summary()
summary(Arthritis$Improved)
None Some Marked
42 14 28
tab1 <- table(Arthritis$Improved)
prop.table(tab1)
None Some Marked
0.5000 0.1667 0.3333
xtab1 <- xtabs(~ Treatment +Improved, Arthritis)
xtab1
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
addmargins(xtab1)
Improved
Treatment None Some Marked Sum
Placebo 29 7 7 43
Treated 13 7 21 41
Sum 42 14 28 84
prop.table(xtab1) # proportion to total
Improved
Treatment None Some Marked
Placebo 0.34524 0.08333 0.08333
Treated 0.15476 0.08333 0.25000
prop.table(xtab1, margin = 1) # proportion to row sum
Improved
Treatment None Some Marked
Placebo 0.6744 0.1628 0.1628
Treated 0.3171 0.1707 0.5122
prop.table(xtab1, margin = 2) # proportion to column sum
Improved
Treatment None Some Marked
Placebo 0.6905 0.5000 0.2500
Treated 0.3095 0.5000 0.7500
## 3rd variable as stratified variable
xtab2 <- xtabs(~ Treatment +Improved +Sex, Arthritis)
xtab2
, , Sex = Female
Improved
Treatment None Some Marked
Placebo 19 7 6
Treated 6 5 16
, , Sex = Male
Improved
Treatment None Some Marked
Placebo 10 0 1
Treated 7 2 5
## flat table
ftable(xtab2)
Sex Female Male
Treatment Improved
Placebo None 19 10
Some 7 0
Marked 6 1
Treated None 6 7
Some 5 2
Marked 16 5
library(gmodels)
tab1 <- xtabs(~ Treatment +Improved, Arthritis)
CrossTable(tab1)
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 84
| Improved
Treatment | None | Some | Marked | Row Total |
-------------|-----------|-----------|-----------|-----------|
Placebo | 29 | 7 | 7 | 43 |
| 2.616 | 0.004 | 3.752 | |
| 0.674 | 0.163 | 0.163 | 0.512 |
| 0.690 | 0.500 | 0.250 | |
| 0.345 | 0.083 | 0.083 | |
-------------|-----------|-----------|-----------|-----------|
Treated | 13 | 7 | 21 | 41 |
| 2.744 | 0.004 | 3.935 | |
| 0.317 | 0.171 | 0.512 | 0.488 |
| 0.310 | 0.500 | 0.750 | |
| 0.155 | 0.083 | 0.250 | |
-------------|-----------|-----------|-----------|-----------|
Column Total | 42 | 14 | 28 | 84 |
| 0.500 | 0.167 | 0.333 | |
-------------|-----------|-----------|-----------|-----------|
library(epiR)
tab.2by2 <- xtabs(~ Sex +Treatment, Arthritis)
tab.2by2
Treatment
Sex Placebo Treated
Female 32 27
Male 11 14
epi.2by2(tab.2by2, units = 1)
Disease + Disease - Total Inc risk * Odds
Exposed + 32 27 59 0.542 1.185
Exposed - 11 14 25 0.440 0.786
Total 43 41 84 0.512 1.049
Point estimates and 95 % CIs:
---------------------------------------------------------
Inc risk ratio 1.23 (0.75, 2.03)
Odds ratio 1.51 (0.59, 3.87)
Attrib risk * 0.1 (-0.13, 0.33)
Attrib risk in population * 0.07 (-0.15, 0.29)
Attrib fraction in exposed (%) 18.87 (-33.82, 50.82)
Attrib fraction in population (%) 14.05 (-24.81, 40.81)
---------------------------------------------------------
* Cases per population unit