Introduction

One of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm).This tutorial provides step-by-step directions for using the functions associated with tableby(). All functions presented here are available within the arsenal package. All the examples in this tutorial use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.

Sidenote on utility package kableextra

It is worthwhile mentioning another package i.e., kableExtra before getting started. It helps us build common complex tables and manipulate table styles. It imports the pipe %>% symbol from magrittr and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with ggplot2 and plotly. For more table styles refer to

https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html

  • PS: For users who are not very familiar with the pipe operator %>% in R, it passes the result along the chain for a more literal coding experience. Basically when we say A %>% B, technically it means sending the results of A to B as B’s first argument.
data(mockstudy) ##load data
#dim(mockstudy)  ##look at how many subjects and variables are in the dataset 
#str(mockstudy) ##quick look at the data
dt<-head(mockstudy) # using kable extra to produce nicer tables

dt %>%
  kbl() %>%
  kable_styling()
case age arm sex race fu.time fu.stat ps hgb bmi alk.phos ast mdquality.s age.ord
1 110754 67 F: FOLFOX Male Caucasian 922 2 0 11.5 25.09861 160 35 NA 60-69
2 99706 74 A: IFL Female Caucasian 270 2 1 10.7 19.49786 290 52 1 70-79
4 105271 50 A: IFL Female Caucasian 175 2 1 11.1 NA 700 100 1 40-49
5 105001 71 G: IROX Female Caucasian 128 2 1 12.6 29.42922 771 68 1 70-79
7 112263 69 F: FOLFOX Female NA 233 2 0 13.0 26.35352 350 35 NA 60-69
8 86205 56 G: IROX Male Caucasian 120 2 0 10.2 19.03673 569 27 1 50-59
dt1<-head(mockstudy) # using kable extra to produce nicer tables

dt1 %>%
  kbl(caption = "Recreating booktabs style table") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Recreating booktabs style table
case age arm sex race fu.time fu.stat ps hgb bmi alk.phos ast mdquality.s age.ord
1 110754 67 F: FOLFOX Male Caucasian 922 2 0 11.5 25.09861 160 35 NA 60-69
2 99706 74 A: IFL Female Caucasian 270 2 1 10.7 19.49786 290 52 1 70-79
4 105271 50 A: IFL Female Caucasian 175 2 1 11.1 NA 700 100 1 40-49
5 105001 71 G: IROX Female Caucasian 128 2 1 12.6 29.42922 771 68 1 70-79
7 112263 69 F: FOLFOX Female NA 233 2 0 13.0 26.35352 350 35 NA 60-69
8 86205 56 G: IROX Male Caucasian 120 2 0 10.2 19.03673 569 27 1 50-59

Features specific to HTML only

If you have a huge table and you don’t want to reduce the font size to unreadable, you may want to put your HTML table in a scroll box, of which users can pick the part they like to read. Note that scroll box isn’t printer friendly, so be aware of that when you use this feature. When you use scroll_box, you can specify either height or width. When you specify height, you will get a vertically scrollable box and vice versa. If you specify both, you will get a two-way scrollable box.

kbl(cbind(dt1, dt1)) %>%
  kable_paper() %>%
  scroll_box(width = "500px", height = "200px")
case age arm sex race fu.time fu.stat ps hgb bmi alk.phos ast mdquality.s age.ord case age arm sex race fu.time fu.stat ps hgb bmi alk.phos ast mdquality.s age.ord
1 110754 67 F: FOLFOX Male Caucasian 922 2 0 11.5 25.09861 160 35 NA 60-69 110754 67 F: FOLFOX Male Caucasian 922 2 0 11.5 25.09861 160 35 NA 60-69
2 99706 74 A: IFL Female Caucasian 270 2 1 10.7 19.49786 290 52 1 70-79 99706 74 A: IFL Female Caucasian 270 2 1 10.7 19.49786 290 52 1 70-79
4 105271 50 A: IFL Female Caucasian 175 2 1 11.1 NA 700 100 1 40-49 105271 50 A: IFL Female Caucasian 175 2 1 11.1 NA 700 100 1 40-49
5 105001 71 G: IROX Female Caucasian 128 2 1 12.6 29.42922 771 68 1 70-79 105001 71 G: IROX Female Caucasian 128 2 1 12.6 29.42922 771 68 1 70-79
7 112263 69 F: FOLFOX Female NA 233 2 0 13.0 26.35352 350 35 NA 60-69 112263 69 F: FOLFOX Female NA 233 2 0 13.0 26.35352 350 35 NA 60-69
8 86205 56 G: IROX Male Caucasian 120 2 0 10.2 19.03673 569 27 1 50-59 86205 56 G: IROX Male Caucasian 120 2 0 10.2 19.03673 569 27 1 50-59

Side note on pulling underlying equation corresponding to a statistical model using package equatiomatic

lr <- glm(sex ~ species * bill_length_mm, data = penguins,family = binomial(link = "logit"))

extract_eq(lr, wrap = TRUE)

\[ \begin{aligned} \log\left[ \frac { P( \operatorname{sex} = \operatorname{male} ) }{ 1 - P( \operatorname{sex} = \operatorname{male} ) } \right] &= \alpha + \beta_{1}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{2}(\operatorname{species}_{\operatorname{Gentoo}}) + \beta_{3}(\operatorname{bill\_length\_mm})\ + \\ &\quad \beta_{4}(\operatorname{species}_{\operatorname{Chinstrap}} \times \operatorname{bill\_length\_mm}) + \beta_{5}(\operatorname{species}_{\operatorname{Gentoo}} \times \operatorname{bill\_length\_mm}) \end{aligned} \]

extract_eq(lr, wrap = TRUE, show_distribution = TRUE) # show the how the data are assumed to be distributed.

\[ \begin{aligned} \operatorname{sex} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{sex} = \operatorname{male}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{2}(\operatorname{species}_{\operatorname{Gentoo}}) + \beta_{3}(\operatorname{bill\_length\_mm})\ + \\ &\quad \beta_{4}(\operatorname{species}_{\operatorname{Chinstrap}} \times \operatorname{bill\_length\_mm}) + \beta_{5}(\operatorname{species}_{\operatorname{Gentoo}} \times \operatorname{bill\_length\_mm}) \end{aligned} \]

## GLMM with individual-level variability (accounting for overdispersion)
## For this data set the model is the same as one allowing for a period:herd
## interaction, which the plot indicates could be needed.
cbpp$obs <- 1:nrow(cbpp)
gm2 <- glmer(cbind(incidence, size - incidence) ~ period +(1 | herd) +  (1|obs),family = binomial, data = cbpp)
extract_eq(gm2,wrap=TRUE)

\[ \begin{aligned} \operatorname{incidence}_{i} &\sim \operatorname{Binomial}(n = 1, \operatorname{prob}_{\operatorname{incidence} = 1} = \widehat{P}) \\ \log\left[\frac{\hat{P}}{1 - \hat{P}} \right] &=\alpha_{j[i],k[i]} \\ \alpha_{j} &\sim N \left(\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{period}_{\operatorname{2}}) + \gamma_{2}^{\alpha}(\operatorname{period}_{\operatorname{3}}) + \gamma_{3}^{\alpha}(\operatorname{period}_{\operatorname{4}}), \sigma^2_{\alpha_{j}} \right) \text{, for obs j = 1,} \dots \text{,J} \\ \alpha_{k} &\sim N \left(\mu_{\alpha_{k}}, \sigma^2_{\alpha_{k}} \right) \text{, for herd k = 1,} \dots \text{,K} \end{aligned} \]

Side note on usage of gtsummary to create publication ready output tables from fitted models

Let’s start by creating a logistic regression model to predict tumor response using the variables age and grade from the trial data set.

tr<-head(trial)
tr %>%
  kbl(caption = "Simulated Trial Data") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Simulated Trial Data
trt age marker stage grade response death ttdeath
Drug A 23 0.160 T1 II 0 0 24.00
Drug B 9 1.107 T2 I 1 0 24.00
Drug A 31 0.277 T1 II 0 0 24.00
Drug A NA 2.067 T3 III 1 1 17.64
Drug A 51 2.767 T4 III 1 1 16.43
Drug B 39 0.613 T4 I 0 1 15.64
m1 <- glm(response ~ age + stage, trial, family = binomial)
tbl_regression(m1, exponentiate = TRUE)
Characteristic OR1 95% CI1 p-value
Age 1.02 1.00, 1.04 0.091
T Stage
T1
T2 0.58 0.24, 1.37 0.2
T3 0.94 0.39, 2.28 0.9
T4 0.79 0.33, 1.90 0.6
1 OR = Odds Ratio, CI = Confidence Interval

Format results into data frame with global p-values

m1 %>%
  tbl_regression(
    exponentiate = TRUE, 
    pvalue_fun = ~style_pvalue(.x, digits = 2),
  ) %>% 
  add_global_p() %>%
  bold_p(t = 0.10) %>%
  bold_labels() %>%
  italicize_levels()
Characteristic OR1 95% CI1 p-value
Age 1.02 1.00, 1.04 0.087
T Stage 0.62
T1
T2 0.58 0.24, 1.37
T3 0.94 0.39, 2.28
T4 0.79 0.33, 1.90
1 OR = Odds Ratio, CI = Confidence Interval

Use wrapper for bells and whiskers

trial %>%
  select(response, age, grade) %>%
  tbl_uvregression(
    method = glm,
    y = response,
    method.args = list(family = binomial),
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 2)
  ) %>%
  add_global_p() %>%  # add global p-value 
  add_nevent() %>%    # add number of events of the outcome
  add_q() %>%         # adjusts global p-values for multiple testing
  bold_p() %>%        # bold p-values under a given threshold (default 0.05)
  bold_p(t = 0.10, q = TRUE) %>% # now bold q-values under the threshold of 0.10
  bold_labels()
## add_q: Adjusting p-values with
## `stats::p.adjust(x$table_body$p.value, method = "fdr")`
Characteristic N Event N OR1 95% CI1 p-value q-value2
Age 183 58 1.02 1.00, 1.04 0.091 0.18
Grade 193 61 0.93 0.93
I
II 0.95 0.45, 2.00
III 1.10 0.52, 2.29
1 OR = Odds Ratio, CI = Confidence Interval
2 False discovery rate correction for multiple testing

Create a simple table stratified by treatment arm

We use a formula statement to specify the variables which we want summarized. The example below uses age (a continuous variable) and sex (a factor). Note results=“asis” when creating the R chunk below changes the layout slightly (compresses it) and bolds the variable names. If you want to use tabl as dataframe use as.data.frame.

tab1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(tab1) # quick look at the table using  summary() on your tableby object, use text =true for pretty text version
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
sex 0.190
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
Age in Years 0.614
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

Add labels to variables

Either you could add labels to the variables in your dataset or you can use the built-in data.frame method for labels<-: as below

labels(mockstudy)  <- c(age = 'Age, yrs', sex = "Gender")

tab1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(tab1)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Gender 0.190
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
Age, yrs 0.614
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

Change summary statistics globally

This default can be modified using the tableby.control function.Note that test=FALSE and total=FALSE results in the total column and p-value column not being printed.

mycontrols  <- tableby.control(test=FALSE, total=FALSE,
                               numeric.test="kwt", cat.test="chisq",
                               numeric.stats=c("N", "median", "q1q3"),
                               cat.stats=c("countpct"),
                               stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols)
summary(tab2)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380)
Gender
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%)
Age, yrs
   Count 428 691 380
   Median 61.000 61.000 61.000
   Q1,Q3 53.000, 68.000 52.000, 69.000 52.000, 68.000

Change summary statistics within the formula

tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + notest(ast), data=mockstudy)
tt<-tests(tab.test)
tt %>%
  kbl(caption = "Tests function for a quick check on what tests were performed") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Tests function for a quick check on what tests were performed
Group Variable p.value Method
Treatment Arm age 0.6390614 Kruskal-Wallis rank sum test
Treatment Arm bmi 0.8916552 Linear Model ANOVA
Treatment Arm ast NA No test

Modifying Summary Statistics

Summary statistics for any individual variable can also be modified, but it must be done as secondary arguments to the test function. The function names must be strings that are functions already written for tableby.

summary(tab.test, pfootnote=TRUE) # default summary table with footnote display option TRUE
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age, yrs 0.6391
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
Body Mass Index (kg/m^2) 0.8922
   N-Miss 9 20 4 33
   Mean (SD) 27.290 (5.552) 27.210 (5.173) 27.106 (5.751) 27.206 (5.432)
   Range 14.053 - 53.008 16.649 - 49.130 15.430 - 60.243 14.053 - 60.243
ast
   N-Miss 69 141 56 266
   Mean (SD) 37.292 (28.036) 35.202 (26.659) 35.670 (25.807) 35.933 (26.843)
   Range 10.000 - 205.000 7.000 - 174.000 5.000 - 176.000 5.000 - 205.000
  1. Kruskal-Wallis rank sum test
  2. Linear Model ANOVA
tab.test1 <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") +
                    notest(bmi, "Nmiss","median"), data=mockstudy)
summary(tab.test1)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
ast 0.039
   N-Miss 69 141 56 266
   Median 29.000 25.500 27.000 27.000
Age, yrs 0.614
   N 428 691 380 1499
   Mean 59.673 60.301 59.763 59.985
Body Mass Index (kg/m^2)
   N-Miss 9 20 4 33
   Median 26.234 26.525 25.978 26.325

Categorical Tests (Chisq and Fisher’s)

set.seed(100)
tab.catsim <- tableby(arm ~ sex + race, cat.test="fe", simulate.p.value=TRUE, B=500, data=mockstudy)
t1<-tests(tab.catsim)
t1 %>%
  kbl(caption = "Fisher's Exact Test for Count Data with simulated p-value") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Fisher’s Exact Test for Count Data with simulated p-value
Group Variable p.value Method
Treatment Arm sex 0.2195609 Fisher’s Exact Test for Count Data with simulated p-value (based on 500 replicates)
Treatment Arm race 0.3093812 Fisher’s Exact Test for Count Data with simulated p-value (based on 500 replicates)
cat.correct <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), data=mockstudy)
t2<-tests(cat.correct)

t2 %>%
  kbl(caption = "Chi-Square Test with correction for Count Data with simulated p-value") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Chi-Square Test with correction for Count Data with simulated p-value
Group Variable p.value Method
Treatment Arm sex 0.1666280 Pearson’s Chi-squared test
Treatment Arm race 0.8108543 Pearson’s Chi-squared test
cat.nocorrect <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm),chisq.correct=FALSE, data=mockstudy)
t3<-tests(cat.nocorrect)
t3 %>%
  kbl(caption = "Chi-Square Test without correction for Count Data with simulated p-value") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Chi-Square Test without correction for Count Data with simulated p-value
Group Variable p.value Method
Treatment Arm sex 0.1666280 Pearson’s Chi-squared test
Treatment Arm race 0.8108543 Pearson’s Chi-squared test

Summarize without a group/by variable

tab.noby <- tableby(~ bmi + sex + age, data=mockstudy)
summary(tab.noby)
Overall (N=1499)
Body Mass Index (kg/m^2)
   N-Miss 33
   Mean (SD) 27.206 (5.432)
   Range 14.053 - 60.243
Gender
   Male 916 (61.1%)
   Female 583 (38.9%)
Age, yrs
   Mean (SD) 59.985 (11.519)
   Range 19.000 - 88.000

Summarize an ordered factor

When comparing groups of ordered data there are two options.

summary(tableby(sex ~ age.ord, data = mockstudy), pfootnote = TRUE)
Male (N=916) Female (N=583) Total (N=1499) p value
age.ord 0.0491
   10-19 1 (0.1%) 0 (0.0%) 1 (0.1%)
   20-29 8 (0.9%) 11 (1.9%) 19 (1.3%)
   30-39 37 (4.0%) 30 (5.1%) 67 (4.5%)
   40-49 127 (13.9%) 83 (14.2%) 210 (14.0%)
   50-59 257 (28.1%) 179 (30.7%) 436 (29.1%)
   60-69 298 (32.5%) 170 (29.2%) 468 (31.2%)
   70-79 168 (18.3%) 101 (17.3%) 269 (17.9%)
   80-89 20 (2.2%) 9 (1.5%) 29 (1.9%)
  1. Trend test for ordinal variables

Summarize a survival variable

Information that is presented by the survfit() function,can be seen with tableby. The default is to show the median survival (time at which the probability of survival = 50%). It is also possible to obtain summaries of the % survival at certain time points (say the probability of surviving 1-year).

# now apply tableby
summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy))
Male (N=916) Female (N=583) Total (N=1499) p value
Surv(fu.time, fu.stat) 0.975
   Events 829 527 1356
   Median Survival 550.000 543.000 546.000
summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv"))) # using tableby
Male (N=916) Female (N=583) Total (N=1499) p value
Surv(fu.time/365.25, fu.stat) 0.975
   time = 1 286 (68.7) 202 (65.3) 488 (67.4)
   time = 2 597 (34.4) 391 (32.8) 988 (33.7)
   time = 3 748 (17.5) 481 (17.0) 1229 (17.3)
   time = 4 809 (9.4) 513 (10.9) 1322 (10.1)
   time = 5 825 (6.3) 525 (7.4) 1350 (6.8)
   time = 1 626 (68.7) 380 (65.3) 1006 (67.4)
   time = 2 309 (34.4) 190 (32.8) 499 (33.7)
   time = 3 152 (17.5) 95 (17.0) 247 (17.3)
   time = 4 57 (9.4) 51 (10.9) 108 (10.1)
   time = 5 24 (6.3) 18 (7.4) 42 (6.8)

Summarising Subset data using the subset argument within tableby to subset the data.

summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
Male (N=333) Female (N=224) Total (N=557) p value
ps 0.652
   N-Miss 64 44 108
   Mean (SD) 0.554 (0.600) 0.528 (0.602) 0.543 (0.600)
   Range 0.000 - 2.000 0.000 - 2.000 0.000 - 2.000
hgb < 0.001
   N-Miss 64 44 108
   Mean (SD) 12.720 (1.925) 12.063 (1.395) 12.457 (1.760)
   Range 9.000 - 18.200 9.100 - 15.900 9.000 - 18.200
Body Mass Index (kg/m^2) 0.650
   N-Miss 9 6 15
   Mean (SD) 27.539 (4.780) 27.337 (5.508) 27.458 (5.081)
   Range 17.927 - 47.458 16.649 - 49.130 16.649 - 49.130

Create combinations of variables

## create a variable combining the levels of mdquality.s and sex
with(mockstudy, table(interaction(mdquality.s,sex)))

0.Male 1.Male 0.Female 1.Female 77 686 47 437

summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy))
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
interaction(mdquality.s, sex) 0.493
   N-Miss 55 156 41 252
   0.Male 29 (7.8%) 31 (5.8%) 17 (5.0%) 77 (6.2%)
   1.Male 214 (57.4%) 285 (53.3%) 187 (55.2%) 686 (55.0%)
   0.Female 12 (3.2%) 21 (3.9%) 14 (4.1%) 47 (3.8%)
   1.Female 118 (31.6%) 198 (37.0%) 121 (35.7%) 437 (35.0%)
## create a new grouping variable with combined levels of arm and sex
summary(tableby(interaction(mdquality.s, sex) ~  age + bmi, data=mockstudy, subset=arm=="F: FOLFOX"))
0.Male (N=31) 1.Male (N=285) 0.Female (N=21) 1.Female (N=198) Total (N=535) p value
Age, yrs 0.190
   Mean (SD) 63.065 (11.702) 60.653 (11.833) 60.810 (10.103) 58.924 (11.366) 60.159 (11.612)
   Range 41.000 - 82.000 19.000 - 88.000 42.000 - 81.000 29.000 - 83.000 19.000 - 88.000
Body Mass Index (kg/m^2) 0.894
   N-Miss 0 6 1 5 12
   Mean (SD) 26.633 (5.094) 27.387 (4.704) 27.359 (4.899) 27.294 (5.671) 27.307 (5.100)
   Range 20.177 - 41.766 17.927 - 47.458 19.801 - 39.369 16.799 - 44.841 16.799 - 47.458

Variable transformations such as log trasnform

trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')),
                 data=mockstudy)
summary(trans)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age, yrs 0.614
   Mean (SD) 5.967 (1.136) 6.030 (1.163) 5.976 (1.150) 5.999 (1.152)
   Range 2.700 - 8.800 1.900 - 8.800 2.600 - 8.500 1.900 - 8.800
Body Mass Index (kg/m^2) 0.811
   N-Miss 9 20 4 33
   Mean (SD) 3.287 (0.197) 3.286 (0.183) 3.279 (0.200) 3.285 (0.192)
   Range 2.643 - 3.970 2.812 - 3.894 2.736 - 4.098 2.643 - 4.098
factor(mdquality.s, levels = 0:1, labels = c(“N”, “Y”)) 0.694
   N-Miss 55 156 41 252
   N 41 (11.0%) 52 (9.7%) 31 (9.1%) 124 (9.9%)
   Y 332 (89.0%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)
summary(trans)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age per 10 yrs 0.614
   Mean (SD) 5.967 (1.136) 6.030 (1.163) 5.976 (1.150) 5.999 (1.152)
   Range 2.700 - 8.800 1.900 - 8.800 2.600 - 8.500 1.900 - 8.800
log(BMI) 0.811
   N-Miss 9 20 4 33
   Mean (SD) 3.287 (0.197) 3.286 (0.183) 3.279 (0.200) 3.285 (0.192)
   Range 2.643 - 3.970 2.812 - 3.894 2.736 - 4.098 2.643 - 4.098
MD Quality 0.694
   N-Miss 55 156 41 252
   N 41 (11.0%) 52 (9.7%) 31 (9.1%) 124 (9.9%)
   Y 332 (89.0%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

Subsetting tables

mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy)
mytab2 <- mytab[c('age','sex','alk.phos')] #Change the ordering of the variables
summary(mytab2)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age, yrs 0.614
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
Gender 0.190
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
alk.phos 0.226
   N-Miss 69 141 56 266
   Mean (SD) 175.577 (128.608) 161.984 (121.978) 173.506 (138.564) 168.969 (128.492)
   Range 11.000 - 858.000 10.000 - 1014.000 7.000 - 982.000 7.000 - 1014.000
summary(mytab[c(3,1)], digits = 3) # delete a variable
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age, yrs 0.614
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
Gender 0.190
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
summary(mytab[mytab < 0.5]) # filter p-value<0.5
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Gender 0.190
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
alk.phos 0.226
   N-Miss 69 141 56 266
   Mean (SD) 175.577 (128.608) 161.984 (121.978) 173.506 (138.564) 168.969 (128.492)
   Range 11.000 - 858.000 10.000 - 1014.000 7.000 - 982.000 7.000 - 1014.000
summary(sort(mytab, decreasing = TRUE)) # sort p-value
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age, yrs 0.614
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
alk.phos 0.226
   N-Miss 69 141 56 266
   Mean (SD) 175.577 (128.608) 161.984 (121.978) 173.506 (138.564) 168.969 (128.492)
   Range 11.000 - 858.000 10.000 - 1014.000 7.000 - 982.000 7.000 - 1014.000
Gender 0.190
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)

Add a title to the table

When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.

t1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(t1, title='Demographics')
Demographics
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Gender 0.190
   Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
   Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
Age, yrs 0.614
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
#With multiple left-hand sides, you can pass a vector or list to determine labels for each table:
summary(tableby(list(arm, sex) ~ age, data = mockstudy), title = c("arm table", "sex table"))
arm table
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age, yrs 0.614
   Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
   Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
sex table
Male (N=916) Female (N=583) Total (N=1499) p value
Age, yrs 0.048
   Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
   Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000

Note: R Markdown combines several different processes together to create documents. The .Rmd document is the original format of the document. It contains a combination of YAML (metadata), text (narratives), and code chunks.To understand and/or change structure of a Markdown document an excellent source to refer to is https://bookdown.org/yihui/rmarkdown-cookbook/rmarkdown-process.html