Bells n Whisker Packages

Tasneem Zaihra

2022-06-15

Loading Required Packages

This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents.
For more details on using R Markdown see http://rmarkdown.rstudio.com.
Here we will use slidy to create slides using a markdwon file.

### Load required packages
library(arsenal) # for tableby
require(knitr)
require(survival)
require(kableExtra)
require(equatiomatic)
require(lme4)
require(gtsummary)
require(DiagrammeR)

Introduction

This tutorial provides step-by-step directions for using the functions associated with tableby() to produce one of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm).
All functions presented here are available within the arsenal package.
All the examples in this tutorial use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.

Utility Package KableExtra

It is worthwhile mentioning another package i.e., kableExtra before getting started. It helps us build common complex tables and manipulate table styles.
It imports the pipe %>% symbol from magrittr and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with ggplot2 and plotly. For more table styles refer to https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
For users who are not very familiar with the pipe operator %>% in R, it passes the result along the chain for a more literal coding experience. Basically when we say A %>% B, technically it means sending the results of A to B as B’s first argument.

Bookstyle Table using KableExtra

Recreating booktabs style table
	case	age	arm	sex	race	fu.time	fu.stat	ps	hgb	bmi	alk.phos	ast	mdquality.s	age.ord
1	110754	67	F: FOLFOX	Male	Caucasian	922	2	0	11.5	25.09861	160	35	NA	60-69
2	99706	74	A: IFL	Female	Caucasian	270	2	1	10.7	19.49786	290	52	1	70-79
4	105271	50	A: IFL	Female	Caucasian	175	2	1	11.1	NA	700	100	1	40-49
5	105001	71	G: IROX	Female	Caucasian	128	2	1	12.6	29.42922	771	68	1	70-79
7	112263	69	F: FOLFOX	Female	NA	233	2	0	13.0	26.35352	350	35	NA	60-69
8	86205	56	G: IROX	Male	Caucasian	120	2	0	10.2	19.03673	569	27	1	50-59

Scroll Box table using KableExtra

-When you use scroll_box, you can specify either or both height or width to get a two-way scrollable box.

kbl(cbind(dt1, dt1)) %>%
  kable_paper() %>%
  scroll_box(width = "500px", height = "200px")

	case	age	arm	sex	race	fu.time	fu.stat	ps	hgb	bmi	alk.phos	ast	mdquality.s	age.ord	case	age	arm	sex	race	fu.time	fu.stat	ps	hgb	bmi	alk.phos	ast	mdquality.s	age.ord
1	110754	67	F: FOLFOX	Male	Caucasian	922	2	0	11.5	25.09861	160	35	NA	60-69	110754	67	F: FOLFOX	Male	Caucasian	922	2	0	11.5	25.09861	160	35	NA	60-69
2	99706	74	A: IFL	Female	Caucasian	270	2	1	10.7	19.49786	290	52	1	70-79	99706	74	A: IFL	Female	Caucasian	270	2	1	10.7	19.49786	290	52	1	70-79
4	105271	50	A: IFL	Female	Caucasian	175	2	1	11.1	NA	700	100	1	40-49	105271	50	A: IFL	Female	Caucasian	175	2	1	11.1	NA	700	100	1	40-49
5	105001	71	G: IROX	Female	Caucasian	128	2	1	12.6	29.42922	771	68	1	70-79	105001	71	G: IROX	Female	Caucasian	128	2	1	12.6	29.42922	771	68	1	70-79
7	112263	69	F: FOLFOX	Female	NA	233	2	0	13.0	26.35352	350	35	NA	60-69	112263	69	F: FOLFOX	Female	NA	233	2	0	13.0	26.35352	350	35	NA	60-69
8	86205	56	G: IROX	Male	Caucasian	120	2	0	10.2	19.03673	569	27	1	50-59	86205	56	G: IROX	Male	Caucasian	120	2	0	10.2	19.03673	569	27	1	50-59

Pulling Underlying Equation Corresponding to a Statistical Model using Package Equatiomatic

lr <- glm(sex ~ species * bill_length_mm, data = penguins,family = binomial(link = "logit"))

extract_eq(lr, wrap = TRUE)

\[ \begin{aligned} \log\left[ \frac { P( \operatorname{sex} = \operatorname{male} ) }{ 1 - P( \operatorname{sex} = \operatorname{male} ) } \right] &= \alpha + \beta_{1}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{2}(\operatorname{species}_{\operatorname{Gentoo}}) + \beta_{3}(\operatorname{bill\_length\_mm})\ + \\ &\quad \beta_{4}(\operatorname{species}_{\operatorname{Chinstrap}} \times \operatorname{bill\_length\_mm}) + \beta_{5}(\operatorname{species}_{\operatorname{Gentoo}} \times \operatorname{bill\_length\_mm}) \end{aligned} \]

extract_eq(lr, wrap = TRUE, show_distribution = TRUE) # show the how the data are assumed to be distributed.

\[ \begin{aligned} \operatorname{sex} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{sex} = \operatorname{male}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{2}(\operatorname{species}_{\operatorname{Gentoo}}) + \beta_{3}(\operatorname{bill\_length\_mm})\ + \\ &\quad \beta_{4}(\operatorname{species}_{\operatorname{Chinstrap}} \times \operatorname{bill\_length\_mm}) + \beta_{5}(\operatorname{species}_{\operatorname{Gentoo}} \times \operatorname{bill\_length\_mm}) \end{aligned} \]

## GLMM with individual-level variability (accounting for overdispersion)
## For this data set the model is the same as one allowing for a period:herd
## interaction, which the plot indicates could be needed.
cbpp$obs <- 1:nrow(cbpp)
gm2 <- glmer(cbind(incidence, size - incidence) ~ period +(1 | herd) +  (1|obs),family = binomial, data = cbpp)
extract_eq(gm2,wrap=TRUE)

\[ \begin{aligned} \operatorname{incidence}_{i} &\sim \operatorname{Binomial}(n = 1, \operatorname{prob}_{\operatorname{incidence} = 1} = \widehat{P}) \\ \log\left[\frac{\hat{P}}{1 - \hat{P}} \right] &=\alpha_{j[i],k[i]} \\ \alpha_{j} &\sim N \left(\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{period}_{\operatorname{2}}) + \gamma_{2}^{\alpha}(\operatorname{period}_{\operatorname{3}}) + \gamma_{3}^{\alpha}(\operatorname{period}_{\operatorname{4}}), \sigma^2_{\alpha_{j}} \right) \text{, for obs j = 1,} \dots \text{,J} \\ \alpha_{k} &\sim N \left(\mu_{\alpha_{k}}, \sigma^2_{\alpha_{k}} \right) \text{, for herd k = 1,} \dots \text{,K} \end{aligned} \]

Using gtsummary to Create Publication Ready Output Tables from Fitted Models

-Start by creating a logistic regression model to predict tumor response using the variables age and grade from the trial data set.

tr<-head(trial)
tr %>%
  kbl(caption = "Simulated Trial Data") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Simulated Trial Data
trt	age	marker	stage	grade	response	death	ttdeath
Drug A	23	0.160	T1	II	0	0	24.00
Drug B	9	1.107	T2	I	1	0	24.00
Drug A	31	0.277	T1	II	0	0	24.00
Drug A	NA	2.067	T3	III	1	1	17.64
Drug A	51	2.767	T4	III	1	1	16.43
Drug B	39	0.613	T4	I	0	1	15.64

m1 <- glm(response ~ age + stage, trial, family = binomial)

tbl_regression(m1, exponentiate = TRUE)

Characteristic	OR¹	95% CI¹	p-value
Age	1.02	1.00, 1.04	0.091
T Stage
T1	—	—
T2	0.58	0.24, 1.37	0.2
T3	0.94	0.39, 2.28	0.9
T4	0.79	0.33, 1.90	0.6
¹ OR = Odds Ratio, CI = Confidence Interval

Format Results into Data Frame with Global p-values

m1 %>%
  tbl_regression(
    exponentiate = TRUE, 
    pvalue_fun = ~style_pvalue(.x, digits = 2),
  ) %>% 
  add_global_p() %>%
  bold_p(t = 0.10) %>%
  bold_labels() %>%
  italicize_levels()

Characteristic	OR¹	95% CI¹	p-value
Age	1.02	1.00, 1.04	0.087
T Stage			0.62
T1	—	—
T2	0.58	0.24, 1.37
T3	0.94	0.39, 2.28
T4	0.79	0.33, 1.90
¹ OR = Odds Ratio, CI = Confidence Interval

Adding False Dsicovery Rates

trial %>%
  select(response, age, grade) %>%
  tbl_uvregression(
    method = glm,
    y = response,
    method.args = list(family = binomial),
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 2)
  ) %>%
  add_global_p() %>%  # add global p-value 
  add_nevent() %>%    # add number of events of the outcome
  add_q() %>%         # adjusts global p-values for multiple testing
  bold_p() %>%        # bold p-values under a given threshold (default 0.05)
  bold_p(t = 0.10, q = TRUE) %>% # now bold q-values under the threshold of 0.10
  bold_labels()

## add_q: Adjusting p-values with
## `stats::p.adjust(x$table_body$p.value, method = "fdr")`

Characteristic	N	Event N	OR¹	95% CI¹	p-value	q-value²
Age	183	58	1.02	1.00, 1.04	0.091	0.18
Grade	193	61			0.93	0.93
I			—	—
II			0.95	0.45, 2.00
III			1.10	0.52, 2.29
¹ OR = Odds Ratio, CI = Confidence Interval
² False discovery rate correction for multiple testing

Create a Table Stratified by Treatment Arm

We use a formula statement to specify the variables which we want summarized.
The example below uses age (a continuous variable) and sex (a factor).
Note results=“asis” when creating the R chunk below changes the layout slightly (compresses it) and bolds the variable names. If you want to use tabl as dataframe use as.data.frame.

tab1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(tab1) # quick look at the table using  summary() on your tableby object, use text =true for pretty text version

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
sex					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)
Age in Years					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000

Add Labels to Variables

Either you could add labels to the variables in your dataset or you can use the built-in data.frame method for labels <-: as below

labels(mockstudy)  <- c(age = 'Age, yrs', sex = "Gender")

tab1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(tab1)

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Gender					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)
Age, yrs					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000

Change Summary Statistics Globally

By default it summarize continuous variables with: Number of missing values, Mean (SD), 25th - 75th quantiles, and Minimum-Maximum values with an ANOVA (t-test with equal variances) p-value.
For categorical variables the default is to show: Number of missing values and count (column percent) with a chi-square p-value.
This default can be modified using the tableby.control function. Note that test=FALSE and total=FALSE results in the total column and p-value column not being printed.

mycontrols  <- tableby.control(test=FALSE, total=FALSE,
                               numeric.test="kwt", cat.test="chisq",
                               numeric.stats=c("N", "median", "q1q3"),
                               cat.stats=c("countpct"),
                               stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols)
summary(tab2)

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)
Gender
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)
Age, yrs
Count	428	691	380
Median	61.000	61.000	61.000
Q1,Q3	53.000, 68.000	52.000, 69.000	52.000, 68.000

Change Summary Statistics Within the Formula

We can also modify the test and summary statistics for specific variables within the formula statement. For example, below we use the kwt (Kruskal-Wallis rank-based) for age and anova for BMI, and no test for the variable “ast”.
The tests function can do a quick check on what tests were performed on each variable in tableby.

tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + notest(ast), data=mockstudy)
tt<-tests(tab.test)
tt %>%
  kbl(caption = "Tests function for a quick check on what tests were performed") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Tests function for a quick check on what tests were performed
Group	Variable	p.value	Method
Treatment Arm	age	0.6390614	Kruskal-Wallis rank sum test
Treatment Arm	bmi	0.8916552	Linear Model ANOVA
Treatment Arm	ast	NA	No test

Modifying Summary Statistics

Summary statistics for any individual variable can also be modified, but it must be done as secondary arguments to the test function.
The function names must be strings that are functions already written for tableby.

summary(tab.test, pfootnote=TRUE) # default summary table with footnote display option TRUE

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Age, yrs					0.639¹
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000
Body Mass Index (kg/m^2)					0.892²
N-Miss	9	20	4	33
Mean (SD)	27.290 (5.552)	27.210 (5.173)	27.106 (5.751)	27.206 (5.432)
Range	14.053 - 53.008	16.649 - 49.130	15.430 - 60.243	14.053 - 60.243
ast
N-Miss	69	141	56	266
Mean (SD)	37.292 (28.036)	35.202 (26.659)	35.670 (25.807)	35.933 (26.843)
Range	10.000 - 205.000	7.000 - 174.000	5.000 - 176.000	5.000 - 205.000

Kruskal-Wallis rank sum test
Linear Model ANOVA

tab.test1 <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") +
                    notest(bmi, "Nmiss","median"), data=mockstudy)
summary(tab.test1)

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
ast					0.039
N-Miss	69	141	56	266
Median	29.000	25.500	27.000	27.000
Age, yrs					0.614
N	428	691	380	1499
Mean	59.673	60.301	59.763	59.985
Body Mass Index (kg/m^2)
N-Miss	9	20	4	33
Median	26.234	26.525	25.978	26.325

Categorical Tests (Chisq and Fisher’s)

The formal tests for categorical variables against the levels of the by variable, chisq and FE, have options to simulate p-values. We show how to turn on the simulations for these with 500 replicates for the Fisher’s test (FE).
The chi-square test on 2x2 tables applies Yates’ continuity correction by default,with and without the correction that is applied to treatment arm by sex, if we use subset to ignore one of the three treatment arms.

set.seed(100)
tab.catsim <- tableby(arm ~ sex + race, cat.test="fe", simulate.p.value=TRUE, B=500, data=mockstudy)
t1<-tests(tab.catsim)
t1 %>%
  kbl(caption = "Fisher's Exact Test for Count Data with simulated p-value") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Fisher’s Exact Test for Count Data with simulated p-value
Group	Variable	p.value	Method
Treatment Arm	sex	0.2195609	Fisher’s Exact Test for Count Data with simulated p-value (based on 500 replicates)
Treatment Arm	race	0.3093812	Fisher’s Exact Test for Count Data with simulated p-value (based on 500 replicates)

cat.correct <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), data=mockstudy)
t2<-tests(cat.correct)

t2 %>%
  kbl(caption = "Chi-Square Test with correction for Count Data with simulated p-value") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Chi-Square Test with correction for Count Data with simulated p-value
Group	Variable	p.value	Method
Treatment Arm	sex	0.1666280	Pearson’s Chi-squared test
Treatment Arm	race	0.8108543	Pearson’s Chi-squared test

cat.nocorrect <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm),chisq.correct=FALSE, data=mockstudy)
t3<-tests(cat.nocorrect)
t3 %>%
  kbl(caption = "Chi-Square Test without correction for Count Data with simulated p-value") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Chi-Square Test without correction for Count Data with simulated p-value
Group	Variable	p.value	Method
Treatment Arm	sex	0.1666280	Pearson’s Chi-squared test
Treatment Arm	race	0.8108543	Pearson’s Chi-squared test

Summarize Without a Group Variable

tab.noby <- tableby(~ bmi + sex + age, data=mockstudy)
summary(tab.noby)

	Overall (N=1499)
Body Mass Index (kg/m^2)
N-Miss	33
Mean (SD)	27.206 (5.432)
Range	14.053 - 60.243
Gender
Male	916 (61.1%)
Female	583 (38.9%)
Age, yrs
Mean (SD)	59.985 (11.519)
Range	19.000 - 88.000

Summarize an Ordered Factor

When comparing groups of ordered data there are two options.

The default uses a general independence test available from the coin package. For two-group comparisons, this is essentially the Armitage trend test.
The other option is to specify the Kruskal Wallis test.

summary(tableby(sex ~ age.ord, data = mockstudy), pfootnote = TRUE)

	Male (N=916)	Female (N=583)	Total (N=1499)	p value
age.ord				0.049¹
10-19	1 (0.1%)	0 (0.0%)	1 (0.1%)
20-29	8 (0.9%)	11 (1.9%)	19 (1.3%)
30-39	37 (4.0%)	30 (5.1%)	67 (4.5%)
40-49	127 (13.9%)	83 (14.2%)	210 (14.0%)
50-59	257 (28.1%)	179 (30.7%)	436 (29.1%)
60-69	298 (32.5%)	170 (29.2%)	468 (31.2%)
70-79	168 (18.3%)	101 (17.3%)	269 (17.9%)
80-89	20 (2.2%)	9 (1.5%)	29 (1.9%)

Trend test for ordinal variables

Summarize a Survival Variable

Information presented by the survfit() function,can be seen with tableby.
The default is to show the median survival (time at which the probability of survival = 50%).
It is also possible to obtain summaries of the % survival at certain time points (say the probability of surviving 1-year).

survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy) # first look at info from survfit func

## Call: survfit(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
## 
##              n events median 0.95LCL 0.95UCL
## sex=Male   916    829    550     515     590
## sex=Female 583    527    543     511     575

survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy)

## Call:
## survdiff(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
## 
##              N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=Male   916      829      830  0.000370  0.000956
## sex=Female 583      527      526  0.000583  0.000956
## 
##  Chisq= 0  on 1 degrees of freedom, p= 1

# now apply tableby
summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy))

	Male (N=916)	Female (N=583)	Total (N=1499)	p value
Surv(fu.time, fu.stat)				0.975
Events	829	527	1356
Median Survival	550.000	543.000	546.000

summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5) # using survfit

## Call: survfit(formula = Surv(fu.time/365.25, fu.stat) ~ sex, data = mockstudy)
## 
##                 sex=Male 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    626     286   0.6870  0.0153       0.6576       0.7177
##     2    309     311   0.3437  0.0158       0.3142       0.3761
##     3    152     151   0.1748  0.0127       0.1516       0.2015
##     4     57      61   0.0941  0.0104       0.0759       0.1168
##     5     24      16   0.0628  0.0095       0.0467       0.0844
## 
##                 sex=Female 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    380     202   0.6531  0.0197       0.6155        0.693
##     2    190     189   0.3277  0.0195       0.2917        0.368
##     3     95      90   0.1701  0.0157       0.1420        0.204
##     4     51      32   0.1093  0.0133       0.0861        0.139
##     5     18      12   0.0745  0.0126       0.0534        0.104

summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv"))) # using tableby

	Male (N=916)	Female (N=583)	Total (N=1499)	p value
Surv(fu.time/365.25, fu.stat)				0.975
time = 1	286 (68.7)	202 (65.3)	488 (67.4)
time = 2	597 (34.4)	391 (32.8)	988 (33.7)
time = 3	748 (17.5)	481 (17.0)	1229 (17.3)
time = 4	809 (9.4)	513 (10.9)	1322 (10.1)
time = 5	825 (6.3)	525 (7.4)	1350 (6.8)
time = 1	626 (68.7)	380 (65.3)	1006 (67.4)
time = 2	309 (34.4)	190 (32.8)	499 (33.7)
time = 3	152 (17.5)	95 (17.0)	247 (17.3)
time = 4	57 (9.4)	51 (10.9)	108 (10.1)
time = 5	24 (6.3)	18 (7.4)	42 (6.8)

Summarising Subset data using the Subset Argument within tableby

summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))

	Male (N=333)	Female (N=224)	Total (N=557)	p value
ps				0.652
N-Miss	64	44	108
Mean (SD)	0.554 (0.600)	0.528 (0.602)	0.543 (0.600)
Range	0.000 - 2.000	0.000 - 2.000	0.000 - 2.000
hgb				< 0.001
N-Miss	64	44	108
Mean (SD)	12.720 (1.925)	12.063 (1.395)	12.457 (1.760)
Range	9.000 - 18.200	9.100 - 15.900	9.000 - 18.200
Body Mass Index (kg/m^2)				0.650
N-Miss	9	6	15
Mean (SD)	27.539 (4.780)	27.337 (5.508)	27.458 (5.081)
Range	17.927 - 47.458	16.649 - 49.130	16.649 - 49.130

Create Combinations of Variables

## create a variable combining the levels of mdquality.s and sex
with(mockstudy, table(interaction(mdquality.s,sex)))

0.Male 1.Male 0.Female 1.Female 77 686 47 437

summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy))

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
interaction(mdquality.s, sex)					0.493
N-Miss	55	156	41	252
0.Male	29 (7.8%)	31 (5.8%)	17 (5.0%)	77 (6.2%)
1.Male	214 (57.4%)	285 (53.3%)	187 (55.2%)	686 (55.0%)
0.Female	12 (3.2%)	21 (3.9%)	14 (4.1%)	47 (3.8%)
1.Female	118 (31.6%)	198 (37.0%)	121 (35.7%)	437 (35.0%)

## create a new grouping variable with combined levels of arm and sex
summary(tableby(interaction(mdquality.s, sex) ~  age + bmi, data=mockstudy, subset=arm=="F: FOLFOX"))

	0.Male (N=31)	1.Male (N=285)	0.Female (N=21)	1.Female (N=198)	Total (N=535)	p value
Age, yrs						0.190
Mean (SD)	63.065 (11.702)	60.653 (11.833)	60.810 (10.103)	58.924 (11.366)	60.159 (11.612)
Range	41.000 - 82.000	19.000 - 88.000	42.000 - 81.000	29.000 - 83.000	19.000 - 88.000
Body Mass Index (kg/m^2)						0.894
N-Miss	0	6	1	5	12
Mean (SD)	26.633 (5.094)	27.387 (4.704)	27.359 (4.899)	27.294 (5.671)	27.307 (5.100)
Range	20.177 - 41.766	17.927 - 47.458	19.801 - 39.369	16.799 - 44.841	16.799 - 47.458

Variable Transformations

If the transformation includes any of the symbols / - + ^ * then surround the new variable by I(), so that R knows to treat it as a variable transformation and not some special model feature.

trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')),
                 data=mockstudy)
summary(trans)

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Age, yrs					0.614
Mean (SD)	5.967 (1.136)	6.030 (1.163)	5.976 (1.150)	5.999 (1.152)
Range	2.700 - 8.800	1.900 - 8.800	2.600 - 8.500	1.900 - 8.800
Body Mass Index (kg/m^2)					0.811
N-Miss	9	20	4	33
Mean (SD)	3.287 (0.197)	3.286 (0.183)	3.279 (0.200)	3.285 (0.192)
Range	2.643 - 3.970	2.812 - 3.894	2.736 - 4.098	2.643 - 4.098
factor(mdquality.s, levels = 0:1, labels = c(“N”, “Y”))					0.694
N-Miss	55	156	41	252
N	41 (11.0%)	52 (9.7%)	31 (9.1%)	124 (9.9%)
Y	332 (89.0%)	483 (90.3%)	308 (90.9%)	1123 (90.1%)

Change lables to desired ones

labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality')
labels(trans)

##                                                     arm 
##                                         "Treatment Arm" 
##                                               I(age/10) 
##                                        "Age per 10 yrs" 
##                                                log(bmi) 
##                                              "log(BMI)" 
## factor(mdquality.s, levels = 0:1, labels = c("N", "Y")) 
##                                            "MD Quality"

summary(trans)

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Age per 10 yrs					0.614
Mean (SD)	5.967 (1.136)	6.030 (1.163)	5.976 (1.150)	5.999 (1.152)
Range	2.700 - 8.800	1.900 - 8.800	2.600 - 8.500	1.900 - 8.800
log(BMI)					0.811
N-Miss	9	20	4	33
Mean (SD)	3.287 (0.197)	3.286 (0.183)	3.279 (0.200)	3.285 (0.192)
Range	2.643 - 3.970	2.812 - 3.894	2.736 - 4.098	2.643 - 4.098
MD Quality					0.694
N-Miss	55	156	41	252
N	41 (11.0%)	52 (9.7%)	31 (9.1%)	124 (9.9%)
Y	332 (89.0%)	483 (90.3%)	308 (90.9%)	1123 (90.1%)

Subsetting Tables

mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy)
mytab2 <- mytab[c('age','sex','alk.phos')] #Change the ordering of the variables
summary(mytab2)

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Age, yrs					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000
Gender					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)
alk.phos					0.226
N-Miss	69	141	56	266
Mean (SD)	175.577 (128.608)	161.984 (121.978)	173.506 (138.564)	168.969 (128.492)
Range	11.000 - 858.000	10.000 - 1014.000	7.000 - 982.000	7.000 - 1014.000

summary(mytab[c(3,1)], digits = 3) # delete a variable

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Age, yrs					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000
Gender					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)

summary(mytab[mytab < 0.5]) # filter p-value<0.5

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Gender					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)
alk.phos					0.226
N-Miss	69	141	56	266
Mean (SD)	175.577 (128.608)	161.984 (121.978)	173.506 (138.564)	168.969 (128.492)
Range	11.000 - 858.000	10.000 - 1014.000	7.000 - 982.000	7.000 - 1014.000

summary(sort(mytab, decreasing = TRUE)) # sort p-value

	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Age, yrs					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000
alk.phos					0.226
N-Miss	69	141	56	266
Mean (SD)	175.577 (128.608)	161.984 (121.978)	173.506 (138.564)	168.969 (128.492)
Range	11.000 - 858.000	10.000 - 1014.000	7.000 - 982.000	7.000 - 1014.000
Gender					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)

Add a Title to the Table

When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.

t1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(t1, title='Demographics')

Demographics
	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Gender					0.190
Male	277 (64.7%)	411 (59.5%)	228 (60.0%)	916 (61.1%)
Female	151 (35.3%)	280 (40.5%)	152 (40.0%)	583 (38.9%)
Age, yrs					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000

#With multiple left-hand sides, you can pass a vector or list to determine labels for each table:
summary(tableby(list(arm, sex) ~ age, data = mockstudy), title = c("arm table", "sex table"))

arm table
	A: IFL (N=428)	F: FOLFOX (N=691)	G: IROX (N=380)	Total (N=1499)	p value
Age, yrs					0.614
Mean (SD)	59.673 (11.365)	60.301 (11.632)	59.763 (11.499)	59.985 (11.519)
Range	27.000 - 88.000	19.000 - 88.000	26.000 - 85.000	19.000 - 88.000

sex table
	Male (N=916)	Female (N=583)	Total (N=1499)	p value
Age, yrs				0.048
Mean (SD)	60.455 (11.369)	59.247 (11.722)	59.985 (11.519)
Range	19.000 - 88.000	22.000 - 88.000	19.000 - 88.000

The End!!

R Markdown contains a combination of YAML (metadata), text (narratives), and code chunks.To understand and/or change structure of a Markdown document an excellent source to refer to is https://bookdown.org/yihui/rmarkdown-cookbook/rmarkdown-process.html
DiagrammeR provides methods to build graphs for a number of different graphing languages. We will present a Graphviz example in this section,6 but you can also use pure R code to create graphs and diagrams with DiagrammeR