Create publication-ready analytical and summary tables using {gtsummary} package in R

Create publication-ready analytical and summary tables using {gtsummary} package.

Using the {gtsummary} package offers a stylish and adaptable method for producing analytical and summary tables that are ready for publication.
This package uses sensible defaults with fully customizable features to summarize datasets, regression models, and more.

Here are the steps for using the {gtsummary} package:

Load the libraries.

# install.packages("gtsummary")
library(gtsummary)
library(gt)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(mlbench)

Upload dataset.

# load the dataset
data(gss_cat)
str(gss_cat)

## tibble [21,483 × 9] (S3: tbl_df/tbl/data.frame)
##  $ year   : int [1:21483] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
##  $ marital: Factor w/ 6 levels "No answer","Never married",..: 2 4 5 2 4 6 2 4 6 6 ...
##  $ age    : int [1:21483] 26 48 67 39 25 25 36 44 44 47 ...
##  $ race   : Factor w/ 4 levels "Other","Black",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ rincome: Factor w/ 16 levels "No answer","Don't know",..: 8 8 16 16 16 5 4 9 4 4 ...
##  $ partyid: Factor w/ 10 levels "No answer","Don't know",..: 6 5 7 6 9 10 5 8 9 4 ...
##  $ relig  : Factor w/ 16 levels "No answer","Don't know",..: 15 15 15 6 12 15 5 15 15 15 ...
##  $ denom  : Factor w/ 30 levels "No answer","Don't know",..: 25 23 3 30 30 25 30 15 4 25 ...
##  $ tvhours: int [1:21483] 12 NA 2 4 1 NA 3 NA 0 3 ...

ls(gss_cat)

## [1] "age"     "denom"   "marital" "partyid" "race"    "relig"   "rincome"
## [8] "tvhours" "year"

Filter dataset for year 2014 and Black and White races only.

gss_cat_2014<- gss_cat %>% 
                       filter(year == 2014 & race %in% c("Black", "White")) %>% 
                       # Use 'droplevels' to remove levels/categories with 0(0.00%) in created dataset
                       droplevels

Perform EDA on gss_cat_2014.

library(summarytools)

## 
## Attaching package: 'summarytools'

## The following object is masked from 'package:tibble':
## 
##     view

dfSummary(gss_cat_2014, 
          plain.ascii  = FALSE, 
          style        = "grid", 
          graph.magnif = 0.75, 
          valid.col    = FALSE,
          tmp.img.dir  = "/tmp",
          max.distinct.values = 30)

## ### Data Frame Summary  
## #### gss_cat_2014  
## **Dimensions:** 2276 x 9  
## **Duplicates:** 22  
## 
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | No | Variable  | Stats / Values               | Freqs (% of Valid)   | Graph                | Missing |
## +====+===========+==============================+======================+======================+=========+
## | 1  | year\     | 1 distinct value             | 2014 : 2276 (100.0%) | ![](/tmp/ds0028.png) | 0\      |
## |    | [integer] |                              |                      |                      | (0.0%)  |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 2  | marital\  | 1\. No answer\               | 4 ( 0.2%)\           | ![](/tmp/ds0029.png) | 0\      |
## |    | [factor]  | 2\. Never married\           | 584 (25.7%)\         |                      | (0.0%)  |
## |    |           | 3\. Separated\               | 69 ( 3.0%)\          |                      |         |
## |    |           | 4\. Divorced\                | 374 (16.4%)\         |                      |         |
## |    |           | 5\. Widowed\                 | 199 ( 8.7%)\         |                      |         |
## |    |           | 6\. Married                  | 1046 (46.0%)         |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 3  | age\      | Mean (sd) : 49.8 (17.5)\     | 72 distinct values   | ![](/tmp/ds0030.png) | 8\      |
## |    | [integer] | min < med < max:\            |                      |                      | (0.4%)  |
## |    |           | 18 < 50 < 89\                |                      |                      |         |
## |    |           | IQR (CV) : 28 (0.4)          |                      |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 4  | race\     | 1\. Black\                   | 386 (17.0%)\         | ![](/tmp/ds0031.png) | 0\      |
## |    | [factor]  | 2\. White                    | 1890 (83.0%)         |                      | (0.0%)  |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 5  | rincome\  | 1\. Don't know\              | 17 ( 0.7%)\          | ![](/tmp/ds0032.png) | 0\      |
## |    | [factor]  | 2\. Refused\                 | 64 ( 2.8%)\          |                      | (0.0%)  |
## |    |           | 3\. $25000 or more\          | 869 (38.2%)\         |                      |         |
## |    |           | 4\. $20000 - 24999\          | 126 ( 5.5%)\         |                      |         |
## |    |           | 5\. $15000 - 19999\          | 75 ( 3.3%)\          |                      |         |
## |    |           | 6\. $10000 - 14999\          | 95 ( 4.2%)\          |                      |         |
## |    |           | 7\. $8000 to 9999\           | 24 ( 1.1%)\          |                      |         |
## |    |           | 8\. $7000 to 7999\           | 11 ( 0.5%)\          |                      |         |
## |    |           | 9\. $6000 to 6999\           | 21 ( 0.9%)\          |                      |         |
## |    |           | 10\. $5000 to 5999\          | 27 ( 1.2%)\          |                      |         |
## |    |           | 11\. $4000 to 4999\          | 22 ( 1.0%)\          |                      |         |
## |    |           | 12\. $3000 to 3999\          | 33 ( 1.4%)\          |                      |         |
## |    |           | 13\. $1000 to 2999\          | 32 ( 1.4%)\          |                      |         |
## |    |           | 14\. Lt $1000\               | 27 ( 1.2%)\          |                      |         |
## |    |           | 15\. Not applicable          | 833 (36.6%)          |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 6  | partyid\  | 1\. No answer\               | 22 ( 1.0%)\          | ![](/tmp/ds0033.png) | 0\      |
## |    | [factor]  | 2\. Don't know\              | 1 ( 0.0%)\           |                      | (0.0%)  |
## |    |           | 3\. Other party\             | 57 ( 2.5%)\          |                      |         |
## |    |           | 4\. Strong republican\       | 238 (10.5%)\         |                      |         |
## |    |           | 5\. Not str republican\      | 277 (12.2%)\         |                      |         |
## |    |           | 6\. Ind,near rep\            | 228 (10.0%)\         |                      |         |
## |    |           | 7\. Independent\             | 416 (18.3%)\         |                      |         |
## |    |           | 8\. Ind,near dem\            | 292 (12.8%)\         |                      |         |
## |    |           | 9\. Not str democrat\        | 354 (15.6%)\         |                      |         |
## |    |           | 10\. Strong democrat         | 391 (17.2%)          |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 7  | relig\    | 1\. No answer\               | 10 ( 0.4%)\          | ![](/tmp/ds0034.png) | 0\      |
## |    | [factor]  | 2\. Don't know\              | 3 ( 0.1%)\           |                      | (0.0%)  |
## |    |           | 3\. Inter-nondenominational\ | 4 ( 0.2%)\           |                      |         |
## |    |           | 4\. Christian\               | 124 ( 5.4%)\         |                      |         |
## |    |           | 5\. Orthodox-christian\      | 9 ( 0.4%)\           |                      |         |
## |    |           | 6\. Moslem/islam\            | 7 ( 0.3%)\           |                      |         |
## |    |           | 7\. Other eastern\           | 1 ( 0.0%)\           |                      |         |
## |    |           | 8\. Hinduism\                | 1 ( 0.0%)\           |                      |         |
## |    |           | 9\. Buddhism\                | 18 ( 0.8%)\          |                      |         |
## |    |           | 10\. Other\                  | 20 ( 0.9%)\          |                      |         |
## |    |           | 11\. None\                   | 467 (20.5%)\         |                      |         |
## |    |           | 12\. Jewish\                 | 39 ( 1.7%)\          |                      |         |
## |    |           | 13\. Catholic\               | 494 (21.7%)\         |                      |         |
## |    |           | 14\. Protestant              | 1079 (47.4%)         |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 8  | denom\    | 1\. No answer\               | 10 ( 0.4%)\          | ![](/tmp/ds0035.png) | 0\      |
## |    | [factor]  | 2\. Don't know\              | 3 ( 0.1%)\           |                      | (0.0%)  |
## |    |           | 3\. No denomination\         | 276 (12.1%)\         |                      |         |
## |    |           | 4\. Other\                   | 235 (10.3%)\         |                      |         |
## |    |           | 5\. Episcopal\               | 38 ( 1.7%)\          |                      |         |
## |    |           | 6\. Presbyterian-dk wh\      | 26 ( 1.1%)\          |                      |         |
## |    |           | 7\. Presbyterian, merged\    | 11 ( 0.5%)\          |                      |         |
## |    |           | 8\. Other presbyterian\      | 2 ( 0.1%)\           |                      |         |
## |    |           | 9\. United pres ch in us\    | 9 ( 0.4%)\           |                      |         |
## |    |           | 10\. Presbyterian c in us\   | 7 ( 0.3%)\           |                      |         |
## |    |           | 11\. Lutheran-dk which\      | 29 ( 1.3%)\          |                      |         |
## |    |           | 12\. Evangelical luth\       | 16 ( 0.7%)\          |                      |         |
## |    |           | 13\. Other lutheran\         | 3 ( 0.1%)\           |                      |         |
## |    |           | 14\. Wi evan luth synod\     | 4 ( 0.2%)\           |                      |         |
## |    |           | 15\. Lutheran-mo synod\      | 25 ( 1.1%)\          |                      |         |
## |    |           | 16\. Luth ch in america\     | 4 ( 0.2%)\           |                      |         |
## |    |           | 17\. Am lutheran\            | 9 ( 0.4%)\           |                      |         |
## |    |           | 18\. Methodist-dk which\     | 17 ( 0.7%)\          |                      |         |
## |    |           | 19\. Other methodist\        | 4 ( 0.2%)\           |                      |         |
## |    |           | 20\. United methodist\       | 109 ( 4.8%)\         |                      |         |
## |    |           | 21\. Afr meth ep zion\       | 4 ( 0.2%)\           |                      |         |
## |    |           | 22\. Afr meth episcopal\     | 8 ( 0.4%)\           |                      |         |
## |    |           | 23\. Baptist-dk which\       | 151 ( 6.6%)\         |                      |         |
## |    |           | 24\. Other baptists\         | 20 ( 0.9%)\          |                      |         |
## |    |           | 25\. Southern baptist\       | 143 ( 6.3%)\         |                      |         |
## |    |           | 26\. Nat bapt conv usa\      | 3 ( 0.1%)\           |                      |         |
## |    |           | 27\. Nat bapt conv of am\    | 9 ( 0.4%)\           |                      |         |
## |    |           | 28\. Am bapt ch in usa\      | 16 ( 0.7%)\          |                      |         |
## |    |           | 29\. Am baptist asso\        | 22 ( 1.0%)\          |                      |         |
## |    |           | 30\. Not applicable          | 1063 (46.7%)         |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 9  | tvhours\  | Mean (sd) : 3 (2.6)\         | 0 : 106 ( 7.0%)\     | ![](/tmp/ds0036.png) | 765\    |
## |    | [integer] | min < med < max:\            | 1 : 278 (18.4%)\     |                      | (33.6%) |
## |    |           | 0 < 2 < 24\                  | 2 : 403 (26.7%)\     |                      |         |
## |    |           | IQR (CV) : 3 (0.9)           | 3 : 266 (17.6%)\     |                      |         |
## |    |           |                              | 4 : 201 (13.3%)\     |                      |         |
## |    |           |                              | 5 : 101 ( 6.7%)\     |                      |         |
## |    |           |                              | 6 :  71 ( 4.7%)\     |                      |         |
## |    |           |                              | 7 :  11 ( 0.7%)\     |                      |         |
## |    |           |                              | 8 :  33 ( 2.2%)\     |                      |         |
## |    |           |                              | 9 :   1 ( 0.1%)\     |                      |         |
## |    |           |                              | 10 :  15 ( 1.0%)\    |                      |         |
## |    |           |                              | 12 :  13 ( 0.9%)\    |                      |         |
## |    |           |                              | 14 :   2 ( 0.1%)\    |                      |         |
## |    |           |                              | 16 :   2 ( 0.1%)\    |                      |         |
## |    |           |                              | 17 :   1 ( 0.1%)\    |                      |         |
## |    |           |                              | 18 :   1 ( 0.1%)\    |                      |         |
## |    |           |                              | 20 :   2 ( 0.1%)\    |                      |         |
## |    |           |                              | 24 :   4 ( 0.3%)     |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+

Use tbl_summary() to summarize specific variables from the dataset.

library(gtsummary)
# summarize the data with our package
table1 <- 
  gss_cat_2014 %>%
  tbl_summary(include = c(age, tvhours, race, marital, rincome)) %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 1. Clients Characteristics",
                 subtitle = " January to December, 2014")

table1

Characteristic	N = 2,276¹
Table 1. Clients Characteristics
January to December, 2014
age	50 (35, 63)
Unknown	8
tvhours	2 (1, 4)
Unknown	765
race
Black	386 (17%)
White	1,890 (83%)
marital
No answer	4 (0.2%)
Never married	584 (26%)
Separated	69 (3.0%)
Divorced	374 (16%)
Widowed	199 (8.7%)
Married	1,046 (46%)
rincome
Don't know	17 (0.7%)
Refused	64 (2.8%)
$25000 or more	869 (38%)
$20000 - 24999	126 (5.5%)
$15000 - 19999	75 (3.3%)
$10000 - 14999	95 (4.2%)
$8000 to 9999	24 (1.1%)
$7000 to 7999	11 (0.5%)
$6000 to 6999	21 (0.9%)
$5000 to 5999	27 (1.2%)
$4000 to 4999	22 (1.0%)
$3000 to 3999	33 (1.4%)
$1000 to 2999	32 (1.4%)
Lt $1000	27 (1.2%)
Not applicable	833 (37%)
¹ Median (IQR); n (%)

Use tbl_summary() with customization options to create crosstabualtion.

table2 <-
  tbl_summary(
    gss_cat_2014,
    include = c(age, tvhours, relig, partyid, marital, rincome),
    by = race, # split table by group
    missing = "no" # don't list missing data separately
  ) %>%
  # add_n() %>% # add column with total number of non-missing observations
  # add_p() %>% # test for a difference between groups
  modify_header(label = "**Variable**") %>% # update the column header
  bold_labels() %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 2. Cleients Characteristics by Race",
                 subtitle = " January to December, 2014")

table2

Variable	Black, N = 386¹	White, N = 1,890¹
Table 2. Cleients Characteristics by Race
January to December, 2014
age	43 (30, 57)	51 (36, 64)
tvhours	3 (2, 5)	2 (1, 4)
relig
No answer	3 (0.8%)	7 (0.4%)
Don't know	0 (0%)	3 (0.2%)
Inter-nondenominational	2 (0.5%)	2 (0.1%)
Christian	36 (9.3%)	88 (4.7%)
Orthodox-christian	0 (0%)	9 (0.5%)
Moslem/islam	3 (0.8%)	4 (0.2%)
Other eastern	0 (0%)	1 (<0.1%)
Hinduism	0 (0%)	1 (<0.1%)
Buddhism	1 (0.3%)	17 (0.9%)
Other	1 (0.3%)	19 (1.0%)
None	62 (16%)	405 (21%)
Jewish	2 (0.5%)	37 (2.0%)
Catholic	28 (7.3%)	466 (25%)
Protestant	248 (64%)	831 (44%)
partyid
No answer	7 (1.8%)	15 (0.8%)
Don't know	0 (0%)	1 (<0.1%)
Other party	5 (1.3%)	52 (2.8%)
Strong republican	7 (1.8%)	231 (12%)
Not str republican	10 (2.6%)	267 (14%)
Ind,near rep	8 (2.1%)	220 (12%)
Independent	52 (13%)	364 (19%)
Ind,near dem	48 (12%)	244 (13%)
Not str democrat	82 (21%)	272 (14%)
Strong democrat	167 (43%)	224 (12%)
marital
No answer	2 (0.5%)	2 (0.1%)
Never married	167 (43%)	417 (22%)
Separated	19 (4.9%)	50 (2.6%)
Divorced	68 (18%)	306 (16%)
Widowed	33 (8.5%)	166 (8.8%)
Married	97 (25%)	949 (50%)
rincome
Don't know	3 (0.8%)	14 (0.7%)
Refused	7 (1.8%)	57 (3.0%)
$25000 or more	128 (33%)	741 (39%)
$20000 - 24999	22 (5.7%)	104 (5.5%)
$15000 - 19999	19 (4.9%)	56 (3.0%)
$10000 - 14999	22 (5.7%)	73 (3.9%)
$8000 to 9999	4 (1.0%)	20 (1.1%)
$7000 to 7999	3 (0.8%)	8 (0.4%)
$6000 to 6999	6 (1.6%)	15 (0.8%)
$5000 to 5999	6 (1.6%)	21 (1.1%)
$4000 to 4999	6 (1.6%)	16 (0.8%)
$3000 to 3999	11 (2.8%)	22 (1.2%)
$1000 to 2999	8 (2.1%)	24 (1.3%)
Lt $1000	8 (2.1%)	19 (1.0%)
Not applicable	133 (34%)	700 (37%)
¹ Median (IQR); n (%)

Regression Models

Use tbl_regression() to display linear regression model results in a table.

# Get Marketing dataset from {datarium} library
library(datarium)
data(marketing)
head(marketing, 5)

##   youtube facebook newspaper sales
## 1  276.12    45.36     83.04 26.52
## 2   53.40    47.16     54.12 12.48
## 3   20.64    55.08     83.16 11.16
## 4  181.80    49.56     70.20 22.20
## 5  216.96    12.96     70.08 15.48

# Create a scatter plot with smoothed line displaying the sales units versus YouTube advertising budget.
ggplot(marketing, aes(x = youtube, y = sales)) +
  geom_point() +
  stat_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Create a scatter plot with smoothed line displaying the sales units versus Facebook advertising budget.
ggplot(marketing, aes(x = facebook, y = sales)) +
  geom_point() +
  stat_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Create a scatter plot with smoothed line displaying the sales units versus newspaper advertising budget.
ggplot(marketing, aes(x = newspaper, y = sales)) +
  geom_point() +
  stat_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Linear regression tries to find the best line to predict sales on the basis of YouTube advertising budget.
model0 <- lm(sales ~ youtube + facebook + newspaper, data = marketing)
summary(model0)

## 
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5932  -1.0690   0.2902   1.4272   3.3951 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.526667   0.374290   9.422   <2e-16 ***
## youtube      0.045765   0.001395  32.809   <2e-16 ***
## facebook     0.188530   0.008611  21.893   <2e-16 ***
## newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.023 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16

# Display table for regression model0
model0 %>%  
tbl_regression() %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 3. Linear Regression Analysis for Sales",
                 subtitle = " Dataset: Marketing {datarium}")

Characteristic	Beta	95% CI¹	p-value
Table 3. Linear Regression Analysis for Sales
Dataset: Marketing {datarium}
youtube	0.05	0.04, 0.05	<0.001
facebook	0.19	0.17, 0.21	<0.001
newspaper	0.00	-0.01, 0.01	0.9
¹ CI = Confidence Interval

Finally, use tbl_regression() to display logistic regression model results in a table.

# Get trial dataset from {gtsummary} library
data(trial)
head(trial, 5)

## # A tibble: 5 × 8
##   trt      age marker stage grade response death ttdeath
##   <chr>  <dbl>  <dbl> <fct> <fct>    <int> <int>   <dbl>
## 1 Drug A    23  0.16  T1    II           0     0    24  
## 2 Drug B     9  1.11  T2    I            1     0    24  
## 3 Drug A    31  0.277 T1    II           0     0    24  
## 4 Drug A    NA  2.07  T3    III          1     1    17.6
## 5 Drug A    51  2.77  T4    III          1     1    16.4

# Logistic regression is a classification algorithm. We are using it to predict tumor response based on a set of independent variables.
model1 <- glm(response ~ trt + age + grade, data=trial, family = binomial)

# Display table for regression model1
model1_tbl<-model1 %>% 
tbl_regression(exponentiate = TRUE) %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 4. Logistic Regression Analysis for Tumor Response to Treatment",
                 subtitle = " Dataset: Trial {gtsummary}")

model1_tbl

Characteristic	OR¹	95% CI¹	p-value
Table 4. Logistic Regression Analysis for Tumor Response to Treatment
Dataset: Trial {gtsummary}
Chemotherapy Treatment
Drug A	—	—
Drug B	1.13	0.60, 2.13	0.7
Age	1.02	1.00, 1.04	0.10
Grade
I	—	—
II	0.85	0.39, 1.85	0.7
III	1.01	0.47, 2.15	>0.9
¹ OR = Odds Ratio, CI = Confidence Interval

A.M.D.G.

Create publication-ready analytical and summary tables using {gtsummary} package in R

Ramon Rodriguez-Santana, MBA, MPH

2024-03-09

Create publication-ready analytical and summary tables using {gtsummary} package.

Regression Models