Create publication-ready analytical and summary tables using {gtsummary} package.

Using the {gtsummary} package offers a stylish and adaptable method for producing analytical and summary tables that are ready for publication.
This package uses sensible defaults with fully customizable features to summarize datasets, regression models, and more.

Here are the steps for using the {gtsummary} package:

  1. Load the libraries.
# install.packages("gtsummary")
library(gtsummary)
library(gt)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mlbench)
  1. Upload dataset.
# load the dataset
data(gss_cat)
str(gss_cat)
## tibble [21,483 × 9] (S3: tbl_df/tbl/data.frame)
##  $ year   : int [1:21483] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
##  $ marital: Factor w/ 6 levels "No answer","Never married",..: 2 4 5 2 4 6 2 4 6 6 ...
##  $ age    : int [1:21483] 26 48 67 39 25 25 36 44 44 47 ...
##  $ race   : Factor w/ 4 levels "Other","Black",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ rincome: Factor w/ 16 levels "No answer","Don't know",..: 8 8 16 16 16 5 4 9 4 4 ...
##  $ partyid: Factor w/ 10 levels "No answer","Don't know",..: 6 5 7 6 9 10 5 8 9 4 ...
##  $ relig  : Factor w/ 16 levels "No answer","Don't know",..: 15 15 15 6 12 15 5 15 15 15 ...
##  $ denom  : Factor w/ 30 levels "No answer","Don't know",..: 25 23 3 30 30 25 30 15 4 25 ...
##  $ tvhours: int [1:21483] 12 NA 2 4 1 NA 3 NA 0 3 ...
ls(gss_cat)
## [1] "age"     "denom"   "marital" "partyid" "race"    "relig"   "rincome"
## [8] "tvhours" "year"
  1. Filter dataset for year 2014 and Black and White races only.
gss_cat_2014<- gss_cat %>% 
                       filter(year == 2014 & race %in% c("Black", "White")) %>% 
                       # Use 'droplevels' to remove levels/categories with 0(0.00%) in created dataset
                       droplevels 
  1. Perform EDA on gss_cat_2014.
library(summarytools)
## 
## Attaching package: 'summarytools'
## The following object is masked from 'package:tibble':
## 
##     view
dfSummary(gss_cat_2014, 
          plain.ascii  = FALSE, 
          style        = "grid", 
          graph.magnif = 0.75, 
          valid.col    = FALSE,
          tmp.img.dir  = "/tmp",
          max.distinct.values = 30)
## ### Data Frame Summary  
## #### gss_cat_2014  
## **Dimensions:** 2276 x 9  
## **Duplicates:** 22  
## 
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | No | Variable  | Stats / Values               | Freqs (% of Valid)   | Graph                | Missing |
## +====+===========+==============================+======================+======================+=========+
## | 1  | year\     | 1 distinct value             | 2014 : 2276 (100.0%) | ![](/tmp/ds0028.png) | 0\      |
## |    | [integer] |                              |                      |                      | (0.0%)  |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 2  | marital\  | 1\. No answer\               | 4 ( 0.2%)\           | ![](/tmp/ds0029.png) | 0\      |
## |    | [factor]  | 2\. Never married\           | 584 (25.7%)\         |                      | (0.0%)  |
## |    |           | 3\. Separated\               | 69 ( 3.0%)\          |                      |         |
## |    |           | 4\. Divorced\                | 374 (16.4%)\         |                      |         |
## |    |           | 5\. Widowed\                 | 199 ( 8.7%)\         |                      |         |
## |    |           | 6\. Married                  | 1046 (46.0%)         |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 3  | age\      | Mean (sd) : 49.8 (17.5)\     | 72 distinct values   | ![](/tmp/ds0030.png) | 8\      |
## |    | [integer] | min < med < max:\            |                      |                      | (0.4%)  |
## |    |           | 18 < 50 < 89\                |                      |                      |         |
## |    |           | IQR (CV) : 28 (0.4)          |                      |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 4  | race\     | 1\. Black\                   | 386 (17.0%)\         | ![](/tmp/ds0031.png) | 0\      |
## |    | [factor]  | 2\. White                    | 1890 (83.0%)         |                      | (0.0%)  |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 5  | rincome\  | 1\. Don't know\              | 17 ( 0.7%)\          | ![](/tmp/ds0032.png) | 0\      |
## |    | [factor]  | 2\. Refused\                 | 64 ( 2.8%)\          |                      | (0.0%)  |
## |    |           | 3\. $25000 or more\          | 869 (38.2%)\         |                      |         |
## |    |           | 4\. $20000 - 24999\          | 126 ( 5.5%)\         |                      |         |
## |    |           | 5\. $15000 - 19999\          | 75 ( 3.3%)\          |                      |         |
## |    |           | 6\. $10000 - 14999\          | 95 ( 4.2%)\          |                      |         |
## |    |           | 7\. $8000 to 9999\           | 24 ( 1.1%)\          |                      |         |
## |    |           | 8\. $7000 to 7999\           | 11 ( 0.5%)\          |                      |         |
## |    |           | 9\. $6000 to 6999\           | 21 ( 0.9%)\          |                      |         |
## |    |           | 10\. $5000 to 5999\          | 27 ( 1.2%)\          |                      |         |
## |    |           | 11\. $4000 to 4999\          | 22 ( 1.0%)\          |                      |         |
## |    |           | 12\. $3000 to 3999\          | 33 ( 1.4%)\          |                      |         |
## |    |           | 13\. $1000 to 2999\          | 32 ( 1.4%)\          |                      |         |
## |    |           | 14\. Lt $1000\               | 27 ( 1.2%)\          |                      |         |
## |    |           | 15\. Not applicable          | 833 (36.6%)          |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 6  | partyid\  | 1\. No answer\               | 22 ( 1.0%)\          | ![](/tmp/ds0033.png) | 0\      |
## |    | [factor]  | 2\. Don't know\              | 1 ( 0.0%)\           |                      | (0.0%)  |
## |    |           | 3\. Other party\             | 57 ( 2.5%)\          |                      |         |
## |    |           | 4\. Strong republican\       | 238 (10.5%)\         |                      |         |
## |    |           | 5\. Not str republican\      | 277 (12.2%)\         |                      |         |
## |    |           | 6\. Ind,near rep\            | 228 (10.0%)\         |                      |         |
## |    |           | 7\. Independent\             | 416 (18.3%)\         |                      |         |
## |    |           | 8\. Ind,near dem\            | 292 (12.8%)\         |                      |         |
## |    |           | 9\. Not str democrat\        | 354 (15.6%)\         |                      |         |
## |    |           | 10\. Strong democrat         | 391 (17.2%)          |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 7  | relig\    | 1\. No answer\               | 10 ( 0.4%)\          | ![](/tmp/ds0034.png) | 0\      |
## |    | [factor]  | 2\. Don't know\              | 3 ( 0.1%)\           |                      | (0.0%)  |
## |    |           | 3\. Inter-nondenominational\ | 4 ( 0.2%)\           |                      |         |
## |    |           | 4\. Christian\               | 124 ( 5.4%)\         |                      |         |
## |    |           | 5\. Orthodox-christian\      | 9 ( 0.4%)\           |                      |         |
## |    |           | 6\. Moslem/islam\            | 7 ( 0.3%)\           |                      |         |
## |    |           | 7\. Other eastern\           | 1 ( 0.0%)\           |                      |         |
## |    |           | 8\. Hinduism\                | 1 ( 0.0%)\           |                      |         |
## |    |           | 9\. Buddhism\                | 18 ( 0.8%)\          |                      |         |
## |    |           | 10\. Other\                  | 20 ( 0.9%)\          |                      |         |
## |    |           | 11\. None\                   | 467 (20.5%)\         |                      |         |
## |    |           | 12\. Jewish\                 | 39 ( 1.7%)\          |                      |         |
## |    |           | 13\. Catholic\               | 494 (21.7%)\         |                      |         |
## |    |           | 14\. Protestant              | 1079 (47.4%)         |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 8  | denom\    | 1\. No answer\               | 10 ( 0.4%)\          | ![](/tmp/ds0035.png) | 0\      |
## |    | [factor]  | 2\. Don't know\              | 3 ( 0.1%)\           |                      | (0.0%)  |
## |    |           | 3\. No denomination\         | 276 (12.1%)\         |                      |         |
## |    |           | 4\. Other\                   | 235 (10.3%)\         |                      |         |
## |    |           | 5\. Episcopal\               | 38 ( 1.7%)\          |                      |         |
## |    |           | 6\. Presbyterian-dk wh\      | 26 ( 1.1%)\          |                      |         |
## |    |           | 7\. Presbyterian, merged\    | 11 ( 0.5%)\          |                      |         |
## |    |           | 8\. Other presbyterian\      | 2 ( 0.1%)\           |                      |         |
## |    |           | 9\. United pres ch in us\    | 9 ( 0.4%)\           |                      |         |
## |    |           | 10\. Presbyterian c in us\   | 7 ( 0.3%)\           |                      |         |
## |    |           | 11\. Lutheran-dk which\      | 29 ( 1.3%)\          |                      |         |
## |    |           | 12\. Evangelical luth\       | 16 ( 0.7%)\          |                      |         |
## |    |           | 13\. Other lutheran\         | 3 ( 0.1%)\           |                      |         |
## |    |           | 14\. Wi evan luth synod\     | 4 ( 0.2%)\           |                      |         |
## |    |           | 15\. Lutheran-mo synod\      | 25 ( 1.1%)\          |                      |         |
## |    |           | 16\. Luth ch in america\     | 4 ( 0.2%)\           |                      |         |
## |    |           | 17\. Am lutheran\            | 9 ( 0.4%)\           |                      |         |
## |    |           | 18\. Methodist-dk which\     | 17 ( 0.7%)\          |                      |         |
## |    |           | 19\. Other methodist\        | 4 ( 0.2%)\           |                      |         |
## |    |           | 20\. United methodist\       | 109 ( 4.8%)\         |                      |         |
## |    |           | 21\. Afr meth ep zion\       | 4 ( 0.2%)\           |                      |         |
## |    |           | 22\. Afr meth episcopal\     | 8 ( 0.4%)\           |                      |         |
## |    |           | 23\. Baptist-dk which\       | 151 ( 6.6%)\         |                      |         |
## |    |           | 24\. Other baptists\         | 20 ( 0.9%)\          |                      |         |
## |    |           | 25\. Southern baptist\       | 143 ( 6.3%)\         |                      |         |
## |    |           | 26\. Nat bapt conv usa\      | 3 ( 0.1%)\           |                      |         |
## |    |           | 27\. Nat bapt conv of am\    | 9 ( 0.4%)\           |                      |         |
## |    |           | 28\. Am bapt ch in usa\      | 16 ( 0.7%)\          |                      |         |
## |    |           | 29\. Am baptist asso\        | 22 ( 1.0%)\          |                      |         |
## |    |           | 30\. Not applicable          | 1063 (46.7%)         |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
## | 9  | tvhours\  | Mean (sd) : 3 (2.6)\         | 0 : 106 ( 7.0%)\     | ![](/tmp/ds0036.png) | 765\    |
## |    | [integer] | min < med < max:\            | 1 : 278 (18.4%)\     |                      | (33.6%) |
## |    |           | 0 < 2 < 24\                  | 2 : 403 (26.7%)\     |                      |         |
## |    |           | IQR (CV) : 3 (0.9)           | 3 : 266 (17.6%)\     |                      |         |
## |    |           |                              | 4 : 201 (13.3%)\     |                      |         |
## |    |           |                              | 5 : 101 ( 6.7%)\     |                      |         |
## |    |           |                              | 6 :  71 ( 4.7%)\     |                      |         |
## |    |           |                              | 7 :  11 ( 0.7%)\     |                      |         |
## |    |           |                              | 8 :  33 ( 2.2%)\     |                      |         |
## |    |           |                              | 9 :   1 ( 0.1%)\     |                      |         |
## |    |           |                              | 10 :  15 ( 1.0%)\    |                      |         |
## |    |           |                              | 12 :  13 ( 0.9%)\    |                      |         |
## |    |           |                              | 14 :   2 ( 0.1%)\    |                      |         |
## |    |           |                              | 16 :   2 ( 0.1%)\    |                      |         |
## |    |           |                              | 17 :   1 ( 0.1%)\    |                      |         |
## |    |           |                              | 18 :   1 ( 0.1%)\    |                      |         |
## |    |           |                              | 20 :   2 ( 0.1%)\    |                      |         |
## |    |           |                              | 24 :   4 ( 0.3%)     |                      |         |
## +----+-----------+------------------------------+----------------------+----------------------+---------+
  1. Use tbl_summary() to summarize specific variables from the dataset.
library(gtsummary)
# summarize the data with our package
table1 <- 
  gss_cat_2014 %>%
  tbl_summary(include = c(age, tvhours, race, marital, rincome)) %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 1. Clients Characteristics",
                 subtitle = " January to December, 2014")

table1
Table 1. Clients Characteristics
January to December, 2014
Characteristic N = 2,2761
age 50 (35, 63)
    Unknown 8
tvhours 2 (1, 4)
    Unknown 765
race
    Black 386 (17%)
    White 1,890 (83%)
marital
    No answer 4 (0.2%)
    Never married 584 (26%)
    Separated 69 (3.0%)
    Divorced 374 (16%)
    Widowed 199 (8.7%)
    Married 1,046 (46%)
rincome
    Don't know 17 (0.7%)
    Refused 64 (2.8%)
    $25000 or more 869 (38%)
    $20000 - 24999 126 (5.5%)
    $15000 - 19999 75 (3.3%)
    $10000 - 14999 95 (4.2%)
    $8000 to 9999 24 (1.1%)
    $7000 to 7999 11 (0.5%)
    $6000 to 6999 21 (0.9%)
    $5000 to 5999 27 (1.2%)
    $4000 to 4999 22 (1.0%)
    $3000 to 3999 33 (1.4%)
    $1000 to 2999 32 (1.4%)
    Lt $1000 27 (1.2%)
    Not applicable 833 (37%)
1 Median (IQR); n (%)
  1. Use tbl_summary() with customization options to create crosstabualtion.
table2 <-
  tbl_summary(
    gss_cat_2014,
    include = c(age, tvhours, relig, partyid, marital, rincome),
    by = race, # split table by group
    missing = "no" # don't list missing data separately
  ) %>%
  # add_n() %>% # add column with total number of non-missing observations
  # add_p() %>% # test for a difference between groups
  modify_header(label = "**Variable**") %>% # update the column header
  bold_labels() %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 2. Cleients Characteristics by Race",
                 subtitle = " January to December, 2014")

table2
Table 2. Cleients Characteristics by Race
January to December, 2014
Variable Black, N = 3861 White, N = 1,8901
age 43 (30, 57) 51 (36, 64)
tvhours 3 (2, 5) 2 (1, 4)
relig

    No answer 3 (0.8%) 7 (0.4%)
    Don't know 0 (0%) 3 (0.2%)
    Inter-nondenominational 2 (0.5%) 2 (0.1%)
    Christian 36 (9.3%) 88 (4.7%)
    Orthodox-christian 0 (0%) 9 (0.5%)
    Moslem/islam 3 (0.8%) 4 (0.2%)
    Other eastern 0 (0%) 1 (<0.1%)
    Hinduism 0 (0%) 1 (<0.1%)
    Buddhism 1 (0.3%) 17 (0.9%)
    Other 1 (0.3%) 19 (1.0%)
    None 62 (16%) 405 (21%)
    Jewish 2 (0.5%) 37 (2.0%)
    Catholic 28 (7.3%) 466 (25%)
    Protestant 248 (64%) 831 (44%)
partyid

    No answer 7 (1.8%) 15 (0.8%)
    Don't know 0 (0%) 1 (<0.1%)
    Other party 5 (1.3%) 52 (2.8%)
    Strong republican 7 (1.8%) 231 (12%)
    Not str republican 10 (2.6%) 267 (14%)
    Ind,near rep 8 (2.1%) 220 (12%)
    Independent 52 (13%) 364 (19%)
    Ind,near dem 48 (12%) 244 (13%)
    Not str democrat 82 (21%) 272 (14%)
    Strong democrat 167 (43%) 224 (12%)
marital

    No answer 2 (0.5%) 2 (0.1%)
    Never married 167 (43%) 417 (22%)
    Separated 19 (4.9%) 50 (2.6%)
    Divorced 68 (18%) 306 (16%)
    Widowed 33 (8.5%) 166 (8.8%)
    Married 97 (25%) 949 (50%)
rincome

    Don't know 3 (0.8%) 14 (0.7%)
    Refused 7 (1.8%) 57 (3.0%)
    $25000 or more 128 (33%) 741 (39%)
    $20000 - 24999 22 (5.7%) 104 (5.5%)
    $15000 - 19999 19 (4.9%) 56 (3.0%)
    $10000 - 14999 22 (5.7%) 73 (3.9%)
    $8000 to 9999 4 (1.0%) 20 (1.1%)
    $7000 to 7999 3 (0.8%) 8 (0.4%)
    $6000 to 6999 6 (1.6%) 15 (0.8%)
    $5000 to 5999 6 (1.6%) 21 (1.1%)
    $4000 to 4999 6 (1.6%) 16 (0.8%)
    $3000 to 3999 11 (2.8%) 22 (1.2%)
    $1000 to 2999 8 (2.1%) 24 (1.3%)
    Lt $1000 8 (2.1%) 19 (1.0%)
    Not applicable 133 (34%) 700 (37%)
1 Median (IQR); n (%)
Regression Models
  1. Use tbl_regression() to display linear regression model results in a table.
# Get Marketing dataset from {datarium} library
library(datarium)
data(marketing)
head(marketing, 5)
##   youtube facebook newspaper sales
## 1  276.12    45.36     83.04 26.52
## 2   53.40    47.16     54.12 12.48
## 3   20.64    55.08     83.16 11.16
## 4  181.80    49.56     70.20 22.20
## 5  216.96    12.96     70.08 15.48
# Create a scatter plot with smoothed line displaying the sales units versus YouTube advertising budget.
ggplot(marketing, aes(x = youtube, y = sales)) +
  geom_point() +
  stat_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Create a scatter plot with smoothed line displaying the sales units versus Facebook advertising budget.
ggplot(marketing, aes(x = facebook, y = sales)) +
  geom_point() +
  stat_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Create a scatter plot with smoothed line displaying the sales units versus newspaper advertising budget.
ggplot(marketing, aes(x = newspaper, y = sales)) +
  geom_point() +
  stat_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Linear regression tries to find the best line to predict sales on the basis of YouTube advertising budget.
model0 <- lm(sales ~ youtube + facebook + newspaper, data = marketing)
summary(model0)
## 
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5932  -1.0690   0.2902   1.4272   3.3951 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.526667   0.374290   9.422   <2e-16 ***
## youtube      0.045765   0.001395  32.809   <2e-16 ***
## facebook     0.188530   0.008611  21.893   <2e-16 ***
## newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.023 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16
# Display table for regression model0
model0 %>%  
tbl_regression() %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 3. Linear Regression Analysis for Sales",
                 subtitle = " Dataset: Marketing {datarium}")
Table 3. Linear Regression Analysis for Sales
Dataset: Marketing {datarium}
Characteristic Beta 95% CI1 p-value
youtube 0.05 0.04, 0.05 <0.001
facebook 0.19 0.17, 0.21 <0.001
newspaper 0.00 -0.01, 0.01 0.9
1 CI = Confidence Interval
  1. Finally, use tbl_regression() to display logistic regression model results in a table.
# Get trial dataset from {gtsummary} library
data(trial)
head(trial, 5)
## # A tibble: 5 × 8
##   trt      age marker stage grade response death ttdeath
##   <chr>  <dbl>  <dbl> <fct> <fct>    <int> <int>   <dbl>
## 1 Drug A    23  0.16  T1    II           0     0    24  
## 2 Drug B     9  1.11  T2    I            1     0    24  
## 3 Drug A    31  0.277 T1    II           0     0    24  
## 4 Drug A    NA  2.07  T3    III          1     1    17.6
## 5 Drug A    51  2.77  T4    III          1     1    16.4
# Logistic regression is a classification algorithm. We are using it to predict tumor response based on a set of independent variables.
model1 <- glm(response ~ trt + age + grade, data=trial, family = binomial)

# Display table for regression model1
model1_tbl<-model1 %>% 
tbl_regression(exponentiate = TRUE) %>%   
  # add table captions
  as_gt() %>%
  gt::tab_header(title = "Table 4. Logistic Regression Analysis for Tumor Response to Treatment",
                 subtitle = " Dataset: Trial {gtsummary}")

model1_tbl
Table 4. Logistic Regression Analysis for Tumor Response to Treatment
Dataset: Trial {gtsummary}
Characteristic OR1 95% CI1 p-value
Chemotherapy Treatment


    Drug A
    Drug B 1.13 0.60, 2.13 0.7
Age 1.02 1.00, 1.04 0.10
Grade


    I
    II 0.85 0.39, 1.85 0.7
    III 1.01 0.47, 2.15 >0.9
1 OR = Odds Ratio, CI = Confidence Interval

A.M.D.G.