This is a short tutorial on table preparation using gtsummary and flextable packages. We are going to look at a short analysis and prepare descriptive and regression output tables that is publication ready.

Bunch of packages I always load for a project

packages <- c("tidyverse", "knitr", "tibble", "kableExtra", "officer", "compareGroups", "car", "haven", "foreign", "tidyr", "magrittr", "labelled", "ggplot2", "foreign", "lme4", "lmerTest", "MuMIn", "MASS", "arm", "lattice", "pscl", "survival", "survey", "srvyr", "survminer", "effects", "gt", "gtsummary", "glue", "flextable", "huxtable", "modelsummary", "broom", "broom.mixed", "lattice", "cmprsk", "stargazer", "sjstats", "sjPlot", "sjmisc", "sjlabelled")
invisible(lapply(packages, function(x) require(x, character.only = T, quietly = T)))


Let’s say you want to look at bivariate descriptive statistics. It’s possible to do a unweighted and weighted descriptive statistics table using two different functions.

tbl_summary for unweighted descriptive statistics. tbl_svysummary for weighted descriptive statistics.

The difference between these two functions is the inclusion of a complex survey design, which is a requirement for weighted analysis.

Here’s an example of preparing a weighted descriptive table:

Weighted Descriptive table

# To have a compact table (not necessary if you need a compact table or can be postprocessed)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

table_one <- 
  survey::svydesign(ids=~psu, weights = ~mortwtsa, data=paa3, strata=~strata, nest=T) %>% # Remove this line for unweighted descriptive statistics
  tbl_svysummary(by= "male", # stratification variable.
                 include = c(age, time_death, d.event, male,
                                           race_eth, edu, marital_stat), 
                 percent = "column",  #Column or Row percentage, depending on your needs
                 statistic = list(all_continuous() ~ "{mean} ({sd})", # What statistics do you want on your table. 
#In this case, for all continuous variables it will generate means and standard deviation.
#For all categorical variables, it will generate weighted percentage, if weight variable is included above.
                            all_categorical() ~ "{p}"),
                 digits = list(all_continuous() ~ c(1,1), 
                               all_categorical() ~ 1), 
                 label = list(age ~ "Age (in years)",  #**The label list should exclude the stratification variable.**
                              time_death ~ "Follow-up time (in years)",
                              d.event ~ "All-cause",
                              race_eth ~ "Race and Ethnicity",
                              edu ~ "Educational attainment",
                              marital_stat ~ "Marital status"))%>% 
  modify_header(update = list(label ~ "",
                              stat_1 ~ "**Female**, n = 117,732",
                              stat_2 ~ "**Male**, n = 91,585"))%>%
  modify_spanning_header(starts_with("stat_") ~ "**Sex (weighted %)**") %>%
  add_p() %>% 
  bold_labels() %>% 
  italicize_levels() %>% 
  as_flex_table() %>% 
  set_table_properties(width = 1, layout = "autofit")# For pretty word output
  # as_gt() can be used as well. But flextable provides better formatting options

#Saving table output
#save_as_docx(des_sum2, path = "C:/Users/munta/Google Drive/## Spring 2021/Writing Group/Descriptive_Table2_May25.docx")

table_one

Generate survey design

To conduct an analysis using complex survey design we need to create a design object incorporating the weight variable. Here’s an example.

options(survey.lonely.psu = "adjust")
des <- svydesign(ids=~psu, strata=~strata, weights = ~mortwtsa, data=paa3, nest=T)

Let’s say we wanted to fit a logistic regression model to test the association among outcome and predictor variables in our analysis.

The svyglm function is required when the outcome is dichotomous which is the case in the following example. We are looking at the association of mortality outcomes (all-cause and CVD) and a number of covariates for this example.

Let’s run our models.

Run Models

fit1 <- svyglm(d.event ~ age+male+race_eth+edu+marital_stat, design=des, family=binomial(link="logit"))
fit2 <- svyglm(cvd ~ age+male+race_eth+edu+marital_stat, design=des, family=binomial(link="logit"))

Model summary

Take a brief look at the model summary.

# Take a look at the output
gtsummary::tbl_regression(fit1, exp = TRUE) 
Characteristic OR1 95% CI1 p-value
Age (in years) 1.09 1.09, 1.09 <0.001
Sex
Female
Male 1.66 1.60, 1.71 <0.001
Race/Ethnicity
Non-Hispanic white
Alaskan Native/American Indian 1.11 0.82, 1.51 0.5
Asian 0.89 0.79, 0.99 0.040
Cuban 0.81 0.66, 0.98 0.032
Dominican (Republic) 0.43 0.30, 0.62 <0.001
Mexican 0.91 0.84, 0.98 0.015
Multi-racial 0.90 0.61, 1.33 0.6
No response 0.61 0.31, 1.20 0.2
Non-Hispanic black 1.08 1.02, 1.14 0.005
Other 0.49 0.42, 0.57 <0.001
Puerto Rican 0.77 0.64, 0.91 0.003
Educational Attainment
Less than high school
High school graduate or equivalent 0.69 0.66, 0.72 <0.001
Some college 0.58 0.55, 0.60 <0.001
College graduate or more 0.38 0.36, 0.40 <0.001
Unknown 0.82 0.68, 0.98 0.030
Marital Status
Never married
Separated 0.91 0.82, 1.01 0.090
Divorced 0.80 0.75, 0.86 <0.001
Widowed 0.90 0.84, 0.96 0.002
Unknown 0.91 0.66, 1.24 0.5
Married 0.60 0.56, 0.64 <0.001

1 OR = Odds Ratio, CI = Confidence Interval

gtsummary::tbl_regression(fit2, exp = TRUE)
Characteristic OR1 95% CI1 p-value
Age (in years) 1.08 1.08, 1.09 <0.001
Sex
Female
Male 2.07 1.94, 2.22 <0.001
Race/Ethnicity
Non-Hispanic white
Alaskan Native/American Indian 1.18 0.66, 2.11 0.6
Asian 0.84 0.68, 1.04 0.11
Cuban 1.10 0.85, 1.42 0.5
Dominican (Republic) 0.61 0.34, 1.12 0.11
Mexican 0.86 0.74, 1.00 0.052
Multi-racial 0.68 0.34, 1.38 0.3
No response 0.31 0.07, 1.34 0.12
Non-Hispanic black 1.13 1.03, 1.25 0.011
Other 0.69 0.51, 0.94 0.019
Puerto Rican 0.86 0.65, 1.16 0.3
Educational Attainment
Less than high school
High school graduate or equivalent 0.73 0.67, 0.80 <0.001
Some college 0.70 0.63, 0.76 <0.001
College graduate or more 0.43 0.38, 0.48 <0.001
Unknown 0.89 0.63, 1.26 0.5
Marital Status
Never married
Separated 0.87 0.70, 1.07 0.2
Divorced 0.70 0.61, 0.81 <0.001
Widowed 0.77 0.67, 0.88 <0.001
Unknown 0.90 0.48, 1.70 0.7
Married 0.52 0.46, 0.59 <0.001

1 OR = Odds Ratio, CI = Confidence Interval


We can put the model summary in objects that will allow us subsequent analysis (but it’s not manadatory).

Creating object from the model output to merge later

set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))
t1 <- tbl_regression(fit1, exp = TRUE) %>% bold_p() #t1 is just an object containing the model output.
t2 <- tbl_regression(fit2, exp = TRUE) %>% bold_p()

For nested modeling, tbl_merge is an easy fix. Even without nested models, table merge function can merge multiple tables side by side. There’s a tbl_stack option that stacks table on top of each other which is useful to summarize bigger tables.

Merging the two model outputs

tbl_merge_ex1 <-
  tbl_merge( #Table merge function
    tbls = list(t1, t2), # Selecting the tables to be included in the combined table output 
    # More objects can be added here
    tab_spanner = c("**All-Cause**", "**CVD**") # Naming the models
  ) %>% 
  bold_labels()%>% 
  italicize_levels()%>% 
  as_flex_table() #gtsummary packages can crete objects as gt, flextable or huxtable, and many more. These are useful pacakges to further formatting tables according to presentation and publication needs.

#as_gt() 

tbl_merge_ex1
#save_as_docx(tbl_merge_ex1, path = "C:/Users/munta/Google Drive/#### Spring 2020/Jeff Research/Table3_Model_Outputfeb11.docx")

Further formatting of the output

table_two <- tbl_merge_ex1 %>% 
  colformat_char( #This function is from flextable package
      j = c(3,6), # These are the no. of columns.
      prefix = "(", #Putting parenthesis around 95% confidence interval
      suffix = ")") %>%
    set_table_properties(width = 1, layout = "autofit")

table_two
#save_as_docx(qq3.0, path =  "C:/Users/munta/Google Drive/## Spring 2021/Writing Group/Descriptive_Table2_May25.docx")