This is a short tutorial on table preparation using gtsummary and flextable packages. We are going to look at a short analysis and prepare descriptive and regression output tables that is publication ready.
packages <- c("tidyverse", "knitr", "tibble", "kableExtra", "officer", "compareGroups", "car", "haven", "foreign", "tidyr", "magrittr", "labelled", "ggplot2", "foreign", "lme4", "lmerTest", "MuMIn", "MASS", "arm", "lattice", "pscl", "survival", "survey", "srvyr", "survminer", "effects", "gt", "gtsummary", "glue", "flextable", "huxtable", "modelsummary", "broom", "broom.mixed", "lattice", "cmprsk", "stargazer", "sjstats", "sjPlot", "sjmisc", "sjlabelled")
invisible(lapply(packages, function(x) require(x, character.only = T, quietly = T)))
Let’s say you want to look at bivariate descriptive statistics. It’s possible to do a unweighted and weighted descriptive statistics table using two different functions.
tbl_summary for unweighted descriptive statistics. tbl_svysummary for weighted descriptive statistics.
The difference between these two functions is the inclusion of a complex survey design, which is a requirement for weighted analysis.
Here’s an example of preparing a weighted descriptive table:
# To have a compact table (not necessary if you need a compact table or can be postprocessed)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))
table_one <-
survey::svydesign(ids=~psu, weights = ~mortwtsa, data=paa3, strata=~strata, nest=T) %>% # Remove this line for unweighted descriptive statistics
tbl_svysummary(by= "male", # stratification variable.
include = c(age, time_death, d.event, male,
race_eth, edu, marital_stat),
percent = "column", #Column or Row percentage, depending on your needs
statistic = list(all_continuous() ~ "{mean} ({sd})", # What statistics do you want on your table.
#In this case, for all continuous variables it will generate means and standard deviation.
#For all categorical variables, it will generate weighted percentage, if weight variable is included above.
all_categorical() ~ "{p}"),
digits = list(all_continuous() ~ c(1,1),
all_categorical() ~ 1),
label = list(age ~ "Age (in years)", #**The label list should exclude the stratification variable.**
time_death ~ "Follow-up time (in years)",
d.event ~ "All-cause",
race_eth ~ "Race and Ethnicity",
edu ~ "Educational attainment",
marital_stat ~ "Marital status"))%>%
modify_header(update = list(label ~ "",
stat_1 ~ "**Female**, n = 117,732",
stat_2 ~ "**Male**, n = 91,585"))%>%
modify_spanning_header(starts_with("stat_") ~ "**Sex (weighted %)**") %>%
add_p() %>%
bold_labels() %>%
italicize_levels() %>%
as_flex_table() %>%
set_table_properties(width = 1, layout = "autofit")# For pretty word output
# as_gt() can be used as well. But flextable provides better formatting options
#Saving table output
#save_as_docx(des_sum2, path = "C:/Users/munta/Google Drive/## Spring 2021/Writing Group/Descriptive_Table2_May25.docx")
table_one
| Sex (weighted %) |
| |
Female, n = 117,7321 | Male, n = 91,5851 | p-value2 | |
Age (in years) | 55.3 (14.0) | 53.9 (13.1) | <0.001 |
Follow-up time (in years) | 8.3 (3.4) | 8.2 (3.4) | <0.001 |
All-cause | 13.0 | 14.7 | <0.001 |
Race and Ethnicity | <0.001 | ||
Non-Hispanic white | 74.4 | 75.0 | |
Alaskan Native/American Indian | 0.6 | 0.7 | |
Asian | 3.6 | 3.7 | |
Cuban | 0.6 | 0.7 | |
Dominican (Republic) | 0.4 | 0.3 | |
Mexican | 5.5 | 6.4 | |
Multi-racial | 0.2 | 0.2 | |
No response | 0.1 | 0.1 | |
Non-Hispanic black | 11.4 | 9.8 | |
Other | 2.0 | 2.1 | |
Puerto Rican | 1.1 | 1.1 | |
Educational attainment | <0.001 | ||
Less than high school | 14.0 | 14.2 | |
High school graduate or equivalent | 32.3 | 30.5 | |
Some college | 28.3 | 25.4 | |
College graduate or more | 24.8 | 29.2 | |
Unknown | 0.7 | 0.7 | |
Marital status | <0.001 | ||
Never married | 8.6 | 10.7 | |
Separated | 2.9 | 2.2 | |
Divorced | 14.8 | 12.5 | |
Widowed | 14.0 | 4.0 | |
Unknown | 0.2 | 0.2 | |
Married | 59.5 | 70.4 | |
1Mean (SD); % | |||
2Wilcoxon rank-sum test for complex survey samples; chi-squared test with Rao & Scott's second-order correction |
To conduct an analysis using complex survey design we need to create a design object incorporating the weight variable. Here’s an example.
options(survey.lonely.psu = "adjust")
des <- svydesign(ids=~psu, strata=~strata, weights = ~mortwtsa, data=paa3, nest=T)
Let’s say we wanted to fit a logistic regression model to test the association among outcome and predictor variables in our analysis.
The svyglm function is required when the outcome is dichotomous which is the case in the following example. We are looking at the association of mortality outcomes (all-cause and CVD) and a number of covariates for this example.
Let’s run our models.
fit1 <- svyglm(d.event ~ age+male+race_eth+edu+marital_stat, design=des, family=binomial(link="logit"))
fit2 <- svyglm(cvd ~ age+male+race_eth+edu+marital_stat, design=des, family=binomial(link="logit"))
Take a brief look at the model summary.
# Take a look at the output
gtsummary::tbl_regression(fit1, exp = TRUE)
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
Age (in years) | 1.09 | 1.09, 1.09 | <0.001 |
Sex | |||
Female | — | — | |
Male | 1.66 | 1.60, 1.71 | <0.001 |
Race/Ethnicity | |||
Non-Hispanic white | — | — | |
Alaskan Native/American Indian | 1.11 | 0.82, 1.51 | 0.5 |
Asian | 0.89 | 0.79, 0.99 | 0.040 |
Cuban | 0.81 | 0.66, 0.98 | 0.032 |
Dominican (Republic) | 0.43 | 0.30, 0.62 | <0.001 |
Mexican | 0.91 | 0.84, 0.98 | 0.015 |
Multi-racial | 0.90 | 0.61, 1.33 | 0.6 |
No response | 0.61 | 0.31, 1.20 | 0.2 |
Non-Hispanic black | 1.08 | 1.02, 1.14 | 0.005 |
Other | 0.49 | 0.42, 0.57 | <0.001 |
Puerto Rican | 0.77 | 0.64, 0.91 | 0.003 |
Educational Attainment | |||
Less than high school | — | — | |
High school graduate or equivalent | 0.69 | 0.66, 0.72 | <0.001 |
Some college | 0.58 | 0.55, 0.60 | <0.001 |
College graduate or more | 0.38 | 0.36, 0.40 | <0.001 |
Unknown | 0.82 | 0.68, 0.98 | 0.030 |
Marital Status | |||
Never married | — | — | |
Separated | 0.91 | 0.82, 1.01 | 0.090 |
Divorced | 0.80 | 0.75, 0.86 | <0.001 |
Widowed | 0.90 | 0.84, 0.96 | 0.002 |
Unknown | 0.91 | 0.66, 1.24 | 0.5 |
Married | 0.60 | 0.56, 0.64 | <0.001 |
1
OR = Odds Ratio, CI = Confidence Interval
|
gtsummary::tbl_regression(fit2, exp = TRUE)
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
Age (in years) | 1.08 | 1.08, 1.09 | <0.001 |
Sex | |||
Female | — | — | |
Male | 2.07 | 1.94, 2.22 | <0.001 |
Race/Ethnicity | |||
Non-Hispanic white | — | — | |
Alaskan Native/American Indian | 1.18 | 0.66, 2.11 | 0.6 |
Asian | 0.84 | 0.68, 1.04 | 0.11 |
Cuban | 1.10 | 0.85, 1.42 | 0.5 |
Dominican (Republic) | 0.61 | 0.34, 1.12 | 0.11 |
Mexican | 0.86 | 0.74, 1.00 | 0.052 |
Multi-racial | 0.68 | 0.34, 1.38 | 0.3 |
No response | 0.31 | 0.07, 1.34 | 0.12 |
Non-Hispanic black | 1.13 | 1.03, 1.25 | 0.011 |
Other | 0.69 | 0.51, 0.94 | 0.019 |
Puerto Rican | 0.86 | 0.65, 1.16 | 0.3 |
Educational Attainment | |||
Less than high school | — | — | |
High school graduate or equivalent | 0.73 | 0.67, 0.80 | <0.001 |
Some college | 0.70 | 0.63, 0.76 | <0.001 |
College graduate or more | 0.43 | 0.38, 0.48 | <0.001 |
Unknown | 0.89 | 0.63, 1.26 | 0.5 |
Marital Status | |||
Never married | — | — | |
Separated | 0.87 | 0.70, 1.07 | 0.2 |
Divorced | 0.70 | 0.61, 0.81 | <0.001 |
Widowed | 0.77 | 0.67, 0.88 | <0.001 |
Unknown | 0.90 | 0.48, 1.70 | 0.7 |
Married | 0.52 | 0.46, 0.59 | <0.001 |
1
OR = Odds Ratio, CI = Confidence Interval
|
We can put the model summary in objects that will allow us subsequent analysis (but it’s not manadatory).
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))
t1 <- tbl_regression(fit1, exp = TRUE) %>% bold_p() #t1 is just an object containing the model output.
t2 <- tbl_regression(fit2, exp = TRUE) %>% bold_p()
For nested modeling, tbl_merge is an easy fix. Even without nested models, table merge function can merge multiple tables side by side. There’s a tbl_stack option that stacks table on top of each other which is useful to summarize bigger tables.
tbl_merge_ex1 <-
tbl_merge( #Table merge function
tbls = list(t1, t2), # Selecting the tables to be included in the combined table output
# More objects can be added here
tab_spanner = c("**All-Cause**", "**CVD**") # Naming the models
) %>%
bold_labels()%>%
italicize_levels()%>%
as_flex_table() #gtsummary packages can crete objects as gt, flextable or huxtable, and many more. These are useful pacakges to further formatting tables according to presentation and publication needs.
#as_gt()
tbl_merge_ex1
| All-Cause | CVD | ||||
Characteristic | OR1 | 95% CI1 | p-value | OR1 | 95% CI1 | p-value |
Age (in years) | 1.09 | 1.09, 1.09 | <0.001 | 1.08 | 1.08, 1.09 | <0.001 |
Sex | ||||||
Female | — | — | — | — | ||
Male | 1.66 | 1.60, 1.71 | <0.001 | 2.07 | 1.94, 2.22 | <0.001 |
Race/Ethnicity | ||||||
Non-Hispanic white | — | — | — | — | ||
Alaskan Native/American Indian | 1.11 | 0.82, 1.51 | 0.5 | 1.18 | 0.66, 2.11 | 0.6 |
Asian | 0.89 | 0.79, 0.99 | 0.040 | 0.84 | 0.68, 1.04 | 0.11 |
Cuban | 0.81 | 0.66, 0.98 | 0.032 | 1.10 | 0.85, 1.42 | 0.5 |
Dominican (Republic) | 0.43 | 0.30, 0.62 | <0.001 | 0.61 | 0.34, 1.12 | 0.11 |
Mexican | 0.91 | 0.84, 0.98 | 0.015 | 0.86 | 0.74, 1.00 | 0.052 |
Multi-racial | 0.90 | 0.61, 1.33 | 0.6 | 0.68 | 0.34, 1.38 | 0.3 |
No response | 0.61 | 0.31, 1.20 | 0.2 | 0.31 | 0.07, 1.34 | 0.12 |
Non-Hispanic black | 1.08 | 1.02, 1.14 | 0.005 | 1.13 | 1.03, 1.25 | 0.011 |
Other | 0.49 | 0.42, 0.57 | <0.001 | 0.69 | 0.51, 0.94 | 0.019 |
Puerto Rican | 0.77 | 0.64, 0.91 | 0.003 | 0.86 | 0.65, 1.16 | 0.3 |
Educational Attainment | ||||||
Less than high school | — | — | — | — | ||
High school graduate or equivalent | 0.69 | 0.66, 0.72 | <0.001 | 0.73 | 0.67, 0.80 | <0.001 |
Some college | 0.58 | 0.55, 0.60 | <0.001 | 0.70 | 0.63, 0.76 | <0.001 |
College graduate or more | 0.38 | 0.36, 0.40 | <0.001 | 0.43 | 0.38, 0.48 | <0.001 |
Unknown | 0.82 | 0.68, 0.98 | 0.030 | 0.89 | 0.63, 1.26 | 0.5 |
Marital Status | ||||||
Never married | — | — | — | — | ||
Separated | 0.91 | 0.82, 1.01 | 0.090 | 0.87 | 0.70, 1.07 | 0.2 |
Divorced | 0.80 | 0.75, 0.86 | <0.001 | 0.70 | 0.61, 0.81 | <0.001 |
Widowed | 0.90 | 0.84, 0.96 | 0.002 | 0.77 | 0.67, 0.88 | <0.001 |
Unknown | 0.91 | 0.66, 1.24 | 0.5 | 0.90 | 0.48, 1.70 | 0.7 |
Married | 0.60 | 0.56, 0.64 | <0.001 | 0.52 | 0.46, 0.59 | <0.001 |
1OR = Odds Ratio, CI = Confidence Interval |
#save_as_docx(tbl_merge_ex1, path = "C:/Users/munta/Google Drive/#### Spring 2020/Jeff Research/Table3_Model_Outputfeb11.docx")
table_two <- tbl_merge_ex1 %>%
colformat_char( #This function is from flextable package
j = c(3,6), # These are the no. of columns.
prefix = "(", #Putting parenthesis around 95% confidence interval
suffix = ")") %>%
set_table_properties(width = 1, layout = "autofit")
table_two
| All-Cause | CVD | ||||
Characteristic | OR1 | 95% CI1 | p-value | OR1 | 95% CI1 | p-value |
Age (in years) | 1.09 | (1.09, 1.09) | <0.001 | 1.08 | (1.08, 1.09) | <0.001 |
Sex | ||||||
Female | — | — | ||||
Male | 1.66 | (1.60, 1.71) | <0.001 | 2.07 | (1.94, 2.22) | <0.001 |
Race/Ethnicity | ||||||
Non-Hispanic white | — | — | ||||
Alaskan Native/American Indian | 1.11 | (0.82, 1.51) | 0.5 | 1.18 | (0.66, 2.11) | 0.6 |
Asian | 0.89 | (0.79, 0.99) | 0.040 | 0.84 | (0.68, 1.04) | 0.11 |
Cuban | 0.81 | (0.66, 0.98) | 0.032 | 1.10 | (0.85, 1.42) | 0.5 |
Dominican (Republic) | 0.43 | (0.30, 0.62) | <0.001 | 0.61 | (0.34, 1.12) | 0.11 |
Mexican | 0.91 | (0.84, 0.98) | 0.015 | 0.86 | (0.74, 1.00) | 0.052 |
Multi-racial | 0.90 | (0.61, 1.33) | 0.6 | 0.68 | (0.34, 1.38) | 0.3 |
No response | 0.61 | (0.31, 1.20) | 0.2 | 0.31 | (0.07, 1.34) | 0.12 |
Non-Hispanic black | 1.08 | (1.02, 1.14) | 0.005 | 1.13 | (1.03, 1.25) | 0.011 |
Other | 0.49 | (0.42, 0.57) | <0.001 | 0.69 | (0.51, 0.94) | 0.019 |
Puerto Rican | 0.77 | (0.64, 0.91) | 0.003 | 0.86 | (0.65, 1.16) | 0.3 |
Educational Attainment | ||||||
Less than high school | — | — | ||||
High school graduate or equivalent | 0.69 | (0.66, 0.72) | <0.001 | 0.73 | (0.67, 0.80) | <0.001 |
Some college | 0.58 | (0.55, 0.60) | <0.001 | 0.70 | (0.63, 0.76) | <0.001 |
College graduate or more | 0.38 | (0.36, 0.40) | <0.001 | 0.43 | (0.38, 0.48) | <0.001 |
Unknown | 0.82 | (0.68, 0.98) | 0.030 | 0.89 | (0.63, 1.26) | 0.5 |
Marital Status | ||||||
Never married | — | — | ||||
Separated | 0.91 | (0.82, 1.01) | 0.090 | 0.87 | (0.70, 1.07) | 0.2 |
Divorced | 0.80 | (0.75, 0.86) | <0.001 | 0.70 | (0.61, 0.81) | <0.001 |
Widowed | 0.90 | (0.84, 0.96) | 0.002 | 0.77 | (0.67, 0.88) | <0.001 |
Unknown | 0.91 | (0.66, 1.24) | 0.5 | 0.90 | (0.48, 1.70) | 0.7 |
Married | 0.60 | (0.56, 0.64) | <0.001 | 0.52 | (0.46, 0.59) | <0.001 |
1OR = Odds Ratio, CI = Confidence Interval |
#save_as_docx(qq3.0, path = "C:/Users/munta/Google Drive/## Spring 2021/Writing Group/Descriptive_Table2_May25.docx")