Bunch of packages I always load for a project

packages <- c("tidyverse", "knitr", "tibble", "kableExtra", "officer", "compareGroups", "car", "haven", "foreign", "tidyr", "magrittr", "labelled", "ggplot2", "foreign", "lme4", "lmerTest", "MuMIn", "MASS", "arm", "lattice", "pscl", "survival", "survey", "srvyr", "survminer", "effects", "gt", "gtsummary", "glue", "flextable", "huxtable", "modelsummary", "broom", "broom.mixed", "lattice", "cmprsk", "stargazer", "sjstats", "sjPlot", "sjmisc", "sjlabelled")
invisible(lapply(packages, function(x) require(x, character.only = T, quietly = T)))

Let’s say you want to look at bivariate descriptive statistics. It’s possible to do a unweighted and weighted descriptive statistics table using two different functions.

tbl_summary for unweighted descriptive statistics. tbl_svysummary for weighted descriptive statistics.

The difference between these two functions is the inclusion of a complex survey design, which is a requirement for weighted analysis.

Here’s an example of preparing a weighted descriptive table:

Weighted Descriptive table

# To have a compact table (not necessary if you need a compact table or can be postprocessed)
set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))

table_one <- 
  survey::svydesign(ids=~psu, weights = ~mortwtsa, data=paa3, strata=~strata, nest=T) %>% # Remove this line for unweighted descriptive statistics
  tbl_svysummary(by= "male", # stratification variable.
                 include = c(age, time_death, d.event, male,
                                           race_eth, edu, marital_stat), 
                 percent = "column",  #Column or Row percentage, depending on your needs
                 statistic = list(all_continuous() ~ "{mean} ({sd})", # What statistics do you want on your table. 
#In this case, for all continuous variables it will generate means and standard deviation.
#For all categorical variables, it will generate weighted percentage, if weight variable is included above.
                            all_categorical() ~ "{p}"),
                 digits = list(all_continuous() ~ c(1,1), 
                               all_categorical() ~ 1), 
                 label = list(age ~ "Age (in years)",  #**The label list should exclude the stratification variable.**
                              time_death ~ "Follow-up time (in years)",
                              d.event ~ "All-cause",
                              race_eth ~ "Race and Ethnicity",
                              edu ~ "Educational attainment",
                              marital_stat ~ "Marital status"))%>% 
  modify_header(update = list(label ~ "",
                              stat_1 ~ "**Female**, n = 117,732",
                              stat_2 ~ "**Male**, n = 91,585"))%>%
  modify_spanning_header(starts_with("stat_") ~ "**Sex (weighted %)**") %>%
  add_p() %>% 
  bold_labels() %>% 
  italicize_levels() %>% 
  as_flex_table() %>% 
  set_table_properties(width = 1, layout = "autofit")# For pretty word output
  # as_gt() can be used as well. But flextable provides better formatting options

#Saving table output
#save_as_docx(des_sum2, path = "C:/Users/munta/Google Drive/## Spring 2021/Writing Group/Descriptive_Table2_May25.docx")

table_one

	Sex (weighted %)
Age (in years)	55.3 (14.0)	53.9 (13.1)	<0.001
Follow-up time (in years)	8.3 (3.4)	8.2 (3.4)	<0.001
All-cause	13.0	14.7	<0.001
Race and Ethnicity			<0.001
Non-Hispanic white	74.4	75.0
Alaskan Native/American Indian	0.6	0.7
Asian	3.6	3.7
Cuban	0.6	0.7
Dominican (Republic)	0.4	0.3
Mexican	5.5	6.4
Multi-racial	0.2	0.2
No response	0.1	0.1
Non-Hispanic black	11.4	9.8
Other	2.0	2.1
Puerto Rican	1.1	1.1
Educational attainment			<0.001
Less than high school	14.0	14.2
High school graduate or equivalent	32.3	30.5
Some college	28.3	25.4
College graduate or more	24.8	29.2
Unknown	0.7	0.7
Marital status			<0.001
Never married	8.6	10.7
Separated	2.9	2.2
Divorced	14.8	12.5
Widowed	14.0	4.0
Unknown	0.2	0.2
Married	59.5	70.4
1Mean (SD); %
2Wilcoxon rank-sum test for complex survey samples; chi-squared test with Rao & Scott's second-order correction

Generate survey design

To conduct an analysis using complex survey design we need to create a design object incorporating the weight variable. Here’s an example.

options(survey.lonely.psu = "adjust")
des <- svydesign(ids=~psu, strata=~strata, weights = ~mortwtsa, data=paa3, nest=T)

Let’s say we wanted to fit a logistic regression model to test the association among outcome and predictor variables in our analysis.

The svyglm function is required when the outcome is dichotomous which is the case in the following example. We are looking at the association of mortality outcomes (all-cause and CVD) and a number of covariates for this example.

Let’s run our models.

Run Models

fit1 <- svyglm(d.event ~ age+male+race_eth+edu+marital_stat, design=des, family=binomial(link="logit"))
fit2 <- svyglm(cvd ~ age+male+race_eth+edu+marital_stat, design=des, family=binomial(link="logit"))

Model summary

Take a brief look at the model summary.

# Take a look at the output
gtsummary::tbl_regression(fit1, exp = TRUE)

Characteristic	OR¹	95% CI¹	p-value
Age (in years)	1.09	1.09, 1.09	<0.001
Sex
Female	—	—
Male	1.66	1.60, 1.71	<0.001
Race/Ethnicity
Non-Hispanic white	—	—
Alaskan Native/American Indian	1.11	0.82, 1.51	0.5
Asian	0.89	0.79, 0.99	0.040
Cuban	0.81	0.66, 0.98	0.032
Dominican (Republic)	0.43	0.30, 0.62	<0.001
Mexican	0.91	0.84, 0.98	0.015
Multi-racial	0.90	0.61, 1.33	0.6
No response	0.61	0.31, 1.20	0.2
Non-Hispanic black	1.08	1.02, 1.14	0.005
Other	0.49	0.42, 0.57	<0.001
Puerto Rican	0.77	0.64, 0.91	0.003
Educational Attainment
Less than high school	—	—
High school graduate or equivalent	0.69	0.66, 0.72	<0.001
Some college	0.58	0.55, 0.60	<0.001
College graduate or more	0.38	0.36, 0.40	<0.001
Unknown	0.82	0.68, 0.98	0.030
Marital Status
Never married	—	—
Separated	0.91	0.82, 1.01	0.090
Divorced	0.80	0.75, 0.86	<0.001
Widowed	0.90	0.84, 0.96	0.002
Unknown	0.91	0.66, 1.24	0.5
Married	0.60	0.56, 0.64	<0.001
¹ OR = Odds Ratio, CI = Confidence Interval

gtsummary::tbl_regression(fit2, exp = TRUE)

Characteristic	OR¹	95% CI¹	p-value
Age (in years)	1.08	1.08, 1.09	<0.001
Sex
Female	—	—
Male	2.07	1.94, 2.22	<0.001
Race/Ethnicity
Non-Hispanic white	—	—
Alaskan Native/American Indian	1.18	0.66, 2.11	0.6
Asian	0.84	0.68, 1.04	0.11
Cuban	1.10	0.85, 1.42	0.5
Dominican (Republic)	0.61	0.34, 1.12	0.11
Mexican	0.86	0.74, 1.00	0.052
Multi-racial	0.68	0.34, 1.38	0.3
No response	0.31	0.07, 1.34	0.12
Non-Hispanic black	1.13	1.03, 1.25	0.011
Other	0.69	0.51, 0.94	0.019
Puerto Rican	0.86	0.65, 1.16	0.3
Educational Attainment
Less than high school	—	—
High school graduate or equivalent	0.73	0.67, 0.80	<0.001
Some college	0.70	0.63, 0.76	<0.001
College graduate or more	0.43	0.38, 0.48	<0.001
Unknown	0.89	0.63, 1.26	0.5
Marital Status
Never married	—	—
Separated	0.87	0.70, 1.07	0.2
Divorced	0.70	0.61, 0.81	<0.001
Widowed	0.77	0.67, 0.88	<0.001
Unknown	0.90	0.48, 1.70	0.7
Married	0.52	0.46, 0.59	<0.001
¹ OR = Odds Ratio, CI = Confidence Interval

We can put the model summary in objects that will allow us subsequent analysis (but it’s not manadatory).

Creating object from the model output to merge later

set_gtsummary_theme(theme_gtsummary_compact(set_theme = TRUE))
t1 <- tbl_regression(fit1, exp = TRUE) %>% bold_p() #t1 is just an object containing the model output.
t2 <- tbl_regression(fit2, exp = TRUE) %>% bold_p()

For nested modeling, tbl_merge is an easy fix. Even without nested models, table merge function can merge multiple tables side by side. There’s a tbl_stack option that stacks table on top of each other which is useful to summarize bigger tables.

Merging the two model outputs

tbl_merge_ex1 <-
  tbl_merge( #Table merge function
    tbls = list(t1, t2), # Selecting the tables to be included in the combined table output 
    # More objects can be added here
    tab_spanner = c("**All-Cause**", "**CVD**") # Naming the models
  ) %>% 
  bold_labels()%>% 
  italicize_levels()%>% 
  as_flex_table() #gtsummary packages can crete objects as gt, flextable or huxtable, and many more. These are useful pacakges to further formatting tables according to presentation and publication needs.

#as_gt() 

tbl_merge_ex1

	All-Cause	CVD
Age (in years)	1.09	1.09, 1.09	<0.001	1.08	1.08, 1.09	<0.001
Sex
Female	—	—		—	—
Male	1.66	1.60, 1.71	<0.001	2.07	1.94, 2.22	<0.001
Race/Ethnicity
Non-Hispanic white	—	—		—	—
Alaskan Native/American Indian	1.11	0.82, 1.51	0.5	1.18	0.66, 2.11	0.6
Asian	0.89	0.79, 0.99	0.040	0.84	0.68, 1.04	0.11
Cuban	0.81	0.66, 0.98	0.032	1.10	0.85, 1.42	0.5
Dominican (Republic)	0.43	0.30, 0.62	<0.001	0.61	0.34, 1.12	0.11
Mexican	0.91	0.84, 0.98	0.015	0.86	0.74, 1.00	0.052
Multi-racial	0.90	0.61, 1.33	0.6	0.68	0.34, 1.38	0.3
No response	0.61	0.31, 1.20	0.2	0.31	0.07, 1.34	0.12
Non-Hispanic black	1.08	1.02, 1.14	0.005	1.13	1.03, 1.25	0.011
Other	0.49	0.42, 0.57	<0.001	0.69	0.51, 0.94	0.019
Puerto Rican	0.77	0.64, 0.91	0.003	0.86	0.65, 1.16	0.3
Educational Attainment
Less than high school	—	—		—	—
High school graduate or equivalent	0.69	0.66, 0.72	<0.001	0.73	0.67, 0.80	<0.001
Some college	0.58	0.55, 0.60	<0.001	0.70	0.63, 0.76	<0.001
College graduate or more	0.38	0.36, 0.40	<0.001	0.43	0.38, 0.48	<0.001
Unknown	0.82	0.68, 0.98	0.030	0.89	0.63, 1.26	0.5
Marital Status
Never married	—	—		—	—
Separated	0.91	0.82, 1.01	0.090	0.87	0.70, 1.07	0.2
Divorced	0.80	0.75, 0.86	<0.001	0.70	0.61, 0.81	<0.001
Widowed	0.90	0.84, 0.96	0.002	0.77	0.67, 0.88	<0.001
Unknown	0.91	0.66, 1.24	0.5	0.90	0.48, 1.70	0.7
Married	0.60	0.56, 0.64	<0.001	0.52	0.46, 0.59	<0.001
1OR = Odds Ratio, CI = Confidence Interval

#save_as_docx(tbl_merge_ex1, path = "C:/Users/munta/Google Drive/#### Spring 2020/Jeff Research/Table3_Model_Outputfeb11.docx")

Further formatting of the output

table_two <- tbl_merge_ex1 %>% 
  colformat_char( #This function is from flextable package
      j = c(3,6), # These are the no. of columns.
      prefix = "(", #Putting parenthesis around 95% confidence interval
      suffix = ")") %>%
    set_table_properties(width = 1, layout = "autofit")

table_two

	All-Cause	CVD
Age (in years)	1.09	(1.09, 1.09)	<0.001	1.08	(1.08, 1.09)	<0.001
Sex
Female	—			—
Male	1.66	(1.60, 1.71)	<0.001	2.07	(1.94, 2.22)	<0.001
Race/Ethnicity
Non-Hispanic white	—			—
Alaskan Native/American Indian	1.11	(0.82, 1.51)	0.5	1.18	(0.66, 2.11)	0.6
Asian	0.89	(0.79, 0.99)	0.040	0.84	(0.68, 1.04)	0.11
Cuban	0.81	(0.66, 0.98)	0.032	1.10	(0.85, 1.42)	0.5
Dominican (Republic)	0.43	(0.30, 0.62)	<0.001	0.61	(0.34, 1.12)	0.11
Mexican	0.91	(0.84, 0.98)	0.015	0.86	(0.74, 1.00)	0.052
Multi-racial	0.90	(0.61, 1.33)	0.6	0.68	(0.34, 1.38)	0.3
No response	0.61	(0.31, 1.20)	0.2	0.31	(0.07, 1.34)	0.12
Non-Hispanic black	1.08	(1.02, 1.14)	0.005	1.13	(1.03, 1.25)	0.011
Other	0.49	(0.42, 0.57)	<0.001	0.69	(0.51, 0.94)	0.019
Puerto Rican	0.77	(0.64, 0.91)	0.003	0.86	(0.65, 1.16)	0.3
Educational Attainment
Less than high school	—			—
High school graduate or equivalent	0.69	(0.66, 0.72)	<0.001	0.73	(0.67, 0.80)	<0.001
Some college	0.58	(0.55, 0.60)	<0.001	0.70	(0.63, 0.76)	<0.001
College graduate or more	0.38	(0.36, 0.40)	<0.001	0.43	(0.38, 0.48)	<0.001
Unknown	0.82	(0.68, 0.98)	0.030	0.89	(0.63, 1.26)	0.5
Marital Status
Never married	—			—
Separated	0.91	(0.82, 1.01)	0.090	0.87	(0.70, 1.07)	0.2
Divorced	0.80	(0.75, 0.86)	<0.001	0.70	(0.61, 0.81)	<0.001
Widowed	0.90	(0.84, 0.96)	0.002	0.77	(0.67, 0.88)	<0.001
Unknown	0.91	(0.66, 1.24)	0.5	0.90	(0.48, 1.70)	0.7
Married	0.60	(0.56, 0.64)	<0.001	0.52	(0.46, 0.59)	<0.001
1OR = Odds Ratio, CI = Confidence Interval

#save_as_docx(qq3.0, path =  "C:/Users/munta/Google Drive/## Spring 2021/Writing Group/Descriptive_Table2_May25.docx")

Creating Descriptive and Regression Output Tables using gtsummary and flextable Packages

Muntasir Masum

4/6/2021