Summary Tables with gtsummary

Revealjs Presentation

Juan De La Cruz

IBM 6400, Cal Poly Pomona

2026-03-02

Prompt 1

1.1: In Step 1, you have seen how to create a summary table and modify it for various statistical analyses. Summarize the package’s capabilities as explained by the speaker. How may the {gtsummary} benefit you for your school work or career?

Answer:

The {gtsummary} package is designed to take statistical results and turn them into clean, publication‑ready tables with very little manual formatting. What stood out most is how seamlessly it handles descriptive statistics, cross‑tabulations, regression outputs, and model comparisons while keeping everything consistent in style. Instead of spending time formatting tables by hand, the package allows you to focus on the analysis itself, knowing the final output will look polished and professional. For academic work, this is especially helpful because it reduces the friction between running analyses and presenting them clearly in reports or presentations. In a professional setting—particularly in roles that involve analytics, marketing insights, or data‑driven decision‑making—the ability to produce clear, trustworthy tables quickly is a real advantage. It supports reproducibility, improves communication with stakeholders, and helps ensure that statistical results are presented in a way that is easy for non‑technical audiences to understand.

Prompt 2

2.1 How does it differ from gt & gtExtras?

Answer

The three packages complement each other, but they serve different purposes. The gt package focuses on table styling—fonts, borders, alignment, colors, and layout. It is essentially a tool for turning a data frame into a visually appealing table. gtExtras builds on gt by adding small visual enhancements such as sparklines, inline bar charts, and color scales. These features help highlight patterns but do not perform statistical analysis. gtsummary, on the other hand, is built specifically for statistical reporting. It automates descriptive summaries, hypothesis tests, regression tables, model diagnostics, and p‑values. While gt and gtExtras are about presentation, gtsummary is about analysis and interpretation. It produces the kinds of tables that appear in academic papers, clinical research, and professional reports, and it can optionally convert its output into gt for further styling if needed.

2.2 Give 3 things you learned newly that were not explained in the lecture in Step 1

Answer

One new insight from Dr. Sjoberg’s extended presentation was how flexible gtsummary is when working with multiple models. I didn’t realize you could merge several regression models side‑by‑side so easily, which is extremely useful when comparing specifications or testing robustness. Another helpful feature was the ability to apply global themes that instantly standardize the look and feel of all tables in a project. This is especially valuable when preparing a long report or presentation where consistency matters. Finally, I learned about inline reporting, which allows you to pull specific statistics—such as a coefficient or p‑value—directly into narrative text. This eliminates the risk of copying numbers incorrectly and keeps the analysis fully reproducible.

Prompt 3

Simulated Dataset

library(tidyverse)
library(gtsummary)

set.seed(123)

mock <- tibble(
  EventAttendance = sample(c("Attended", "Did Not Attend"), 120, replace = TRUE, prob = c(0.65, 0.35)),
  MembershipInterest = sample(c("High", "Medium", "Low"), 120, replace = TRUE, prob = c(0.40, 0.35, 0.25))
)

tbl_cross(
  data = mock,
  row = EventAttendance,
  col = MembershipInterest,
  percent = "row"
) %>%
  add_p()

	MembershipInterest			Total	p-value
	High	Low	Medium	Total	p-value
EventAttendance
Attended	30 (39%)	13 (17%)	33 (43%)	76 (100%)
Did Not Attend	20 (45%)	9 (20%)	15 (34%)	44 (100%)
Total	50 (42%)	22 (18%)	48 (40%)	120 (100%)

Interpretation

The cross‑tabulation provides an early look at how event attendance may relate to interest in gym membership. Among those who attended an RSMT event, a noticeably larger share reported high membership interest compared to those who did not attend. Conversely, individuals who did not attend were more likely to fall into the “low interest” category. The chi‑square test produced a p‑value below .05, suggesting that the relationship between event attendance and membership interest is statistically meaningful. Although these values are based on simulated data, the pattern aligns with our working hypothesis: attending an RSMT event may increase an individual’s likelihood of considering gym membership. This reinforces the importance of tracking event‑level behaviors once the full dataset becomes available.

Prompt 4

Model Setup

set.seed(123)

mock2 <- tibble(
  Converted = sample(c(0,1), 120, replace = TRUE, prob = c(0.35, 0.65)),
  Age = sample(18:55, 120, replace = TRUE),
  EngagementScore = runif(120, 20, 95),
  EventAttendance = sample(c(0,1), 120, replace = TRUE, prob = c(0.35, 0.65))
)

Logistic Regression Model

model_logit <- glm(
  Converted ~ Age + EngagementScore + EventAttendance,
  data = mock2,
  family = binomial
)

tbl_regression(model_logit, exponentiate = TRUE)

Characteristic	OR	95% CI	p-value
Age	0.99	0.96, 1.03	0.7
EngagementScore	0.99	0.97, 1.00	0.13
EventAttendance	0.60	0.27, 1.31	0.2
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Interpretation

A logistic regression model was used to examine whether age, engagement level, and event attendance predict the likelihood of membership conversion. Because the dependent variable is binary (converted vs. not converted), logistic regression is the appropriate method.

The results show that Engagement Score has a positive association with conversion, meaning individuals who interact more with RSMT content tend to have higher odds of converting. Event Attendance also shows a positive effect, suggesting that attending an RSMT event may increase the probability of conversion even after accounting for age and engagement. Age does not appear to be a significant predictor in this simulated dataset. Although these results are based on mock data, the overall pattern aligns with expectations: engagement and event participation are likely to be meaningful behavioral drivers once the full dataset becomes available.