Summary Tables with gtsummary

Revealjs Presentation

Erika Barajas

IBM 6530, Cal Poly Pomona

2026-05-14

Prompt 1: gtsummary Capabilities

Overview of gtsummary

Prompt: Summarize the package’s capabilities as explained by the speaker.

The {gtsummary} package creates publication-ready summary tables in R. Key capabilities include:

  • tbl_summary() — descriptive statistics for continuous and categorical variables
  • tbl_cross() — cross-tabulation with chi-square tests
  • tbl_regression() — regression model output tables (logistic, OLS)
  • tbl_uvregression() — univariate regression across multiple variables
  • add_p() — adds p-values with appropriate statistical tests
  • add_overall() — adds overall column to grouped tables
  • tbl_merge() / tbl_stack() — combines multiple tables
  • Theme support including JAMA and QJEcon journal styles

How gtsummary Benefits My Work

Prompt: How may {gtsummary} benefit you for your school work or career?

For my Sino Yoga CEP project, {gtsummary} allows me to present survey results in clean, professional tables without manual formatting. For my career in marketing analytics, being able to produce publication-quality tables directly from R code saves time and ensures reproducibility.

Prompt 2: Step 2 Video — Daniel Dsjoberg

How gtsummary Differs from gt and gtExtras

Prompt: How does gtsummary differ from gt and gtExtras?

  • {gt} is a general-purpose table formatting package — it gives you full control over appearance but requires manual calculation of statistics
  • {gtExtras} extends {gt} with additional styling and visualization options like inline plots and color themes
  • {gtsummary} is built on top of {gt} and automates the statistical calculations — you don’t need to compute means, percentages, or p-values yourself
  • {gtsummary} is specifically designed for clinical and social science research reporting, while {gt} is general purpose

Three New Things I Learned

Prompt: Give three things you learned newly that were not explained in the lecture in Step 1.

  1. Themes{gtsummary} supports journal-specific themes like JAMA and QJEcon that automatically reformat tables to match publication standards
  2. add_difference() — calculates and displays the difference between groups with confidence intervals, going beyond just p-values
  3. tbl_uvregression() — runs univariate regression across multiple variables simultaneously, which is much more efficient than running each model separately

Prompt 3: Cross-Tabulation with Sino Yoga Data

Variables and Hypotheses

Prompt: Choose two appropriate variables for cross-tabulation and show if the two variables are associated or not.

  • Variable 1: Gender (Question 13) — categorical: Male, Female, Non-binary, Prefer not to say
  • Variable 2: Likelihood to visit website after watching YouTube video (Question 10) — ordinal: 1-5 scale
  • Hypothesis: There is a statistically significant association between gender and likelihood to visit the Sino Yoga website after watching a YouTube video

Results and Interpretation

library(tidyverse)
library(gtsummary)
library(readxl)

# Load data
sino_yoga <- read_excel("Sino Yoga Project - Copy_March 17, 2026_17.55.xlsx", 
                        skip = 1)

# Remove first two metadata rows
sino_yoga <- sino_yoga[-c(1,2), ]

# Recode variables
sino_yoga_clean <- sino_yoga |>
  rename(
    gender = `What is your gender?`,
    website_visit = `After watching a yoga instructor's YouTube video, how likely are you to visit their website to learn more about their courses or services?`
  ) |>
  filter(!is.na(gender), !is.na(website_visit),
         gender != "{\"ImportId\":\"QID9\"}",
         website_visit != "{\"ImportId\":\"QID34\"}") |>
  mutate(
    gender = case_when(
      gender == "1.0" ~ "Male",
      gender == "2.0" ~ "Female",
      gender == "3.0" ~ "Non-binary",
      gender == "4.0" ~ "Prefer not to say",
      TRUE ~ NA_character_
    ),
    website_visit = case_when(
      website_visit == "17.0" ~ "1 - Very Unlikely",
      website_visit == "18.0" ~ "2 - Unlikely",
      website_visit == "19.0" ~ "3 - Neutral",
      website_visit == "20.0" ~ "4 - Likely",
      website_visit == "21.0" ~ "5 - Very Likely",
      TRUE ~ NA_character_
    ),
    website_visit = factor(website_visit, levels = c(
      "1 - Very Unlikely", "2 - Unlikely", "3 - Neutral",
      "4 - Likely", "5 - Very Likely"))
  ) |>
  filter(!is.na(gender), !is.na(website_visit))

# Cross-tabulation
sino_yoga_clean |>
  tbl_cross(
    row = website_visit,
    col = gender,
    percent = "column"
  ) |>
  add_p() |>
  bold_labels()
gender
Total p-value1
Female Male Non-binary
website_visit



>0.9
    1 - Very Unlikely 3 (6.5%) 0 (0%) 0 (0%) 3 (6.0%)
    2 - Unlikely 10 (22%) 0 (0%) 0 (0%) 10 (20%)
    3 - Neutral 10 (22%) 1 (33%) 0 (0%) 11 (22%)
    4 - Likely 17 (37%) 2 (67%) 1 (100%) 20 (40%)
    5 - Very Likely 6 (13%) 0 (0%) 0 (0%) 6 (12%)
Total 46 (100%) 3 (100%) 1 (100%) 50 (100%)
1 Fisher’s exact test

Interpretation

The cross-tabulation examined whether gender is associated with likelihood to visit the Sino Yoga website after watching a YouTube video. Using Fisher’s exact test (appropriate for small cell sizes), the p-value was greater than 0.9, indicating no statistically significant association between gender and website visit intention.

The majority of respondents (40%) indicated they were “Likely” (4 out of 5) to visit the website after watching a YouTube video, regardless of gender. This suggests that YouTube content may be an effective driver of website traffic across all gender groups for Sino Yoga.

Prompt 4: Multiple Regression with Sino Yoga Data

Model Setup

Prompt: Using the MSDM CEP data, run multiple regression. Regress a dependent variable on a set of independent variables. Code, produce the table, and interpret the result.

  • Dependent Variable: Likelihood to visit website after watching YouTube video (continuous, 1-5)
  • Independent Variables:
    • Yoga engagement: “I am very interested in yoga related content”
    • Yoga lifestyle: “Yoga plays an important role in my lifestyle”
    • Actively seeks yoga content: “I actively seek out information or videos about yoga”

Results and Interpretation

# Prepare regression data
sino_yoga_reg <- sino_yoga[-c(1,2), ] |>
  rename(
    website_visit = `After watching a yoga instructor's YouTube video, how likely are you to visit their website to learn more about their courses or services?`,
    yoga_interest = `How much do you agree with the following statements about yoga? - I am very interested in yoga related content`,
    yoga_lifestyle = `How much do you agree with the following statements about yoga? - Yoga plays an important role in my lifestyle`,
    yoga_seeks = `How much do you agree with the following statements about yoga? - I actively seek out information or videos about yoga`
  ) |>
  mutate(across(c(website_visit, yoga_interest, yoga_lifestyle, yoga_seeks), 
                as.numeric)) |>
  filter(!is.na(website_visit), !is.na(yoga_interest), 
         !is.na(yoga_lifestyle), !is.na(yoga_seeks))

# Run regression
m1 <- lm(website_visit ~ yoga_interest + yoga_lifestyle + yoga_seeks, 
         data = sino_yoga_reg)

# Table
m1 |>
  tbl_regression() |>
  add_n() |>
  bold_labels() |>
  bold_p(t = 0.05)
Characteristic N Beta 95% CI p-value
yoga_interest 49 0.01 -0.33, 0.35 >0.9
yoga_lifestyle 49 -0.23 -0.65, 0.19 0.3
yoga_seeks 49 0.47 0.14, 0.81 0.007
Abbreviation: CI = Confidence Interval

Interpretation

The multiple regression examined predictors of likelihood to visit the Sino Yoga website after watching a YouTube video (N = 49).

  • Actively seeking yoga content (yoga_seeks) was the only statistically significant predictor (β = 0.47, p = 0.007). Respondents who actively seek out yoga information or videos online are significantly more likely to visit the website after watching a YouTube video.
  • Yoga lifestyle (yoga_lifestyle) and yoga interest (yoga_interest) were not statistically significant predictors.

These findings suggest that Sino Yoga should target users who are active seekers of yoga content — for example through SEO optimization and YouTube search targeting — as they are most likely to convert from viewers to website visitors.

Published Presentation

RPubs URL

This presentation is published at:

https://rpubs.com/Erika_NB/1433897