Summary Tables with gtsummary

Revealjs Presentation

Erika Barajas

IBM 6530, Cal Poly Pomona

2026-05-14

Prompt 1: gtsummary Capabilities

Overview of gtsummary

Prompt: Summarize the package’s capabilities as explained by the speaker.

The {gtsummary} package creates publication-ready summary tables in R. Key capabilities include:

tbl_summary() — descriptive statistics for continuous and categorical variables
tbl_cross() — cross-tabulation with chi-square tests
tbl_regression() — regression model output tables (logistic, OLS)
tbl_uvregression() — univariate regression across multiple variables
add_p() — adds p-values with appropriate statistical tests
add_overall() — adds overall column to grouped tables
tbl_merge() / tbl_stack() — combines multiple tables
Theme support including JAMA and QJEcon journal styles

How gtsummary Benefits My Work

Prompt: How may {gtsummary} benefit you for your school work or career?

For my Sino Yoga CEP project, {gtsummary} allows me to present survey results in clean, professional tables without manual formatting. For my career in marketing analytics, being able to produce publication-quality tables directly from R code saves time and ensures reproducibility.

Prompt 2: Step 2 Video — Daniel Dsjoberg

How gtsummary Differs from gt and gtExtras

Prompt: How does gtsummary differ from gt and gtExtras?

{gt} is a general-purpose table formatting package — it gives you full control over appearance but requires manual calculation of statistics
{gtExtras} extends {gt} with additional styling and visualization options like inline plots and color themes
{gtsummary} is built on top of {gt} and automates the statistical calculations — you don’t need to compute means, percentages, or p-values yourself
{gtsummary} is specifically designed for clinical and social science research reporting, while {gt} is general purpose

Three New Things I Learned

Prompt: Give three things you learned newly that were not explained in the lecture in Step 1.

Themes — {gtsummary} supports journal-specific themes like JAMA and QJEcon that automatically reformat tables to match publication standards
add_difference() — calculates and displays the difference between groups with confidence intervals, going beyond just p-values
tbl_uvregression() — runs univariate regression across multiple variables simultaneously, which is much more efficient than running each model separately

Prompt 3: Cross-Tabulation with Sino Yoga Data

Variables and Hypotheses

Prompt: Choose two appropriate variables for cross-tabulation and show if the two variables are associated or not.

Variable 1: Gender (Question 13) — categorical: Male, Female, Non-binary, Prefer not to say
Variable 2: Likelihood to visit website after watching YouTube video (Question 10) — ordinal: 1-5 scale
Hypothesis: There is a statistically significant association between gender and likelihood to visit the Sino Yoga website after watching a YouTube video

Results and Interpretation

library(tidyverse)
library(gtsummary)
library(readxl)

# Load data
sino_yoga <- read_excel("Sino Yoga Project - Copy_March 17, 2026_17.55.xlsx", 
                        skip = 1)

# Remove first two metadata rows
sino_yoga <- sino_yoga[-c(1,2), ]

# Recode variables
sino_yoga_clean <- sino_yoga |>
  rename(
    gender = `What is your gender?`,
    website_visit = `After watching a yoga instructor's YouTube video, how likely are you to visit their website to learn more about their courses or services?`
  ) |>
  filter(!is.na(gender), !is.na(website_visit),
         gender != "{\"ImportId\":\"QID9\"}",
         website_visit != "{\"ImportId\":\"QID34\"}") |>
  mutate(
    gender = case_when(
      gender == "1.0" ~ "Male",
      gender == "2.0" ~ "Female",
      gender == "3.0" ~ "Non-binary",
      gender == "4.0" ~ "Prefer not to say",
      TRUE ~ NA_character_
    ),
    website_visit = case_when(
      website_visit == "17.0" ~ "1 - Very Unlikely",
      website_visit == "18.0" ~ "2 - Unlikely",
      website_visit == "19.0" ~ "3 - Neutral",
      website_visit == "20.0" ~ "4 - Likely",
      website_visit == "21.0" ~ "5 - Very Likely",
      TRUE ~ NA_character_
    ),
    website_visit = factor(website_visit, levels = c(
      "1 - Very Unlikely", "2 - Unlikely", "3 - Neutral",
      "4 - Likely", "5 - Very Likely"))
  ) |>
  filter(!is.na(gender), !is.na(website_visit))

# Cross-tabulation
sino_yoga_clean |>
  tbl_cross(
    row = website_visit,
    col = gender,
    percent = "column"
  ) |>
  add_p() |>
  bold_labels()

	gender			Total	p-value¹
	Female	Male	Non-binary	Total	p-value¹
website_visit					>0.9
1 - Very Unlikely	3 (6.5%)	0 (0%)	0 (0%)	3 (6.0%)
2 - Unlikely	10 (22%)	0 (0%)	0 (0%)	10 (20%)
3 - Neutral	10 (22%)	1 (33%)	0 (0%)	11 (22%)
4 - Likely	17 (37%)	2 (67%)	1 (100%)	20 (40%)
5 - Very Likely	6 (13%)	0 (0%)	0 (0%)	6 (12%)
Total	46 (100%)	3 (100%)	1 (100%)	50 (100%)
¹ Fisher’s exact test

Interpretation

The cross-tabulation examined whether gender is associated with likelihood to visit the Sino Yoga website after watching a YouTube video. Using Fisher’s exact test (appropriate for small cell sizes), the p-value was greater than 0.9, indicating no statistically significant association between gender and website visit intention.

The majority of respondents (40%) indicated they were “Likely” (4 out of 5) to visit the website after watching a YouTube video, regardless of gender. This suggests that YouTube content may be an effective driver of website traffic across all gender groups for Sino Yoga.

Prompt 4: Multiple Regression with Sino Yoga Data

Model Setup

Prompt: Using the MSDM CEP data, run multiple regression. Regress a dependent variable on a set of independent variables. Code, produce the table, and interpret the result.

Dependent Variable: Likelihood to visit website after watching YouTube video (continuous, 1-5)
Independent Variables:
- Yoga engagement: “I am very interested in yoga related content”
- Yoga lifestyle: “Yoga plays an important role in my lifestyle”
- Actively seeks yoga content: “I actively seek out information or videos about yoga”

Results and Interpretation

# Prepare regression data
sino_yoga_reg <- sino_yoga[-c(1,2), ] |>
  rename(
    website_visit = `After watching a yoga instructor's YouTube video, how likely are you to visit their website to learn more about their courses or services?`,
    yoga_interest = `How much do you agree with the following statements about yoga? - I am very interested in yoga related content`,
    yoga_lifestyle = `How much do you agree with the following statements about yoga? - Yoga plays an important role in my lifestyle`,
    yoga_seeks = `How much do you agree with the following statements about yoga? - I actively seek out information or videos about yoga`
  ) |>
  mutate(across(c(website_visit, yoga_interest, yoga_lifestyle, yoga_seeks), 
                as.numeric)) |>
  filter(!is.na(website_visit), !is.na(yoga_interest), 
         !is.na(yoga_lifestyle), !is.na(yoga_seeks))

# Run regression
m1 <- lm(website_visit ~ yoga_interest + yoga_lifestyle + yoga_seeks, 
         data = sino_yoga_reg)

# Table
m1 |>
  tbl_regression() |>
  add_n() |>
  bold_labels() |>
  bold_p(t = 0.05)

Characteristic	N	Beta	95% CI	p-value
yoga_interest	49	0.01	-0.33, 0.35	>0.9
yoga_lifestyle	49	-0.23	-0.65, 0.19	0.3
yoga_seeks	49	0.47	0.14, 0.81	0.007
Abbreviation: CI = Confidence Interval

Interpretation

The multiple regression examined predictors of likelihood to visit the Sino Yoga website after watching a YouTube video (N = 49).

Actively seeking yoga content (yoga_seeks) was the only statistically significant predictor (β = 0.47, p = 0.007). Respondents who actively seek out yoga information or videos online are significantly more likely to visit the website after watching a YouTube video.
Yoga lifestyle (yoga_lifestyle) and yoga interest (yoga_interest) were not statistically significant predictors.

These findings suggest that Sino Yoga should target users who are active seekers of yoga content — for example through SEO optimization and YouTube search targeting — as they are most likely to convert from viewers to website visitors.

Published Presentation

RPubs URL

This presentation is published at:

https://rpubs.com/Erika_NB/1433897