Mini 11: Table 1 – Descriptive Statistics by Outcome

Author

Jesse McDevitt-Irwin

Assignment Overview

In this mini-assignment, you will construct a Table 1 that shows descriptive statistics by levels of your dependent variable. This is similar to the kind of summary table you might see at the beginning of a published research paper.

Your columns should represent the dependent variable. If your dependent variable is numeric (e.g. age, hemoglobin), recode it into ordered categories, such as “low”, “medium”, and “high”. Aim for 2 or 3 categories that make conceptual and statistical sense for your research question.

Your rows should include your main independent variable and at least five other variables, such as:

  • Wealth index
  • Urban
  • Age
  • Gender
  • Household size (# of people)
  • Educational attainment
  • Marital status
  • Anemia status

Also report the number of missing values for your main dependent and independent variables.

Note: the exact variables you choose will depend on your research question.


Instructions

Step 1: Recode Your Variables

Use mutate() and case_when() to:

  • Simplify categories
  • Rename variables for clarity
  • Recode special values (e.g. “Don’t Know”, “Refused”) as NA
  • Recode your dependent variable to create 2–3 ordered categories if it is continuous. For example:
data <- data %>%
  mutate(outcome_cat = case_when(
    outcome < 10 ~ "Low",
    outcome >= 10 & outcome < 15 ~ "Medium",
    outcome >= 15 ~ "High"
  ))

You can also count missing values using:

data %>% summarise(missing_outcome = sum(is.na(outcome)),
                   missing_main_predictor = sum(is.na(main_predictor)))

Step 2: Create Summary Statistics

Use group_by() and summarise() to create a summary table by levels of your dependent variable. Here’s an example using mock variables:

library(dplyr)

data %>%
  group_by(outcome_cat) %>%
  summarise(
    n = n(),
    mean_age = mean(age, na.rm = TRUE),
    sd_age = sd(age, na.rm = TRUE),
    prop_male = mean(sex == "Male", na.rm = TRUE),
    mean_wealth = mean(wealth_index, na.rm = TRUE),
    prop_married = mean(married == "Yes", na.rm = TRUE)
  )

You should adapt this code to match your variables and research questions.


Step 3: Format and Export Your Table

Once you have created your summary table, use the flextable package to create a nicely formatted table you can copy and paste into Word or Google Docs.

library(flextable)

summary_table <- data %>%
  group_by(outcome_cat) %>%
  summarise(
    mean_age = mean(age, na.rm = TRUE),
    sd_age = sd(age, na.rm = TRUE),
    ... # other variables here
  )

flextable(summary_table)

This will open the table in the Viewer window. From there, you can copy and paste directly into your Word document.