This notebook walks through the manual generation of summary statistics in \(\LaTeX\) that are broken out by treatment status. It uses the stargazer package, but it’s a bit more advanced than the previous walkthrough.

Step 1: Import necessary libraries and data

library(stargazer)
library(readxl)  # I'm using this to read my data from an Excel file
library(tidyverse)  # A suite of packages for manipulating data
library(stats)

I’ll be using updated survey response data from the Colombia-LfP project.

colombia_LfP_raw <- read_xlsx("../colombia_LfP/HH7.7.23.xlsx")
print(dim(colombia_LfP_raw))  # Just confirming the import worked correctly

## [1] 2024 1823

Step 2: Clean your data (different for each user)

Every new summary statistics table, even if you’ve worked with the source data before, probably requires a somehow unique data cleaning process. That applies here, too, so this notebook unfortunately won’t be able to walk you through that step of things, but stargazer can cut down the amount of this work if you let it.

In short, you want your data cleaning to take your original dataset and shrink it to only the rows and columns you’re interested in. That’s it for now. No additional group-bys or aggregations, just filter()s and select()s, mostly. The aggregations happen later. For now, I’ll apply a toy version of some data cleaning I’ve performed on this dataset before here.

The main change from last time is the addition of a binary column, treat, that indicates treatment status.

# Define vereda geography subset in hh
vereda_status <- c("control-vereda","treatment-vereda")

# NEW: Define a binary column indicating treatment status
colombia_LfP_raw$treat <- ifelse(
  colombia_LfP_raw$treatment_status == "treatment-vereda",
  1,0
)

# Make a subset of six columns including only responses in "Vereda" geographies
hh_vereda <- colombia_LfP_raw %>%
  filter(colombia_LfP_raw$treatment_status %in% vereda_status) %>%
  # NEW: include 'treat' in the select() below
  select(resp_sex, born, born_head, econstatus, hhmem_size, fhhconsume, treat)

print(dim(hh_vereda))  # Confirming data trimming worked

## [1] 1033    7

IMPORTANT: Convert all fields to numeric

stargazer only recognizes numeric fields in its calculation of summary statistics, so ensure that every column that will eventually be included in the data.frame passed to the stargazer() function (which we’ll see soon) is numeric. One example of a process to do this conversion is demonstrated here.

First, find out the types of each column using the str() function

str(hh_vereda)

## tibble [1,033 x 7] (S3: tbl_df/tbl/data.frame)
##  $ resp_sex  : chr [1:1033] "1" "2" "1" "2" ...
##  $ born      : chr [1:1033] "0" "0" "1" "1" ...
##  $ born_head : chr [1:1033] NA NA NA NA ...
##  $ econstatus: num [1:1033] 1 1 1 3 3 2 1 2 1 7 ...
##  $ hhmem_size: num [1:1033] 5 1 3 4 5 1 3 3 2 2 ...
##  $ fhhconsume: chr [1:1033] "1" "2" "3" "3" ...
##  $ treat     : num [1:1033] 1 1 1 1 1 1 1 1 1 1 ...

# Define all character columns that you want to convert to numeric
# ... as items in a list of strings
convert_to_numeric <- c(
  "resp_sex","born","born_head","fhhconsume"
)

# Once the list of columns to convert is defined, use the code below to apply
# ... the conversion to your data.frame
hh_vereda <- hh_vereda %>%
  mutate(across(
    all_of(convert_to_numeric),
    as.numeric
  ))

str(hh_vereda)  # Confirm changes were applied

## tibble [1,033 x 7] (S3: tbl_df/tbl/data.frame)
##  $ resp_sex  : num [1:1033] 1 2 1 2 1 1 1 1 1 2 ...
##  $ born      : num [1:1033] 0 0 1 1 0 0 0 0 0 0 ...
##  $ born_head : num [1:1033] NA NA NA NA NA NA NA NA NA NA ...
##  $ econstatus: num [1:1033] 1 1 1 3 3 2 1 2 1 7 ...
##  $ hhmem_size: num [1:1033] 5 1 3 4 5 1 3 3 2 2 ...
##  $ fhhconsume: num [1:1033] 1 2 3 3 2 1 3 2 2 3 ...
##  $ treat     : num [1:1033] 1 1 1 1 1 1 1 1 1 1 ...

Back to data cleaning

I have one more thing to do: convert a categorical column representing respondent gender to a boolean field representing whether or not the respondent is female.

# Convert resp_sex categorical responses to binary
hh_vereda$resp_f <- ifelse(
  hh_vereda$resp_sex == 2,
  1,0
)

# Drop resp_sex
hh_vereda <- hh_vereda %>%
  select(resp_f, born, born_head, econstatus, hhmem_size, fhhconsume, treat)

Step 3: Prepare and export `stargazer` table

This is the final step, and at the end of it you’ll have a .tex file with a \(\LaTeX\) table of summary statistics broken out by treatment status.

Step 3.1: Choose better column titles

My current dataset includes column titles, like born and fhhconsume, whose meanings might be unclear even to someone who has worked closely on the project. Let’s choose better names for our columns.

NB: No need to include treat in the renaming, because we’re not representing treatment status as a row in our summary. We’re representing it as a horizontal subset. (If that’s a confusing phrase, you’ll see what I mean later.)

# Define a list for renaming columns: "New Name" = "old_name"
renaming_list <- c(
  "Respondent Female" = "resp_f",
  "Born Locally" = "born",
  "Head of HH Born Locally" = "born_head",
  "Economic Status" = "econstatus",
  "HH Size" = "hhmem_size",
  "Forest Covers Basic Needs" = "fhhconsume"
)

# Create a new dataframe with renamed columns
stargazer_prep <- hh_vereda %>%
  data.frame() %>%  # Include this line to ensure formatting is correct later
  rename(all_of(renaming_list))

Step 3.2: Create summary statistics broken out by treatment status

This is where things get more complicated compared to last time. stargazer has no mechanism to easily define groups (like treatment groups) and display summary statistics in separate horizontal groupings on that basis. So, we need to create our own data.frame object that looks like this and pass it through the stargazer() function with the argument summary=FALSE, as you’ll see below.

This is a pretty long for-loop, but it’s doing simple stuff. For each column from your data you want to summarize (names(renaming_list) above), you want to create a row in your summary table. To do this, we’re going to create a list of vectors (summary_cols below) where each vector will eventually represent a column, and the length of those vectors will eventually be the number of rows. So for each data column you’re summarizing, you’ll tack on (aka append()) one element (a summary statistic) to each vector.

# Define empty list with summary table column titles
# This might look a bit different each time based on what stats you want.
summary_cols = list(
  covariate = c(),
  control.N = c(),
  control.mean = c(),
  control.sd = c(),
  control.min = c(),
  control.max = c(),
  treat.N = c(),
  treat.mean = c(),
  treat.sd = c(),
  treat.min = c(),
  treat.max = c()
)

# Iterate through the columns we're summarizing to create our table row-by-row
# Notice that each list item above has its own little code chunk in this loop
for (col in names(renaming_list)) {
  summary_cols$covariate <- summary_cols$covariate %>%
    append(col)  # Covariate name
  summary_cols$control.N <- summary_cols$control.N %>%
    append(  # Add to our summary column
      stargazer_prep %>%  # From our data
        filter(treat == 0) %>%  # Restrict to treatment_status group
        .[, col] %>%  # Grab column as a vector
        is.na() %>% `!` %>%  # Convert to T/F where NA is FALSE
        sum()  # Control count -- sum up the TRUEs
    )
  summary_cols$control.mean <- summary_cols$control.mean %>%
    append(
      stargazer_prep %>%
        filter(treat == 0) %>%
        .[, col] %>%
        mean(na.rm=T)  # Control mean
    )
  summary_cols$control.sd <- summary_cols$control.sd %>%
    append(
      stargazer_prep %>%
        filter(treat == 0) %>%
        .[, col] %>%
        sd(na.rm=T)  # Control standard deviation
    )
  summary_cols$control.min <- summary_cols$control.min %>%
    append(
      stargazer_prep %>%
        filter(treat == 0) %>%
        .[, col] %>%
        min(na.rm=T)  # Control min
    )
  summary_cols$control.max <- summary_cols$control.max %>%
    append(
      stargazer_prep %>%
        filter(treat == 0) %>%
        .[, col] %>%
        max(na.rm=T)  # Control max
    )
  summary_cols$treat.N <- summary_cols$treat.N %>%
    append(
      stargazer_prep %>%
        filter(treat == 1) %>%
        .[, col] %>%
        is.na() %>% `!` %>%
        sum()  # Treat count
    )
  summary_cols$treat.mean <- summary_cols$treat.mean %>%
    append(
      stargazer_prep %>%
        filter(treat == 1) %>%
        .[, col] %>%
        mean(na.rm=T)  # Treat mean
    )
  summary_cols$treat.sd <- summary_cols$treat.sd %>%
    append(
      stargazer_prep %>%
        filter(treat == 1) %>%
        .[, col] %>%
        sd(na.rm=T)  # Treat standard deviation
    )
  summary_cols$treat.min <- summary_cols$treat.min %>%
    append(
      stargazer_prep %>%
        filter(treat == 1) %>%
        .[, col] %>%
        min(na.rm=T)  # Treat min
    )
  summary_cols$treat.max <- summary_cols$treat.max %>%
    append(
      stargazer_prep %>%
        filter(treat == 1) %>%
        .[, col] %>%
        max(na.rm=T)  # Treat max
    )
}

# Convert the list of vectors to a data.frame so it prints out nicely
summary_final <- summary_cols %>%
  data.frame()

# Display a pretty version of our table
summary_final %>% knitr::kable()

covariate	control.N	control.mean	control.sd	control.min	control.max	treat.N	treat.mean	treat.sd	treat.min	treat.max
Respondent Female	509	0.4047151	0.4913197	0	1	513	0.3625731	0.4812123	0	1
Born Locally	509	0.1925344	0.3946783	0	1	513	0.2534113	0.4353889	0	1
Head of HH Born Locally	114	0.1578947	0.3662522	0	1	92	0.2608696	0.4415150	0	1
Economic Status	509	1.9214145	1.2524437	1	9	513	2.0350877	1.4424970	1	10
HH Size	509	3.3418468	1.7937548	1	11	513	3.1598441	1.6833820	1	10
Forest Covers Basic Needs	509	1.9508841	0.7909081	1	3	513	2.0155945	0.7829672	1	3

Step 3.3: Export `stargazer` table

Time to make the table! Call the stargazer() function with a data.frame containing only the columns you’re interested in as the first argument. Use the out= argument to define the file path for your .tex file

stargazer(
  summary_final,
  # Note the "rownames=F" argument! Try taking it away to see why I use it
  summary= FALSE, rownames= FALSE,
  out= "colombia_summary.tex",
  title= "Summary Statistics from Vereda Areas"
)

## 
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Tue, Jul 18, 2023 - 1:30:04 PM
## \begin{table}[!htbp] \centering 
##   \caption{Summary Statistics from Vereda Areas} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}} ccccccccccc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
## covariate & control.N & control.mean & control.sd & control.min & control.max & treat.N & treat.mean & treat.sd & treat.min & treat.max \\ 
## \hline \\[-1.8ex] 
## Respondent Female & $509$ & $0.405$ & $0.491$ & $0$ & $1$ & $513$ & $0.363$ & $0.481$ & $0$ & $1$ \\ 
## Born Locally & $509$ & $0.193$ & $0.395$ & $0$ & $1$ & $513$ & $0.253$ & $0.435$ & $0$ & $1$ \\ 
## Head of HH Born Locally & $114$ & $0.158$ & $0.366$ & $0$ & $1$ & $92$ & $0.261$ & $0.442$ & $0$ & $1$ \\ 
## Economic Status & $509$ & $1.921$ & $1.252$ & $1$ & $9$ & $513$ & $2.035$ & $1.442$ & $1$ & $10$ \\ 
## HH Size & $509$ & $3.342$ & $1.794$ & $1$ & $11$ & $513$ & $3.160$ & $1.683$ & $1$ & $10$ \\ 
## Forest Covers Basic Needs & $509$ & $1.951$ & $0.791$ & $1$ & $3$ & $513$ & $2.016$ & $0.783$ & $1$ & $3$ \\ 
## \hline \\[-1.8ex] 
## \end{tabular} 
## \end{table}

Step 3.4: Directly edit \(\LaTeX\) table header

To check your table’s appearance, copy it into Overleaf somewhere or use an online interpreter like quicklatex.com. Here is my table, directly exported from stargazer:

You’ll probably notice here that the table is too wide. Like, there’s no way it’ll fit on a page properly. This happens because we needed to attach control. or treat. to every column name in our R code to maintain unique column references. Let’s directly edit the \(\LaTeX\) code to make something that looks better! Here’s the original, too-wide header from this export:

\begin{table}[!htbp] \centering 
  \caption{Summary Statistics from Vereda Areas} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}} ccccccccccc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
covariate & control.N & control.mean & control.sd & control.min & control.max & treat.N & treat.mean & treat.sd & treat.min & treat.max \\
\hline \\[-1.8ex]

We want to do a few things with this:

Rename all the columns with better formatting
- Remove the “control.” and “treat.” prefixes
Left-align the survey item names
- In the string of cs in the \begin{tabular} line, change the c on the left to l
Create a new row in our header that indicates treatment status

To add the new “Control/Treatment” row to the header, copy the following lines below the two \hlines and above the new column titles. Remember to edit the number ranges in the \cline commands if you need to realign the treatment groups!

& \multicolumn{5}{l}{Control} & \multicolumn{5}{l}{Treatment}
& \cline{2-6} \cline{7-11} \\[-1.8ex]

And here is the new table, along with the new code for the full header:

\begin{table}[!htbp] \centering 
  \caption{Summary Statistics from Vereda Areas} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}} lcccccccccc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
& \multicolumn{5}{l}{Control} & \multicolumn{5}{l}{Treatment}
& \cline{2-6} \cline{7-11} \\[-1.8ex]
Item & N & Mean & St.Dev & Min & Max & N & Mean & St.Dev & Min & Max \\ 
\hline \\[-1.8ex]

It’s not as easy as the default stargazer tables, and you’ll need to use trial and error to make your \(\LaTeX\) tables look good, but it’s not impossible! Good luck!

Summary Statistic Tables

Jack Jacobs

7/18/2023

Step 1: Import necessary libraries and data

Step 2: Clean your data (different for each user)

IMPORTANT: Convert all fields to numeric

Back to data cleaning

Step 3: Prepare and export `stargazer` table

Step 3.1: Choose better column titles

Step 3.2: Create summary statistics broken out by treatment status

Step 3.3: Export `stargazer` table

Step 3.4: Directly edit \(\LaTeX\) table header

Summary Statistic Tables

Jack Jacobs

7/18/2023

Step 1: Import necessary libraries and data

Step 2: Clean your data (different for each user)

IMPORTANT: Convert all fields to numeric

Back to data cleaning

Step 3: Prepare and export stargazer table

Step 3.1: Choose better column titles

Step 3.2: Create summary statistics broken out by treatment status

Step 3.3: Export stargazer table

Step 3.4: Directly edit \(\LaTeX\) table header

Step 3: Prepare and export `stargazer` table

Step 3.3: Export `stargazer` table