This notebook covers the basic steps needed to create summary statistics tables in \(L^AT_EX\) from an R data.frame.
NB: I’ve heard it pronounced ‘LAY-tex’, ‘LAY-tek’, and ‘LAH-tek’ (where the first syllable is like the middle “la” in “falafel”), and it seems like the more knowledgeable the person is, the less they pronounce it like the English word “latex”. Not really important imo, but a CS professor told me this once so now I’m telling you.
library(stargazer)
library(readxl) # I'm using this to read my data from an Excel file
library(tidyverse) # A suite of packages for manipulating data
I’ll be using survey response data from the Colombia-LfP project.
colombia_LfP_raw <- read_xlsx("../colombia_LfP/HH_Data06.01.23.xlsx")
print(dim(colombia_LfP_raw)) # Just confirming the import worked correctly
## [1] 2024 1823
Every new summary statistics table, even if you’ve worked with the source data before, probably requires a somehow unique data cleaning process. That applies here, too, so this notebook unfortunately won’t be able to walk you through that step of things, but stargazer can cut down the amount of this work if you let it.
In short, you want your data cleaning to take your original dataset and shrink it to only the rows and columns you’re interested in. That’s it! No additional group-bys or aggregations, just filter()s and select()s, mostly. I’ll apply a toy version of some data cleaning I’ve performed on this dataset before here.
# Define vereda geography subset in hh
vereda_status <- c("control-vereda","treatment-vereda")
# Make a subset of six columns including only responses in "Vereda" geographies
hh_vereda <- colombia_LfP_raw %>%
filter(colombia_LfP_raw$treatment_status %in% vereda_status) %>%
select(resp_sex, born, born_head, econstatus, hhmem_size, fhhconsume)
print(dim(hh_vereda)) # Confirming data trimming worked
## [1] 1819 6
stargazer only recognizes numeric fields in its calculation of summary statistics, so ensure that every column that will eventually be included in the data.frame passed to the stargazer() function (which we’ll see soon) is numeric. One example of a process to do this conversion is demonstrated here.
First, find out the types of each column using the str() function
str(hh_vereda)
## tibble [1,819 x 6] (S3: tbl_df/tbl/data.frame)
## $ resp_sex : chr [1:1819] "1" "2" "1" "2" ...
## $ born : chr [1:1819] "0" "0" "1" "1" ...
## $ born_head : chr [1:1819] NA NA NA NA ...
## $ econstatus: num [1:1819] 1 1 1 3 3 2 1 2 1 7 ...
## $ hhmem_size: num [1:1819] 5 1 3 4 5 1 3 3 2 2 ...
## $ fhhconsume: chr [1:1819] "1" "2" "3" "3" ...
# Define all character columns that you want to convert to numeric
# ... as items in a list of strings
convert_to_numeric <- c(
"resp_sex","born","born_head","fhhconsume"
)
# Once the list of columns to convert is defined, use the code below to apply
# ... the conversion to your data.frame
hh_vereda <- hh_vereda %>%
mutate(across(
all_of(convert_to_numeric),
as.numeric
))
str(hh_vereda) # Confirm changes were applied
## tibble [1,819 x 6] (S3: tbl_df/tbl/data.frame)
## $ resp_sex : num [1:1819] 1 2 1 2 1 1 1 1 1 2 ...
## $ born : num [1:1819] 0 0 1 1 0 0 0 0 0 0 ...
## $ born_head : num [1:1819] NA NA NA NA NA NA NA NA NA NA ...
## $ econstatus: num [1:1819] 1 1 1 3 3 2 1 2 1 7 ...
## $ hhmem_size: num [1:1819] 5 1 3 4 5 1 3 3 2 2 ...
## $ fhhconsume: num [1:1819] 1 2 3 3 2 1 3 2 2 3 ...
I have one more thing to do: convert a categorical column representing respondent gender to a boolean field representing whether or not the respondent is female.
# Convert resp_sex categorical responses to binary
hh_vereda$resp_f <- ifelse(
hh_vereda$resp_sex == 2,
1,0
)
# Drop resp_sex
hh_vereda <- hh_vereda %>%
select(resp_f, born, born_head, econstatus, hhmem_size, fhhconsume)
stargazer tableThis is the final step, and at the end of it you’ll have a .tex file with a nicely formatted \(L^AT_EX\) table of summary statistics.
My current dataset includes column titles, like born and fhhconsume, whose meanings might be unclear even to someone who has worked closely on the project. Let’s choose better names for our columns.
# Define a list for renaming columns: "New Name" = "old_name"
renaming_list <- c(
"Respondent Female" = "resp_f",
"Born Locally" = "born",
"Head of HH Born Locally" = "born_head",
"Economic Status" = "econstatus",
"HH Size" = "hhmem_size",
"Forest Covers Basic Needs" = "fhhconsume"
)
# Create a new dataframe with renamed columns
stargazer_prep <- hh_vereda %>%
data.frame() %>% # Include this line to ensure formatting is correct later
rename(all_of(renaming_list))
NB: You could do this in the covariate.labels= argument in stargazer(), but I like doing it this way because the paired-list format makes sure that I won’t mistakenly get the covariate labels out of order.
stargazer tableTime to make the table! Simply call the stargazer() function with a data.frame containing only the columns you’re interested in as the first argument. Use the out= argument to define the file path for your .tex file
stargazer(
stargazer_prep,
out= "colombia_summary.tex",
title= "Table 1: Summary Statistics from Vereda Areas"
)
##
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Wed, Jul 12, 2023 - 5:48:11 PM
## \begin{table}[!htbp] \centering
## \caption{Table 1: Summary Statistics from Vereda Areas}
## \label{}
## \begin{tabular}{@{\extracolsep{5pt}}lccccc}
## \\[-1.8ex]\hline
## \hline \\[-1.8ex]
## Statistic & \multicolumn{1}{c}{N} & \multicolumn{1}{c}{Mean} & \multicolumn{1}{c}{St. Dev.} & \multicolumn{1}{c}{Min} & \multicolumn{1}{c}{Max} \\
## \hline \\[-1.8ex]
## Respondent Female & 1,798 & 0.364 & 0.481 & 0 & 1 \\
## Born Locally & 1,798 & 0.259 & 0.438 & 0 & 1 \\
## Head of HH Born Locally & 339 & 0.251 & 0.434 & 0 & 1 \\
## Economic Status & 1,798 & 2.009 & 1.384 & 1 & 10 \\
## HH Size & 1,798 & 3.314 & 1.728 & 1 & 11 \\
## Forest Covers Basic Needs & 1,796 & 1.981 & 0.795 & 1 & 3 \\
## \hline \\[-1.8ex]
## \end{tabular}
## \end{table}
To see if your code worked, copy it into Overleaf somewhere or use an online interpreter like quicklatex.com.
A bit more about the arguments passed to the stargazer() function:
out=: This argument defines the file that the function’s output is written to. Exclude it if you’d like to display the output to the console or below your Rmd cell, like I’ll do belowtitle=: This argument can be left NULL, but even in that case a \caption{} line will be included in your \(L^AT_EX\) table, so keep in mind you may need to edit the table’s code directly if this clashes with your document’s table labeling system.type=: You don’t need to only make \(L^AT_EX\) tables! Specifying "html" will create a table that looks nicely formatted in a web browser, and "text" will create a nicely formatted ASCII table that looks better in an R Markdown notebook.digits=: Specifies the number digits after the decimal place you want to have the statistics rounded tomedian=: Pass TRUE to this argument if you would like the median to be included in your summary statistics.flip=: flips it when set to TRUEsummary=: If set to FALSE, the stargazer() function will not create a summary statistics table. It will simply display the data.frame you have passed to the function in your specified output format.summary.stat=: Specifies which statistics you would like to include in your summary statistics table. This argument takes a character vector of these statistic codesLet’s look at some of these in an example:
stargazer(
stargazer_prep,
title= "Table 1: Summary Statistics from Vereda Areas",
type= "text",
digits= 1,
median= TRUE
)
##
## Table 1: Summary Statistics from Vereda Areas
## ============================================================
## Statistic N Mean St. Dev. Min Median Max
## ------------------------------------------------------------
## Respondent Female 1,798 0.4 0.5 0 0 1
## Born Locally 1,798 0.3 0.4 0 0 1
## Head of HH Born Locally 339 0.3 0.4 0 0 1
## Economic Status 1,798 2.0 1.4 1 2 10
## HH Size 1,798 3.3 1.7 1 3 11
## Forest Covers Basic Needs 1,796 2.0 0.8 1 2 3
## ------------------------------------------------------------
stargazer for summary statistics, regression results, and other nice-looking table exportsstargazerstargazer’s features from Northeastern University. I like it because it’s tailored to the needs of quantitative social scientists in academia—us!