# Core data + modelling
library(tidyverse)
library(lmtest)
library(sandwich)
# Table packages
library(stargazer)
library(texreg)
library(huxtable)
library(modelsummary) # <-- new package added
# Data
tut5 <- read_csv("tut5.csv")
# Keep the variables used in the tutorial regression
dat <- tut5 %>%
select(countryname, gdppc, healthpc) %>%
filter(!is.na(gdppc), !is.na(healthpc))
# Model
m_ols <- lm(healthpc ~ gdppc, data = dat)
# Robust covariance + robust tests
V_hc1 <- vcovHC(m_ols, type = "HC1")
ct_hc1 <- coeftest(m_ols, vcov = V_hc1)
se_hc1 <- sqrt(diag(V_hc1))
p_hc1 <- ct_hc1[, 4]0.1 Package Comparison Overview
| Package | Descriptive Stats | Regression Tables | Robust / HC SE | Custom GOF Control | Formatting Flexibility | LaTeX/HTML/Word | Best Use Case |
|---|---|---|---|---|---|---|---|
| modelsummary | Excellent | Excellent | Excellent | Very Good | High | Excellent | Default modern workflow |
| texreg | Limited | Excellent | Excellent | Excellent | Moderate | Excellent | Manual control of regression output |
| huxtable | Very Good | Very Good | Very Good | High | Excellent | Excellent | Highly polished tables |
| stargazer | Good | Good | Good | Limited | Low–Moderate | Good (LaTeX) | Quick textbook-style tables |
| xtable | Basic | Basic | Manual | Limited | Low | LaTeX only | Simple LaTeX conversion |
Colour Guide:
- 🟢 Green = Strongest capability
- 🔵 Light Blue = Very strong
- 🟡 Yellow = Recommended default workflow
This table summarises the relative strengths of each package. In practice, the choice depends on the balance between ease of use, formatting control, and econometric transparency required.
0.2 Summary (what to use when)
This note compares stargazer, texreg, huxtable, and modelsummary for two common tasks:
- Descriptive statistics tables
- Fastest “one-liner”:
stargazer(..., summary = TRUE) - Most flexible formatting (HTML/Word/PDF):
huxtable(and it plays nicely with Quarto) - Most modern + consistent workflow:
modelsummary::datasummary()(excellent defaults, flexible) - Not really intended for this:
texreg(possible via a workaround, but not its strength)
- Fastest “one-liner”:
- Regression tables (OLS + HC robust SE, and joint tests)
- Econometrics-friendly, minimal code:
texreg(great for multi-model tables + custom GOF rows) - Classic / familiar:
stargazer(robust SEs are possible, but overall/robust F-tests require extra work) - Best control over styling/output formats:
huxtable(especially viahuxreg()) - Best “one interface for everything” (including robust/clustered SE):
modelsummary::modelsummary()
- Econometrics-friendly, minimal code:
In the examples below we use the Tutorial 5 dataset (tut5.csv) and estimate:
\[ \text{healthpc}_i = \beta_0 + \beta_1\,\text{gdppc}_i + u_i \]
0.3 Setup
1 Descriptive statistics tables
We illustrate the same basic descriptive-statistics table using each package.
We will summarise healthpc and gdppc.
vars_for_desc <- dat %>% select(healthpc, gdppc)1.1 Descriptive stats with stargazer
stargazer() can produce a quick “journal-style” summary table with almost no code.
# ensure it's a plain data.frame (not tibble)
tut5_df <- as.data.frame(tut5)
# (optional but recommended) keep numeric columns only for descriptives
tut5_num <- tut5_df[sapply(tut5_df, is.numeric)]
stargazer(
tut5_num[c("healthpc","gdppc")],
type = "html",
digits = 3,
title = "Descriptive statistics (stargazer)"
)| Statistic | N | Mean | St. Dev. | Min | Max |
| healthpc | 187 | 1,139.383 | 1,850.334 | 18.521 | 10,515.300 |
| gdppc | 208 | 18,731.120 | 27,840.020 | 238.784 | 185,978.600 |
Pros: extremely quick; familiar layout.
Cons: limited styling control; not actively developed; can feel dated in HTML.
1.2 Descriptive stats with huxtable
With huxtable, you typically compute the stats (with dplyr or base R) and then convert to a huxtable for styling/output.
desc_tbl <- vars_for_desc %>%
summarise(across(everything(),
list(
N = ~sum(!is.na(.)),
Mean = ~mean(., na.rm = TRUE),
SD = ~sd(., na.rm = TRUE),
Min = ~min(., na.rm = TRUE),
Max = ~max(., na.rm = TRUE)
))) %>%
pivot_longer(everything(),
names_to = c("Variable", ".value"),
names_sep = "_")
ht <- as_hux(desc_tbl)
ht <- set_caption(ht, "Descriptive statistics (huxtable)")
ht| Variable | N | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| healthpc | 184 | 1.16e+03 | 1.86e+03 | 18.5 | 1.05e+04 |
| gdppc | 184 | 1.56e+04 | 2.36e+04 | 239 | 1.86e+05 |
Pros: best styling control; great for Quarto (HTML/PDF/Word).
Cons: a bit more code than stargazer() for a simple table.
1.3 Descriptive stats with modelsummary
modelsummary has a purpose-built descriptive-statistics interface: datasummary() / datasummary_skim().
datasummary_skim(vars_for_desc,
fmt = 3,
title = "Descriptive statistics (modelsummary)")| Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
|---|---|---|---|---|---|---|---|---|
| healthpc | 184 | 0 | 1155.162 | 1861.063 | 18.521 | 389.957 | 10515.300 | |
| gdppc | 184 | 0 | 15608.773 | 23637.569 | 238.784 | 6220.876 | 185978.600 |
Pros: modern defaults; flexible; consistent workflow for both descriptives and regressions.
Cons: learning the datasummary() formula interface can take a moment (worth it).
1.4 Descriptive stats with texreg (workaround)
texreg is primarily for regression models. For descriptives, the most honest approach is: - use huxtable, modelsummary, or stargazer, and - reserve texreg for regressions.
That said, if you must stay inside texreg, you can create a custom texreg object. Below is a minimal “mean (sd)” table using createTexreg().
means <- sapply(vars_for_desc, mean, na.rm = TRUE)
sds <- sapply(vars_for_desc, sd, na.rm = TRUE)
tex_desc <- createTexreg(
coef.names = names(means),
coef = as.numeric(means),
se = as.numeric(sds),
pvalues = rep(NA_real_, length(means)),
gof.names = c("N"),
gof = c(nrow(vars_for_desc)),
gof.decimal = c(FALSE)
)
screenreg(list(tex_desc),
custom.model.names = "Descriptives",
digits = 3,
caption = "Descriptive statistics (texreg workaround): coef = mean, se = sd")
======================
Descriptives
----------------------
healthpc 1155.162
(1861.063)
gdppc 15608.773
(23637.569)
----------------------
N 184
======================
*** p < 0.001; ** p < 0.01; * p < 0.05
Pros: possible if you need everything to run through texreg.
Cons: not natural; you lose the richer descriptive layouts other packages provide.
2 Regression tables (OLS vs HC1)
We now report the same regression in two columns:
- Column 1: OLS SE (homoskedasticity-based)
- Column 2: HC1 robust SE (heteroskedasticity-consistent)
We also show the practical issue you identified: the “usual” model F-statistic vs a robust joint test (Wald F).
2.1 Regression output with stargazer
stargazer can accept a list of standard errors, but it will still print the usual model F-stat (from the lm object). A robust overall test needs extra work if you want it reported.
# estimate model **1** using OLS (ignoring any heteroskedasticity)
# OLS Model (Ignore heteroskedasticity)
# dependent variable: healthpc
# explanatory variable: gdppc
reg3 <- lm(healthpc ~ gdppc, data=tut5)
# Sample F statistic
wald_ols <- waldtest(reg3) # sample F statistics of overall significance
RSS_reg3 <- deviance(reg3) # residual sum of squares for reg3
resid_reg3 <- reg3$residuals # generate series for residuals for reg3
yhat_reg3 <- reg3$fitted.values # generate series for fitted values for reg3
demdf1 <- df.residual(reg3) # denominator df
fstat1 <- round(wald_ols$"F"[2], digits=4) # Sample value of F stat
pvalf1 <- round(wald_ols$'Pr(>F)'[2], digits=4) # p value of F test
numdf1 <- abs(wald_ols$"Df"[2]) # numerator df Now estimate model 2 using Huber-White standard errors (this is using the code given in tute5.R )
# Use sandwich package
cov_reg3 <- vcovHC(reg3, type = "HC1")
# print(cov_reg3) # print robust variance=covariance matrix
# coeftest(reg3, vcov=cov_reg3) # print estimates with robust std.errors
robust_se_reg3 <- sqrt(diag(cov_reg3)) # robust std errors as sq. root of main diagonals
# Adjusted F statistic (Robust standard errors)
wald_r <- waldtest(reg3, vcov = cov_reg3) # sample F statistic using the robust variance-cov matrix
# print(wald_r)
fstat2 <- round(wald_r$"F"[2], digits=4) # Sample value of F stat based on robust var-cov matrix
pvalf2 <- round(wald_r$'Pr(>F)'[2], digits=4) # p value of F test
numdf2 <- abs(wald_r$"Df"[2]) # numerator df Then, again using the code provided in tute5.R , report the results using stargazer
Still need to use omit.stat="f" in this and then include add.lines
Little modification to make the table more readable ?
f_C <- qf(p=.01, df1=1, df2=181, lower.tail=FALSE)
F_crit <- sprintf("F(%.0f,%.0f)= %.3f",
numdf1, demdf1,f_C)stargazer(reg3, reg3, type = "html",
dep.var.labels=c("Health Expenditure, per capita (current ($US)"),
covariate.labels=c("Intercept", "GDP, per capita (current ($US)"),
column.labels = c("OLS", "Robust (White)"),
se = list(NULL, robust_se_reg3, NULL),
omit.stat = "f",
title = "Figure 5: Model (4) Estimation Results",
add.lines = list(c("F Statistic", fstat1, fstat2),
c("F critical", F_crit, F_crit),
c("p-value", sprintf("%.5f",pvalf1),
sprintf("%.5f",pvalf2))),
digits=4, align=TRUE,
intercept.bottom=FALSE,
star.cutoffs = c(0.05, 0.01, 0.001))| Dependent variable: | ||
| Health Expenditure, per capita (current ($US) | ||
| OLS | Robust (White) | |
| (1) | (2) | |
| Intercept | 160.4437 | 160.4437 |
| (96.8937) | (191.1788) | |
| GDP, per capita (current ($US) | 0.0637*** | 0.0637*** |
| (0.0034) | (0.0162) | |
| F Statistic | 345.7792 | 15.5561 |
| F critical | F(1,182)= 6.777 | F(1,182)= 6.777 |
| p-value | 0.00000 | 0.00010 |
| Observations | 184 | 184 |
| R2 | 0.6552 | 0.6552 |
| Adjusted R2 | 0.6533 | 0.6533 |
| Residual Std. Error (df = 182) | 1,095.8730 | 1,095.8730 |
| Note: | p<0.05; p<0.01; p<0.001 | |
Pros: quick; familiar.
Cons: robust overall tests (Wald F) are not “native”; custom lines can get fiddly.
2.2 Regression output with texreg (recommended for this use-case)
This is the workflow we used earlier: keep both columns as lm objects and override the SE/p-values in the robust column, then manually insert the robust Wald F (and p-value).
# Classical (usual) F-stat and p-value
fs <- summary(m_ols)$fstatistic
F_lm <- unname(fs["value"])
df1 <- unname(fs["numdf"])
df2 <- unname(fs["dendf"])
p_lm <- pf(F_lm, df1, df2, lower.tail = FALSE)
# Robust (HC1) Wald F and p-value for overall significance
wald <- waldtest(m_ols, vcov = V_hc1, test = "F")
F_hc <- unname(wald[2, "F"])
p_hc_F <- unname(wald[2, "Pr(>F)"])
add_stars <- function(p) {
if (is.na(p)) ""
else if (p < 0.001) "***"
else if (p < 0.01) "**"
else if (p < 0.05) "*"
else if (p < 0.1) "."
else ""
}
F_label <- sprintf("F(%d, %d)", df1, df2)
F_lm_str <- sprintf("%.3f%s", F_lm, add_stars(p_lm))
F_hc_str <- sprintf("%.3f%s", F_hc, add_stars(p_hc_F))
p_lm_str <- sprintf("(%.3f)", p_lm)
p_hc_str <- sprintf("(%.3f)", p_hc_F)
screenreg(
list(m_ols, m_ols),
custom.model.names = c("OLS", "HC1"),
override.se = list(NULL, se_hc1),
override.pvalues = list(NULL, p_hc1),
include.fstat = FALSE,
custom.gof.rows = list(
F_label = c(F_lm_str, F_hc_str),
"(p-value)" = c(p_lm_str, p_hc_str)
),
digits = 3,
caption = "Regression results (texreg): OLS vs HC1, with usual F and robust Wald F"
)Warning in override(models = models, override.coef = override.coef, override.se
= override.se, : Number of SEs provided does not match number of coefficients
in model 1. Using default SEs.
Warning in override(models = models, override.coef = override.coef, override.se
= override.se, : Number of p-values provided does not match number of
coefficients in model 1. Using default p-values.
======================================
OLS HC1
--------------------------------------
(Intercept) 160.444 160.444
(96.894) (191.179)
gdppc 0.064 *** 0.064 ***
(0.003) (0.016)
--------------------------------------
F_label 345.779*** 15.556***
(p-value) (0.000) (0.000)
R^2 0.655 0.655
Adj. R^2 0.653 0.653
Num. obs. 184 184
======================================
*** p < 0.001; ** p < 0.01; * p < 0.05
Pros: very econometrics-friendly; easy robust column; easy custom GOF rows.
Cons: descriptives are awkward; very bespoke styling is less convenient than huxtable.
2.3 Regression output with huxtable
huxreg() is a nice, readable interface. You can supply robust SE/p-values by constructing a second “robust” column (similar conceptually to the texreg override approach).
library(huxtable)
library(sandwich)
library(lmtest)
# --- Fit model (adjust to your variables/data) ---
# Example:
# dat <- read.csv("tut5.csv")
# m_ols <- lm(y ~ gdppc, data = dat)
# Robust covariance + robust SEs
cov1 <- vcovHC(m_ols, type = "HC1")
se_hc1 <- sqrt(diag(cov1))
se_hc1 <- se_hc1[names(coef(m_ols))] # align with coef order
# Build base huxtable (two columns = same model)
hux_tbl <- huxreg(
"OLS" = m_ols,
"HC1" = m_ols,
coefs = c("Intercept" = "(Intercept)", "GDP per capita" = "gdppc")
)
# --- Replace SE strings in HC1 column (col 3) ---
mat <- huxtable::contents(hux_tbl)
to_chr <- function(x) vapply(x, function(z) paste(z, collapse = ""), character(1))
mat_chr <- apply(mat, 2, to_chr)
# SE rows are rows where OLS column begins with "("
se_rows <- which(grepl("^\\(", mat_chr[, 2]))
coef_rows <- se_rows - 1
# Map displayed term labels to underlying coef names
lookup <- c("Intercept" = "(Intercept)", "GDP per capita" = "gdppc")
display_terms <- mat_chr[coef_rows, 1]
coef_names_in_table <- unname(lookup[display_terms])
# Robust SEs in the same order as terms displayed
se_hc1_in_table_order <- se_hc1[coef_names_in_table]
# Overwrite HC1 SE cells
mat[se_rows, 3] <- sprintf("(%.3f)", se_hc1_in_table_order)
huxtable::contents(hux_tbl) <- mat
# --- Compute overall F stats (OLS and robust Wald F) ---
fs <- summary(m_ols)$fstatistic
F_lm <- unname(fs["value"])
df1 <- unname(fs["numdf"])
df2 <- unname(fs["dendf"])
p_lm <- pf(F_lm, df1, df2, lower.tail = FALSE)
wald <- waldtest(m_ols, vcov = cov1, test = "F")
F_hc <- unname(wald[2, "F"])
p_hc_F <- unname(wald[2, "Pr(>F)"])
add_stars <- function(p) {
if (p < 0.001) "***"
else if (p < 0.01) "**"
else if (p < 0.05) "*"
else if (p < 0.1) "."
else ""
}
F_label <- sprintf("F(%d, %d)", df1, df2)
F_lm_str <- sprintf("%.3f%s", F_lm, add_stars(p_lm))
F_hc_str <- sprintf("%.3f%s", F_hc, add_stars(p_hc_F))
p_lm_str <- sprintf("(%.3f)", p_lm)
p_hc_str <- sprintf("(%.3f)", p_hc_F)
# --- Append F rows to table (safest via character matrix -> as_hux) ---
mat2 <- huxtable::contents(hux_tbl)
mat2_chr <- apply(mat2, 2, to_chr)
mat2_chr <- rbind(
mat2_chr,
c(F_label, F_lm_str, F_hc_str),
c("(p-value)", p_lm_str, p_hc_str)
)
hux_final <- as_hux(mat2_chr)
# Simple styling
bold(hux_final)[1, ] <- TRUE
align(hux_final) <- "center"
align(hux_final)[, 1] <- "left"
set_caption(hux_final, "Regression results (huxtable): OLS vs HC1 (robust SE + overall F-tests)")| OLS | HC1 | |
| Intercept | 160.443652225566 | 160.443652225566 |
| (96.8937191911367) | (191.179) | |
| GDP per capita | 0.0637281798232077 *** | 0.0637281798232077 *** |
| (0.00342714168873277) | (0.016) | |
| N | 184 | 184 |
| R2 | 0.655158846931442 | 0.655158846931442 |
| logLik | -1547.95170959669 | -1547.95170959669 |
| AIC | 3101.90341919338 | 3101.90341919338 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | *** p < 0.001; ** p < 0.01; * p < 0.05. | *** p < 0.001; ** p < 0.01; * p < 0.05. |
| F(1, 182) | 345.779*** | 15.556*** |
| (p-value) | (0.000) | (0.000) |
hux_final| OLS | HC1 | |
| Intercept | 160.443652225566 | 160.443652225566 |
| (96.8937191911367) | (191.179) | |
| GDP per capita | 0.0637281798232077 *** | 0.0637281798232077 *** |
| (0.00342714168873277) | (0.016) | |
| N | 184 | 184 |
| R2 | 0.655158846931442 | 0.655158846931442 |
| logLik | -1547.95170959669 | -1547.95170959669 |
| AIC | 3101.90341919338 | 3101.90341919338 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | *** p < 0.001; ** p < 0.01; * p < 0.05. | *** p < 0.001; ** p < 0.01; * p < 0.05. |
| F(1, 182) | 345.779*** | 15.556*** |
| (p-value) | (0.000) | (0.000) |
Pros: excellent control over styling and output formats.
Cons: more coding for desired output; robust inference is often easiest via modelsummary rather than manual cell edits.
2.4 Regression output with modelsummary (cleanest “modern” approach)
modelsummary() makes robust SEs straightforward and consistent. You can provide different vcov specs per model column.
models <- list("OLS" = m_ols, "HC1" = m_ols)
modelsummary(
models,
vcov = list(NULL, "HC1"),
statistic = "({std.error})",
stars = TRUE,
fmt = 3,
title = "Regression results (modelsummary): OLS vs HC1"
)| OLS | HC1 | |
|---|---|---|
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | ||
| (Intercept) | 160.444+ | 160.444 |
| (96.894) | (191.179) | |
| gdppc | 0.064*** | 0.064*** |
| (0.003) | (0.016) | |
| Num.Obs. | 184 | 184 |
| R2 | 0.655 | 0.655 |
| R2 Adj. | 0.653 | 0.653 |
| AIC | 3101.9 | 3101.9 |
| BIC | 3111.5 | 3111.5 |
| Log.Lik. | -1547.952 | -1547.952 |
| F | 345.779 | 15.556 |
| RMSE | 1089.90 | 1089.90 |
| Std.Errors | HC1 | |
2.4.1 Adding an overall (robust) Wald F line
modelsummary is happiest when you add custom rows via add_rows. We’ll compute the same “usual F” and “robust Wald F” and append them.
extra_rows <- data.frame(
term = c(F_label, "(p-value)"),
OLS = c(F_lm_str, p_lm_str),
HC1 = c(F_hc_str, p_hc_str),
check.names = FALSE
)
modelsummary(
models,
vcov = list(NULL, "HC1"),
statistic = "({std.error})",
stars = TRUE,
fmt = 3,
title = "Regression results (modelsummary): with usual F and robust Wald F",
add_rows = extra_rows
)| OLS | HC1 | |
|---|---|---|
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | ||
| (Intercept) | 160.444+ | 160.444 |
| (96.894) | (191.179) | |
| gdppc | 0.064*** | 0.064*** |
| (0.003) | (0.016) | |
| Num.Obs. | 184 | 184 |
| R2 | 0.655 | 0.655 |
| R2 Adj. | 0.653 | 0.653 |
| AIC | 3101.9 | 3101.9 |
| BIC | 3111.5 | 3111.5 |
| Log.Lik. | -1547.952 | -1547.952 |
| F | 345.779 | 15.556 |
| RMSE | 1089.90 | 1089.90 |
| Std.Errors | HC1 | |
| F(1, 182) | 345.779*** | 15.556*** |
| (p-value) | (0.000) | (0.000) |
Pros: arguably the cleanest interface today; robust/clustered SE easy; works great in Quarto.
Cons: learning the full set of options (especially for custom layouts) can take a bit of time.
2.5 Practical notes
- The usual model F-stat in
lmoutput relies on homoskedasticity-based variance assumptions. - The robust “overall significance” test is a Wald/F test computed using a robust covariance matrix (HC1 here).
- If you show both in the same table, it’s good practice to label the robust one as Wald F (HC1) or similar in notes/caption.
2.6 References
stargazercheatsheet: https://www.jakeruss.com/cheatsheets/stargazer/#TOCtexreg: https://jrnold.github.io/intro-methods-notes/formatting-tables.htmlhuxtable: https://hughjonesd.github.io/huxtable/modelsummary: https://modelsummary.com/vignettes/modelsummary.html
2.7 Recommended Workflow
2.7.1 Practical Recommendation
modelsummary: Best all-round choice. Clean syntax, handles OLS, HC, clustered SEs, descriptive statistics, and multiple output formats (LaTeX, HTML, Word). Recommended default for most teaching and applied work.
texreg: Excellent when you want full manual control over goodness-of-fit rows (e.g., inserting classical and robust F-statistics with stars). Very transparent for econometrics teaching.
huxtable: Best when you want maximum formatting flexibility and polished output. Especially useful in Quarto workflows and when styling tables carefully.
stargazer: Still useful for quick textbook-style tables, but development has slowed and customization is limited compared to newer packages.
2.7.2 Suggested Default Strategy
- Use modelsummary for descriptive statistics and regression tables.
- Switch to texreg when you need explicit control over classical vs. robust inference presentation.