Marginal Effects for Logged Outcomes
Sometimes, we want to test treatment effects for one or more
treatments against a control group, but the outcome in our model is
strongly right skewed. We could log-transform the outcome to ensure
normally distributed residuals, but what in the world do we do with a
beta coefficient for a logged outcome? Fortunately, the
margins package in R is well equipped to produce marginal
effects for a range of models, including the formal specification of a
log-OLS model, known as a gamma model.
Packages
First, let’s briefly install and activate the following packages:
# if you haven't already, please uncomment and install these packages.
# install.packages(c("tidyverse", "margins"))
library(tidyverse) # for data wrangling
library(margins) # for marginal effectsExample Dataset
As our example dataset, we will use the mtcars dataset,
a popular simple dataset of 32 automobiles’ performance. We’ll test the
treatment effect of an automatic versus manual transmission on miles per
gallon.
First, let’s load in our dataset. (If you’ve got your own dataset,
you’ll want to read it in using read_csv from the
readr package, pre-loaded via the tidyverse
package.) We’ll save our dataset as dat.
dat <- mtcars %>%
# Let's grab just these variables
select(mpg, am, hp) %>%
# Turn this variable into a factor, ordered via the levels command, with the following labels
mutate(am = factor(am, levels = c(0, 1), labels = c("automatic", "manual")))Let’s take a peak at the first few rows.
Each row is a car model.
mpgshows the car’s miles per gallon, our outcome.amshows transmission type (0 now is “automatic”, 1 now is “manual”), our categorical variable.hpshows horsepower, a relevant control variable.Note: in
am,"automatic"was put first in thefactor()function’slevels(c(0, 1)) above, so now it is our baseline category.
dat %>% head() mpg am hp
Mazda RX4 21.0 manual 110
Mazda RX4 Wag 21.0 manual 110
Datsun 710 22.8 manual 93
Hornet 4 Drive 21.4 automatic 110
Hornet Sportabout 18.7 automatic 175
Valiant 18.1 automatic 105
Distribution
When we check the distribution of mpg, we see that it is
slightly right-skewed. Let’s log-transform to correct that, and then
make a log-OLS model (and later a gamma model).
Fairly right skewed, right?
dat$mpg %>% hist()
In contrast, the logged version follows a somewhat more normal bell
curve, which will lead to a better fitting OLS model.
dat$mpg %>% log() %>% hist()Model
So, let’s make a basic ordinary least squares model, transforming the outcome variable as so:
m <- lm(formula = log(mpg) ~ am + hp, data = dat)We can view our model equation here.
m
Call:
lm(formula = log(mpg) ~ am + hp, data = dat)
Coefficients:
(Intercept) ammanual hp
3.296225 0.246651 -0.002992
This shows that, holding all else constant, as transmission type
changes from "automatic" to "manual", the log
of mpg increases by 0.2466513.
We can exponentiate that beta coefficient to get it back in the original units, the projected increase in y, but you can’t back-transform standard errors and confidence intervals.
Interestingly, this produces the exact same coefficients as a gamma model with a log-link function.
g <- glm(formula = mpg ~ am + hp, family = Gamma(link = "log"), data = dat)g
Call: glm(formula = mpg ~ am + hp, family = Gamma(link = "log"), data = dat)
Coefficients:
(Intercept) ammanual hp
3.300476 0.247044 -0.002961
Degrees of Freedom: 31 Total (i.e. Null); 29 Residual
Null Deviance: 2.735
Residual Deviance: 0.576 AIC: 159.6
See? Coefficients are exactly the same. But, fortunately, when you run predictions, the gamma model outputs in the original units of the outcome, not logged units. Super handy. Those predictions can be converted into marginal effects.
Marginal Effects with a Gamma Model
Rather than exponentiating beta coefficients, I recommend that you
adopt the model that best fits your outcome variable’s distribution, and
then calculate marginal effects instead using the margins
package.
# Let's calculate the marginal effect at the means (AME), with appropriate standard errors, p-values, and lower and upper confidence intervals, right here.
margins(g, variables = "am", type = "response") %>% summary() factor AME SE z p lower upper
ammanual 5.0178 1.0640 4.7157 0.0000 2.9323 7.1033
Super! Wow, that was so simple! This tells us that for an otherwise average car in the dataset, that car tends to get 5.0178 (95% CI 2.93 - 7.10) more miles per gallon if it had a manual transmission than if it had an automatic transmission. Plus, this difference is super duper statistically significant.
On the other hand, maybe you don’t want to know about an average
observation, but want to know how the marginal effect of transmission
type varies at higher or lower levels of other variables. If so,
margins can do that using the at = command.
For example, I input 3 conditions for hp below, producing a
matrix of 3 marginal effects for ammanual, with details
reported below.
margins(g, variables = "am", type = "response",
at = list(hp = c(100, 200, 300) )) %>%
summary() factor hp AME SE z p lower upper
ammanual 100.0000 5.6534 1.1851 4.7705 0.0000 3.3307 7.9761
ammanual 200.0000 4.2046 0.9197 4.5718 0.0000 2.4020 6.0071
ammanual 300.0000 3.1270 0.7300 4.2839 0.0000 1.6964 4.5577
In Brief
# You could model a right-skewed rate with lm, with a logged outcome...
m <- lm(formula = log(mpg) ~ am + hp, data = dat)
# But if you model it as a gamma model, which produces the same estimates anyway,
# you can predict in original units of the outcome, due to its log-link function.
g <- glm(formula = mpg ~ am + hp, family = Gamma(link = "log"), data = dat)
# You can quickly calculate marginal effects at the means, like so:
myeffects <- margins(g, variables = "am", type = "response") %>% summary()
# Or, you can calculate marginal effects, holding observations constant at specific levels, like so:
myeffects2 <- margins(g, variables = "am", type = "response",
at = list(hp = c(100, 200, 300) )) %>% summary()Hope this helps!