Marginal Effects for Logged Outcomes

Sometimes, we want to test treatment effects for one or more treatments against a control group, but the outcome in our model is strongly right skewed. We could log-transform the outcome to ensure normally distributed residuals, but what in the world do we do with a beta coefficient for a logged outcome? Fortunately, the margins package in R is well equipped to produce marginal effects for a range of models, including the formal specification of a log-OLS model, known as a gamma model.

Packages

First, let’s briefly install and activate the following packages:

# if you haven't already, please uncomment and install these packages.
# install.packages(c("tidyverse", "margins")) 
library(tidyverse) # for data wrangling
library(margins) # for marginal effects

Example Dataset

As our example dataset, we will use the mtcars dataset, a popular simple dataset of 32 automobiles’ performance. We’ll test the treatment effect of an automatic versus manual transmission on miles per gallon.

First, let’s load in our dataset. (If you’ve got your own dataset, you’ll want to read it in using read_csv from the readr package, pre-loaded via the tidyverse package.) We’ll save our dataset as dat.

dat <- mtcars %>%
  # Let's grab just these variables
  select(mpg, am, hp) %>%
  # Turn this variable into a factor, ordered via the levels command, with the following labels
  mutate(am = factor(am, levels = c(0, 1), labels = c("automatic", "manual")))

Let’s take a peak at the first few rows.

  • Each row is a car model.

  • mpg shows the car’s miles per gallon, our outcome.

  • am shows transmission type (0 now is “automatic”, 1 now is “manual”), our categorical variable.

  • hp shows horsepower, a relevant control variable.

  • Note: in am, "automatic" was put first in the factor() function’s levels ( c(0, 1) ) above, so now it is our baseline category.

dat %>% head()
                   mpg        am  hp
Mazda RX4         21.0    manual 110
Mazda RX4 Wag     21.0    manual 110
Datsun 710        22.8    manual  93
Hornet 4 Drive    21.4 automatic 110
Hornet Sportabout 18.7 automatic 175
Valiant           18.1 automatic 105

Distribution

When we check the distribution of mpg, we see that it is slightly right-skewed. Let’s log-transform to correct that, and then make a log-OLS model (and later a gamma model).

Fairly right skewed, right?

dat$mpg %>% hist()

In contrast, the logged version follows a somewhat more normal bell curve, which will lead to a better fitting OLS model.

dat$mpg %>% log() %>% hist()

Model

So, let’s make a basic ordinary least squares model, transforming the outcome variable as so:

m <- lm(formula = log(mpg) ~ am + hp, data = dat)

We can view our model equation here.

m

Call:
lm(formula = log(mpg) ~ am + hp, data = dat)

Coefficients:
(Intercept)     ammanual           hp  
   3.296225     0.246651    -0.002992  

This shows that, holding all else constant, as transmission type changes from "automatic" to "manual", the log of mpg increases by 0.2466513.

We can exponentiate that beta coefficient to get it back in the original units, the projected increase in y, but you can’t back-transform standard errors and confidence intervals.

Interestingly, this produces the exact same coefficients as a gamma model with a log-link function.

g <- glm(formula = mpg ~ am + hp, family = Gamma(link = "log"), data = dat)
g

Call:  glm(formula = mpg ~ am + hp, family = Gamma(link = "log"), data = dat)

Coefficients:
(Intercept)     ammanual           hp  
   3.300476     0.247044    -0.002961  

Degrees of Freedom: 31 Total (i.e. Null);  29 Residual
Null Deviance:      2.735 
Residual Deviance: 0.576    AIC: 159.6

See? Coefficients are exactly the same. But, fortunately, when you run predictions, the gamma model outputs in the original units of the outcome, not logged units. Super handy. Those predictions can be converted into marginal effects.

Marginal Effects with a Gamma Model

Rather than exponentiating beta coefficients, I recommend that you adopt the model that best fits your outcome variable’s distribution, and then calculate marginal effects instead using the margins package.

# Let's calculate the marginal effect at the means (AME), with appropriate standard errors, p-values, and lower and upper confidence intervals, right here.
margins(g, variables = "am", type = "response") %>% summary()
   factor    AME     SE      z      p  lower  upper
 ammanual 5.0178 1.0640 4.7157 0.0000 2.9323 7.1033

Super! Wow, that was so simple! This tells us that for an otherwise average car in the dataset, that car tends to get 5.0178 (95% CI 2.93 - 7.10) more miles per gallon if it had a manual transmission than if it had an automatic transmission. Plus, this difference is super duper statistically significant.

On the other hand, maybe you don’t want to know about an average observation, but want to know how the marginal effect of transmission type varies at higher or lower levels of other variables. If so, margins can do that using the at = command. For example, I input 3 conditions for hp below, producing a matrix of 3 marginal effects for ammanual, with details reported below.

margins(g, variables = "am", type = "response", 
        at = list(hp = c(100, 200, 300)  )) %>% 
  summary()
   factor       hp    AME     SE      z      p  lower  upper
 ammanual 100.0000 5.6534 1.1851 4.7705 0.0000 3.3307 7.9761
 ammanual 200.0000 4.2046 0.9197 4.5718 0.0000 2.4020 6.0071
 ammanual 300.0000 3.1270 0.7300 4.2839 0.0000 1.6964 4.5577

In Brief

# You could model a right-skewed rate with lm, with a logged outcome...
m <- lm(formula = log(mpg) ~ am + hp, data = dat)
# But if you model it as a gamma model, which produces the same estimates anyway,
# you can predict in original units of the outcome, due to its log-link function.
g <- glm(formula = mpg ~ am + hp, family = Gamma(link = "log"), data = dat)
# You can quickly calculate marginal effects at the means, like so:
myeffects <- margins(g, variables = "am", type = "response") %>% summary()
# Or, you can calculate marginal effects, holding observations constant at specific levels, like so:
myeffects2 <- margins(g, variables = "am", type = "response", 
        at = list(hp = c(100, 200, 300)  )) %>% summary()

Hope this helps!