As Bailey mentions, quadratic terms (that is, models like \(Y= \beta_0 + \beta_1 X + \beta_2 X^2 + \epsilon\)) are special cases of polynomial models (that is, models of the form \(Y= \beta_0 + \beta_1 X + \beta_2 X^n + \epsilon\)). We can use OLS to estimate polynomial models.
As Bailey also mentions: quadratics are just about the only version of a polynomial you should ever use; anything more than a term in the second power and nobody’s going to believe you! (That’s a bit of an exaggeration but for now stick to quadratics—no higher powers until I permit it.)
Why do we want to use quadratic terms? Let’s take a look at the following plot of maternal mortality rates and per-capita GDP.
library(ggplot2)
library(foreign)
library(memisc)
library(pander)
data <- read.csv("IHMEMMRdemo.csv")
data$ssa <- 0
data[data$ht_region=="4. Sub-Saharan Africa",]$ssa <- 1
data$petrostate <- NA
data[data$oipc>=1000,]$petrostate <- 1
data[data$oipc<1000 & data$oipc>=0,]$petrostate <- 0
data$oipc_1k <- data$oipc/1000
plot1 <- ggplot(data,(aes(y=ihme_mmr,x=wdi_gdpc)))
plot1 + geom_point() +
ggtitle("Maternal Mortality Vs. Development") +
labs(x="Per Capita GDP",y="Maternal Mortality") +
stat_smooth(method=lm,se=TRUE) + # adds OLS regression + standard errors + trend line
theme_bw()
Is a linear model really appropriate here? Probably not: the data are “L”-shaped, but the predictions we get from the linear model are, well, straight—for every change in X (Per Capita GDP) we see a constant change in Y (maternal mortality). But just by eyeballing this, we are pretty confident that a small change in X when X is small has a big effect on Y and even a large effect in X when X is already large will have a very small effect on Y.
In the plot below, we use the I() function to allow for an interaction term in the trend line, so that we add a quadratic – squared – term:
plot1 <- ggplot(data,(aes(y=ihme_mmr,x=wdi_gdpc)))
plot1 + geom_point() +
ggtitle("Maternal Mortality Vs. Development") +
labs(x="Per Capita GDP",y="Maternal Mortality") +
stat_smooth(method=lm,se=TRUE,formula=y~x+I(x^2)) +
theme_bw()
This equation takes the form \(Y = \beta_0 + \beta_1 X + \beta_2 X^2 +\epsilon\). Consequently, to understand the marginal effect of a change in X, we can’t just look at \(\beta_1\) in isolation anymore: now, every change in X has to be understood through two terms in our equation. In particular, for a unit change in X, we expect a change in Y of \(\beta_1 + 2\beta X\).
The resulting fitted curve better fits the observed distribution of data to an extent. It now seems to do a better job of fitting the data – for one thing, it predicts fewer negative maternal mortality events (that is, at the extreme, our linear model predicted that rich countries would have mothers spontaneously appear!). But there’s still a wide range of X in which our model is returning nonsensical answers, and we’re not even doing a great job of explaining either extreme of X.
At this point, we begin to think about how to transform our variables to come up with an analysis that will make some sense. Consider the distribution of maternal mortality. How should we best model it?
qplot(ihme_mmr,data=data,geom="histogram")
This is a highly skewed, positive distribution. The choice of when to use a transformation is always a little bit of art and a little bit of trial and error, but logs and quadratic transformations are probably 95 percent or more of all transformations, so start with those two:
qplot(log(ihme_mmr),data=data,geom="histogram")
qplot(ihme_mmr^2,data=data,geom="histogram")
The logged variable histogram does a much better job of straightening out the distribution than the squared term (some time with the math of squaring small things and logging small things will explain why this is so).
So let’s plug our logged term into our equation and plot it and our regression equation:
plot1 <- ggplot(data,(aes(y=log(ihme_mmr),x=wdi_gdpc)))
plot1 + geom_point() +
ggtitle("Maternal Mortality Vs. Development") +
labs(x="Per Capita GDP",y="Maternal Mortality (Log)") +
stat_smooth(method=lm,se=TRUE,formula=y~x+I(x^2)) +
theme_bw()
This seems better: First, visually, our predicted relationship looks much closer to what it should be – the U-shaped curve fits the data much better. Second, statistically, our predictions now make sense: we’re no longer making any silly predictions about the spontaneous appearance of mothers (what else does negative maternal mortality mean?).
However!
We’re no longer estimating \(Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \epsilon\); now it’s \(log(Y) = \beta_0 + \beta_1 X + \beta_2 X^2 + \epsilon\). That means our interpretation of \(\beta_1\) and \(\beta_2\) have to change, as well. Bailey, on page 219, explains this in more detail. Essentially, the \(\hat{\beta_j}\) we are now estimating now represent the percentage change in Y associated with a one-unit increase in X. Turning to the actual regressions…
model1 <- lm(log(ihme_mmr) ~ wdi_gdpc + I(wdi_gdpc^2),data=data)
summary(model1)
##
## Call:
## lm(formula = log(ihme_mmr) ~ wdi_gdpc + I(wdi_gdpc^2), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4461 -0.6497 0.0394 0.5557 3.6676
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.882e+00 1.259e-01 46.70 < 2e-16 ***
## wdi_gdpc -1.992e-04 1.509e-05 -13.21 < 2e-16 ***
## I(wdi_gdpc^2) 2.455e-09 3.053e-10 8.04 2.09e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9789 on 156 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.6582, Adjusted R-squared: 0.6538
## F-statistic: 150.2 on 2 and 156 DF, p-value: < 2.2e-16
We can see that the estimated relationship is indeed statistically significant. (Bear in mind that this is not a great regression–it’s really still bivariate, even if it’s well specified; there’s no other control variables here!)
The interpretation is a little trickier than in Bailey’s example, however, because here we have an interacted term. So, as a first cut, we need to make sure that we interpret marginal changes in X’s by doing a little arithmetic.
For very low levels of per capita GDP ($1,000 USD, say), the effect of a unit change of X will be the derivative with respect to Y: \[\frac{\delta y}{\delta x} = \beta_1 + 2 \beta_2 x\]. Subbing in our estimated values, \[\beta_1 + 2 \beta_2 X\] \[-0.0002 + 0.0000000025 \times 2 \times 1000\] \[0.0001942\] (with some adjustments for rounding). That’s not much, although that is the effect of a $1 change on something measured in terms of a rate.
For much higher levels of per capita GDP ($20,000 USD, say), the effect will be different: \[\beta_1 + 2 \beta_2 X\] \[-0.0002 + 2 \times 0.0000000025 \times 20000\] \[0.0000992\]
That’s a small number – but about 50 percent smaller than the earlier effect size.
For a very high level of per capita GDP ($50,000 USD): \[\beta_1 + 2 \beta_2 X\] \[-0.0002 + 2 \times 0.0000000025 \times 50000\] \[0.0000508\]
The size of the marginal effect, in other words, is diminishing …. until we get to a larger set of X’s:
\[\beta_1 + 2 \beta_2 X\] \[-0.0002 + 2 \times 0.0000000025 \times 800000\] \[0.000208\]
At the far right, maternal mortality is predicted to increase with every additional unit of GDP. That’s actually a fascinating finding, isn’t it? We will return to it later. For now, just concentrate on the fact that the marginal impact of this X on this Y changes with the value of X itself.