Chapter 8: Nonlinear Regression Functions

Karim Naguib (Boston University)
10/15/2013

Introduction

So far we've assumed that the effect of a change in $ X $ on $ Y $ is constant across all values of $ X $
If this effect actually changes with $ X $ we have a nonlinear population regression function
We're going to consider two types of nonlinear regression functions
- The slope of the population regression function changes with some variable $ X_1 $
- The population regression function is linear in $ X_1 $, but has a different slope depending on $ X_2 $

A General Strategy for Modeling Nonlinear Regression Functions

Test Scores and District Income

In the previous chapter we regressed average test scores in a district on a few variables including some to control for students' economic background
Suppose we now use a different variable that captures average annual household income per school district

Test Scores and District Income Scatterplot and Linear Regression Line

plot of chunk unnamed-chunk-2

Curved Regression Line

As shown in the scatterplot a straight line is not adequate in describing the relationship between district income and test scores
What is then needed is a curve to fit the data: this can be done using a quadratic function instead of a linear function \[ TestScore_i = \beta_0 + \beta_1 Income_i + \beta_2 Income_i^2 + u_i \]
This is called a quadratic regression model with the population regression function \[ E[TestScore_i|Income_i] = \beta_0 + \beta_1 Income_i + \beta_2 Income_i^2 \]
To carry out OLS estimation using a quadratic model, we simply consider it a multiple regression with two variables: $ Income_i $ and $ Income_i^2 $

Quadratic Regression (1)

test.score.data$avginc.squared <- test.score.data$avginc^2
regress.results <- lm(testscr ~ avginc + avginc.squared, data = test.score.data)
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    607.30174    2.92422  207.68   <2e-16 ***
avginc           3.85099    0.27110   14.20   <2e-16 ***
avginc.squared  -0.04231    0.00488   -8.67   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(regress.results)$adj.r.squared

[1] 0.554

Quadratic Regression (2)

plot of chunk unnamed-chunk-4

A General Formula for Population Regression Functions (1)

A general form of a regression model is

\[ Y_i = \underbrace{f(X_{1i}, X_{2i},\dots, X_{ki})}_{\text{regression function}} + u_i, i = 1,\dots,n \]
This regression function can also be defined as

\[ E[Y_i|X_{1i},X_{2i},\dots, X_{ki}] = f(X_{1i}, X_{2i},\dots, X_{ki}) \]

A General Formula for Population Regression Functions (2)

In the case of a linear population regression function we would have something like

\[ f(X_{1i}, X_{2i},\dots, X_{ki}) = \beta_0 + \beta_1 X_{1i} + \cdots + \beta_k X_{ki} \]
In the case of a nonlinear regression function it could look something like

\[ f(Income_i) = \beta_0 + \beta_1 Income_i + \beta_2 Income_i^2 \]

The Effect on Y Of A Change in X

When we want to know the effect of a $ \Delta X_1 $ on $ Y $ in a linear population regression model we calculated it to be \[ \Delta Y = \beta_1 \Delta X_1 \]
In the case of a nonlinear model this calculation is more complicated because it depends on the dependent variable
For a nonlinear regression function we calculate the expected change in $ Y $, $ \Delta Y $, in response to a change $ \Delta X_1 $ in $ X_1 $ and holding all other variables fixed by \[ \Delta Y = f(X_1 + \Delta X_1, X_2, \dots, X_k) - f(X_1, X_2, \dots, X_k) \]
Since we don't observe the population regression function $ f $ we rely on the estimated function $ \hat{f} $

Changes in District Income

Suppose we wish to calculate the predicted change in test scores in response to a $1,000 increase in district income. Since this predicted change depends on the initial income we consider two cases

an increase from 10 to 11
\[ \begin{aligned} \Delta \widehat{TestScore} = &(\hat{\beta}_0 + \hat{\beta}_1 \times 11 + \hat{\beta}_2 \times 11^2) \\&- (\hat{\beta}_0 + \hat{\beta}_1 \times 10 + \hat{\beta}_2 \times 10^2) \\ = &2.96 \end{aligned} \]
an increase from 40 to 41
\[ \begin{align*} \Delta \widehat{TestScore} = &(\hat{\beta}_0 + \hat{\beta}_1 \times 41 + \hat{\beta}_2 \times 41^2) \\&- (\hat{\beta}_0 + \hat{\beta}_1 \times 40 + \hat{\beta}_2 \times 40^2) \\ = &0.42 \end{align*} \]

Standard Errors of Estimated Effects (1)

Now that we've shown how to estimate $ \Delta \hat{Y} $ we want to construct confidence intervals
For that we need to calculate the standard error of $ \Delta \hat{Y} $
Consider a change in district income from 10 to 11 \[ \Delta \hat{Y} = \hat{\beta}_1 \times (11 - 10) + \hat{\beta}_2 \times (11^2 - 10^2) = \hat{\beta}_1 + 21 \hat{\beta}_2 \]

Standard Errors of Estimated Effects (2)

The standard error of $ \Delta \hat{Y} $ would be

\[ SE(\Delta \hat{Y}) = SE(\hat{\beta}_1 + 21 \hat{\beta}_2) = \frac{|\Delta \hat{Y}|}{\sqrt{F}} \]

Recall that to test the single restriction $ \hat{\beta}_1 + 21\hat{\beta}_2 = 0 $ we use

\[ F = t^2 = \left[\frac{\hat{\beta}_1 + 21\hat{\beta}_2}{SE(\hat{\beta}_1 + 21\hat{\beta}_2)}\right]^2 \]

Interpreting Coefficients in Nonlinear Specifications

In linear specifications it was easy to interpret the meaning of a coefficient

\[ \beta_1 = \frac{\Delta Y}{\Delta X_1} \]

But in a nonlinear specification, we cannot use the same interpretation. It is more useful to use a graph to show the effect of changes in $ X_1 $ on $ Y $ or by calculating $ \Delta Y $.

A General Approach to Modeling Nonlinearities Using Multiple Regression

Identify possible nonlinear relationships
Specify a nonlinear function and estimate its parameters using OLS
Determine whether the nonlinear specification improves upon a linear one
Plot the estimated nonlinear regression function
Estimate the effect on $ Y $ of a change in $ X $

Nonlinear Functions of a Single Independent Variable: Polynomials

Polynomials

The polynomial regression model of degree $ r $ is

\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \cdots + \beta_r X_i^r + u_i \]

It is the general form that included the quadratic function discussed earlier ($ r = 2 $)

Testing If the Population Regression Function is Linear

In order to test the null hypothesis that population regression function is linear we have to test with $ q = r - 1 $ restrictions:

\[ \begin{align*} H_0&: \beta_2 = 0, \beta_3 = 0,\dots, \beta_r = 0 \\ H_1&: \text{at least one }\beta_j \ne 0, j = 2,\dots,r \end{align*} \]

Application to District Income and Test Scores

Consider estimating the cubic regression model ($ r = 3 $) where test scores are regressed on district income

test.score.data$avginc.cubed <- test.score.data$avginc^3
regress.results <- lm(testscr ~ avginc + avginc.squared + avginc.cubed, data = test.score.data)
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     6.00e+02   5.46e+00  109.86  < 2e-16 ***
avginc          5.02e+00   7.87e-01    6.37  4.9e-10 ***
avginc.squared -9.58e-02   3.41e-02   -2.81   0.0051 ** 
avginc.cubed    6.85e-04   4.37e-04    1.57   0.1174    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lht(regress.results, c('avginc.squared = 0', 'avginc.cubed = 0'), test = 'F', vcov=vcovHC(regress.results))

Linear hypothesis test

Hypothesis:
avginc.squared = 0
avginc.cubed = 0

Model 1: restricted model
Model 2: testscr ~ avginc + avginc.squared + avginc.cubed

Note: Coefficient covariance matrix supplied.

  Res.Df Df    F  Pr(>F)    
1    418                    
2    416  2 29.7 8.9e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Nonlinear Functions of a Single Independent Variable: Logarithms

Logarithmic Specification

We are often interested in the logarithmic relationship between variables: this allows us to represent changes as percentage changes
In economics we are often interested in the percentage change in demand in response to a 1% change in prices: elasticity

Natural Logarithms

the exponential function: is function of $ x $ and is equal to $ e^x $ where $ e = 2.71828\dots $
the natural logarithm: is the inverse of the exponential function where is defined as $ x = \ln(e^x) $

Logarithms and Percentages

The relationship between logarithms and percentages relies on the approximation, when $ \Delta x $ and $ \frac{\Delta x}{x} $ are small,

\[ \ln(x + \Delta x) - \ln(x) \cong \frac{\Delta x}{x} \]

The percentage change in $ x $ is $ \frac{\Delta x}{x} \times 100 $

Case I: X Is In Logarithms, Y Is Not

The regression model is

\[ Y_i = \beta_0 + \beta_1\ln(X_i) + u_i, i = 1,\dots,n \]

which is called the linear-log model

A 1% change in $ X $ results in a change in $ Y $ of $ 0.01\beta_1 $

\[ \begin{align*} \Delta Y &= [\beta_0 + \beta_1 \ln(X + \Delta X)] - [\beta_0 + \beta_1 \ln(X)] \\ &= \beta_1[\ln(X + \Delta X) - \ln(X)] \cong \beta_1\frac{\Delta X}{X} \end{align*} \]

Application to Test Scores and District Income (1)

OLS estimation for this form of specification is the same as with a single variable regression except we use a new variable $ \ln(X) $ instead of $ X $
Suppose we now want to regress test scores on the logarithm of district income

regress.results <- lm(testscr ~ log(avginc), data = test.score.data)
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   557.83       3.86   144.4   <2e-16 ***
log(avginc)    36.42       1.41    25.9   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Application to Test Scores and District Income (2)

What is the predicted difference in test scores of districts with average income $10,000 vs. $11,000?

\[ \Delta \hat{Y} = [557.8 + 36.42\ln(11)] - [557.8 + 36.42\ln(10)] = 3.47 \]

What is the predicted difference in test scores of districts with average income $40,000 vs. $41,000?

\[ \Delta \hat{Y} = [557.8 + 36.42\ln(41)] - [557.8 + 36.42\ln(40)] = 0.90 \]

Application to Test Scores and District Income (3)

plot of chunk unnamed-chunk-8

Case II: Y Is In Logarithms, X Is Not

The regression model is

\[ \ln(Y_i) = \beta_0 + \beta_1 X_i + u_i, i = 1,\dots,n \]

hich is called the log-linear model

A unit change in $ X $ ($ \Delta X = 1 $) is associated with a $ 100 \times \beta_1\% $ change in $ Y $

\[ \begin{align*} \ln(Y + \Delta Y) - \ln(Y) &= [\beta_0 + \beta_1(X + \Delta X)] - [\beta_0 + \beta_1(X)] \\ &= \beta_1 \Delta X \end{align*} \]

We know the approximation $ \ln(Y + \Delta Y) - \ln(Y) = \frac{\Delta Y}{Y} $ and hence $ \frac{\Delta Y}{Y} = \beta_1 \Delta X $

Application to Earnings and Age Data

Suppose we want to regress the logarithm of earnings on the age of college graduates from some 2009 CPS data

regress.results <- lm(log(ahe) ~ age, data = cps.92.08, subset = (bachelor == 1))
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.70582    0.06428    26.5   <2e-16 ***
age          0.03756    0.00218    17.2   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Earnings are predicted to increase by 3.76% for each additional year of age

Application to Earnings and Age Data (2)

plot of chunk unnamed-chunk-10

plot of chunk unnamed-chunk-11

Case III: Both X and Y are in logarithms

The regression model is

\[ \ln(Y_i) = \beta_0 + \beta_1\ln(X_i) + u_i \]

which is called a log-log model

A 1% change in $ X $ is associated with a $ \beta_1\% $ change in $ Y $.

\[ \begin{align*} \ln(Y + \Delta Y) - \ln(Y) &= [\beta_0 + \beta_1\ln(X + \Delta X)] - [\beta_0 + \beta_1\ln(X)] \\ &= \beta_1[\ln(X + \Delta X) - \ln(X)] \end{align*} \] \[ \therefore \frac{\Delta Y}{Y} \cong \beta_1\frac{\Delta X}{X}\text{ or }\beta_1 = \frac{\Delta Y/Y}{\Delta X/X} = \frac{\text{percentage change in }Y}{\text{percentage change in }X} \]

$ \beta_1 $ is the elasticity of $ Y $ with $ X $

Application to Test Scores and District Income (1)

regress.results <- lm(log(testscr) ~ log(avginc), data = test.score.data)
coeftest(regress.results, vcovHC(regress.results))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.33635    0.00596  1063.4   <2e-16 ***
log(avginc)  0.05542    0.00216    25.7   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This means that a 1% increase in income is estimated to cause a 0.0554% increase in test scores.

Application to Test Scores and District Income (2)

For comparison consider a log-linear model of the logarithm of test scores regressed on income

regress.results <- lm(log(testscr) ~ avginc, data = test.score.data)
coeftest(regress.results, vcovHC(regress.results))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.439362   0.002987  2155.4   <2e-16 ***
avginc      0.002844   0.000183    15.5   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Application to Test Scores and District Income (3)

plot of chunk unnamed-chunk-14

plot of chunk unnamed-chunk-15

Comparing Specifcations

Now that we've seen different forms of polynomial and logarithmic specifications, how do we compare them?

Logarithmic specifications
- To compare log-linear and log-log models we can compare their $ \bar{R}^2 $ measures
- However, we cannot compare a linear-log with a log-log (or a log-linear) since the dependent variable is $ Y $ in the former and $ \ln(Y) $ in the latter
To comparing a linear-log and a polynmial specification it is sufficient to compare their $ \bar{R}^2 $

Interactions Between Independent Variables

Introduction to Interactions

It is possible that student who are still learning English might respond differently to the STR than those who aren't. In other words, STR could interact differently with the percentage of English learners in a district.

We will consider three types of possible interactions between independent variables

Both variables are binary variables
One variables is binary and the other is continuous
Both variables are continuous

Interactions Between Two Binary Variables

A Model With Two Binary Variables

Consider a model with two binary variables, where the dependent variable $ Y_i $ is the log of earnings, and the independent variables are whether a worker has a college degree ($ D_{1i} $), and a worker's gender ($ D_{2i} $).

\[ Y_i = \beta_0 + \beta_1 D_{1i} + \beta_2 D_{2i} + u_i \]

$ \beta_1 $ captures the effect of having a college degree
$ \beta_2 $ captures the effect of being female
However this specification ignores the possibility that the effect of having a college degree might be different for men and women

Interacting the Two Binary Variables (1)

In order to model the possibility of this interaction between college degrees and gender we use a interaction regression model

\[ Y_i = \beta_0 + \beta_1 D_{1i} + \beta_2 D_{2i} + \beta_3 (D_{1i} \times D_{2i}) + u_i \]

The term $ D_{1i} \times D_{2i} $ is called an interaction term or an interacted regressor

Interacting the Two Binary Variables (2)

Consider the expectation of $ Y_i $ conditional on having no college degree ($ D_{1i} = 0 $) and the gender dummy variable $ D_{2i} = d_2 $

\[ \begin{align*} E[Y_i|D_{1i} = 0, D_{2i} = d_2] &= \beta_0 + \beta_1\times 0 + \beta_2\times d_2 + \beta_3\times(0\times d_2) \\ &= \beta_0 + \beta_2 d_2 \end{align*} \]

Consider the expectation of $ Y_i $ conditional on having a college degree ($ D_{1i} = 1 $) and the gender dummy variable $ D_{2i} = d_2 $

\[ \begin{align*} E[Y_i|D_{1i} = 1, D_{2i} = d_2] &= \beta_0 + \beta_1\times 1 + \beta_2\times d_2 + \beta_3\times(1\times d_2) \\ &= \beta_0 + \beta_1 + \beta_2 d_2 + \beta_3 d_2 \end{align*} \]

Interacting the Two Binary Variables (3)

Compare the difference between them

\[ E[Y_i|D_{1i} = 1, D_{2i} = d_2] - E[Y_i|D_{1i} = 0, D_{2i} = d_2] = \beta_1 + \beta_3 d_2 \]

Application To The STR And The Percentage of English Learners

Let use add two dummy variables to capture whether STR is high and whether there is a high percentage of English learners. Then we regress $ TestScore $ on them and their interaction term

test.score.data$hi.str <- ifelse(test.score.data$str >= 20, 1, 0)
test.score.data$hi.el <- ifelse(test.score.data$el.pct >= 10, 1, 0)
regress.results <- lm(testscr ~ hi.str + hi.el + hi.str : hi.el, data = test.score.data)
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    664.14       1.39  477.53  < 2e-16 ***
hi.str          -1.91       1.94   -0.98     0.33    
hi.el          -18.16       2.36   -7.70  9.7e-14 ***
hi.str:hi.el    -3.49       3.14   -1.11     0.27    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interacting a Continuous And a Binary Variable (1)

Consider regressing log earnings on an individual's work experience ($ X_i $) and a binary variable indicating whether they earned a college degree ($ D_i $)

\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_i + u_i \]

Using the dummy variable $ D_i $ allows for having a different $ y $-axis intercept depending on having a college degree

No college degree: $ \beta_0 $
College degree: $ \beta_0 + \beta_2 $

Interacting a Continuous And a Binary Variable (2)

However, it does not allow for the possibility of having different slopes depending on having a college degree. For that we add an interaction term

\[ Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_i + \beta_3(X_i\times D_i) + u_i \]

Application To The STR And The Percentage of English Learners (1)

Consider regressing test scores on $ STR $ but allowing for different slopes (or effect) depending on whether there are a high or low percentage of English learners

regress.results <- lm(testscr ~ str + hi.el + str : hi.el, data = test.score.data)
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  682.246     12.071   56.52   <2e-16 ***
str           -0.968      0.599   -1.62     0.11    
hi.el          5.639     19.889    0.28     0.78    
str:hi.el     -1.277      0.986   -1.30     0.20    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Application To The STR And The Percentage of English Learners (2)

This means that we have two regression lines depending on whether there is a high or low percentage of English learners in the school district

Low percentage:

\[ 682.2 - 0.97 STR_i \]
High percentage:

\[ (682.2 + 5.6) - (0.97 + 1.28)STR_i = 687.8 - 2.25 STR_i \]

Some Hypothesis Testing (1)

To test whether the two regression lines are the same test the joint hypothesis that both the coefficients on $ HiEL_i $ and $ STR_i \times HiEL_i $ are zero

lht(regress.results, c('hi.el = 0', 'str:hi.el = 0'), test = 'F', vcov = vcovHC(regress.results))

Linear hypothesis test

Hypothesis:
hi.el = 0
str:hi.el = 0

Model 1: restricted model
Model 2: testscr ~ str + hi.el + str:hi.el

Note: Coefficient covariance matrix supplied.

  Res.Df Df    F Pr(>F)    
1    418                   
2    416  2 88.8 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Some Hypothesis Testing (2)

To test the hypothesis that they both have the same slope, then we need to test the hypothesis that the coefficient on $ STR_i\times HiEL_i $ is zero, which we can see the regression results
To test the hypothesis that they both have the same intercept corresponds to test that the coefficient on $ HiEL_i $ is zero, which we can see from the regression results

Some Hypothesis Testing (3)

To leads us to the seemingly contradictory conclusion that both regression lines are different ($ F $-test), and that their slopes and intercepts are the same ($ t $-tests).
This can be explained by the high correlation between $ HiEL_i $ and $ STR_i\times HiEL_i $, resulting in large standard errors
The $ F $-test can be relied on to tell us that at least the slope or the intercept is different between these two regression lines, but not which.

Interacting Two Continuous Variables

Consider regressing log earnings on two continuous random variables

$ X_{1i} $: years of work experience
$ X_{2i} $: years of education \[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i \] This would capture the effect of work experience and education separately. But what if the effect of work experience depends years of education, or vice versa? For that we need to add an interaction term \[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3(X_{1i} \times X_{2i}) + u_i \]

Interpreting the Coefficients

The coefficient on the new interaction term captures the effect of a unit change in $ X_{1i} $, with dependence on $ X_{2i} $
If a change $ \Delta X_1 $ is made, while holding $ X_{2} $ fixed \[ \Delta Y = (\beta_1 + \beta_3 X_2)\Delta X_1 \] which can be transformed to \[ \frac{\Delta Y}{\Delta X_1} = \beta_1 + \beta_3 X_2 \]
A similar calculation can be made for \[ \frac{\Delta Y}{\Delta X_2} = \beta_2 + \beta_3 X_1 \]

Application to STR and the Percentage of English Learners

Consider regressing test scores on $ STR $ and the percentage of English learners (both continuous variables)

regress.results <- lm(testscr ~ str + el.pct + str : el.pct, data = test.score.data)
coeftest(regress.results, vcovHC(regress.results))


t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 686.33852   11.93785   57.49   <2e-16 ***
str          -1.11702    0.59652   -1.87    0.062 .  
el.pct       -0.67291    0.38654   -1.74    0.082 .  
str:el.pct    0.00116    0.01916    0.06    0.952    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Nonlinear Effects on Test Scores of the Student-Teacher Ratio

Regression Functions Relating Test Scores and STR

plot of chunk unnamed-chunk-20

Regression Functions Relating Test Scores and STR (Interacted With High/Low PctEL)

plot of chunk unnamed-chunk-21