1 Incorporating Nonlinearities in Simple Regression

Linear relationships may not be appropriate in some cases.
By appropriately redefining variables we can easily incorporate nonlinearities into the simple regression.
Our model will still be linear in parameters. We do not use nonlinear transformations of parameters.
In practice natural logarithmic transformations are widely used: log(y) or ln(y).
Other transformations may also be used, e.g., adding quadratic or cubic terms, inverse form, etc.
Remember that the linearity of the regression model is determined by the linearity of \(\beta\)’s not \(x\) and \(y\).
We can still use nonlinear transformations of \(x\) and \(y\) such as \(log x\), \(log y\), \(x^2\), \(sqrt{x}\), \(\frac{1}{x}\), \(y^{\frac{1}{4}}\). The model is still linear in parameters.
But the models that include nonlinear transformations of \(\beta\)’s not \(x\) and \(y\) are not linear in parameters and cannot be analyzed using OLS framework
For example the following models are not linear in parameters:

\[\begin{align} consumption &= \frac{1}{\beta_0 + \beta_1 income} + u \notag \\ y &=\beta_0 + \beta_1^2x + u \notag \\ y &= \beta_0 + e^{\beta_1x} + u \notag \end{align}\]

2 Functional Forms using Natural Logarithms: Log-level model

\[ log\; y = \beta_0 + \beta_1x + u \]

We call the above equation as the log-level regression model

\[\begin{align} \Delta y &= \beta_1 \Delta x \notag \\ \% \Delta y &= 100\beta_1 \Delta x \notag \end{align}\]

Interpretation: For a one-unit change in \(x\), \(y\) changes by \((100\beta_1)\%\).
The relationship between \(x\) and \(y\), before the (natural) logarithmic transformation can be written as

\[ y = exp(\beta_0 + \beta_1 x + u) \equiv e^{\beta_0 + \beta_1 x + u} \]

3 Functional Forms using Natural Logarithms: Level-log model

\[ y = \beta_0 + \beta_1 log\;x + u \]

This is referred to as level-log regression model

\[\begin{align} \Delta y &= \beta_1 \Delta log x \notag \\ &= \left( \frac{\beta_1}{100} \right) \underbrace{100\Delta log x}_{\% \Delta x} \notag \end{align}\]

Interpretation: For a 1% change in \(x\), \(y\) changes by \(\left(\frac{\beta_1}{100}\right)\) (in its own units of measurement)

4 Functional Forms using Natural Logarithms: Log-log model

\[ log \; y = \beta_0 + \beta_1 log\;x + u \]

This called the log-log model

\[\begin{align} \Delta log \; y &= \beta_1 \Delta log x \notag \\ \% \Delta y &= \beta_1 \% \Delta x \notag \\ \beta_1 &= \frac{\% \Delta y}{\% \Delta x} \end{align}\]

Interpretation: \(\beta_1\) is the elasticity of \(y\) with respect to \(x\). It gives the percentage change in \(y\) for a 1% change in \(x\).

5 Some Examples:

5.1 Wage-Education Model (log-level model)

library(wooldridge)
data(wage1)
wage.logl <- lm(log(wage) ~ educ, data = wage1)
summary(wage.logl)

## 
## Call:
## lm(formula = log(wage) ~ educ, data = wage1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.21158 -0.36393 -0.07263  0.29712  1.52339 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.583773   0.097336   5.998 3.74e-09 ***
## educ        0.082744   0.007567  10.935  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4801 on 524 degrees of freedom
## Multiple R-squared:  0.1858, Adjusted R-squared:  0.1843 
## F-statistic: 119.6 on 1 and 524 DF,  p-value: < 2.2e-16

The estimated regression equation is

\[ \widehat{log wage} = \underset{(0.097)}{0.584} + \underset{(0.008)}{0.083} educ \]

After multiplying the slope estimate by 100 it can be interpreted as %
An additional year of education is predicted to increase average wages by 8.3%. This is called return to another year of education.
\(R^2 = 0.186\): Education explains about 18.6% of the variation in log wage.

wage.lm <- lm(wage ~ educ, data = wage1)
plot(wage1$educ, wage1$lwage,
     col = "steelblue",
     pch = 20,
     main = "Log-level Regression", 
     cex.main = 1,
     ylab = "Wage",
     xlab = "Education")
abline(wage.lm,
       col = "blue",
       lwd = 2)
abline(wage.logl, 
       col = "red", 
       lwd = 2)

5.2 Test Score and Regional Income (level-log model)

\[ Score = \beta_0 + \beta_1 \;log(income) + u \]

library(AER)
data(CASchools)
CASchools$score <- (CASchools$read + CASchools$math) / 2
score.llog<- lm(score ~ log(income), data = CASchools)
summary(score.llog)

## 
## Call:
## lm(formula = score ~ log(income), data = CASchools)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.256  -9.050   0.078   8.230  31.214 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  557.832      4.200  132.81   <2e-16 ***
## log(income)   36.420      1.571   23.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.62 on 418 degrees of freedom
## Multiple R-squared:  0.5625, Adjusted R-squared:  0.5615 
## F-statistic: 537.4 on 1 and 418 DF,  p-value: < 2.2e-16

The estimated regression equation is

\[ \widehat{\text{Score}} = \underset{(4.20)}{557.832} + \underset{(1.571)}{36.420} \; \text{log(income)} \]

Interpretation: 1% increase in income is associated with \(\left(\frac{36.42}{100} \right) = 0.3642\) point increase in test scores.
- Equivalently, a 3% increase in income is associated with about \(3 \times 0.3642 = 1.0926\) point increase in test scores.
\(R^2 = 0.5625\): log(income) can explain about 56.25% of the variation in test scores.

plot(CASchools$income, CASchools$score,
     col = "steelblue",
     pch = 20,
     xlab = "District Income (thousands of dollars)", 
     ylab = "Test Score",
     cex.main = 0.9,
     main = "Test Score vs. District Income", 
     cex.main = 1)

order_id  <- order(CASchools$income)

lines(CASchools$income[order_id],
      fitted(score.llog)[order_id], 
      col = "red", 
      lwd = 2)

abline(lm(score ~ income, data = CASchools),
       col = "blue", 
       lwd = 2)

5.3 CEO salary and firm performance (log-log model)

\[ log(salary) = \beta_0 + \beta_1\; log(sales) + u \]

salary.loglog <- lm(log(salary) ~ log(sales), data = ceosal1)
summary(salary.loglog)

## 
## Call:
## lm(formula = log(salary) ~ log(sales), data = ceosal1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.01038 -0.28140 -0.02723  0.21222  2.81128 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.82200    0.28834  16.723  < 2e-16 ***
## log(sales)   0.25667    0.03452   7.436  2.7e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5044 on 207 degrees of freedom
## Multiple R-squared:  0.2108, Adjusted R-squared:  0.207 
## F-statistic:  55.3 on 1 and 207 DF,  p-value: 2.703e-12

Estimated regression:

\[ \widehat{log(salary)} = \underset{(0.2880)}{4.822} + \underset{(0.0345)}{0.257} \; \text{log(sales)} \]

Interpretation: 1% increase in firm sales increases CEO salary by 0.257%. In other words, the elasticity of CEO salary with respect to sales is 0.257.
- Equivalently, about 4% increase in firm sales will increase CEO salary by about 1%.
\(R^2 = 0.2108\): log(sales) can explain about 21.08% of variation in log(salary).

plot(ceosal1$sales, ceosal1$salary,
     col = "steelblue",
     pch = 20,
     cex.main = 1,
     xlab = "Sales",
     ylab = "Salary")

abline(lm(salary~sales, 
          data=ceosal1),
       ol = "red", 
       lwd = 2)

## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "ol" is not
## a graphical parameter

plot(log(salary) ~ log(sales), 
     col = "steelblue",
     pch = 20,
     data = ceosal1,
     main = "Log-Log Regression Fit", 
     cex.main = 1)
abline(salary.loglog, 
       col = "red", 
       lwd = 2)

6 Functional Forms using Natural Logarithms: Summary

Econ 115s (Introduction to Econometrics)

Lesson 1.3 (Incorporating Nonlinearities in the SLR Model)

NE Milla, Jr.

2023-03-05