1 Incorporating Nonlinearities in Simple Regression

\[\begin{align} consumption &= \frac{1}{\beta_0 + \beta_1 income} + u \notag \\ y &=\beta_0 + \beta_1^2x + u \notag \\ y &= \beta_0 + e^{\beta_1x} + u \notag \end{align}\]

2 Functional Forms using Natural Logarithms: Log-level model

\[ log\; y = \beta_0 + \beta_1x + u \]

\[\begin{align} \Delta y &= \beta_1 \Delta x \notag \\ \% \Delta y &= 100\beta_1 \Delta x \notag \end{align}\]

\[ y = exp(\beta_0 + \beta_1 x + u) \equiv e^{\beta_0 + \beta_1 x + u} \]

3 Functional Forms using Natural Logarithms: Level-log model

\[ y = \beta_0 + \beta_1 log\;x + u \]

\[\begin{align} \Delta y &= \beta_1 \Delta log x \notag \\ &= \left( \frac{\beta_1}{100} \right) \underbrace{100\Delta log x}_{\% \Delta x} \notag \end{align}\]

4 Functional Forms using Natural Logarithms: Log-log model

\[ log \; y = \beta_0 + \beta_1 log\;x + u \]

\[\begin{align} \Delta log \; y &= \beta_1 \Delta log x \notag \\ \% \Delta y &= \beta_1 \% \Delta x \notag \\ \beta_1 &= \frac{\% \Delta y}{\% \Delta x} \end{align}\]

5 Some Examples:

5.1 Wage-Education Model (log-level model)

library(wooldridge)
data(wage1)
wage.logl <- lm(log(wage) ~ educ, data = wage1)
summary(wage.logl)
## 
## Call:
## lm(formula = log(wage) ~ educ, data = wage1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.21158 -0.36393 -0.07263  0.29712  1.52339 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.583773   0.097336   5.998 3.74e-09 ***
## educ        0.082744   0.007567  10.935  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4801 on 524 degrees of freedom
## Multiple R-squared:  0.1858, Adjusted R-squared:  0.1843 
## F-statistic: 119.6 on 1 and 524 DF,  p-value: < 2.2e-16
  • The estimated regression equation is

\[ \widehat{log wage} = \underset{(0.097)}{0.584} + \underset{(0.008)}{0.083} educ \]

  • After multiplying the slope estimate by 100 it can be interpreted as %

  • An additional year of education is predicted to increase average wages by 8.3%. This is called return to another year of education.

  • \(R^2 = 0.186\): Education explains about 18.6% of the variation in log wage.

wage.lm <- lm(wage ~ educ, data = wage1)
plot(wage1$educ, wage1$lwage,
     col = "steelblue",
     pch = 20,
     main = "Log-level Regression", 
     cex.main = 1,
     ylab = "Wage",
     xlab = "Education")
abline(wage.lm,
       col = "blue",
       lwd = 2)
abline(wage.logl, 
       col = "red", 
       lwd = 2)

5.2 Test Score and Regional Income (level-log model)

\[ Score = \beta_0 + \beta_1 \;log(income) + u \]

library(AER)
data(CASchools)
CASchools$score <- (CASchools$read + CASchools$math) / 2
score.llog<- lm(score ~ log(income), data = CASchools)
summary(score.llog)
## 
## Call:
## lm(formula = score ~ log(income), data = CASchools)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.256  -9.050   0.078   8.230  31.214 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  557.832      4.200  132.81   <2e-16 ***
## log(income)   36.420      1.571   23.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.62 on 418 degrees of freedom
## Multiple R-squared:  0.5625, Adjusted R-squared:  0.5615 
## F-statistic: 537.4 on 1 and 418 DF,  p-value: < 2.2e-16
  • The estimated regression equation is

\[ \widehat{\text{Score}} = \underset{(4.20)}{557.832} + \underset{(1.571)}{36.420} \; \text{log(income)} \]

  • Interpretation: 1% increase in income is associated with \(\left(\frac{36.42}{100} \right) = 0.3642\) point increase in test scores.

    • Equivalently, a 3% increase in income is associated with about \(3 \times 0.3642 = 1.0926\) point increase in test scores.
  • \(R^2 = 0.5625\): log(income) can explain about 56.25% of the variation in test scores.

plot(CASchools$income, CASchools$score,
     col = "steelblue",
     pch = 20,
     xlab = "District Income (thousands of dollars)", 
     ylab = "Test Score",
     cex.main = 0.9,
     main = "Test Score vs. District Income", 
     cex.main = 1)

order_id  <- order(CASchools$income)

lines(CASchools$income[order_id],
      fitted(score.llog)[order_id], 
      col = "red", 
      lwd = 2)

abline(lm(score ~ income, data = CASchools),
       col = "blue", 
       lwd = 2)

5.3 CEO salary and firm performance (log-log model)

\[ log(salary) = \beta_0 + \beta_1\; log(sales) + u \]

salary.loglog <- lm(log(salary) ~ log(sales), data = ceosal1)
summary(salary.loglog)
## 
## Call:
## lm(formula = log(salary) ~ log(sales), data = ceosal1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.01038 -0.28140 -0.02723  0.21222  2.81128 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.82200    0.28834  16.723  < 2e-16 ***
## log(sales)   0.25667    0.03452   7.436  2.7e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5044 on 207 degrees of freedom
## Multiple R-squared:  0.2108, Adjusted R-squared:  0.207 
## F-statistic:  55.3 on 1 and 207 DF,  p-value: 2.703e-12
  • Estimated regression:

\[ \widehat{log(salary)} = \underset{(0.2880)}{4.822} + \underset{(0.0345)}{0.257} \; \text{log(sales)} \]

  • Interpretation: 1% increase in firm sales increases CEO salary by 0.257%. In other words, the elasticity of CEO salary with respect to sales is 0.257.

    • Equivalently, about 4% increase in firm sales will increase CEO salary by about 1%.
  • \(R^2 = 0.2108\): log(sales) can explain about 21.08% of variation in log(salary).

plot(ceosal1$sales, ceosal1$salary,
     col = "steelblue",
     pch = 20,
     cex.main = 1,
     xlab = "Sales",
     ylab = "Salary")

abline(lm(salary~sales, 
          data=ceosal1),
       ol = "red", 
       lwd = 2)
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): "ol" is not
## a graphical parameter

plot(log(salary) ~ log(sales), 
     col = "steelblue",
     pch = 20,
     data = ceosal1,
     main = "Log-Log Regression Fit", 
     cex.main = 1)
abline(salary.loglog, 
       col = "red", 
       lwd = 2)

6 Functional Forms using Natural Logarithms: Summary