Session Overview


Session 1: Polynomial Regression

1.1 Why Nonlinear? The Test Score – Income Relationship

  • Look at the scatterplot of TestScore vs. Income.
  • A straight line under‑predicts at very low and very high incomes, over‑predicts in the middle → curvature.
  • The effect of a $1,000 increase in income is larger for poor districts than for rich districts.

1.2 Quadratic Regression Model

Population model:

\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{Income}_i + \beta_3 \text{Income}_i^2 + \varepsilon_i \]

Here \(\beta_1\) is the intercept, \(\beta_2\) the coefficient on Income, \(\beta_3\) the coefficient on Income², and \(\varepsilon_i\) is the error term.

The Slope Depends on Income

The marginal effect of a change in Income on the expected TestScore is given by the derivative of the regression function with respect to Income:

\[ \frac{\partial E(\text{TestScore} \mid \text{Income})}{\partial \text{Income}} = \beta_2 + 2\beta_3 \text{Income} \]

Explanation:

  • For a linear model (\(\beta_3 = 0\)), the slope is constant (\(\beta_2\)).
  • For a quadratic model, the slope changes linearly with Income.
  • If \(\beta_3\) is negative, the slope decreases as Income increases (diminishing returns).
  • For example, at Income = 10 (thousand dollars), the slope is \(\beta_2 + 20\beta_3\). At Income = 40, the slope is \(\beta_2 + 80\beta_3\). Since \(\beta_3\) is negative, the slope is smaller at higher incomes.

Estimation (OLS in GRETL)

Create Income_sq = Income^2. Then regress TestScore on Income and Income_sq.

Example output (Stock & Watson, Equation 8.2):

Coefficient Estimate Std. Error t‑ratio p‑value
\(\hat{\beta}_1\) (const) 607.3 3.05 199.4 <0.0001
\(\hat{\beta}_2\) (Income) 3.85 0.30 12.7 <0.0001
\(\hat{\beta}_3\) (Income²) -0.0423 0.0062 -6.76 <0.0001

Testing for nonlinearity

  • \(H_0: \beta_3 = 0\) (linear) vs. \(H_1: \beta_3 \neq 0\) (quadratic).
  • t = -6.76 → reject \(H_0\) → quadratic model is better than linear.

1.3 Cubic Regression

The cubic model adds Income³:

\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{Income}_i + \beta_3 \text{Income}_i^2 + \beta_4 \text{Income}_i^3 + \varepsilon_i \]

Coefficient Estimate Std. Error t‑ratio p‑value
const 600.1 5.8 102.9 <0.0001
Income 5.02 0.86 5.84 <0.0001
Income² -0.096 0.037 -2.56 0.0107
Income³ 0.00069 0.00047 1.45 0.1471
  • The t‑statistic on the cubic term is 1.45. This is statistically insignificant at the 5% level, so we fail to reject the null that the cubic term is zero.

Why include cubic if improvement is small?

  • To be thorough and test whether a more complex shape is needed.
  • To show that sometimes higher‑order terms are significant but not economically meaningful.
  • In practice, we often stop at quadratic unless theory or data strongly support cubic.

1.4 The F‑Test: Testing Multiple Restrictions

When we add both quadratic and cubic terms, we may want to test jointly whether they are both zero. The null hypothesis is:

\[ H_0: \beta_3 = 0 \text{ and } \beta_4 = 0 \quad \text{(linear model)} \] \[ H_1: \text{at least one of } \beta_3, \beta_4 \neq 0 \quad \text{(nonlinear)} \]

The F‑test is used for such joint hypotheses.

  • What is the F‑test? It compares the fit of the restricted model (linear) with the unrestricted model (quadratic or cubic). It measures whether the improvement in fit (reduction in sum of squared residuals) is large relative to the number of extra parameters.
  • When do we use it? Whenever we want to test if a group of variables (e.g., Income² and Income³) together add explanatory power. Also for testing multiple linear restrictions (e.g., in interactions or dummy variables).

In GRETL output:

  • The overall regression F‑statistic (testing that all slopes are zero) appears by default.
  • To test specific restrictions (e.g., \(\beta_3 = \beta_4 = 0\)), use:
    After estimating the unrestricted modelTests → Omit variables → select Income_sq and Income_cu. GRETL reports the F‑statistic and p‑value.
    Or use the restrict command for custom linear hypotheses.

Example
Testing whether the coefficients on STR² and STR³ are jointly zero gives F = 6.17, p < 0.001 → reject linearity.

1.5 Computing the Effect of a Change in Income

For a change from \(X_0\) to \(X_0 + \Delta X\),

\[ \Delta \widehat{\text{TestScore}} = \hat{f}(X_0 + \Delta X) - \hat{f}(X_0) \]

For the quadratic model: \(\hat{f}(X) = \hat{\beta}_1 + \hat{\beta}_2 X + \hat{\beta}_3 X^2\).

Example: Increase from $10,000 to $11,000 (Income from 10 to 11):

\[ \begin{aligned} \hat{f}(10) &= 607.3 + 3.85\times 10 - 0.0423\times 10^2 = 607.3 + 38.5 - 4.23 = 641.57 \\ \hat{f}(11) &= 607.3 + 3.85\times 11 - 0.0423\times 121 = 607.3 + 42.35 - 5.118 = 644.53 \\ \Delta \hat{Y} &= 644.53 - 641.57 = 2.96 \text{ points} \end{aligned} \]

From $40,000 to $41,000 (Income 40 → 41):

\[ \hat{f}(40) = 607.3 + 154 - 67.68 = 693.62,\quad \hat{f}(41) = 607.3 + 157.85 - 71.11 = 694.04 \] \[ \Delta \hat{Y} = 0.42 \text{ points} \]

Conclusion: The same dollar increase has a much larger effect in poor districts.


Session 2: Logarithmic Models, Threshold Effects, and Multicollinearity

2.1 Why Logarithms?

Many economic relationships are best expressed in percentage terms.
Key fact: For small changes, \(\ln(X+\Delta X) - \ln(X) \approx \Delta X / X\) (the percentage change).

We will use the exact percentage change formula throughout:

\[ \text{Percentage change in } Y = 100 \times (e^{\beta \cdot \Delta X} - 1)\% \]

For a binary or one‑unit change, this becomes \(100 \times (e^{\beta} - 1)\%\).

2.2 Three Logarithmic Specifications

Case Specification Interpretation of \(\beta_2\) (when \(\Delta X = 1\) or 1%)
Linear‑log \(Y_i = \beta_1 + \beta_2 \ln(X_i) + \varepsilon_i\) A 1% increase in \(X\) → change in \(Y\) = \(0.01\beta_2\) (in units of \(Y\))
Log‑linear \(\ln(Y_i) = \beta_1 + \beta_2 X_i + \varepsilon_i\) A one‑unit increase in \(X\)\(100 \times (e^{\beta_2} - 1)\%\) change in \(Y\)
Log‑log \(\ln(Y_i) = \beta_1 + \beta_2 \ln(X_i) + \varepsilon_i\) A 1% increase in \(X\)\(\beta_2\%\) change in \(Y\) (elasticity)

Important: For the log‑linear model, the coefficient \(\beta_2\) is not the percentage change directly. Use the exact formula.

Exact vs. Approximate Calculations

  • Exact calculation uses the exponential function: \(\% \Delta Y = 100 \times (e^{\beta_2 \Delta X} - 1)\). This is always correct.

Exact Calculation in Linear‑Log Models

The linear‑log model yields an exact calculation because we use the difference in logs:
\(\Delta Y = \beta_2 [\ln(X+\Delta X) - \ln(X)]\). No approximation is needed. The approximation \(\Delta Y \approx \beta_2 (\Delta X / X)\) is only valid for small \(\Delta X / X\).

2.3 Application to Test Scores and Income

Linear‑Log Model

\[ \text{TestScore}_i = \beta_1 + \beta_2 \ln(\text{Income}_i) + \varepsilon_i \]

Estimated equation

\[ \widehat{\text{TestScore}} = 557.8 + 36.42 \ln(\text{Income}) \]

  • A 1% increase in income → test scores increase by \(0.01 \times 36.42 = 0.364\) points (exact, no approximation).
  • To compute the effect of a $1,000 increase from $10k to $11k (a 10% increase):

\[ \Delta \hat{Y} = 36.42 \times [\ln(11) - \ln(10)] = 36.42 \times 0.09531 = 3.47 \text{ points} \]

This is exact. The approximation using \(\Delta X / X = 0.1\) would give \(36.42 \times 0.1 = 3.64\) points – slightly off (4.9% error).

Log‑Linear Model (less common for test scores)

\[ \ln(\text{TestScore}_i) = \beta_1 + \beta_2 \text{Income}_i + \varepsilon_i \]

Estimated: \(\widehat{\ln(\text{TestScore})} = 6.439 + 0.00284 \text{Income}\).
A $1,000 increase in income → exact percentage change = \(100 \times (e^{0.00284} - 1) \approx 0.284\%\).

Log‑Log Model

\[ \ln(\text{TestScore}_i) = \beta_1 + \beta_2 \ln(\text{Income}_i) + \varepsilon_i \]

Estimated \(\widehat{\ln(\text{TestScore})} = 6.336 + 0.0554 \ln(\text{Income})\).
A 1% increase in income → test scores increase by 0.0554% (the elasticity).

  • Elasticity is a unit‑free measure. Unlike a slope that depends on the units of measurement (e.g., points per thousand dollars), elasticity compares percentage changes, so it works regardless of whether income is measured in dollars, thousands of dollars, or euros.

  • 0.0554 is a small elasticity: test scores are not very responsive to income in percentage terms. But recall that test scores themselves are in points, not percentages – the log‑log model here is less natural than the linear‑log model for this application.

2.4 Threshold Effects

Now we consider a different type of nonlinearity: discontinuous jumps at certain thresholds. An educator claims:

  • STR < 20 → constant high performance
  • 20 ≤ STR ≤ 25 → constant medium performance
  • STR > 25 → constant low performance

Define dummy variables:

\[ \begin{aligned} \text{STRsmall}_i &= 1 \text{ if STR}_i < 20, \text{ else } 0 \\ \text{STRmoderate}_i &= 1 \text{ if } 20 \leq \text{STR}_i \leq 25, \text{ else } 0 \\ \text{STRlarge}_i &= 1 \text{ if STR}_i > 25, \text{ else } 0 \end{aligned} \]

Creating STRsmall, STRmoderate, and STRlarge Dummies in GRETL

For STRsmall:

  1. Go to Add → Define new variable.
  2. In the expression box, type:

strsmall = (str < 20)

  1. Click OK. GRETL creates a dummy variable equal to 1 if the condition is true, 0 otherwise.

For STRmoderate:

  1. Add → Define new variable.
  2. Expression:

strmoderate = (str >= 20) && (str <= 25)

  1. Click OK.

For STRlarge:

  1. Add → Define new variable.

  2. Expression: strlarge = (str > 25)

  3. Click OK.

After creating all three, verify that each observation has exactly one dummy equal to 1 by summing them:

  • Add → Define new variablecheck = STRsmall + STRmoderate + STRlarge
  • View summary statistics (Variable → Summary statistics for check). The mean should be 1, and the min/max should also be 1.

Part (a): Regression with two dummies

\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{STRsmall}_i + \beta_3 \text{STRlarge}_i + \varepsilon_i \]

  • Omitted category: STRmoderate (reference).
  • Moderate STR → predicted score = \(\beta_1\)
  • Small STR → \(\beta_1 + \beta_2\)
  • Large STR → \(\beta_1 + \beta_3\)

To match the educator’s story: \(\beta_2 > 0\) (small classes higher scores), \(\beta_3 < 0\) (large classes lower scores).
The regression function is three horizontal lines with jumps at STR = 20 and 25.

Part (b): The dummy variable trap

If we try to include all three dummies plus a constant:

\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{STRsmall}_i + \beta_3 \text{STRmoderate}_i + \beta_4 \text{STRlarge}_i + \varepsilon_i \]

For every observation, exactly one dummy equals 1, so:

\[ \text{STRsmall}_i + \text{STRmoderate}_i + \text{STRlarge}_i = 1 \]

This means the constant term (which is always 1) is a perfect linear combination of the three dummies. This is perfect multicollinearity. The OLS estimator cannot compute unique coefficients; the software returns an error message.

Solution: Omit one dummy (e.g., STRmoderate). That’s why part (a) works.

2.5 Transition to Multicollinearity

The dummy variable trap is a classic example of perfect multicollinearity: one regressor is an exact linear function of others. Perfect multicollinearity prevents OLS estimation. It often arises from including all categories of a categorical variable along with an intercept.

Perfect Multicollinearity – More Examples

  • Including both PctEL (percentage of English learners) and FracEL (fraction of English learners) because PctEL = 100 * FracEL.
  • Including a binary variable that is constant for all observations (e.g., NVSi).

What is NVSi?

In Stock & Watson, NVSi stands for “not very small” classes. It is a binary variable that equals 1 if the student‑teacher ratio is 12 or more, and 0 otherwise. However, in the California data set, the smallest STR is 14, so NVSi = 1 for every district. Thus NVSi is perfectly collinear with the constant term (since the constant is also 1 for all observations). Trying to include NVSi along with an intercept causes perfect multicollinearity.

  • Including PctEL and PctES (percentage of English speakers) because PctES = 100 - PctEL. This also involves the constant.

How to Fix Perfect Multicollinearity

  1. Drop one of the redundant variables. For dummy variables, omit one category. For linear dependencies like PctEL and FracEL, keep only one of them.
  2. Drop the constant term (rarely recommended). If you have all categories, you can suppress the intercept. But this makes interpretation less intuitive.
  3. Check your data. Sometimes multicollinearity arises because a variable has no variation (e.g., all observations have the same value). In that case, drop that variable.

In GRETL, perfect multicollinearity triggers an error message. The software may automatically drop one variable, but you should understand why and make a conscious choice.

Imperfect Multicollinearity

Imperfect multicollinearity occurs when two or more regressors are highly correlated but not perfectly. This does not prevent estimation, but it inflates standard errors, making coefficients imprecisely estimated.

  • Example: Including PctEL and PctImmigrants (percentage of first‑generation immigrants). These are positively correlated because immigrant families often have children learning English.
  • The OLS estimates remain unbiased, but the variance of the coefficients becomes large. The t‑statistics may be small even when the true effects are large.

In GRETL: Look for high pairwise correlations (View → Correlation matrix). Also check variance inflation factors (VIF) after regression (Analysis → Collinearity diagnostics). A VIF above 10 (or 5, depending on context) indicates problematic multicollinearity.

Why it matters: Imperfect multicollinearity is not a mistake – it is a data limitation. You may need a larger sample or a different research design to precisely estimate the separate effects.

2.6 Choosing the Best Nonlinear Model

Model Dependent variable R² (adj) Interpretation
Linear TestScore 0.508 Constant slope – too rigid
Quadratic TestScore 0.554 Slope decreases with income
Cubic TestScore 0.555 Slight improvement over quadratic
Linear‑log TestScore 0.561 Best fit among level models
Log‑log ln(TestScore) 0.557 (on log scale) Elasticity interpretation

Practical advice:

  • For test scores (levels), linear‑log or quadratic are both good.
  • Use F‑tests to decide whether higher‑order polynomials are needed.
  • For percentage‑change interpretations (wages, prices), use log‑linear or log‑log.
  • For threshold effects, use dummy variables but remember to omit one category.

Homework / In‑Class Exercises

  1. Using the California test score data, estimate quadratic and cubic regressions of TestScore on Income. Perform an F‑test for the joint significance of Income² and Income³. Which model do you prefer?

  2. Estimate the linear‑log model: TestScore on ln(Income). Compute the exact predicted change in test scores when income increases from $15,000 to $20,000. Compare with the approximation using \(0.01\beta_2 \times\) percentage change.

  3. Create dummy variables for STR thresholds: STRsmall (STR < 20), STRmoderate (20 ≤ STR ≤ 25), STRlarge (STR > 25). Run the regression with STRsmall and STRlarge only. Interpret the coefficients. What does the constant represent?

  4. (Multicollinearity) Create a new variable PctEL2 = 2 * PctEL. Run a regression of TestScore on PctEL and PctEL2. What happens? Why? Then run a regression of TestScore on PctEL and PctEL2 without a constant. Does it work? Explain.

  5. (Perfect multicollinearity) Create a variable NVSi that equals 1 for all observations. Add it to the regression of TestScore on Income. What happens? Why?