Duration: 2 sessions of 45 minutes each
Dataset: California Test Score Data (420 school districts)
Variables:
TestScore = average test score (dependent
variable)Income = average district income (thousands of
dollars)STR = student‑teacher ratioPctEL = percentage of English learners in the
districtLunch = percentage of students eligible for subsidized
lunch (measure of economic disadvantage)Objective: Students will learn to model nonlinear relationships using polynomials and logarithms, interpret coefficients, test for nonlinearity using F‑tests, compute exact vs. approximate effects, and understand multicollinearity through threshold models.
Focus:
TestScore
vs. Income.Population model:
\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{Income}_i + \beta_3 \text{Income}_i^2 + \varepsilon_i \]
Here \(\beta_1\) is the intercept, \(\beta_2\) the coefficient on Income, \(\beta_3\) the coefficient on Income², and \(\varepsilon_i\) is the error term.
The marginal effect of a change in Income on the expected TestScore is given by the derivative of the regression function with respect to Income:
\[ \frac{\partial E(\text{TestScore} \mid \text{Income})}{\partial \text{Income}} = \beta_2 + 2\beta_3 \text{Income} \]
Explanation:
Create Income_sq = Income^2. Then regress
TestScore on Income and
Income_sq.
Example output (Stock & Watson, Equation 8.2):
| Coefficient | Estimate | Std. Error | t‑ratio | p‑value |
|---|---|---|---|---|
| \(\hat{\beta}_1\) (const) | 607.3 | 3.05 | 199.4 | <0.0001 |
| \(\hat{\beta}_2\) (Income) | 3.85 | 0.30 | 12.7 | <0.0001 |
| \(\hat{\beta}_3\) (Income²) | -0.0423 | 0.0062 | -6.76 | <0.0001 |
The cubic model adds Income³:
\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{Income}_i + \beta_3 \text{Income}_i^2 + \beta_4 \text{Income}_i^3 + \varepsilon_i \]
| Coefficient | Estimate | Std. Error | t‑ratio | p‑value |
|---|---|---|---|---|
| const | 600.1 | 5.8 | 102.9 | <0.0001 |
| Income | 5.02 | 0.86 | 5.84 | <0.0001 |
| Income² | -0.096 | 0.037 | -2.56 | 0.0107 |
| Income³ | 0.00069 | 0.00047 | 1.45 | 0.1471 |
Why include cubic if improvement is small?
When we add both quadratic and cubic terms, we may want to test jointly whether they are both zero. The null hypothesis is:
\[ H_0: \beta_3 = 0 \text{ and } \beta_4 = 0 \quad \text{(linear model)} \] \[ H_1: \text{at least one of } \beta_3, \beta_4 \neq 0 \quad \text{(nonlinear)} \]
The F‑test is used for such joint hypotheses.
In GRETL output:
Income_sq and
Income_cu. GRETL reports the F‑statistic and p‑value.restrict command for custom linear
hypotheses.Example
Testing whether the coefficients on STR² and STR³ are jointly zero gives
F = 6.17, p < 0.001 → reject linearity.
For a change from \(X_0\) to \(X_0 + \Delta X\),
\[ \Delta \widehat{\text{TestScore}} = \hat{f}(X_0 + \Delta X) - \hat{f}(X_0) \]
For the quadratic model: \(\hat{f}(X) = \hat{\beta}_1 + \hat{\beta}_2 X + \hat{\beta}_3 X^2\).
Example: Increase from $10,000 to $11,000 (Income from 10 to 11):
\[ \begin{aligned} \hat{f}(10) &= 607.3 + 3.85\times 10 - 0.0423\times 10^2 = 607.3 + 38.5 - 4.23 = 641.57 \\ \hat{f}(11) &= 607.3 + 3.85\times 11 - 0.0423\times 121 = 607.3 + 42.35 - 5.118 = 644.53 \\ \Delta \hat{Y} &= 644.53 - 641.57 = 2.96 \text{ points} \end{aligned} \]
From $40,000 to $41,000 (Income 40 → 41):
\[ \hat{f}(40) = 607.3 + 154 - 67.68 = 693.62,\quad \hat{f}(41) = 607.3 + 157.85 - 71.11 = 694.04 \] \[ \Delta \hat{Y} = 0.42 \text{ points} \]
Conclusion: The same dollar increase has a much larger effect in poor districts.
Many economic relationships are best expressed in
percentage terms.
Key fact: For small changes, \(\ln(X+\Delta X)
- \ln(X) \approx \Delta X / X\) (the percentage change).
We will use the exact percentage change formula throughout:
\[ \text{Percentage change in } Y = 100 \times (e^{\beta \cdot \Delta X} - 1)\% \]
For a binary or one‑unit change, this becomes \(100 \times (e^{\beta} - 1)\%\).
| Case | Specification | Interpretation of \(\beta_2\) (when \(\Delta X = 1\) or 1%) |
|---|---|---|
| Linear‑log | \(Y_i = \beta_1 + \beta_2 \ln(X_i) + \varepsilon_i\) | A 1% increase in \(X\) → change in \(Y\) = \(0.01\beta_2\) (in units of \(Y\)) |
| Log‑linear | \(\ln(Y_i) = \beta_1 + \beta_2 X_i + \varepsilon_i\) | A one‑unit increase in \(X\) → \(100 \times (e^{\beta_2} - 1)\%\) change in \(Y\) |
| Log‑log | \(\ln(Y_i) = \beta_1 + \beta_2 \ln(X_i) + \varepsilon_i\) | A 1% increase in \(X\) → \(\beta_2\%\) change in \(Y\) (elasticity) |
Important: For the log‑linear model, the coefficient \(\beta_2\) is not the percentage change directly. Use the exact formula.
The linear‑log model yields an exact calculation
because we use the difference in logs:
\(\Delta Y = \beta_2 [\ln(X+\Delta X) -
\ln(X)]\). No approximation is needed. The approximation \(\Delta Y \approx \beta_2 (\Delta X / X)\)
is only valid for small \(\Delta X /
X\).
\[ \text{TestScore}_i = \beta_1 + \beta_2 \ln(\text{Income}_i) + \varepsilon_i \]
Estimated equation
\[ \widehat{\text{TestScore}} = 557.8 + 36.42 \ln(\text{Income}) \]
\[ \Delta \hat{Y} = 36.42 \times [\ln(11) - \ln(10)] = 36.42 \times 0.09531 = 3.47 \text{ points} \]
This is exact. The approximation using \(\Delta X / X = 0.1\) would give \(36.42 \times 0.1 = 3.64\) points – slightly off (4.9% error).
\[ \ln(\text{TestScore}_i) = \beta_1 + \beta_2 \text{Income}_i + \varepsilon_i \]
Estimated: \(\widehat{\ln(\text{TestScore})} = 6.439 + 0.00284
\text{Income}\).
A $1,000 increase in income → exact percentage change = \(100 \times (e^{0.00284} - 1) \approx
0.284\%\).
\[ \ln(\text{TestScore}_i) = \beta_1 + \beta_2 \ln(\text{Income}_i) + \varepsilon_i \]
Estimated \(\widehat{\ln(\text{TestScore})}
= 6.336 + 0.0554 \ln(\text{Income})\).
A 1% increase in income → test scores increase by 0.0554% (the
elasticity).
Elasticity is a unit‑free measure. Unlike a slope that depends on the units of measurement (e.g., points per thousand dollars), elasticity compares percentage changes, so it works regardless of whether income is measured in dollars, thousands of dollars, or euros.
0.0554 is a small elasticity: test scores are not very responsive to income in percentage terms. But recall that test scores themselves are in points, not percentages – the log‑log model here is less natural than the linear‑log model for this application.
Now we consider a different type of nonlinearity: discontinuous jumps at certain thresholds. An educator claims:
Define dummy variables:
\[ \begin{aligned} \text{STRsmall}_i &= 1 \text{ if STR}_i < 20, \text{ else } 0 \\ \text{STRmoderate}_i &= 1 \text{ if } 20 \leq \text{STR}_i \leq 25, \text{ else } 0 \\ \text{STRlarge}_i &= 1 \text{ if STR}_i > 25, \text{ else } 0 \end{aligned} \]
For STRsmall:
strsmall = (str < 20)
For STRmoderate:
strmoderate = (str >= 20) && (str <= 25)
For STRlarge:
Add → Define new variable.
Expression: strlarge = (str > 25)
Click OK.
After creating all three, verify that each observation has exactly one dummy equal to 1 by summing them:
check = STRsmall + STRmoderate + STRlargeVariable → Summary statistics
for check). The mean should be 1, and the min/max should
also be 1.\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{STRsmall}_i + \beta_3 \text{STRlarge}_i + \varepsilon_i \]
STRmoderate (reference).To match the educator’s story: \(\beta_2
> 0\) (small classes higher scores), \(\beta_3 < 0\) (large classes lower
scores).
The regression function is three horizontal lines with jumps at STR = 20
and 25.
If we try to include all three dummies plus a constant:
\[ \text{TestScore}_i = \beta_1 + \beta_2 \text{STRsmall}_i + \beta_3 \text{STRmoderate}_i + \beta_4 \text{STRlarge}_i + \varepsilon_i \]
For every observation, exactly one dummy equals 1, so:
\[ \text{STRsmall}_i + \text{STRmoderate}_i + \text{STRlarge}_i = 1 \]
This means the constant term (which is always 1) is a perfect linear combination of the three dummies. This is perfect multicollinearity. The OLS estimator cannot compute unique coefficients; the software returns an error message.
Solution: Omit one dummy (e.g.,
STRmoderate). That’s why part (a) works.
The dummy variable trap is a classic example of perfect multicollinearity: one regressor is an exact linear function of others. Perfect multicollinearity prevents OLS estimation. It often arises from including all categories of a categorical variable along with an intercept.
PctEL (percentage of English learners)
and FracEL (fraction of English learners) because
PctEL = 100 * FracEL.NVSi).What is NVSi?
In Stock & Watson, NVSi stands for “not very small”
classes. It is a binary variable that equals 1 if the student‑teacher
ratio is 12 or more, and 0 otherwise. However, in the California data
set, the smallest STR is 14, so NVSi = 1 for every
district. Thus NVSi is perfectly collinear with the
constant term (since the constant is also 1 for all observations).
Trying to include NVSi along with an intercept causes
perfect multicollinearity.
PctEL and PctES (percentage of
English speakers) because PctES = 100 - PctEL. This also
involves the constant.PctEL and FracEL, keep only one of them.In GRETL, perfect multicollinearity triggers an error message. The software may automatically drop one variable, but you should understand why and make a conscious choice.
Imperfect multicollinearity occurs when two or more regressors are highly correlated but not perfectly. This does not prevent estimation, but it inflates standard errors, making coefficients imprecisely estimated.
PctEL and PctImmigrants
(percentage of first‑generation immigrants). These are positively
correlated because immigrant families often have children learning
English.In GRETL: Look for high pairwise correlations
(View → Correlation matrix). Also check variance inflation
factors (VIF) after regression
(Analysis → Collinearity diagnostics). A VIF above 10 (or
5, depending on context) indicates problematic multicollinearity.
Why it matters: Imperfect multicollinearity is not a mistake – it is a data limitation. You may need a larger sample or a different research design to precisely estimate the separate effects.
| Model | Dependent variable | R² (adj) | Interpretation |
|---|---|---|---|
| Linear | TestScore | 0.508 | Constant slope – too rigid |
| Quadratic | TestScore | 0.554 | Slope decreases with income |
| Cubic | TestScore | 0.555 | Slight improvement over quadratic |
| Linear‑log | TestScore | 0.561 | Best fit among level models |
| Log‑log | ln(TestScore) | 0.557 (on log scale) | Elasticity interpretation |
Practical advice:
Using the California test score data, estimate quadratic and
cubic regressions of TestScore on Income.
Perform an F‑test for the joint significance of Income² and Income³.
Which model do you prefer?
Estimate the linear‑log model: TestScore on
ln(Income). Compute the exact predicted change in test
scores when income increases from $15,000 to $20,000. Compare with the
approximation using \(0.01\beta_2
\times\) percentage change.
Create dummy variables for STR thresholds: STRsmall
(STR < 20), STRmoderate (20 ≤ STR ≤ 25),
STRlarge (STR > 25). Run the regression with
STRsmall and STRlarge only. Interpret the
coefficients. What does the constant represent?
(Multicollinearity) Create a new variable
PctEL2 = 2 * PctEL. Run a regression of
TestScore on PctEL and PctEL2.
What happens? Why? Then run a regression of TestScore on
PctEL and PctEL2 without a
constant. Does it work? Explain.
(Perfect multicollinearity) Create a variable NVSi
that equals 1 for all observations. Add it to the regression of
TestScore on Income. What happens?
Why?