Stat 115s (Introduction to Econometrics)

Lesson 2.3- Further Topics in Linear Regression Analysis

Author

Norberto E. Milla, Jr.

Published

April 3, 2023

1 Effects of Data Scaling on OLS Statistics

  • Changing the units of measurements changes the OLS intercept and slope estimates

  • Why may we be interested in changing the units of measurements: cosmetic purposes, such as reducing the number of zeros on coefficient estimates, easier interpretation

  • Rescaling data does not change the testing outcomes

  • Rescaling data does not change the significance of coefficient estimates:

    • t statistics do not change
    • R^2 remains the same
    • F test statistic remains the same
    • SSE and SSR would change if we rescale the data

1.1 Examples

Consider the following model: bwght = \beta_0 + \beta_1 cigs + \beta_2 faminc + u where bwght is measured in ounces, cigs is the number of cigarettes smoked per day, faminc is measured in 1000USD

Let us fit first the model with the variables in their original units

Code
mod1 <- lm(bwght ~ cigs + faminc, data = bwght)
summary(mod1)

Call:
lm(formula = bwght ~ cigs + faminc, data = bwght)

Residuals:
    Min      1Q  Median      3Q     Max 
-96.061 -11.543   0.638  13.126 150.083 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 116.97413    1.04898 111.512  < 2e-16 ***
cigs         -0.46341    0.09158  -5.060 4.75e-07 ***
faminc        0.09276    0.02919   3.178  0.00151 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.06 on 1385 degrees of freedom
Multiple R-squared:  0.0298,    Adjusted R-squared:  0.0284 
F-statistic: 21.27 on 2 and 1385 DF,  p-value: 7.942e-10

Let us fit another model with the unit of measurement for the dependent variable rescaled from ounces to grams.

Code
bwght <- bwght %>% 
  mutate(bwght.grams = bwght*28.3495231)
mod2 <- lm(bwght.grams ~ cigs + faminc, data = bwght)
summary(mod2)

Call:
lm(formula = bwght.grams ~ cigs + faminc, data = bwght)

Residuals:
    Min      1Q  Median      3Q     Max 
-2723.3  -327.2    18.1   372.1  4254.8 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3316.1608    29.7382 111.512  < 2e-16 ***
cigs         -13.1374     2.5962  -5.060 4.75e-07 ***
faminc         2.6298     0.8275   3.178  0.00151 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 568.8 on 1385 degrees of freedom
Multiple R-squared:  0.0298,    Adjusted R-squared:  0.0284 
F-statistic: 21.27 on 2 and 1385 DF,  p-value: 7.942e-10

Next, we change cigs to packs.

Code
bwght <- bwght %>% 
  mutate(cig.packs = cigs/20)
mod3 <- lm(bwght.grams ~ packs + faminc, data = bwght)
summary(mod3)

Call:
lm(formula = bwght.grams ~ packs + faminc, data = bwght)

Residuals:
    Min      1Q  Median      3Q     Max 
-2723.3  -327.2    18.1   372.1  4254.8 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3316.1608    29.7382 111.512  < 2e-16 ***
packs       -262.7477    51.9232  -5.060 4.75e-07 ***
faminc         2.6298     0.8275   3.178  0.00151 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 568.8 on 1385 degrees of freedom
Multiple R-squared:  0.0298,    Adjusted R-squared:  0.0284 
F-statistic: 21.27 on 2 and 1385 DF,  p-value: 7.942e-10

Lastly, we change change faminc to dollars instead of 1000USD

Code
bwght <- bwght %>% 
  mutate(famincdol = faminc*1000)
mod4 <- lm(bwght.grams ~ packs + famincdol, data = bwght)
summary(mod4)

Call:
lm(formula = bwght.grams ~ packs + famincdol, data = bwght)

Residuals:
    Min      1Q  Median      3Q     Max 
-2723.3  -327.2    18.1   372.1  4254.8 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.316e+03  2.974e+01 111.512  < 2e-16 ***
packs       -2.627e+02  5.192e+01  -5.060 4.75e-07 ***
famincdol    2.630e-03  8.275e-04   3.178  0.00151 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 568.8 on 1385 degrees of freedom
Multiple R-squared:  0.0298,    Adjusted R-squared:  0.0284 
F-statistic: 21.27 on 2 and 1385 DF,  p-value: 7.942e-10

2 Standardized Regression

Comparing the magnitudes of coefficients is irrelevant if one of the following is true:

  1. The variables are measured on the same scale, but it does not make intuitive sense to compare the magnitudes.

  2. The variables are not measured on the same scale.

Define the following variables:

\begin{align} z_y &= \frac{y - \overline{y}}{s_y} \notag \\ z_1 &= \frac{x_1 - \overline{x}_1}{s_1} \notag \\ z_2 &= \frac{x_2 - \overline{x}_2}{s_2} \notag \\ &\vdots \notag \\ z_k &= \frac{x_k - \overline{x}_k}{s_k} \notag \end{align}

Then we have the standardized regression model:

z_y = \beta_1z_1 + \beta_2z_2 + \cdots + \beta_kz_k +u

  • Slope coefficients are known as standardized coefficients or beta coefficients

  • Interpretation: In response to a one standard deviation change in x_j, y is expected to change by \beta_j standard deviations

  • Original units of measurements are irrelevant. They are now measured in terms of standard deviations

  • Effects of changes in the explanatory variables on the dependent variable can now be easily compared

2.1 Example

Dependent variable: total monthly earnings (wage)

Explanatory Variables:

  • IQ: a measure of intelligence
  • KWW: measure of knowledge of their job
  • educ: years of education
  • exper: years of experience
  • tenure: years at current job

Unstandardized model

Code
lm(wage ~ IQ + KWW + educ + exper + tenure,
      data=wage2) %>% 
  summary()

Call:
lm(formula = wage ~ IQ + KWW + educ + exper + tenure, data = wage2)

Residuals:
    Min      1Q  Median      3Q     Max 
-826.33 -243.85  -44.83  180.83 2253.35 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -531.0392   115.0513  -4.616 4.47e-06 ***
IQ             3.6966     0.9651   3.830 0.000137 ***
KWW            8.2703     1.8273   4.526 6.79e-06 ***
educ          47.2698     7.2980   6.477 1.51e-10 ***
exper         11.8589     3.2494   3.650 0.000277 ***
tenure         6.2465     2.4565   2.543 0.011156 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 365.4 on 929 degrees of freedom
Multiple R-squared:  0.1878,    Adjusted R-squared:  0.1834 
F-statistic: 42.97 on 5 and 929 DF,  p-value: < 2.2e-16

Standardized model

Code
lm(scale(wage) ~ -1 + scale(IQ) + scale(KWW) + scale(educ) + scale(exper) + scale(tenure), data=wage2) %>% 
  summary()

Call:
lm(formula = scale(wage) ~ -1 + scale(IQ) + scale(KWW) + scale(educ) + 
    scale(exper) + scale(tenure), data = wage2)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0435 -0.6031 -0.1109  0.4472  5.5726 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
scale(IQ)      0.13761    0.03591   3.832 0.000135 ***
scale(KWW)     0.15623    0.03450   4.528 6.71e-06 ***
scale(educ)    0.25679    0.03962   6.481 1.48e-10 ***
scale(exper)   0.12830    0.03513   3.652 0.000275 ***
scale(tenure)  0.07840    0.03082   2.544 0.011113 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9031 on 930 degrees of freedom
Multiple R-squared:  0.1878,    Adjusted R-squared:  0.1835 
F-statistic: 43.01 on 5 and 930 DF,  p-value: < 2.2e-16
  • A one standard deviation increase in IQ corresponds to a 0.138 standard deviation increase in monthly earnings

  • A one standard deviation increase in knowledge corresponds to a 0.156 standard deviation increase in monthly earnings

  • A one standard deviation increase in education leads to a 0.257 standard deviation increase in monthly earnings

  • A one standard deviation increase in experience leads to a 0.128 standard deviation increase in monthly earnings

  • A one standard deviation increase in tenure corresponds to a 0.078 standard deviation increase in monthly earnings

3 More on logarithmic transformation

  • In our previous lectures, we learned how to allow for nonlinear relationships between variables using logarithmic transformation

  • There are many advantages of using logarithms of strictly positive variables (y > 0)

  • Interpretation of coefficients is easier: independent of the units of measurements of x’s (elasticity or semi-elasticity)

  • When y > 0, log(y) often satisfies CLM assumptions more closely than y in levels. Strictly positive variables (prices, income, etc.) often have heteroscedastic or skewed distributions. Taking logs can mitigate these problems.

  • Log transformation reduces or eliminates skewness and reduces variance

  • Taking logs narrows the range of the variable leading to estimates which are less sensitive to outliers

3.1 Some rules of thumb for taking logs

  • Strictly positive variables such as wage, income, population, production, sales etc. are generally included in the model using log transformation

  • Proportions or rates such as unemployment rate, interest rate, etc. usually appear in their original form. But sometimes they may be included in log form if strictly positive.

  • If the variable takes nonnegative values (y > 0), i.e. it is 0 for some observations, we cannot use log transformation because log(0) is not defined

    • In this case we can use log(1 + y) transformation instead of log(y)
  • We cannot compare the R^2s from two models in which we have log(y) as the dependent variable in one of the models and y in the other

4 Quadratic models

  • Quadratic functions are generally used to capture decreasing or increasing marginal effects

  • In quadratic models slope coefficient is not constant: it depends on the value of x

\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x + \hat{\beta}_2 x^2

  • If \beta_1 > 0 and \beta_2 < 0 then the relationship is concave downward

  • If \beta_1 < 0 and \beta_2 > 0 then the relationship is concave upward

  • The slope between x and y can be approximated as follows

\Delta\hat{y} \approx (\hat{\beta}_1 + 2\hat{\beta}_2 x) \Delta x \implies \frac{\Delta\hat{y}}{\Delta x} \approx \hat{\beta}_1 + 2\hat{\beta}_2 x

  • If x = 0, then \hat{\beta}_1 is the slope estimated for the change from x = 0 to x = 1. For values larger than x = 1 we need to consider the second term

4.1 Example 1

Code
wage1 <- wage1 |> 
  mutate(expersq = exper^2)

lm(wage ~ exper + expersq, data = wage1) |>
summary()

Call:
lm(formula = wage ~ exper + expersq, data = wage1)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5916 -2.1440 -0.8603  1.1801 17.7649 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.7254058  0.3459392  10.769  < 2e-16 ***
exper        0.2981001  0.0409655   7.277 1.26e-12 ***
expersq     -0.0061299  0.0009025  -6.792 3.02e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.524 on 523 degrees of freedom
Multiple R-squared:  0.09277,   Adjusted R-squared:  0.0893 
F-statistic: 26.74 on 2 and 523 DF,  p-value: 8.774e-12

The summary of the estimated equation is given below: \widehat{wage} = 3.73 + 0.298 \; exper - 0.0061\; exper^2

  • The regression above implies that exper has a diminishing marginal effect on wage

  • Slope estimate is:

\frac{\Delta\widehat{wage}}{\Delta exper} \approx 0.298 - (2 \times 0.0061)exper - The first year of experience is worth approximately $0.298

  • The second year of experience is worth less:

\frac{\Delta\widehat{wage}}{\Delta exper} \approx 0.298 - (2 \times 0.0061)(1) = 0.286

  • If exper changes from 10 to 11, wage is predicted to change by:

\frac{\Delta\widehat{wage}}{\Delta exper} \approx 0.298 - (2 \times 0.0061)(10) = 0.176

  • Turning point/point of infliction:

\frac{\Delta \hat{y}}{\Delta x} \approx \hat{\beta}_1 + 2\hat{\beta}_2 x = 0 \implies x^{*} =\Big| \frac{\hat{\beta}_1}{2\hat{\beta}_2}\Big|

  • Estimated turning point for the wage-exper relationship:

exper^{*} = \Big|\frac{0.298}{ - 2 \times 0.0061} \Big |= 24.4

4.2 Example 2

Code
hprice2 <- hprice2 %>% 
  mutate(roomsq=rooms^2)

lm(log(price) ~ log(nox) + log(dist) + stratio + rooms + roomsq, data = hprice2)|>
  summary()

Call:
lm(formula = log(price) ~ log(nox) + log(dist) + stratio + rooms + 
    roomsq, data = hprice2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.04285 -0.12774  0.02038  0.12650  1.25272 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.385477   0.566473  23.630  < 2e-16 ***
log(nox)    -0.901682   0.114687  -7.862 2.34e-14 ***
log(dist)   -0.086781   0.043281  -2.005  0.04549 *  
stratio     -0.047590   0.005854  -8.129 3.42e-15 ***
rooms       -0.545113   0.165454  -3.295  0.00106 ** 
roomsq       0.062261   0.012805   4.862 1.56e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2592 on 500 degrees of freedom
Multiple R-squared:  0.6028,    Adjusted R-squared:  0.5988 
F-statistic: 151.8 on 5 and 500 DF,  p-value: < 2.2e-16

Summary of the results is given below:

\widehat{log(price)} = \underset{(0.566)}{13.386} - \underset{(0.115)}{0.902} \;log(nox) - \underset{(0.043)}{0.0868}\; log(dist) - \underset{(0.0059)}{0.0476}\; stratio \\ - \underset{(0.1655)}{0.5451}\; rooms + \underset{(0.0128)}{0.0623}\; rooms^2

  • House value and rooms: First decreasing then increasing

  • As the number of rooms changes from 3 to 4, price is predicted to change by: \frac{\Delta \widehat{log(price)}}{\Delta rooms} = -0.5451 + 2 \times 0.0623 (3) = -0.1713 \implies 17.13\%

    • At rooms = 3, an additional room leads to approximately 17.13% decrease in price
  • Turning point:

rooms^{*} = \Big|\frac{-0.5451}{2 \times 0.0623}\Big| \approx 4.4

The impact of an additional room on price:

\Delta \widehat{log(price)} = [-0.5451 + 2\times0.0623 \times rooms] \Delta rooms

Or,

\begin{align} \% \Delta \widehat{price} &= 100 \times [-0.5451 + 2\times 0.0623 \times rooms] \Delta rooms \notag \\ &= [-54.51 + 2\times 6.23 \times rooms] \Delta rooms \notag \end{align}

  • For example as the number of rooms changes from 5 to 6, price increases by -54.51 + 12.46 \times 5 = 7.79\%. Notice that here, \Delta rooms = 1

  • Going from 6 to 7: -54.51 + 12.46 \times 6 = 20.25\%.

  • Going from 5 to 7: (-54.51 + 12.46 \times 5) \times 2 = 15.58\%. Here \Delta rooms = 2

5 Models with interaction terms

  • In some cases, the partial impact of one variable may depend on the magnitude of another explanatory variable

  • To capture this we add interaction terms into the regression model

y = \beta_0 + \beta_1x_1 +\beta_2x_2 + \beta_3\underbrace{x_1x_2}_{interaction} + \beta_4 x_3 + u

  • Interaction variables: x_1 and x_2. The partial impact of x_1 on y depends on x_2

\frac{\Delta y}{\Delta x_1} = \beta_1 + \beta_3x_2

  • To compute this interaction effect we need to plug in a value for x_2. In practice, we generally use mean or median of x_2.

  • Similarly, the partial impact of x_2 depends on x_1:

\frac{\Delta y}{\Delta x_2} = \beta_2 + \beta_3x_1

  • Let the sample mean of x_2 be \overline{x}_2. Using this value we have:

\frac{\Delta y}{\Delta x_1} = \beta_1 + \beta_3 \overline{x}_2

  • This gives us the interaction effect at x_2 = \overline{x}_2. Is this effect statistically significant?

  • To test this we rewrite the model using x_1 \times (x_2 - \overline{x}_2) instead of x_1 x_2:

y = \beta_0 + \beta_1x_1 +\beta_2x_2 + \beta_3\underbrace{x_1 \times (x_2 - \overline{x}_2)}_{interaction} + \beta_4 x_3 + u

  • Simple significance t-test: H_0: \beta_1 = 0

  • Other effects can be tested similarly

5.1 Example

Code
attend <- attend |>
  mutate(priGPA2=priGPA^2, ACT2 = ACT^2, priGPA.atndrte = priGPA*atndrte)

lm(stndfnl ~ atndrte + priGPA + ACT + priGPA2 + ACT2 + priGPA.atndrte,
   data=attend) |>
  summary()

Call:
lm(formula = stndfnl ~ atndrte + priGPA + ACT + priGPA2 + ACT2 + 
    priGPA.atndrte, data = attend)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1698 -0.5316 -0.0177  0.5737  2.3344 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     2.050293   1.360319   1.507 0.132225    
atndrte        -0.006713   0.010232  -0.656 0.512005    
priGPA         -1.628540   0.481003  -3.386 0.000751 ***
ACT            -0.128039   0.098492  -1.300 0.194047    
priGPA2         0.295905   0.101049   2.928 0.003523 ** 
ACT2            0.004533   0.002176   2.083 0.037634 *  
priGPA.atndrte  0.005586   0.004317   1.294 0.196173    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8729 on 673 degrees of freedom
Multiple R-squared:  0.2287,    Adjusted R-squared:  0.2218 
F-statistic: 33.25 on 6 and 673 DF,  p-value: < 2.2e-16

Summary of the estimated equation:

\widehat{stndfnl} = \underset{(1.36)}{2.05} - \underset{(0.0102)}{0.0067} \; atndrte - \underset{(0.48)}{1.63} \; priGPA - \underset{(0.098)}{0.128} \; ACT + \underset{(0.101)}{0.296 \; priGPA^2} \\ + \underset{(0.0022)}{0.0045}\; ACT^2 + \underset{(0.0043)}{0.0056}\; priGPA \times atndrte where stndfnl: Standardized final score; atndrte: attendance rate (%); priGPA: cumulative GPA in the previous semester (out of 4); ACT: achievement test score

  • The coeffcient estimate on atndrte (-0.0067) measures the impact when priGPA = 0.

  • Since there is no 0 in priGPA its sign is unimportant. This coefficient alone does not measure the impact of attendance rate because there is interaction term with priGPA.

  • We need to take into account the interaction term (\beta_6). Note that \beta_1 and \beta_6 cannot pass individual t-statistics but they are jointly significant [Verify that H_0: \beta_1 = \beta_6 = 0 can be rejected using F test with p-value = 0.014]

  • The sample mean of priGPA is 2.59. Using this we get:

\Delta \widehat{stdfnl} = -0.0067 + 0.0056(2.59) = 0.0078

  • Interpretation: At the mean GPA, (priGPA = 2.59), a 10 percentage point increase in atndrte increases stndfnl by 0.078 standard deviations from the mean final score.

  • The partial effect of the attendance rate at the mean GPA is estimated as 0.0078. Is this effect statistically different from zero?

  • To test this we will re-estimate the model using (priGPA - 2.59) \times atndrte instead of priGPA \times atndrte

  • In this regression, the coefficient estimate on atndrte (ie., \hat{\beta}_1) will measure the predicted partial effect when priGPA is fixed at its mean 2.59

Code
attend <- attend %>% 
  mutate(priGPA.cntrd = priGPA-2.59, priGPAcntrd.atndrte = priGPA.cntrd*atndrte)

lm(stndfnl ~ atndrte + priGPA + ACT + priGPA2 + ACT2 + priGPAcntrd.atndrte,
   data=attend) |>
  summary()

Call:
lm(formula = stndfnl ~ atndrte + priGPA + ACT + priGPA2 + ACT2 + 
    priGPAcntrd.atndrte, data = attend)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1698 -0.5316 -0.0177  0.5737  2.3344 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)          2.050293   1.360319   1.507 0.132225    
atndrte              0.007755   0.002639   2.938 0.003415 ** 
priGPA              -1.628540   0.481003  -3.386 0.000751 ***
ACT                 -0.128039   0.098492  -1.300 0.194047    
priGPA2              0.295905   0.101049   2.928 0.003523 ** 
ACT2                 0.004533   0.002176   2.083 0.037634 *  
priGPAcntrd.atndrte  0.005586   0.004317   1.294 0.196173    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8729 on 673 degrees of freedom
Multiple R-squared:  0.2287,    Adjusted R-squared:  0.2218 
F-statistic: 33.25 on 6 and 673 DF,  p-value: < 2.2e-16

Summary of the estimated equation:

\widehat{stndfnl} = \underset{(1.36)}{2.05} - \underset{(0.0026)}{0.0078} \; atndrte - \underset{(0.481)}{1.6285} \; priGPA + \underset{(0.101)}{0.2959} priGPA^2 - \underset{(0.098)}{0.1280} ACT \\ + \underset{(0.0022)}{0.0045}ACT^2 + \underset{(0.004)}{0.0056}(priGPA - 2.59) \times atndrte

  • Test:

    • H_0: \beta_1 = 0.
    • t = \frac{0.0078}{0.0026} = 2.938 with p-value = 0.0034.
    • Therefore we reject H_0.

6 Prediction of the mean of Y given covariates

\hat{y} = \hat{\beta}_0 + \hat{\beta}_1x_1 + \hat{\beta}_2x_2 + \cdots + \hat{\beta}_k x_k

  • When we plug in particular x values into the model above we obtain a prediction for y which is an estimate of the expected value of y given the particular values for the explanatory variables, E(y|x).

  • Let particular values be x_1 = c_1, x_2 = c_2, \cdots, x_k = c_k. Also let the prediction value for y be \theta_0

\begin{align} \theta_0 &= \beta_0 + \beta_1c_1 + \beta_2c_2 + \cdots + \beta_k c_k \notag \\ &=E(y|x_1 = c_1, x_2 = c_2, \cdots, x_k = c_k) \notag \end{align}

  • The OLS estimator of \theta_0 is:

\hat{\theta}_0 = \hat{\beta}_0 + \hat{\beta}_1c_1 + \hat{\beta}_2c_2 + \cdots + \hat{\beta}_k c_k

  • 95% confidence interval for \theta_0: \hat{\theta}_0 \pm 1.96 \times se(\hat{\theta}_0)

  • To compute this we need the standard error of \hat{\theta}_0.

  • This standard error can easily be calculated using an auxiliary regression. By definition

\beta_0 = \theta_0 - \beta_1c_1 - \beta_2c_2 - \cdots - \beta_k c_k

  • Substituting into the model and rearranging we get

y = \theta_0 + \beta_1(x_1 - c_1) + \beta_2 (x_2 - c_2) + \cdots + \beta_k (x_k - c_k) + u

  • The standard error on the intercept estimate will give us the standard error of the prediction

  • The variance of \hat{\theta}_0 reaches its smallest value at the arithmetic means of x variables (c_j = \overline{x}_j).

  • Thus, as the values of c_j get farther away from the \overline{x}_j , Var(\hat{y}) gets larger and larger

6.1 Example

Let

  • colgpa: GPA
  • sat: combined SAT score
  • hsperc: high school percentile, from top
  • hsize: size graduating class (in 100s)
Code
gpa2 <- gpa2 %>% 
  mutate(hsize2=hsize^2)

lm(colgpa ~ sat + hsperc + hsize + hsize2, data = gpa2) %>% 
  summary()

Call:
lm(formula = colgpa ~ sat + hsperc + hsize + hsize2, data = gpa2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.57543 -0.35081  0.03342  0.39945  1.81683 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.493e+00  7.534e-02  19.812  < 2e-16 ***
sat          1.492e-03  6.521e-05  22.886  < 2e-16 ***
hsperc      -1.386e-02  5.610e-04 -24.698  < 2e-16 ***
hsize       -6.088e-02  1.650e-02  -3.690 0.000228 ***
hsize2       5.460e-03  2.270e-03   2.406 0.016191 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5599 on 4132 degrees of freedom
Multiple R-squared:  0.2781,    Adjusted R-squared:  0.2774 
F-statistic:   398 on 4 and 4132 DF,  p-value: < 2.2e-16

Summary of the estimated equation:

\widehat{colgpa} = \underset{(0.075)}{1.493} + \underset{(0.00007)}{0.00149}\; sat - \underset{(0.00056)}{0.01386}\;hsperc - \underset{(0.01650)}{0.06088}\;hsize + \underset{(0.00227)}{0.00546}\;hsize^2

  • What is expected GPA for a student with sat = 1200; hsperc = 30; hsize = 5?

  • Plugging these prediction points into the estimated regression we get colGPA = 2.70.

  • To compute the standard error of this prediction we run an auxiliary (mean centered) regression. Define:

    • sat0 = sat - 1200
    • hsperc0 = hsperc - 30
    • hsize0 = hsize - 5
    • hsize20 = hsize2 - 25
  • Then, we regress colGPA on these variables.

Code
gpa2 <- gpa2 %>% 
  mutate(sat0 = sat - 1200, hsperc0 = hsperc - 30, 
         hsize0 = hsize - 5, hsize20 = hsize2 - 25)

lm(colgpa ~ sat0 + hsperc0 + hsize0 + hsize20, data = gpa2) %>% 
  summary()

Call:
lm(formula = colgpa ~ sat0 + hsperc0 + hsize0 + hsize20, data = gpa2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.57543 -0.35081  0.03342  0.39945  1.81683 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.700e+00  1.988e-02 135.833  < 2e-16 ***
sat0         1.492e-03  6.521e-05  22.886  < 2e-16 ***
hsperc0     -1.386e-02  5.610e-04 -24.698  < 2e-16 ***
hsize0      -6.088e-02  1.650e-02  -3.690 0.000228 ***
hsize20      5.460e-03  2.270e-03   2.406 0.016191 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5599 on 4132 degrees of freedom
Multiple R-squared:  0.2781,    Adjusted R-squared:  0.2774 
F-statistic:   398 on 4 and 4132 DF,  p-value: < 2.2e-16

Summary of the estimated equation:

\widehat{colgpa} = \underset{(0.020)}{2.70} + \underset{(0.00007)}{0.00149}\; sat0 - \underset{(0.00056)}{0.01386}\;hsperc0 - \underset{(0.01650)}{0.06088}\;hsize0 + \underset{(0.00227)}{0.00546}\;hsizesq0

  • 95% Confidence Interval: 2.70 \pm 1:96(0.020) = [2.66; 2.74].

7 Prediction of a new observation

  • The standard error and the confidence interval computed above are for the average value of y for the subpopulation with a given set of covariates

  • This is not the same as the confidence interval for the individual predictions of y (out-sample)

  • In forming a CI for an unknown outcome on y, we must account for another source of variation: the variance in the unobserved error u in addition to the variance in \hat{y}

  • Let y^0 represent a new cross-sectional unit (individual, firm, region, country, etc.) not in our original sample:

y^0 = \beta_0 + \beta_1 x_1^0 + \beta_2 x_2^0 + \cdots + \beta_k x_k^0 + u^0

  • The OLS prediction of y^0 at the values x_j^0:

\hat{y}^0 = \hat{\beta}_0 + \hat{\beta}_1 x_1^0 + \hat{\beta}_2 x_2^0 + \cdots + \hat{\beta}_k x_k^0

  • The prediction error is

\hat{e}^0 = y^0 - \hat{y}^0 = \beta_0 + \beta_1 x_1^0 + \beta_2 x_2^0 + \cdots + \beta_k x_k^0 + u^0 - \hat{y}^0

  • Taking expectations we obtain

E(\hat{e}^0) = 0

  • The variance of the prediction error

Var(\hat{e}^0) = Var(\hat{y}^0) + Var(u^0) = Var(\hat{y}^0) + \sigma^2

  • Var(\hat{y}^0) is inversely related to the sample size n. It gets smaller as n increases.

  • \sigma^2 is the variance of the unobserved error term. It does not decrease as n increases.

  • Thus, \sigma^2 is the dominant term in the variance of the prediction error

  • The standard error of the prediction error:

se(\hat{e}^0) = \sqrt{Var(\hat{y}^0) + \hat{\sigma}^2}

  • 95% CI: \hat{y}^0 \pm 1.96 \times se(\hat{e}^0)

  • The confidence intervals for the individual predictions will be much wider than the CI for the conditional average of y. The reason is that \sigma^2 is much larger than Var(\hat{y}^0)

7.1 Example

Suppose that we want to construct a 95% CI for the colGPA of a high school student with sat = 1200; hsperc = 30; hsize = 5.

  • Plugging these values in the regression model we obtain colGPA = 2.70 (\hat{y}^0) as before

  • From Example 6.1 we have se(\hat{y}^0) = 0.02 and RMSE = \hat{\sigma} = 0.56. Thus, se(\hat{e}^0) = \sqrt{0.02^2 + 0.56^2} = 0.56

  • The 95% CI is (2.70 \pm 1.96 \times0.56) \implies (1.6, 3.8)

  • This is a very wide confidence interval. It is so wide that it is almost impossible to accurately pin down an individual’s future college grade point average