Lesson 2.3- Further Topics in Linear Regression Analysis
Author
Norberto E. Milla, Jr.
Published
April 3, 2023
1 Effects of Data Scaling on OLS Statistics
Changing the units of measurements changes the OLS intercept and slope estimates
Why may we be interested in changing the units of measurements: cosmetic purposes, such as reducing the number of zeros on coefficient estimates, easier interpretation
Rescaling data does not change the testing outcomes
Rescaling data does not change the significance of coefficient estimates:
t statistics do not change
R^2 remains the same
F test statistic remains the same
SSE and SSR would change if we rescale the data
1.1 Examples
Consider the following model:
bwght = \beta_0 + \beta_1 cigs + \beta_2 faminc + u
where bwght is measured in ounces, cigs is the number of cigarettes smoked per day, faminc is measured in 1000USD
Let us fit first the model with the variables in their original units
Code
mod1 <-lm(bwght ~ cigs + faminc, data = bwght)summary(mod1)
Call:
lm(formula = bwght ~ cigs + faminc, data = bwght)
Residuals:
Min 1Q Median 3Q Max
-96.061 -11.543 0.638 13.126 150.083
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 116.97413 1.04898 111.512 < 2e-16 ***
cigs -0.46341 0.09158 -5.060 4.75e-07 ***
faminc 0.09276 0.02919 3.178 0.00151 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.06 on 1385 degrees of freedom
Multiple R-squared: 0.0298, Adjusted R-squared: 0.0284
F-statistic: 21.27 on 2 and 1385 DF, p-value: 7.942e-10
Let us fit another model with the unit of measurement for the dependent variable rescaled from ounces to grams.
Call:
lm(formula = scale(wage) ~ -1 + scale(IQ) + scale(KWW) + scale(educ) +
scale(exper) + scale(tenure), data = wage2)
Residuals:
Min 1Q Median 3Q Max
-2.0435 -0.6031 -0.1109 0.4472 5.5726
Coefficients:
Estimate Std. Error t value Pr(>|t|)
scale(IQ) 0.13761 0.03591 3.832 0.000135 ***
scale(KWW) 0.15623 0.03450 4.528 6.71e-06 ***
scale(educ) 0.25679 0.03962 6.481 1.48e-10 ***
scale(exper) 0.12830 0.03513 3.652 0.000275 ***
scale(tenure) 0.07840 0.03082 2.544 0.011113 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9031 on 930 degrees of freedom
Multiple R-squared: 0.1878, Adjusted R-squared: 0.1835
F-statistic: 43.01 on 5 and 930 DF, p-value: < 2.2e-16
A one standard deviation increase in IQ corresponds to a 0.138 standard deviation increase in monthly earnings
A one standard deviation increase in knowledge corresponds to a 0.156 standard deviation increase in monthly earnings
A one standard deviation increase in education leads to a 0.257 standard deviation increase in monthly earnings
A one standard deviation increase in experience leads to a 0.128 standard deviation increase in monthly earnings
A one standard deviation increase in tenure corresponds to a 0.078 standard deviation increase in monthly earnings
3 More on logarithmic transformation
In our previous lectures, we learned how to allow for nonlinear relationships between variables using logarithmic transformation
There are many advantages of using logarithms of strictly positive variables (y > 0)
Interpretation of coefficients is easier: independent of the units of measurements of x’s (elasticity or semi-elasticity)
When y > 0, log(y) often satisfies CLM assumptions more closely than y in levels. Strictly positive variables (prices, income, etc.) often have heteroscedastic or skewed distributions. Taking logs can mitigate these problems.
Log transformation reduces or eliminates skewness and reduces variance
Taking logs narrows the range of the variable leading to estimates which are less sensitive to outliers
3.1 Some rules of thumb for taking logs
Strictly positive variables such as wage, income, population, production, sales etc. are generally included in the model using log transformation
Proportions or rates such as unemployment rate, interest rate, etc. usually appear in their original form. But sometimes they may be included in log form if strictly positive.
If the variable takes nonnegative values (y > 0), i.e. it is 0 for some observations, we cannot use log transformation because log(0) is not defined
In this case we can use log(1 + y) transformation instead of log(y)
We cannot compare the R^2s from two models in which we have log(y) as the dependent variable in one of the models and y in the other
4 Quadratic models
Quadratic functions are generally used to capture decreasing or increasing marginal effects
In quadratic models slope coefficient is not constant: it depends on the value of x
\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x + \hat{\beta}_2 x^2
If \beta_1 > 0 and \beta_2 < 0 then the relationship is concave downward
If \beta_1 < 0 and \beta_2 > 0 then the relationship is concave upward
The slope between x and y can be approximated as follows
\Delta\hat{y} \approx (\hat{\beta}_1 + 2\hat{\beta}_2 x) \Delta x \implies \frac{\Delta\hat{y}}{\Delta x} \approx \hat{\beta}_1 + 2\hat{\beta}_2 x
If x = 0, then \hat{\beta}_1 is the slope estimated for the change from x = 0 to x = 1. For values larger than x = 1 we need to consider the second term
House value and rooms: First decreasing then increasing
As the number of rooms changes from 3 to 4, price is predicted to change by:
\frac{\Delta \widehat{log(price)}}{\Delta rooms} = -0.5451 + 2 \times 0.0623 (3) = -0.1713 \implies 17.13\%
At rooms = 3, an additional room leads to approximately 17.13% decrease in price
\widehat{stndfnl} = \underset{(1.36)}{2.05} - \underset{(0.0102)}{0.0067} \; atndrte - \underset{(0.48)}{1.63} \; priGPA - \underset{(0.098)}{0.128} \; ACT + \underset{(0.101)}{0.296 \; priGPA^2} \\
+ \underset{(0.0022)}{0.0045}\; ACT^2 + \underset{(0.0043)}{0.0056}\; priGPA \times atndrte
where stndfnl: Standardized final score; atndrte: attendance rate (%); priGPA: cumulative GPA in the previous semester (out of 4); ACT: achievement test score
The coeffcient estimate on atndrte (-0.0067) measures the impact when priGPA = 0.
Since there is no 0 in priGPA its sign is unimportant. This coefficient alone does not measure the impact of attendance rate because there is interaction term with priGPA.
We need to take into account the interaction term (\beta_6). Note that \beta_1 and \beta_6 cannot pass individual t-statistics but they are jointly significant [Verify that H_0: \beta_1 = \beta_6 = 0 can be rejected using F test with p-value = 0.014]
The sample mean of priGPA is 2.59. Using this we get:
Interpretation: At the mean GPA, (priGPA = 2.59), a 10 percentage point increase in atndrte increases stndfnl by 0.078 standard deviations from the mean final score.
The partial effect of the attendance rate at the mean GPA is estimated as 0.0078. Is this effect statistically different from zero?
To test this we will re-estimate the model using (priGPA - 2.59) \times atndrte instead of priGPA \times atndrte
In this regression, the coefficient estimate on atndrte (ie., \hat{\beta}_1) will measure the predicted partial effect when priGPA is fixed at its mean 2.59
When we plug in particular x values into the model above we obtain a prediction for y which is an estimate of the expected value of y given the particular values for the explanatory variables, E(y|x).
Let particular values be x_1 = c_1, x_2 = c_2, \cdots, x_k = c_k. Also let the prediction value for y be \theta_0
The standard error and the confidence interval computed above are for the average value of y for the subpopulation with a given set of covariates
This is not the same as the confidence interval for the individual predictions of y (out-sample)
In forming a CI for an unknown outcome on y, we must account for another source of variation: the variance in the unobserved error u in addition to the variance in \hat{y}
Let y^0 represent a new cross-sectional unit (individual, firm, region, country, etc.) not in our original sample:
The confidence intervals for the individual predictions will be much wider than the CI for the conditional average of y. The reason is that \sigma^2 is much larger than Var(\hat{y}^0)
7.1 Example
Suppose that we want to construct a 95% CI for the colGPA of a high school student with sat = 1200; hsperc = 30; hsize = 5.
Plugging these values in the regression model we obtain colGPA = 2.70 (\hat{y}^0) as before
From Example 6.1 we have se(\hat{y}^0) = 0.02 and RMSE = \hat{\sigma} = 0.56. Thus, se(\hat{e}^0) = \sqrt{0.02^2 + 0.56^2} = 0.56
The 95% CI is (2.70 \pm 1.96 \times0.56) \implies (1.6, 3.8)
This is a very wide confidence interval. It is so wide that it is almost impossible to accurately pin down an individual’s future college grade point average