October 20, 2016

Scaling data

  • If we're estimating the model lm(height ~ weight,data) using weight in lbs and height in inches, will we find the same results when we convert to metric data?

Scaling data

  • The first model estimated \(\frac{\Delta inches}{\Delta pounds}\).
  • Now we're estimating \(\frac{\Delta cm}{\Delta kg}\).
  • The numbers should change, but they should tell the same story.
  • All our hypothesis tests will be exactly the same.
  • The units don't really matter.

Scaling data

  • Since the units don't really matter, if we have units that are hard to interpret, we might want to use easier to understand units.
  • Normalized coefficients (Wooldridge calls these Beta coefficients) are found by converting data into z-scores
    • Interpretting these coefficients are interpretted as, "a one standard deviation in \(x_i\) is associated with a \(\beta_i\) change in \(y\)."
    • These coefficients are easy to compare to one another…
      • They tell us about the relative strength of different effects.

Transforming the data

normalize <- function(vect){
  vbar <- mean(vect)
  z <- (vect-vbar)/sd(vect)
  return(z)
}
mtcars.n <- mtcars %>% mutate_each(funs(normalize))

Does it matter?

Does it matter?

Regression results

Call: lm(formula = mpg ~ hp + wt + disp, data = mtcars.n)
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  3.568e-17  7.740e-02   0.000  1.00000   
hp          -3.544e-01  1.301e-01  -2.724  0.01097 * 
wt          -6.171e-01  1.731e-01  -3.565  0.00133 **
disp        -1.927e-02  2.128e-01  -0.091  0.92851   

Residual standard error: 0.4379 on 28 degrees of freedom
Multiple R-squared:  0.8268,    Adjusted R-squared:  0.8083 
F-statistic: 44.57 on 3 and 28 DF,  p-value: < 1e-04

Regression results

library(stargazer)
fit1 <- lm(mpg ~ hp + wt + disp, mtcars)
fit2 <- lm(mpg ~ hp + wt + disp, mtcars.n)
stargazer(fit1,fit2,type = "html")

Regression results

Dependent variable:
mpg
(1) (2)
hp -0.031** -0.354**
(0.011) (0.130)
wt -3.801*** -0.617***
(1.066) (0.173)
disp -0.001 -0.019
(0.010) (0.213)
Constant 37.106*** 0.000
(2.111) (0.077)
Observations 32 32
R2 0.827 0.827
Adjusted R2 0.808 0.808
Residual Std. Error (df = 28) 2.639 0.438
F Statistic (df = 3; 28) 44.566*** 44.566***
Note: *p<0.1; **p<0.05; ***p<0.01

Logarithms

  • Logarithmic relationships show up when something changes at a constant rate.
    • Constant rate of growth means that your bank deposit will grow $10 this year $11 next year, etc. The amount of interest payments grows in dollar terms
    • But it will always take the same amount of time for your account to increase in value by \(x%\).
  • We interpret a logarithmic coefficient as a percentage change. The underlying unit no longer matters (i.e. Canadian dollars and American dollars both grow at some percentage interest)
  • Log transformation often make the data more closely satisfy the Classical Linear Regression assumptions.