OLS Point Estimates

Data Selection

library(AER)
data("CPS1985")

# squared experience term
CPS1985$exp2 <- CPS1985$experience^2

str(CPS1985)
'data.frame':   534 obs. of  12 variables:
 $ wage      : num  5.1 4.95 6.67 4 7.5 ...
 $ education : num  8 9 12 12 12 13 10 12 16 12 ...
 $ experience: num  21 42 1 4 17 9 27 9 11 9 ...
 $ age       : num  35 57 19 22 35 28 43 27 33 27 ...
 $ ethnicity : Factor w/ 3 levels "cauc","hispanic",..: 2 1 1 1 1 1 1 1 1 1 ...
 $ region    : Factor w/ 2 levels "south","other": 2 2 2 2 2 2 1 2 2 2 ...
 $ gender    : Factor w/ 2 levels "male","female": 2 2 1 1 1 1 1 1 1 1 ...
 $ occupation: Factor w/ 6 levels "worker","technical",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ sector    : Factor w/ 3 levels "manufacturing",..: 1 1 1 3 3 3 3 3 1 3 ...
 $ union     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 1 ...
 $ married   : Factor w/ 2 levels "no","yes": 2 2 1 1 2 1 1 1 2 1 ...
 $ exp2      : num  441 1764 1 16 289 ...
summary(CPS1985[, c("wage", "education", "experience", "exp2")])
      wage          education       experience         exp2       
 Min.   : 1.000   Min.   : 2.00   Min.   : 0.00   Min.   :   0.0  
 1st Qu.: 5.250   1st Qu.:12.00   1st Qu.: 8.00   1st Qu.:  64.0  
 Median : 7.780   Median :12.00   Median :15.00   Median : 225.0  
 Mean   : 9.024   Mean   :13.02   Mean   :17.82   Mean   : 470.6  
 3rd Qu.:11.250   3rd Qu.:15.00   3rd Qu.:26.00   3rd Qu.: 676.0  
 Max.   :44.500   Max.   :18.00   Max.   :55.00   Max.   :3025.0  

Data Description

I use the CPS1985 dataset from the AER package. This is cross-sectional economic data based on a sample of workers from the 1985 Current Population Survey. The dependent variable is wage, measured in dollars per hour. The explanatory variables are education (years of schooling), experience (potential work experience), and exp2 (experience squared). This specification allows wages to vary linearly with education and nonlinearly with experience.

Equation

wagei​=β0​+β1​educationi​+β2​experiencei​+β3​experiencei2​+ui​

OLS using lm()

model_lm <- lm(wage ~ education + experience + exp2, data = CPS1985)
summary(model_lm)

Call:
lm(formula = wage ~ education + experience + exp2, data = CPS1985)

Residuals:
   Min     1Q Median     3Q    Max 
-8.624 -2.827 -0.826  2.010 37.298 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -5.396661   1.222301  -4.415 1.22e-05 ***
education    0.881593   0.082272  10.716  < 2e-16 ***
experience   0.259532   0.055859   4.646 4.27e-06 ***
exp2        -0.003574   0.001231  -2.903  0.00385 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.568 on 530 degrees of freedom
Multiple R-squared:  0.2145,    Adjusted R-squared:  0.2101 
F-statistic: 48.25 on 3 and 530 DF,  p-value: < 2.2e-16

OLS using algebra

y <- as.matrix(CPS1985$wage)

X <- cbind(
  1,
  CPS1985$education,
  CPS1985$experience,
  CPS1985$exp2
)

# (X'X)^(-1) X'y
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
beta_hat
             [,1]
[1,] -5.396661415
[2,]  0.881592611
[3,]  0.259532357
[4,] -0.003573966

These numbers are just the coefficients from the regression using matrix algebra. The first is the intercept, and the rest are for education, experience, and experience squared. For example, the education coefficient (~0.88) means each extra year of school is linked to about $0.88 more per hour. The negative experience² term shows diminishing returns, so wages grow with experience but slow down over time.

Comparing Coeffcients

coef(model_lm)
 (Intercept)    education   experience         exp2 
-5.396661415  0.881592611  0.259532357 -0.003573966 

These match the lm() results, which shows both methods are doing the same thing.

Standard error by hand

n <- nrow(X)
k <- ncol(X)

# residuals
e <- y - X %*% beta_hat

# residual variance estimate
sigma2_hat <- as.numeric(t(e) %*% e / (n - k))

# variance-covariance matrix
vcov_beta <- sigma2_hat * solve(t(X) %*% X)

# standard errors
se_beta <- sqrt(diag(vcov_beta))
se_beta
[1] 1.222300578 0.082272161 0.055858628 0.001231062

To get the standard errors by hand, we first look at the residuals (the errors between actual and predicted values). Bigger residuals mean more noise in the model. We use those to estimate the error variance, then combine it with (X’-X)^-1 to get the variance of each coefficient. Taking the square root gives the standard errors.

Intuitively, standard errors show how confident we are in each coefficient—more noise = bigger standard errors, cleaner data = smaller ones.

hand-calculated SEs to lm() SEs

summary(model_lm)$coefficients[, 2]
(Intercept)   education  experience        exp2 
1.222300578 0.082272161 0.055858628 0.001231062 
se_beta
[1] 1.222300578 0.082272161 0.055858628 0.001231062

The standard errors I calculated by hand match the ones from lm(), which shows that both methods are using the same underlying formula. In both cases, we first measure how much error is left in the model (using the residuals), and then adjust that based on the structure of the data (X’-X)^-1. The reason they match is because lm() is internally doing the exact same matrix algebra calculations, just automatically. This confirms that the standard errors come from the same logic as the coefficient estimates, just adding a measure of uncertainty.