Set-up and background

The assignment is worth 100 points. There are 3 questions. You should have the following packages installed:

library(tidyverse)
library(fixest)

In this problem set you will replicate some results from the paper “Do Workers Work More if Wages Are High? Evidence from a Randomized Field Experiment” (Fehr and Goette, AER 2007).

1 Big picture

[Q1] Why cluster the standard errors? Why not just use messenger fixed effects?

Fixed effects and clustered standard errors solve different issues. Fixed effects deal with omitted variable bias stemming from time invariant characteristics not included in the model. In our specific example of the Fehr and Goette paper, the usage of fixed effects would account for messenger characteristics such as sex, chronic injuries which did not heal over the course of the experiment, age (while age changes, the individuals likely remained the same age over the course of this experiment) and so on.

Clustering standard errors, on the other hand, deals with the heteroskedasicity (non-constant variance) stemming from the autocorrelation present between the observations of a single cluster (in the case of the Fehr and Goette paper, this would be individual messengers). Doing so allows us to make sure we are not underestimating standard errors, and potentially showing statistical significance of estimators which are in fact not significant. Additionally, the use of standard errors allows us to account for individual characteristics contained in the error term which do vary over time. For example, if a biker had an injury, such as a strained calf, this would impact his future performance, even as the injury was healing through the observation period. This autocorrelation would be picked up by the SE clustering.

2 Replication

data <- read.csv("/Users/samuelsinger/Desktop/Behavioral Economics/Problem Set 2/tables1to4.csv")

[Q2] Replicate specification four in Table 3 by hand. That means estimate the coefficients and the – clustered! – standard errors manually. Follow the example from lecture (clustered.Rmd). Note that you still have to control for individual fixed effects. There are a few ways you can – check your PS2!

Please add more code chunks as necessary, and/or use .R file to prototype your answer.

rm(list=ls())

library(tidyverse)
library(fixest)


data <- read.csv("/Users/samuelsinger/Desktop/Behavioral Economics/Problem Set 2/tables1to4.csv")

data <- data %>%
  group_by(fahrer) %>%
  filter(group!= "Other Veloblitz") %>%
  filter(group!= "Flash") %>%
  mutate(mean_totrev = mean(totrev, na.rm = TRUE)) %>%
  mutate(totrev_fe = (totrev - mean_totrev)) %>%
  mutate(mean_block2 = mean(block2, na.rm = TRUE)) %>%
  mutate(block2_fe = (block2 - mean_block2))%>%
  mutate(mean_block3 = mean(block3, na.rm = TRUE)) %>%
  mutate(block3_fe = (block3 - mean_block3))%>%
  mutate(mean_high = mean(high, na.rm = TRUE)) %>%
  mutate(high_fe = (high - mean_high)) %>%
  mutate(mean_shifts = mean(shifts, na.rm = TRUE)) %>%
  mutate(shifts_fe = (shifts - mean_high))

X <- cbind(1, data$high_fe, data$block2_fe, data$block3_fe)
y <- data$shifts_fe

n = nrow(X)

k = ncol(X)

betas = solve(crossprod(X)) %*% crossprod(X,y) 

y_pred = X %*% betas

u = y - y_pred

###########################################################################################################

messengers = unique(data$fahrer)

n_messengers = length(messengers)

vcovs_matrix = vector(mode = 'list', length = n_messengers)

names(vcovs_matrix) = messengers

###########################################################################################################

index <- 1

for (mess in messengers) {
  c <- data$fahrer == mess
  X_c <- X[c, ]
  u_c <- u[c]
  vcov_c <- t(X_c) %*% tcrossprod(u_c) %*% X_c
  vcovs_matrix[[index]] <- vcov_c
  index <- index + 1
}
###########################################################################################################

omega = Reduce(f = "+", x = unname(vcovs_matrix))
omega = ((n-1) / (n-k)) * (n_messengers / (n_messengers - 1)) * omega

vcov = solve(crossprod(X)) %*% omega %*% solve(crossprod(X))
###########################################################################################################

se_clustered = sqrt(diag(vcov))

cbind(betas, se_clustered)

##                se_clustered
## [1,] 11.040323    0.9494942
## [2,]  3.986364    0.8319206
## [3,] -1.275000    1.3928835
## [4,] -2.561364    1.5108988

[Q3] Use fixest::feols() to confirm your results.

fixest::feols(shifts_fe ~ high_fe + block2_fe + block3_fe, data= data, cluster = 'fahrer')

## OLS estimation, Dep. Var.: shifts_fe
## Observations: 124
## Standard-errors: Clustered (fahrer) 
##             Estimate Std. Error   t value   Pr(>|t|)    
## (Intercept) 11.04032   0.949494 11.627583 1.4666e-14 ***
## high_fe      3.98636   0.831921  4.791760 2.1860e-05 ***
## block2_fe   -1.27500   1.392883 -0.915367 3.6535e-01    
## block3_fe   -2.56136   1.510899 -1.695258 9.7609e-02 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 7.30459   Adj. R2: 0.030516

Problem Set 3

Behavioral Economics, Boston College

Samuel C. Singer

Set-up and background

1 Big picture

2 Replication