The assignment is worth 100 points. There are 13 questions. You should have the following packages installed:
library(tidyverse)
library(patchwork)
library(fixest)
library(ggplot2)
library(broom)
library(fixest)In this problem set you will replicate some results from the paper “Do Workers Work More if Wages Are High? Evidence from a Randomized Field Experiment” (Fehr and Goette, AER 2007).
[Q1] Recall the taxi cab studies where reference dependence is studied using observational data. What can an experimental study do that an observational study can’t?
An experimental study allows for the random assignment of treatment to the subjects. Since random assignment is the best way to eliminate endogeneity within a model, the results generated by these experiments are seen as more accurate.
[Q2] Summarize the field experiment design.
Employees of a Swiss messenger company were split into two randomly assorted groups, A and B, based on employee numbers. Group A received a ~25% pay increase (treatment) for four weeks in September 2000, while employees of Group B received their regular salary (control). After the four weeks, the groups switched roles, with group B receiving the treatment for four weeks in November 2000 and group A returning to receiving its standard salary. None of the test subjects withdrew from the experiment over the two data collecting periods, which eliminates a potential source of bias; similar to randomized control drug trials, if a person did drop out, we would have to find out why to assure result accuracy. The messengers did not know the purpose of the study, eliminating potential result manipulation; for example, if the subjects were to know the relationship between effort and salary was being studied, they may have tried harder in hopes of creating favorable results and an argument for pay incentives down the line. Finally, all subjects received their bonuses at the end of the experimental period (December 2000) to make sure that each group faced comparable budget constraints.
[Q3] Summarize the main results of the field experiment. Distinguish between hours and effort.
The paper has found that the treatment has a statistically significant impact on wages. According to the results from the messengers participating in the experiment, individuals who were given the salary boost made, on average, 1,033.6 CHF more compared to the non-treated group, cetris paribus.
Furthermore, the paper found that the treatment had a statistically significant impact on shifts worked. From the messengers participating in the experiment, on average, during the treatment period, individuals who were given the salary bump worked 3.99 shifts more compared to the non-treated group, cetris paribus.
The results then investigate the impact of the treatment on average hourly earnings, finding that the treatment leads to an approximate six percent decrease in hourly earnings compared to the untreated groups.
Since we know that the treatment increases both hours worked and wages but decreases hourly earnings, the results demonstrate a fall in effort caused by the treatment.
Use theme_classic() for all plots.
For this section please use dailycorrs.csv.
[Q4] The authors show that earnings at Veloblitz and Flash are correlated. Show this with a scatter plot with a regression line and no confidence interval. Title your axes and the plot appropriately. Do not print the plot but assign it to an object called
p1.
library(ggplot2)
p1 <- ggplot(data, aes(x = logv, y = logf)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(x = "Log of Veloblit Earnings", y = "Log of Flash Earnings", title = "Correlation of the natural logs of Veloblitz's and Flash's earnings") +
theme_minimal()[Q5] Next plot the kernel density estimates of revenues for both companies. Overlay the distributions and make the densities transparent so they are easily seen. Title your axes and the plot appropriately. Do not print the plot but assign it to an object called
p2.
p2 <- ggplot(data) +
geom_density(aes(x = logv), fill = "blue", alpha = 0.5) +
geom_density(aes(x = logf), fill = "red", alpha = 0.5) +
labs(x = "Values", y = "Density", title = "Kernel Density Estimates of Veloblitz's and Flash's logged earnings") +
theme_minimal()[Q6] Now combine both plots using
library(patchwork)and label the plots with letters.
For this section please use tables1to4.csv.
On page 307 the authors write:
“Table 2 controls for individual fixed effects by showing how, on average, the messengers’ revenues deviate from their person-specific mean revenues. Thus, a positive number here indicates a positive deviation from the person-specific mean; a negative number indicates a negative deviation.”
[Q7] Fixed effects are a way to control for heterogeneity across individuals that is time invariant. Why would we want to control for fixed effects? Give a reason how bike messengers could be different from each other, and how these differences might not vary over time.
Fixed effects allow us to control for individual fixed characteristics, which is key in eliminating endogeneity from unobserved personal traits of the different bike messengers. For example, suppose a messenger has a pre-existing injury that does not heal, such as a bad knee. In that case, they may be less responsive to financial incentives because they ara physically unable to capitalize on it. Furthermore, more apparent sources of endogeneity, such as the messenger’s sex, would also be accounted for by using FE, as sex is mostly time-invariant. If women, for example, were less responsive to financial incentives (the treatment) due to physical ability differences, FE would capture it.
[Q8] Create a variable called
totrev_feand add it to the dataframe. This requires you to “average out” each individual’s revenue for a block from their average revenue: \(x_i^{fe} = x_{it} - \bar{x}_i\) where \(x_i^{fe}\) is the fixed effect revenue for \(i\).
data2 <- data2 %>%
group_by(fahrer) %>%
mutate(mean_totrev = mean(totrev, na.rm = TRUE)) %>%
mutate(totrev_fe = (totrev - mean_totrev))[Q9] Use
summarise()to recreate the findings in Table 2 for “Participating Messengers” using your new variabletotrev_fe. (You do not have to calculate the differences in means.)In addition to calculating the fixed-effect controled means, calculate the standard errors. Recall the standard error is \(\frac{s_{jt}}{\sqrt{n_{jt}}}\) where \(s_{jt}\) is the standard deviation for treatment \(j\) in block \(t\) and \(n_{jt}\) are the corresponding number of observations.
(Hint: use
n()to count observations.) Each calculation should be named to a new variable. Assign the resulting dataframe to a new dataframe calleddf_avg_revenue.
df_avg_revenue <- data2 %>%
group_by(block) %>%
summarise(
MainRevenuesA = mean(totrev_fe[odd == 1], na.rm = TRUE),
SDMainRevA = sd(totrev_fe[odd == 1], na.rm = TRUE) / sqrt(sum(!is.na(totrev_fe[odd == 1]))),
MainRevenuesB = mean(totrev_fe[odd == 0], na.rm = TRUE),
SDMainRevB = sd(totrev_fe[odd == 0], na.rm = TRUE) / sqrt(sum(!is.na(totrev_fe[odd == 0])))
)
print(df_avg_revenue)## # A tibble: 3 × 5
## block MainRevenuesA SDMainRevA MainRevenuesB SDMainRevB
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 -48.9 361. -120. 303.
## 2 2 722. 193. -278. 241.
## 3 3 -675. 289. 392. 251.
[Q10] Plot
df_avg_revenue. Use points for the means and error bars for standard errors of the means.
To dodge the points and size them appropriately, use
To place the error bars use
geom_errorbar(aes(
x=block,
ymin = [MEAN] - [SE], ymax = [MEAN] + [SE]),
width = .1,
position=position_dodge(width=0.5))You will need to replace [MEAN] with whatever you
named your average revenues and [SE] with whatever you
named your standard errors.
ggplot(df_avg_revenue, aes(x = factor(block))) +
geom_point(aes(y = MainRevenuesA, color = "Group A"),
position = position_dodge(width = 0.5), size = 4) +
geom_point(aes(y = MainRevenuesB, color = "Group B"),
position = position_dodge(width = 0.5), size = 4) +
geom_errorbar(aes(
x = block,
ymin = MainRevenuesA - SDMainRevA, ymax = MainRevenuesA + SDMainRevA, color = "Group A"
), width = 0.1, position = position_dodge(width = 0.5)) +
geom_errorbar(aes(
x = block,
ymin = MainRevenuesB - SDMainRevB, ymax = MainRevenuesB + SDMainRevB, color = "Group B"
), width = 0.1, position = position_dodge(width = 0.5)) +
labs(
x = "Block",
y = "Mean Revenue (totrev_fe)",
title = "Mean Revenue by Block for Group A and Group B with Standard Errors",
color = "Group"
) +
theme_minimal() +
theme(legend.position = "top")[Q11] Recreate the point estimates in Model (1) in Table 3 by hand (you don’t need to worry about the standard errors). Assign it to object
m1. Recreating this model requires you to control for individual fixed effects and estimate the following equation where \(\text{H}\) is the variablehigh, \(\text{B2}\) is the second block (block == 2) and \(\text{B3}\) is the third block (block == 3):
\[ y_{ijt} - \bar{y}_{ij} = \beta_1 (\text{H}_{ijt} - \bar{\text{H}}_{ij}) + \beta_2 (\text{B2}_{ijt} - \bar{\text{B2}}_{ij}) + \beta_3 (\text{B3}_{ijt} - \bar{\text{B3}}_{ij}) + (\varepsilon_{ijt} - \bar{\varepsilon}_{ij}) \]
data3 <- data2 %>%
group_by(fahrer) %>%
filter(group!= "Other Veloblitz") %>%
filter(group!= "Flash") %>%
mutate(mean_block2 = mean(block2, na.rm = TRUE)) %>%
mutate(block2_fe = (block2 - mean_block2))%>%
mutate(mean_block3 = mean(block3, na.rm = TRUE)) %>%
mutate(block3_fe = (block3 - mean_block3))%>%
mutate(mean_high = mean(high, na.rm = TRUE)) %>%
mutate(high_fe = (high - mean_high))
X <- cbind(1, data3$high_fe, data3$block2_fe, data3$block3_fe)
y <- data3$totrev_fe
m1 <- solve(t(X) %*% X) %*% (t(X) %*% y)
m1## [,1]
## [1,] -3.088121e-14
## [2,] 1.033560e+03
## [3,] -2.109725e+02
## [4,] -5.747125e+02
[Q12] Now recreate the same point estimates using
lmand assign it to objectm2. You are estimating the model below where where \(\text{F}_i\) is the dummy variable for each messenger (fahrer). Make sure to cluster the standard errors at the messenger level. (Uselmtestandsandwhichfor this.)
\[ y_{ijt} = \beta_0 + \beta_1 \text{H}_{ijt} + \beta_2 \text{B2}_{ijt} + \beta_3 \text{B3}_{ijt} + \sum_{i=1}^{n} \alpha_i \text{F}_i + \varepsilon_{ijt} \]
library(lmtest)
library(sandwich)
m2 <- lm(totrev_fe ~ high_fe + block2_fe + block3_fe + factor(fahrer), data = data3)
clustered_se <- vcovCL(m2, cluster = ~fahrer)
m2_results <- coeftest(m2, vcov = clustered_se)
print(m2_results)##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.8716e-11 2.3985e-11 -0.7803 0.437523
## high_fe 1.0336e+03 3.2685e+02 3.1621 0.002222 **
## block2_fe -2.1097e+02 4.9725e+02 -0.4243 0.672516
## block3_fe -5.7471e+02 5.4568e+02 -1.0532 0.295454
## factor(fahrer)2 1.9067e-11 2.4179e-11 0.7886 0.432729
## factor(fahrer)3 1.8923e-11 2.4009e-11 0.7881 0.432976
## factor(fahrer)5 1.8617e-11 2.4036e-11 0.7745 0.440933
## factor(fahrer)6 1.8649e-11 2.3965e-11 0.7782 0.438784
## factor(fahrer)7 1.9356e-11 2.4086e-11 0.8036 0.424035
## factor(fahrer)8 1.8943e-11 2.3967e-11 0.7904 0.431677
## factor(fahrer)9 1.8345e-11 2.3956e-11 0.7658 0.446076
## factor(fahrer)14 1.8412e-11 2.4091e-11 0.7643 0.446980
## factor(fahrer)15 1.8606e-11 2.3964e-11 0.7764 0.439828
## factor(fahrer)18 1.8535e-11 2.4105e-11 0.7689 0.444220
## factor(fahrer)19 1.8919e-11 2.4135e-11 0.7839 0.435460
## factor(fahrer)21 1.8385e-11 2.4063e-11 0.7641 0.447111
## factor(fahrer)22 1.9611e-11 2.3903e-11 0.8204 0.414440
## factor(fahrer)23 1.8600e-11 2.4129e-11 0.7709 0.443092
## factor(fahrer)24 1.8977e-11 2.3999e-11 0.7908 0.431452
## factor(fahrer)25 1.9082e-11 2.3994e-11 0.7953 0.428845
## factor(fahrer)28 1.8536e-11 2.3988e-11 0.7727 0.441985
## factor(fahrer)30 1.9168e-11 2.3971e-11 0.7996 0.426318
## factor(fahrer)31 1.8359e-11 2.4006e-11 0.7648 0.446694
## factor(fahrer)32 1.8225e-11 2.4011e-11 0.7590 0.450106
## factor(fahrer)33 1.8481e-11 2.4036e-11 0.7689 0.444257
## factor(fahrer)34 1.8468e-11 2.4011e-11 0.7692 0.444082
## factor(fahrer)35 1.9158e-11 2.3979e-11 0.7990 0.426709
## factor(fahrer)36 1.9233e-11 2.3961e-11 0.8027 0.424569
## factor(fahrer)37 1.9011e-11 2.3967e-11 0.7932 0.430025
## factor(fahrer)38 1.9431e-11 2.4031e-11 0.8086 0.421196
## factor(fahrer)42 1.9593e-11 2.3976e-11 0.8172 0.416285
## factor(fahrer)44 1.9358e-11 2.3961e-11 0.8079 0.421588
## factor(fahrer)45 1.9486e-11 2.4194e-11 0.8054 0.422985
## factor(fahrer)49 1.9593e-11 2.3986e-11 0.8169 0.416464
## factor(fahrer)50 1.8723e-11 2.3961e-11 0.7814 0.436919
## factor(fahrer)51 1.9905e-11 2.4005e-11 0.8292 0.409487
## factor(fahrer)52 1.8719e-11 2.3954e-11 0.7815 0.436873
## factor(fahrer)53 1.9035e-11 2.3942e-11 0.7951 0.428964
## factor(fahrer)55 1.9460e-11 2.3945e-11 0.8127 0.418818
## factor(fahrer)56 1.9108e-11 2.3950e-11 0.7978 0.427373
## factor(fahrer)57 1.7454e-11 2.3955e-11 0.7286 0.468388
## factor(fahrer)58 1.9154e-11 2.3973e-11 0.7990 0.426707
## factor(fahrer)60 1.9698e-11 2.3971e-11 0.8217 0.413700
## factor(fahrer)61 1.9345e-11 2.3944e-11 0.8079 0.421566
## factor(fahrer)63 1.8681e-11 2.3913e-11 0.7812 0.437032
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[Q13] Now use
feolsto recreate Model (1), including the standard errors. Assign your estimates to the objectm3. You are estimating the model below where where \(\alpha_i\) is the individual intercept (i.e. the individual fixed effect):
\[ y_{ijt} = \alpha_i + \beta_1 \text{H}_{ijt} + \beta_2 \text{B2}_{ijt} + \beta_3 \text{B3}_{ijt} + \varepsilon_{ijt} \]
m3 <- fixest::feols(totrev_fe ~ high + block2 + block3 | fahrer, data = data3, cluster = ~fahrer)
summary(m3)## OLS estimation, Dep. Var.: totrev_fe
## Observations: 124
## Fixed-effects: fahrer: 42
## Standard-errors: Clustered (fahrer)
## Estimate Std. Error t value Pr(>|t|)
## high 1033.560 265.202 3.897256 0.00035253 ***
## block2 -210.973 403.458 -0.522911 0.60385066
## block3 -574.713 442.748 -1.298057 0.20152287
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 1,230.2 Adj. R2: -0.365108
## Within R2: 0.123224