Set-up and background

The assignment is worth 100 points. There are 27 questions. You should have the following packages installed:

library(tidyverse)
library(patchwork)
library(fixest)

In this problem set you will summarize the paper “Evolutionary Origins of the Endowment Effect: Evidence from Hunter-Gatherers” (Apicella et al., AER 2011) and recreate some of its findings.

1 Big picture

[Q1] What is the main question asked in this paper?

This paper investigates the universality of the endowment effect, its cultural/environmental applications, and its evolutionary significance.

[Q2] Describe the Hadza. How do they differ from market-based societies?

The Hadza are one of the last true hunter-gatherer populations in the world. They live in a remote environment in Northern Tanzania where they are highly isolated from modern culture. However, in the last few years, tourist companies have been interacting more with certain regions of the Hadza community.

[Q3] Summarize the experiment design. Pay attention to the source of randomization.

The Hadza overtime have naturally been broken up into groups that have high versus low exposure to the ‘modern market’. As this paper is trying to decide if the endowment effect is naturally occurring, these two groups provide a good comparison of pre vs post industrialism. The experimenters provided two items to the Hadza people, a biscuit and a lighter, in two different conditions. In the first condition, the participant is physically handed one of the items and is then asked if they want to trade. In the second condition, in an attempt to eliminate the physical touch bias, the objects are laid on the ground and assigned to the participant with a coin flip. Then, once the participant is asked if they want to trade or not, the participant is allowed to pick up the final item.

[Q4] Why did the authors use biscuits and lighters and their design?

The authors chose to use biscuits and lighters in an attempt to flush out if there is an evolutionary bias that involves the instinct to protect food. The biscuit is obviously the food item and the lighter is the non-food item.

[Q5] Summarize the main results of the experiment.

The authors found that the isolated low exposure subset of the Hadza tribe does not show the endowment effect while those with high exposure to the modern market do experience the endowment effect. This means that the authors can say with some certainty that early human populations did not exhibit the endowment effect.

[Q6] How do the results of this study compare to the sportcards market study by List (2003)?

These two studies display two extremes of the market. Our study takes a population with no experience of the modern market while List takes a group of people that are highly specialized in their respective market (trading sports cards). We found that those who have never experienced the market show no endowment effect while List finds that the endowment effect can be unlearned by those who have a lot of experience in the market.

[Q7] What do these results tell us about preferences? Are they endogenous or exogenous?

This paper found that the endowment effect is endogenous. The authors went into this wondering if the endowment effect was naturally occurring in humans or if the modern market has imposed this effect on the world. If it was naturally occurring, it would be exogenous because no other variable would effect its presence. However, the authors found that populations like the Hadza that are unaffected by the market do not exhibit the endowment effect. This means it is endogenous.

[Q8] Why are these results valuable? What have we learned? Motivate your discussion with a real-world example.

These results are valuable because it helps explain consumer behavior. Why do people over-value items in their possession? Why are they so unwilling to give them up? Is it possible for this phenomenon to be reversed? We learned that the modern market has had dramatic effects on consumer preferences and decisions. Now, how do we work with those preferences? We see the endowment effect all the time. People hold on to stocks for too long, sports fans over-value their own players, and potential car owners fall more in love with the cars they test drive as opposed to the ones they just look at. Marketing companies can use this effect to their advantage as can those in the business of buying and flipping used items.

2 Replication

Use theme_classic() for all plots.

Load the data. You may need to update your path depending on where you stored it.

df = read_csv("apicella_al_2011.csv")

2.1 Figure 2

[Q9] The column magnola_region is the treatment condition. Use mutate() to create a new column called magnola_region_cat, a categorical variable, that takes the value High Exposure if magnola_region == 1, otherwise Low Exposure. Then use mutate() again and factor() to force the new column magnola_region_cat into a factor variable. Factors are how categorical variables are represented in R. Do both mutations in one pipe chain.

df = df %>%
  mutate(magnola_region_cat = ifelse(magnola_region == 1, "High Exposure", "Low Exposure")) %>%
  mutate(magnola_region_cat = factor(magnola_region_cat))

[Q10] Factor variables in R have “levels” or categories. R chooses a default order for these levels. Check the order of the levels in magnola_region_cat with levels():

levels(df$magnola_region_cat)
## [1] "High Exposure" "Low Exposure"

[Q11] Notice how High Exposure is the first level. That means it will be drawn first when we re-create Figure 2. If we want to perfectly re-create Figure 2, we need High Exposure to be drawn second. So, we have to re-order the levels in the column. Do so with fct_relevel():

df$magnola_region_cat <- fct_relevel(df$magnola_region_cat, "Low Exposure")

[Q12] Re-run levels() to check the new ordering of levels in magnola_region_cat:

levels(df$magnola_region_cat)
## [1] "Low Exposure"  "High Exposure"

[Q13] OK, let’s make figure 2A. Use stat_summary(fun = mean) to plot the averages and stat_summary(fun.data = mean_se) to plot the error bars (hint: set the width of the error bars to something like 0.1). Assign the output to the object fig2a. Use ylim() to set the limits of the axis to \([0,1]\), and make sure to label both axes.

fig2a <-  
  ggplot(df, aes(x=magnola_region_cat, y=trade, fill = "red")) + 
  geom_hline(yintercept = 0.5) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width=0.1) +
  stat_summary(fun ="mean", geom="bar", width=0.3, colour="black") + 
  ylim(0,1) +
  xlab("Category") +
  ylab("Average Trading") +
  labs(title = "Panel A") +
  theme_classic() +
  theme(legend.position = "none")
fig2a

[Q14] Figure 2b shows the fraction of subjects that traded by camp and distance to the village Mangola. This one is a bit more challenging. We have to scatter plot distance on the x-axis and mean trade on the y-axis – and then size each point by total trade. Let’s start by making these summaries. Use summarise() to create three columns by campname: mean_trade (the average trade), sum_trade (the total trade), and distance (hint: use unique(distance_to_mangola)):

table1 <- df %>%
            group_by(campname) %>%
            mutate(mean_trade = mean(trade),
                   sum_trade = sum(trade),
                   distance = unique(distance_to_mangola)) %>%
            summarise(mean(mean_trade),
                      mean(sum_trade),
                      mean(distance))

colnames(table1) <- c("campname", "mean_trade", "sum_trade", "distance")

[Q15] OK, now pipe the output of what you just did to ggplot to plot mean_trade as a function of distance and size each point by sum_trade. Assign the plot to fig2b.

fig2b <- 
  ggplot(table1, aes(x=distance, y=mean_trade, label = campname)) + 
  geom_hline(yintercept = 0.5) +
  geom_point(colour= "black", size = table1$sum_trade + 1) +
  geom_point(colour = "red", alpha = 0.8, size=table1$sum_trade) +
  geom_text(position = position_nudge(x=-1, y=0.1)) +
  scale_size_continuous(range = c(1,10)) +
  xlab("Distance from Magnola Village (km)") +
  ylab("Average Trading") +
  ylim(0,1) + 
  labs(title = "Panel B") +
  theme_classic() +
  theme(legend.position = "none")
fig2b

[Q16] Use library(patchwork) to combine the two plots and complete the replication.

fig2a | fig2b

2.2 Table 1

The main finding is that the High Exposure subjects are less likely to trade and thus exhibit endowment effects. This finding is seen in Table 1.

[Q17] Pipe the data to lm() and then to summary() to replicate the coefficients in fifth specification (the fifth column in Table 1).

lm1 <- lm(trade ~ magnola_region_cat + distance_to_mangola + lighter + (lighter*distance_to_mangola), data=df)
summary(lm1)
## 
## Call:
## lm(formula = trade ~ magnola_region_cat + distance_to_mangola + 
##     lighter + (lighter * distance_to_mangola), data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6140 -0.2847 -0.2157  0.4732  0.7847 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                      0.496565   0.153222   3.241  0.00142 **
## magnola_region_catHigh Exposure -0.285900   0.145628  -1.963  0.05119 . 
## distance_to_mangola              0.001437   0.002627   0.547  0.58508   
## lighter                          0.079017   0.098073   0.806  0.42150   
## distance_to_mangola:lighter     -0.002969   0.002272  -1.307  0.19293   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.471 on 177 degrees of freedom
## Multiple R-squared:  0.09322,    Adjusted R-squared:  0.07273 
## F-statistic: 4.549 on 4 and 177 DF,  p-value: 0.001603

Notice how the coefficients above are the same as Table 1 Specification 5 but the standard errors are different. This is because the authors cluster the standard errors at the village level. Before we dive into clustering, we need to appreciate why we care about the standard errors.

The standard error is the estimate of the variance of a regression coefficient, and it plays a huge role in hypothesis testing. Recall that the null hypothesis test on any coefficient is that its expected value is zero (i.e., no or “null” effect of the variable on the outcome). The test statistic of the hypothesis test is thus distributed around zero, and the probability that we should observe our regression coefficient assuming the null hypothesis is true is the area underneath the curve above and below the test statistic. This probability is the p-value, and the p-value determines whether we reject or fail-to-reject the null hypothesis. So, if we have the wrong estimate of the standard error, we will make the wrong inference about our regression coefficient.

[Q18] This test statistic is the “t value”, and it is simply the estimated coefficient divided by the standard error. Verify the t value for the treatment indicator. (No functions needed. You just have to divide two numbers from the regression output.)

t1 <- -0.285900/0.145628

t1
## [1] -1.963221

[Q19] Now verify the p-value to the estimated treatment effect using pt(). (Hint: the t-distribution is symmetric around the mean! And mind the degrees-of-freedom, the df argument in pt(). The degrees of freedom can be found in the regression table from above.)

p1 <- 2*pt(-1.963, 177)
p1
## [1] 0.05121313

[Q20] The authors cluster standard errors within villages to account for arbitrary, unobserved correlation between subjects in the same village. Why might there be such correlation? Recall the main decision made by villagers: to trade or not to trade.

I feel like there could be correlation within villages because each village probably has a small market of their own where they can trade goods amongst each other. Therefore, people within their own village would have exposure to their own internal market, which creates unitentional bias and correlation.

[Q21] Use feols() from library(fixest) to re-run the regression. Assign the output to the object model. (Hint: you don’t need to change your model call from before!)

model <- feols(trade ~ magnola_region_cat + distance_to_mangola + lighter + (lighter*distance_to_mangola), data=df)

[Q22] Run summary() on model to view the standard errors and p-values. They should be the same as before. (The formatting will look a bit different because feols() returns a different type of data object than lm().)

summary(model)
## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: Standard 
##                                  Estimate Std. Error   t value Pr(>|t|))    
## (Intercept)                      0.496565   0.153222  3.240800  0.001424 ** 
## magnola_region_catHigh Exposure -0.285900   0.145628 -1.963200  0.051186 .  
## distance_to_mangola              0.001437   0.002627  0.546975  0.585085    
## lighter                          0.079017   0.098073  0.805699  0.421497    
## distance_to_mangola:lighter     -0.002969   0.002272 -1.306900  0.192925    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q23] Now use the se and cluster arguments to summary() to cluster the standard errors at the village level (campname in the data set).

Here is a helpful resource from the fixest author: https://cran.r-project.org/web/packages/fixest/vignettes/standard_errors.html

summary(model, cluster = "campname")
## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: Clustered (campname) 
##                                  Estimate Std. Error   t value Pr(>|t|))    
## (Intercept)                      0.496565   0.095157  5.218400  0.001228 ** 
## magnola_region_catHigh Exposure -0.285900   0.082226 -3.477000  0.010308 *  
## distance_to_mangola              0.001437   0.001552  0.925606  0.385450    
## lighter                          0.079017   0.085915  0.919710  0.388318    
## distance_to_mangola:lighter     -0.002969   0.001622 -1.830500  0.109869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q24] What changed? The estimated coefficients? The standard errors? The p-values? Do your numbers (the coefficients and the standard errors) match the numbers in Table 1 Specification?

Everything changed except for the coefficients. Table 1, the feols model, and the model with the clustered standard errors have the same coefficients across the board. However, once we clustered the standard errors, the p-values decreased, the t-values decreased, and the standard errors decreased.