# Clear the workspace
rm(list = ls()) # Clear environment
gc()            # Clear unused memory
##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 450076 24.1     966460 51.7   638942 34.2
## Vcells 804307  6.2    8388608 64.0  1633064 12.5
cat("\f")       # Clear the console

Set-up and background

The assignment is worth 100 points. There are 27 questions. You should have the following packages installed:

library(tidyverse)
library(patchwork)
library(fixest)

In this problem set you will summarize the paper “Evolutionary Origins of the Endowment Effect: Evidence from Hunter-Gatherers” (Apicella et al., AER 2011) and recreate some of its findings.

1 Big picture

[Q1] What is the main question asked in this paper?

What factors influence rational economic behavior through biases such as the endowment effect?

[Q2] Describe the Hadza. How do they differ from market-based societies?

The Hadza are a population of hunter-gatherers in Northern Tanzania. Much of their society is based on sharing resources equally among all members of the community, and in doing so, the sense of individual ownership is absent. In market-based societies, individual ownership is present as members of those societies act on behalf of themselves or their immediate family.

[Q3] Summarize the experiment design. Pay attention to the source of randomization.

The Hadza are a population of hunter-gatherers with villages spread across Northern Tanzania, with some villages more isolated and other villages located closer to modern communities. Having similar populations located near and far from exposure to modern markets offers the ability to test if that exposure has any influence on the endowment effect. This randomization of geographic proximity to modern society can help determine if it is a factor in the endowment effect.

[Q4] Why did the authors use biscuits and lighters and their design?

The author used biscuits and lighters because it introduces the idea of durable vs. non-durable goods. Biscuits are non-durable so they are not as easily shared with the camp and can only be consumed once, whereas lighters can be passed around and used multiple times.

[Q5] Summarize the main results of the experiment.

The experiment yielded that higher exposure to modern markets leads to the endowment effect being more present in the Hadza population.

[Q6] How do the results of this study compare to the sportcards market study by List (2003)?

In the study by List, the participants live in a modern society so their exposure to modern markets is more prevalent. The study is testing the influence trading experience on the endowment effect. The type of good being tested is sportcards. The difference between this study and List’s is the population (modern vs. hunter-gatherers) and exposure to markets vs. the population’s experience in trading. Also, the type of good matters here too as biscuits and lighters are basic goods that are used for survival while sportscards are a leisure product.

[Q7] What do these results tell us about preferences? Are they endogenous or exogenous?

The results are exogenous because the level of exposure directly impacted the rate of trading. Those of the Hadza populations with lower exposure exhibited rational rates of trading. With the Hadza largely similar genetically and culturally homogeneous, the external factor of exposure to markets was able to be isolated.

[Q8] Why are these results valuable? What have we learned? Motivate your discussion with a real-world example.

These results are important because they can help explain how environments can affect consumer behavior and decision-making. We learned that more experience with market interactions leads to consumers valuing their endowed goods higher than what they would be willing to pay for that same product in the market, leading to less purchasing. I could relate this to Amazon’s “try before you buy” option when buying clothes. You can opt-in to receive clothes and try them on and return them without being charged if it doesn’t fit or you don’t like it. Once the consumer receives the product and tries it on, it could lead to a WTA > WTP since they actually have the clothes rather than comparing between two sites. Say the consumer likes the clothes, they are not going to return the clothes to Amazon and buy the same clothes from another retailer for the same price.

2 Replication

Use theme_classic() for all plots.

Load the data. You may need to update your path depending on where you stored it.

setwd("/Users/spoll/OneDrive/Documents/Boston College/Behavioral Economics/Week 2")
df = read.csv("apicella_al_2011.csv"
                           , check.names = FALSE
                           , stringsAsFactors = FALSE
                           , na.strings = ""
)

2.1 Figure 2

[Q9] The column magnola_region is the treatment condition. Use mutate() to create a new column called magnola_region_cat, a categorical variable, that takes the value High Exposure if magnola_region == 1, otherwise Low Exposure. Then use mutate() again and factor() to force the new column magnola_region_cat into a factor variable. Factors are how categorical variables are represented in R. Do both mutations in one pipe chain.

df = df %>% 
  mutate(mangola_region_cat = ifelse(magnola_region == 1, 'High Exposure', 'Low Exposure')) %>% 
  mutate(mangola_region_cat = factor(mangola_region_cat))

[Q10] Factor variables in R have “levels” or categories. R chooses a default order for these levels. Check the order of the levels in magnola_region_cat with levels():

df$mangola_region_cat %>% levels()
## [1] "High Exposure" "Low Exposure"

[Q11] Notice how High Exposure is the first level. That means it will be drawn first when we re-create Figure 2. If we want to perfectly re-create Figure 2, we need High Exposure to be drawn second. So, we have to re-order the levels in the column. Do so with fct_relevel():

df$mangola_region_cat = fct_relevel(df$mangola_region_cat,"Low Exposure")

[Q12] Re-run levels() to check the new ordering of levels in magnola_region_cat:

df$mangola_region_cat %>% levels()
## [1] "Low Exposure"  "High Exposure"

[Q13] OK, let’s make figure 2A. Use stat_summary(fun = mean) to plot the averages and stat_summary(fun.data = mean_se) to plot the error bars (hint: set the width of the error bars to something like 0.1). Assign the output to the object fig2a. Use ylim() to set the limits of the axis to \([0,1]\), and make sure to label both axes.

fig2a = ggplot(df, aes(x = mangola_region_cat, y = trade, fill = "tomato")) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.02, colour = "black") +
  stat_summary(fun = "mean", geom = "bar", width = 0.3, colour = "black") +
  ylim(0,1) +
  geom_hline(yintercept = 0.5) +
  ggtitle("Panel A") +
  theme_classic() +
   theme(legend.position = "none") +
  labs(y = "Trade", x = "Exposure to Markets")

fig2a

[Q14] Figure 2b shows the fraction of subjects that traded by camp and distance to the village Mangola. This one is a bit more challenging. We have to scatter plot distance on the x-axis and mean trade on the y-axis – and then size each point by total trade. Let’s start by making these summaries. Use summarise() to create three columns by campname: mean_trade (the average trade), sum_trade (the total trade), and distance (hint: use unique(distance_to_mangola)):

df %>% 
  group_by(campname) %>% 
  summarise(mean_trade = mean(trade)
            , sum_trade = sum(trade)
            , distance = unique(distance_to_mangola))
## # A tibble: 8 x 4
##   campname     mean_trade sum_trade distance
##   <chr>             <dbl>     <int>    <dbl>
## 1 Endadubu          0.5          16    36.7 
## 2 Mayai             0.7           7    77.8 
## 3 Mizeu             1             2    52.9 
## 4 Mkwajuni          0.227        10     3.52
## 5 Mwashilatu        0.467        14    81.8 
## 6 Setako Chini      0.562         9    42.0 
## 7 Shibibunga        0.214         6     3.25
## 8 Sonai             0.35          7     4.44

[Q15] OK, now pipe the output of what you just did to ggplot to plot mean_trade as a function of distance and size each point by sum_trade. Assign the plot to fig2b.

fig2b = df %>% 
  group_by(campname) %>% 
  summarise(mean_trade = mean(trade)
            , sum_trade = sum(trade)
            , distance = unique(distance_to_mangola)) %>% 
  ggplot(aes(x = distance
             , y = mean_trade)) +
  geom_point(aes(size = sum_trade
                 , colour = "tomato")) +
  geom_point(shape = 1,aes(size = sum_trade),colour = "black") +
  ylim(0,1) + 
  scale_y_continuous(limits = c(0,1)) +
  geom_text(aes(label = campname)
            , nudge_y = 0.05
            , check_overlap = TRUE) +
  geom_hline(yintercept = 0.5) +
  ggtitle("Panel B") +
  theme_classic() +
   theme(legend.position = "none") +
  labs(y = "Trade"
       , x = "Distance to Mangola Village (km)")

fig2b

[Q16] Use library(patchwork) to combine the two plots and complete the replication.

fig2a + fig2b

2.2 Table 1

The main finding is that the High Exposure subjects are less likely to trade and thus exhibit endowment effects. This finding is seen in Table 1.

[Q17] Pipe the data to lm() and then to summary() to replicate the coefficients in fifth specification (the fifth column in Table 1).

df %>% 
  lm(trade ~ mangola_region_cat + 
       distance_to_mangola + 
       lighter + 
       (lighter*distance_to_mangola)
     , data = .) %>% 
  summary()
## 
## Call:
## lm(formula = trade ~ mangola_region_cat + distance_to_mangola + 
##     lighter + (lighter * distance_to_mangola), data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6140 -0.2847 -0.2157  0.4732  0.7847 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                      0.496565   0.153222   3.241  0.00142 **
## mangola_region_catHigh Exposure -0.285900   0.145628  -1.963  0.05119 . 
## distance_to_mangola              0.001437   0.002627   0.547  0.58508   
## lighter                          0.079017   0.098073   0.806  0.42150   
## distance_to_mangola:lighter     -0.002969   0.002272  -1.307  0.19293   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.471 on 177 degrees of freedom
## Multiple R-squared:  0.09322,    Adjusted R-squared:  0.07273 
## F-statistic: 4.549 on 4 and 177 DF,  p-value: 0.001603

Notice how the coefficients above are the same as Table 1 Specification 5 but the standard errors are different. This is because the authors cluster the standard errors at the village level. Before we dive into clustering, we need to appreciate why we care about the standard errors.

The standard error is the estimate of the variance of a regression coefficient, and it plays a huge role in hypothesis testing. Recall that the null hypothesis test on any coefficient is that its expected value is zero (i.e., no or “null” effect of the variable on the outcome). The test statistic of the hypothesis test is thus distributed around zero, and the probability that we should observe our regression coefficient assuming the null hypothesis is true is the area underneath the curve above and below the test statistic. This probability is the p-value, and the p-value determines whether we reject or fail-to-reject the null hypothesis. So, if we have the wrong estimate of the standard error, we will make the wrong inference about our regression coefficient.

[Q18] This test statistic is the “t value”, and it is simply the estimated coefficient divided by the standard error. Verify the t value for the treatment indicator. (No functions needed. You just have to divide two numbers from the regression output.)

-0.285900/0.145628
## [1] -1.963221

[Q19] Now verify the p-value to the estimated treatment effect using pt(). (Hint: the t-distribution is symmetric around the mean! And mind the degrees-of-freedom, the df argument in pt(). The degrees of freedom can be found in the regression table from above.)

2*pt(q = -1.963, df = 177)
## [1] 0.05121313

[Q20] The authors cluster standard errors within villages to account for arbitrary, unobserved correlation between subjects in the same village. Why might there be such correlation? Recall the main decision made by villagers: to trade or not to trade.

There is unobserved correlation here caused by the culture of the village. Since the village follows egalitarianism, there could be some bias because the individuals in the experiment could be making decisions on what they think the group would make, rather than their own individual considerations. This presents biases in the experiment, so the standard errors are clustered to remove that bias.

[Q21] Use feols() from library(fixest) to re-run the regression. Assign the output to the object model. (Hint: you don’t need to change your model call from before!)

model = df %>% 
  feols(trade ~ mangola_region_cat + 
       distance_to_mangola + 
       lighter + 
       (lighter*distance_to_mangola)
     , data = .) 

[Q22] Run summary() on model to view the standard errors and p-values. They should be the same as before. (The formatting will look a bit different because feols() returns a different type of data object than lm().)

model %>% 
  summary()
## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: IID 
##                                  Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)                      0.496565   0.153222  3.240814 0.001424 ** 
## mangola_region_catHigh Exposure -0.285900   0.145628 -1.963228 0.051186 .  
## distance_to_mangola              0.001437   0.002627  0.546975 0.585085    
## lighter                          0.079017   0.098073  0.805699 0.421497    
## distance_to_mangola:lighter     -0.002969   0.002272 -1.306950 0.192925    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q23] Now use the se and cluster arguments to summary() to cluster the standard errors at the village level (campname in the data set).

Here is a helpful resource from the fixest author: https://cran.r-project.org/web/packages/fixest/vignettes/standard_errors.html

model %>% 
  summary(se = "standard")
## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: IID 
##                                  Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)                      0.496565   0.153222  3.240814 0.001424 ** 
## mangola_region_catHigh Exposure -0.285900   0.145628 -1.963228 0.051186 .  
## distance_to_mangola              0.001437   0.002627  0.546975 0.585085    
## lighter                          0.079017   0.098073  0.805699 0.421497    
## distance_to_mangola:lighter     -0.002969   0.002272 -1.306950 0.192925    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273
model %>% 
  summary(cluster = "campname")
## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: Clustered (campname) 
##                                  Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)                      0.496565   0.095157  5.218352 0.001228 ** 
## mangola_region_catHigh Exposure -0.285900   0.082226 -3.476996 0.010308 *  
## distance_to_mangola              0.001437   0.001552  0.925606 0.385450    
## lighter                          0.079017   0.085915  0.919710 0.388318    
## distance_to_mangola:lighter     -0.002969   0.001622 -1.830471 0.109869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q24] What changed? The estimated coefficients? The standard errors? The p-values? Do your numbers (the coefficients and the standard errors) match the numbers in Table 1 Specification?

The estimated coefficients went unchanged while the standard errors changed. In doing so, the t-values changed as well, since it is equal to the coefficients/std. error. Yes, both my coefficients and standard errors are the same as Table 1 in the paper.