Set-up and background

The assignment is worth 100 points. There are 27 questions. You should have the following packages installed:

library(tidyverse)
library(patchwork)
library(fixest)

In this problem set you will summarize the paper “Evolutionary Origins of the Endowment Effect: Evidence from Hunter-Gatherers” (Apicella et al., AER 2011) and recreate some of its findings.

1 Big picture

[Q1] What is the main question asked in this paper?

The main question being asked is whether the tendency to value possessions more than non-possessions (endowment effect) is a learned behavior or one acquired through evolutionary progress.

[Q2] Describe the Hadza. How do they differ from market-based societies?

The Hazda are a nomadic people with lives similar to how a majority of humanity lived physically and socially before the wide-spread adoption of agriculture. It is this way of life (which takes the form of egalitarianism) that differentiates them from market-based societies.

The possessions owned by the Hazda are limited by what they can carry, and the resources (food) acquired by the tribe are shared equally among its members.

[Q3] Summarize the experiment design. Pay attention to the source of randomization.

The experiment compared two groups based on their exposure to tourists. The Low Exposure (LE) group resides in a location too remote and treacherous for trips to be offered to tourists, while the High Exposure (HE) group receives almost daily visits from tourists during parts of the year.

For both groups, two conditions were used to address concerns that the endowment effect may be a result of experimental procedures. The first ensures that the item is randomized, which is done by giving the subject an item and asking them if they would like to exchange the item for another. The second prohibits the subject from holding the item–which minimizes any transaction costs of choosing the endowed item–which is done by simply placing the two items on the ground in front of the subject.

[Q4] Why did the authors use biscuits and lighters and their design?

To avoid the possibility that the endowment effect operates differently depending on the item. Two separate trials were conducted with each subject. One between two food items and the other between two lighters.

No difference was found between the types of items.

[Q5] Summarize the main results of the experiment.

The results show that proximity to the village reduces the probability of trading the endowed item by 28.3 percent.

It was concluded that the isolated population displays no endowment effect, while the groups with increased contact with modern society and markets display this bias.

[Q6] How do the results of this study compare to the sportcards market study by List (2003)?

They are not comparable. The subjects in the referenced study are professional traders who have learned to overcome the endowment effect.

[Q7] What do these results tell us about preferences? Are they endogenous or exogenous?

These results would suggest that preferences are exogenous and a behavior learned through societal/environmental influences.

[Q8] Why are these results valuable? What have we learned? Motivate your discussion with a real-world example.

These results provide valuable insight into how behaviors and/or preferences are formed. In the eyes of a marketer, this means that environmental factors can alter an individuals preferences or economic behavior. With real world applications, this could mean that social fads–if they can be created–could organically trend among a population. For example, if the HE group of Hazda people were met with groups of tourists exhibiting a specific and controlled behavior, the preferences adopted by the Hazda people would likely follow whatever behavior they saw from the tourists. On the other and more applicable end of the spectrum, simple awareness of this phenomenon increases the odds that our desired outcome is reached. Knowing what we know about the HE and LE groups, if you wanted to trade something with the Hazda people, you’d be more likely to find success with the LE group.

With this knowledge in mind, marketers can maximize their efficiency by knowing and targeting groups of a population most likely to buy certain products. For example, Black Rifle Coffee company likely would find the same ROI by spending most of marketing budget in Boston or NYC compared to Texas or Florida.

2 Replication

Use theme_classic() for all plots.

Load the data. You may need to update your path depending on where you stored it.

df = read_csv("~/Desktop/NY_summer/ADEC7810.01/data/apicella_al_2011.csv")

2.1 Figure 2

[Q9] The column magnola_region is the treatment condition. Use mutate() to create a new column called magnola_region_cat, a categorical variable, that takes the value High Exposure if magnola_region == 1, otherwise Low Exposure. Then use mutate() again and factor() to force the new column magnola_region_cat into a factor variable. Factors are how categorical variables are represented in R. Do both mutations in one pipe chain.

df = df %>%
  mutate(magnola_region_cat = ifelse(
    magnola_region == 1,
    'High Exposure',
    'Low Exposure'
  )) %>%
  mutate(magnola_region_cat = factor(magnola_region_cat))

# can also wrap ifelse in factor()

[Q10] Factor variables in R have “levels” or categories. R chooses a default order for these levels. Check the order of the levels in magnola_region_cat with levels():

levels(df$magnola_region_cat)
## [1] "High Exposure" "Low Exposure"

[Q11] Notice how High Exposure is the first level. That means it will be drawn first when we re-create Figure 2. If we want to perfectly re-create Figure 2, we need High Exposure to be drawn second. So, we have to re-order the levels in the column. Do so with fct_relevel():

df = df %>%
  mutate(magnola_region_cat = df$magnola_region_cat %>%
           fct_relevel("Low Exposure"))

[Q12] Re-run levels() to check the new ordering of levels in magnola_region_cat:

levels(df$magnola_region_cat)
## [1] "Low Exposure"  "High Exposure"

[Q13] OK, let’s make figure 2A. Use stat_summary(fun = mean) to plot the averages and stat_summary(fun.data = mean_se) to plot the error bars (hint: set the width of the error bars to something like 0.1). Assign the output to the object fig2a. Use ylim() to set the limits of the axis to \([0,1]\), and make sure to label both axes.

fig2a = df %>%
  ggplot(data=., aes(x=magnola_region_cat, y=trade))+
  geom_hline(yintercept=0.5)+
  stat_summary(fun=mean, geom="bar", 
               fill="red", col="black",
               alpha=0.9)+
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.1)+
  ggtitle("Panel A") +
  theme(plot.title = element_text(hjust = 0))+
  ylab("")+xlab("")+
  theme_classic()+
  ylim(0,1)

[Q14] Figure 2b shows the fraction of subjects that traded by camp and distance to the village Mangola. This one is a bit more challenging. We have to scatter plot distance on the x-axis and mean trade on the y-axis – and then size each point by total trade. Let’s start by making these summaries. Use summarise() to create three columns by campname: mean_trade (the average trade), sum_trade (the total trade), and distance (hint: use unique(distance_to_mangola)):

d2b = df %>% 
  group_by(campname) %>%
  summarize(mean_trade = mean(trade),
            sum_trade = sum(trade),
            distance = unique(distance_to_mangola))

[Q15] OK, now pipe the output of what you just did to ggplot to plot mean_trade as a function of distance and size each point by sum_trade. Assign the plot to fig2b.

fig2b = d2b %>% 
  ggplot(data=., aes(x=distance, y=mean_trade, label=campname))+
  geom_hline(yintercept=0.5)+
  geom_point(aes(size=sum_trade), 
             color="red", alpha=0.7, 
             show.legend=FALSE)+
  geom_text(hjust=0.2, vjust=1.6)+
  ylim(0,1)+ xlim(0,100)+
  ggtitle("Panel B") +
  theme(plot.title = element_text(hjust = 0))+
  xlab("Distance from Mangola Village (km)")+ylab("")+
  theme_classic()

[Q16] Use library(patchwork) to combine the two plots and complete the replication.

fig2a + fig2b

2.2 Table 1

The main finding is that the High Exposure subjects are less likely to trade and thus exhibit endowment effects. This finding is seen in Table 1.

[Q17] Pipe the data to lm() and then to summary() to replicate the coefficients in fifth specification (the fifth column in Table 1).

df %>% 
  lm(trade~ magnola_region + distance_to_mangola + lighter +
       lighter*distance_to_mangola, data=.) %>%
  summary()
## 
## Call:
## lm(formula = trade ~ magnola_region + distance_to_mangola + lighter + 
##     lighter * distance_to_mangola, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6140 -0.2847 -0.2157  0.4732  0.7847 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                  0.496565   0.153222   3.241  0.00142 **
## magnola_region              -0.285900   0.145628  -1.963  0.05119 . 
## distance_to_mangola          0.001437   0.002627   0.547  0.58508   
## lighter                      0.079017   0.098073   0.806  0.42150   
## distance_to_mangola:lighter -0.002969   0.002272  -1.307  0.19293   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.471 on 177 degrees of freedom
## Multiple R-squared:  0.09322,    Adjusted R-squared:  0.07273 
## F-statistic: 4.549 on 4 and 177 DF,  p-value: 0.001603

Notice how the coefficients above are the same as Table 1 Specification 5 but the standard errors are different. This is because the authors cluster the standard errors at the village level. Before we dive into clustering, we need to appreciate why we care about the standard errors.

The standard error is the estimate of the variance of a regression coefficient, and it plays a huge role in hypothesis testing. Recall that the null hypothesis test on any coefficient is that its expected value is zero (i.e., no or “null” effect of the variable on the outcome). The test statistic of the hypothesis test is thus distributed around zero, and the probability that we should observe our regression coefficient assuming the null hypothesis is true is the area underneath the curve above and below the test statistic. This probability is the p-value, and the p-value determines whether we reject or fail-to-reject the null hypothesis. So, if we have the wrong estimate of the standard error, we will make the wrong inference about our regression coefficient.

[Q18] This test statistic is the “t value”, and it is simply the estimated coefficient divided by the standard error. Verify the t value for the treatment indicator. (No functions needed. You just have to divide two numbers from the regression output.)

-0.2859 / 0.145628
## [1] -1.963221

[Q19] Now verify the p-value to the estimated treatment effect using pt(). (Hint: the t-distribution is symmetric around the mean! And mind the degrees-of-freedom, the df argument in pt(). The degrees of freedom can be found in the regression table from above.)

round(
  1 - abs(
  pt(-1.963221, 177, lower.tail = TRUE) - 
    pt(-1.963221, 177, lower.tail = FALSE)
  ), 
  digits=5)
## [1] 0.05119

[Q20] The authors cluster standard errors within villages to account for arbitrary, unobserved correlation between subjects in the same village. Why might there be such correlation? Recall the main decision made by villagers: to trade or not to trade.

The standard errors are clustered within villages due to the high likelihood of correlation among members in a village. This correlation is likely caused by communication between village members leading to shared attitudes on whether to trade or not to trade.

[Q21] Use feols() from library(fixest) to re-run the regression. Assign the output to the object model. (Hint: you don’t need to change your model call from before!)

model = df %>%
  feols(trade~ magnola_region + distance_to_mangola + lighter +
          lighter * distance_to_mangola, data=.)

model = df %>%
  feols(trade~ magnola_region + distance_to_mangola + lighter +
          lighter * distance_to_mangola, data=.)

[Q22] Run summary() on model to view the standard errors and p-values. They should be the same as before. (The formatting will look a bit different because feols() returns a different type of data object than lm().)

summary(model) # I like this!
## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: IID 
##                              Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)                  0.496565   0.153222  3.240814 0.001424 ** 
## magnola_region              -0.285900   0.145628 -1.963228 0.051186 .  
## distance_to_mangola          0.001437   0.002627  0.546975 0.585085    
## lighter                      0.079017   0.098073  0.805699 0.421497    
## distance_to_mangola:lighter -0.002969   0.002272 -1.306950 0.192925    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q23] Now use the se and cluster arguments to summary() to cluster the standard errors at the village level (campname in the data set).

Here is a helpful resource from the fixest author: https://cran.r-project.org/web/packages/fixest/vignettes/standard_errors.html

# https://rdrr.io/cran/fixest/man/feols.html

se(model)
##                 (Intercept)              magnola_region 
##                 0.153222199                 0.145627602 
##         distance_to_mangola                     lighter 
##                 0.002626839                 0.098073004 
## distance_to_mangola:lighter 
##                 0.002271535
summary(model, cluster = ~campname)
## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: Clustered (campname) 
##                              Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)                  0.496565   0.095157  5.218352 0.001228 ** 
## magnola_region              -0.285900   0.082226 -3.476996 0.010308 *  
## distance_to_mangola          0.001437   0.001552  0.925606 0.385450    
## lighter                      0.079017   0.085915  0.919710 0.388318    
## distance_to_mangola:lighter -0.002969   0.001622 -1.830471 0.109869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q24] What changed? The estimated coefficients? The standard errors? The p-values? Do your numbers (the coefficients and the standard errors) match the numbers in Table 1 Specification?

The numbers match the table. The estimated coefficients did not change. The standard errors, p-values and t values have changed.