Set-up and background

The assignment is worth 100 points. There are 27 questions. You should have the following packages installed:

library(tidyverse)
library(patchwork)
library(fixest)

In this problem set you will summarize the paper “Evolutionary Origins of the Endowment Effect: Evidence from Hunter-Gatherers” (Apicella et al., AER 2011) and recreate some of its findings.

1 Big picture

[Q1] What is the main question asked in this paper?

This paper questions whether the endowment effect is a universal principle or if it is a learned behavior. This question is broken down into three components: the universality of the endowment effect, the dependence of the endowment effect on cultural factors, and the role of evolution in creating the endowment effect.

[Q2] Describe the Hadza. How do they differ from market-based societies?

The Hadza are almost entirely removed from modern culture. They live in an isolated region in Tanzania, living a lifestyle that resembles pre-agricultural human society. The Hadza are nomadic, sleep outside, and do not utilize herding or agriculture. The Hadza live in a way that is very close to the lifestyle of early evolutionary humans.

[Q3] Summarize the experiment design. Pay attention to the source of randomization.

The endowment effect was tested for on different groups of Hadza. One group was considered the HE (high exposure) group, which interfaced with tourists frequently. The second group was the LE (low exposure group) which resides on the west and southeast of the lake which are very difficult to reach and, therefore, rarely see tourists.

The endowment effect was tested for in both the HE and LE groups. The experimenters were very intentional in their design as to dispel concerns in the academic literature that the endowment effect is driven by experimental design or that the endowment effect may only be relevant with items with evolutionary importance (i.e. food). In order to prevent the concern that the endowment effect is only driven by items with evolutionary importance, subjects were exposed to two different trials; one in which they were endowed with food items (two packages of biscuits) and one in which they were endowed with non-food items (two packages of lighters). In order to prevent experimental procedure from interfering with the results, the experimenters made sure that the distribution of endowed items was randomized and, also, that subjects could not hold the endowed items. Two conditions were created to assure that the experiment was not interfering with the results. One condition had the subject physically receiving an item from the experimenter and then being asked if he/she would like to trade it. The second condition had the experimenter putting the items on the ground and then flipping a coin to determine which subject received which item. This randomization assured that the results were truly a product of the endowment effect and not a product of the experimental design or the type of goods that were being endowed.

[Q4] Why did the authors use biscuits and lighters and their design?

Biscuits and lighters were used to dispel concerns in the literature that the endowment effect is only relevant for items of evolutionary importance (Bronson et. al 2007). The biscuits served as a food item of evolutionary importance while the lighters were not. The results suggest that the item being food or non-food was not relevant when measuring the endowment effect.

[Q5] Summarize the main results of the experiment.

The results of this experiment show that the high exposure group exhibited the endowment effect while the low exposure group did not exhibit the endowment effect. This is likely because the high exposure group has been exposed to markets and the concepts of value and possession that arise as a result of market exposure. Therefore, the key takeaway from this study is that the endowment effect may be caused by market exposure.

[Q6] How do the results of this study compare to the sportcards market study by List (2003)?

The findings of List (2003) suggest that dealers, or experts in trading and markets, exhibit less of an endowment effect when trading than do non-dealers. When compared with ordinary consumers, the dealer’s experience in trading and previous market exposure allow them to operate more rationally than the ordinary consumer. Contrary to this finding, Apicella finds that market exposure does not decrease the endowment effect but, rather, increases the endowment effect. The high-exposure Hadza group in the Apicella study shows more of an endowment effect than does the lower exposure group. In this sense, List and Apicella arrive at contrary conclusions surrounding the effect of market exposure on the endowment effect.

[Q7] What do these results tell us about preferences? Are they endogenous or exogenous?

These results suggest that preferences are exogenous, as they are influenced by outside forces such as our exposure to markets. If increased market exposure causes someone to behave less rationally, that preference has been dictated by an exogenous factor.

[Q8] Why are these results valuable? What have we learned? Motivate your discussion with a real-world example.

These results allow us to understand the root-cause of irrational human behavior. If market exposure is the motivating factor behind human preferences and unwillingness to trade, even when a trade is in our best interest, these findings allow us to better understand the human condition and respond rationally in light of our knowledge. In modern society, understanding and exposing the endowment effect in our daily lives can be very valuable. For example, people are often unwilling to make reasonable trades as a consequence of sentimentality, emotional attachment or feelings of possession. Consider someone who drives an old car that is not particularly safe or well maintained. If riddled with the endowment effect, that individual might struggle to trade out his/her car for a better car because of the feeling of sentimentality or possession. If we understand that this notion of the endowment effect is merely a consequence of that individual’s exposure to markets, it is more likely that he/she will make the rational choice and trade in the car for something safer, newer, cleaner, etc.

2 Replication

Use theme_classic() for all plots.

Load the data. You may need to update your path depending on where you stored it.

df = read_csv("apicella_al_2011.csv")

2.1 Figure 2

[Q9] The column magnola_region is the treatment condition. Use mutate() to create a new column called magnola_region_cat, a categorical variable, that takes the value High Exposure if magnola_region == 1, otherwise Low Exposure. Then use mutate() again and factor() to force the new column magnola_region_cat into a factor variable. Factors are how categorical variables are represented in R. Do both mutations in one pipe chain.

df = df %>% mutate(magnola_region_cat = ifelse(magnola_region == 1,
    'High Exposure',
    'Low Exposure')) %>%
  mutate(magnola_region_cat = factor(magnola_region_cat))

[Q10] Factor variables in R have “levels” or categories. R chooses a default order for these levels. Check the order of the levels in magnola_region_cat with levels():

levels(df$magnola_region_cat)

## [1] "High Exposure" "Low Exposure"

[Q11] Notice how High Exposure is the first level. That means it will be drawn first when we re-create Figure 2. If we want to perfectly re-create Figure 2, we need High Exposure to be drawn second. So, we have to re-order the levels in the column. Do so with fct_relevel():

df = df %>%
  mutate(magnola_region_cat = df$magnola_region_cat %>%
           fct_relevel("Low Exposure"))

[Q12] Re-run levels() to check the new ordering of levels in magnola_region_cat:

levels(df$magnola_region_cat)

## [1] "Low Exposure"  "High Exposure"

[Q13] OK, let’s make figure 2A. Use stat_summary(fun = mean) to plot the averages and stat_summary(fun.data = mean_se) to plot the error bars (hint: set the width of the error bars to something like 0.1). Assign the output to the object fig2a. Use ylim() to set the limits of the axis to \([0,1]\), and make sure to label both axes.

fig2a <- ggplot(data=df, aes(x=magnola_region_cat)) + 
  stat_summary(aes(y=trade), fun.data="mean_se", geom="bar", width=0.3, fill="red1", alpha=0.8, col="black") + 
  ylab("")+ xlab("") + 
  geom_hline(yintercept=0.5) + ylim(0,1) + 
  ggtitle("Panel A") + theme_classic()

fig2a

[Q14] Figure 2b shows the fraction of subjects that traded by camp and distance to the village Mangola. This one is a bit more challenging. We have to scatter plot distance on the x-axis and mean trade on the y-axis – and then size each point by total trade. Let’s start by making these summaries. Use summarise() to create three columns by campname: mean_trade (the average trade), sum_trade (the total trade), and distance (hint: use unique(distance_to_mangola)):

df2 = df %>% group_by(campname) %>% summarize(mean_trade = mean(trade),
            sum_trade = sum(trade),
            distance = unique(distance_to_mangola))

[Q15] OK, now pipe the output of what you just did to ggplot to plot mean_trade as a function of distance and size each point by sum_trade. Assign the plot to fig2b.

fig2b <- df2 %>% 
  ggplot(data=., aes(x=distance, y=mean_trade, label=campname))+
  geom_hline(yintercept=0.5)+
  geom_point(aes(size=sum_trade), show.legend=FALSE,
             color="red1", alpha=0.8) + ylim(0,1) + 
  ggtitle("Panel B") +
  theme(plot.title = element_text(hjust = 0))+
  xlab("Distance from Mangola Village (km)")+ylab("") + theme_classic() 



fig2b

[Q16] Use library(patchwork) to combine the two plots and complete the replication.

fig2a + fig2b

2.2 Table 1

The main finding is that the High Exposure subjects are less likely to trade and thus exhibit endowment effects. This finding is seen in Table 1.

[Q17] Pipe the data to lm() and then to summary() to replicate the coefficients in fifth specification (the fifth column in Table 1).

lm(trade ~ magnola_region_cat + distance_to_mangola + lighter + (lighter *distance_to_mangola) , data=df) %>% summary()

## 
## Call:
## lm(formula = trade ~ magnola_region_cat + distance_to_mangola + 
##     lighter + (lighter * distance_to_mangola), data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6140 -0.2847 -0.2157  0.4732  0.7847 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                      0.496565   0.153222   3.241  0.00142 **
## magnola_region_catHigh Exposure -0.285900   0.145628  -1.963  0.05119 . 
## distance_to_mangola              0.001437   0.002627   0.547  0.58508   
## lighter                          0.079017   0.098073   0.806  0.42150   
## distance_to_mangola:lighter     -0.002969   0.002272  -1.307  0.19293   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.471 on 177 degrees of freedom
## Multiple R-squared:  0.09322,    Adjusted R-squared:  0.07273 
## F-statistic: 4.549 on 4 and 177 DF,  p-value: 0.001603

Notice how the coefficients above are the same as Table 1 Specification 5 but the standard errors are different. This is because the authors cluster the standard errors at the village level. Before we dive into clustering, we need to appreciate why we care about the standard errors.

The standard error is the estimate of the variance of a regression coefficient, and it plays a huge role in hypothesis testing. Recall that the null hypothesis test on any coefficient is that its expected value is zero (i.e., no or “null” effect of the variable on the outcome). The test statistic of the hypothesis test is thus distributed around zero, and the probability that we should observe our regression coefficient assuming the null hypothesis is true is the area underneath the curve above and below the test statistic. This probability is the p-value, and the p-value determines whether we reject or fail-to-reject the null hypothesis. So, if we have the wrong estimate of the standard error, we will make the wrong inference about our regression coefficient.

[Q18] This test statistic is the “t value”, and it is simply the estimated coefficient divided by the standard error. Verify the t value for the treatment indicator. (No functions needed. You just have to divide two numbers from the regression output.)

tvalue <- (-0.285900/0.145628)
tvalue

## [1] -1.963221

[Q19] Now verify the p-value to the estimated treatment effect using pt(). (Hint: the t-distribution is symmetric around the mean! And mind the degrees-of-freedom, the df argument in pt(). The degrees of freedom can be found in the regression table from above.)

pvalue <- 2* pt(tvalue, df=177)
pvalue

## [1] 0.0511872

[Q20] The authors cluster standard errors within villages to account for arbitrary, unobserved correlation between subjects in the same village. Why might there be such correlation? Recall the main decision made by villagers: to trade or not to trade.

The villagers may be influenced by similar value structures, norms, customs, or trading patterns associated with their specific location. There could be correlation between villagers as a consequence of their exposure to certain norms developed within the village. For example, while villagers in the low exposure village might not have markets or currency, they may have customs for bartering with one another. This could skew the results of the experiment.

[Q21] Use feols() from library(fixest) to re-run the regression. Assign the output to the object model. (Hint: you don’t need to change your model call from before!)

model <- feols(trade ~ magnola_region_cat + distance_to_mangola + lighter + (lighter *distance_to_mangola), data=df)

[Q22] Run summary() on model to view the standard errors and p-values. They should be the same as before. (The formatting will look a bit different because feols() returns a different type of data object than lm().)

summary(model)

## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: IID 
##                                  Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)                      0.496565   0.153222  3.240814 0.001424 ** 
## magnola_region_catHigh Exposure -0.285900   0.145628 -1.963228 0.051186 .  
## distance_to_mangola              0.001437   0.002627  0.546975 0.585085    
## lighter                          0.079017   0.098073  0.805699 0.421497    
## distance_to_mangola:lighter     -0.002969   0.002272 -1.306950 0.192925    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q23] Now use the se and cluster arguments to summary() to cluster the standard errors at the village level (campname in the data set).

Here is a helpful resource from the fixest author: https://cran.r-project.org/web/packages/fixest/vignettes/standard_errors.html

model %>% summary(se="standard") %>% summary(cluster="campname")

## OLS estimation, Dep. Var.: trade
## Observations: 182 
## Standard-errors: Clustered (campname) 
##                                  Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)                      0.496565   0.095157  5.218352 0.001228 ** 
## magnola_region_catHigh Exposure -0.285900   0.082226 -3.476996 0.010308 *  
## distance_to_mangola              0.001437   0.001552  0.925606 0.385450    
## lighter                          0.079017   0.085915  0.919710 0.388318    
## distance_to_mangola:lighter     -0.002969   0.001622 -1.830471 0.109869    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.464483   Adj. R2: 0.07273

[Q24] What changed? The estimated coefficients? The standard errors? The p-values? Do your numbers (the coefficients and the standard errors) match the numbers in Table 1 Specification?

The coefficients remain the same for each table, as does the adjusted R-squared. However, the p-values, t-values, and standard error change across each different model. The model from question 23, which clusters standard errors, has yeilded standard errors that are identical to the Apicella paper. While the other models had the same coefficient values as the paper, this clustered model is the first one to have the same standard errors as the paper as well.

Problem Set 2

Behavioral Economics, Boston College

Joey Preziosi

Set-up and background

1 Big picture

2 Replication

2.1 Figure 2

2.2 Table 1