Why do we overestimate others’ willingness to pay?
In this WPA, we will analyze data from Matthews et al. (2016): Why do we overestimate others’ willingness to pay? The purpose of this research was to test if our beliefs about other people’s affluence (i.e.; wealth) affect how much we think they will be willing to pay for items. You can find the full paper at http://journal.sjdm.org/15/15909/jdm15909.pdf.
Study 1
In this WPA, we will analyze data from their first study. In study 1, participants indicated the proportion of other people taking part in the survey who have more than themselves, and then whether other people would be willing to pay more than them for each of 10 items.
The following table shows a table of the 10 projects and proportion of participants who indicated that others would be more willing to pay for the product than themselves (Table 1 in Matthews et al., 2016).
Product Number | Product | Reported p(other > self) |
---|---|---|
1 | A freshly-squeezed glass of apple juice | .695 |
2 | A Parker ballpoint pen | .863 |
3 | A pair of Bose noise-cancelling headphones | .705 |
4 | A voucher giving dinner for two at Applebee’s | .853 |
5 | A 16 oz jar of Planters dry-roasted peanuts | .774 |
6 | A one-month movie pass | .800 |
7 | An Ikea desk lamp | .863 |
8 | A Casio digital watch | .900 |
9 | A large, ripe pineapple | .674 |
10 | A handmade wooden chess set | .732 |
Table 1: Proportion of participants who indicated that the “typical participant” would pay more than they would for each product in Study 1.
Study 1 variable description
Here are descriptions of the data variables (taken from the author’s dataset notes available at http://journal.sjdm.org/15/15909/Notes.txt)
id
: participant id codegender
: participant gender. 1 = male, 2 = femaleage
: participant ageincome
: participant annual household income on categorical scale with 8 categorical options: Less than $15,000; $15,001–$25,000; $25,001–$35,000; $35,001–$50,000; $50,001–$75,000; $75,001–$100,000; $100,001–$150,000; greater than $150,000.p1-p10
: whether the “typical” survey respondent would pay more (coded 1) or less (coded 0) than oneself, for each of the 10 productstask
: whether the participant had to judge the proportion of other people who “have more money than you do” (coded 1) or the proportion who “have less money than you do” (coded 0)havemore
: participant’s response when task = 1haveless
: participant’s response when task = 0pcmore
: participant’s estimate of the proportion of people who have more than they do (calculated as 100-haveless when task=0)
Open your R project from last week (I recommended calling it
RCourse
or something similar). There should be at least two folders in this working directory:data
andR
.Open a new R script and save it as wpa3.R in the R folder in your project directory
The data are stored at http://journal.sjdm.org/15/15909/data1.csv. Load the data into R by using
read.table()
into a new object calledmatthews.df
.
matthews.df <- read.table(file = "http://journal.sjdm.org/15/15909/data1.csv",
sep = ",",
header = TRUE)
- Using
write.table()
, save the data as a tab–delimited text file calledmatthews.txt
in the data folder of your working directory.
write.table(x = matthews.df,
file = "data/study1.txt",
sep = "\t")
- Look at the first few rows of
matthews.df
usinghead()
,View()
, andstr()
head(matthews.df)
## id gender age income p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 task
## 1 R_3PtNn51LmSFdLNM 2 26 7 1 1 1 1 1 1 1 1 1 1 0
## 2 R_2AXrrg62pgFgtMV 2 32 4 1 1 1 1 1 1 1 1 1 1 0
## 3 R_cwEOX3HgnMeVQHL 1 25 2 0 1 1 1 1 1 1 1 0 0 0
## 4 R_d59iPwL4W6BH8qx 1 33 5 1 1 1 1 1 1 1 1 1 1 0
## 5 R_1f3K2HrGzFGNelZ 1 24 1 1 1 0 1 1 1 1 1 1 1 1
## 6 R_3oN5ijzTfoMy4ca 1 22 2 1 1 0 0 1 1 1 1 0 1 0
## havemore haveless pcmore
## 1 NA 50 50
## 2 NA 25 75
## 3 NA 10 90
## 4 NA 50 50
## 5 99 NA 99
## 6 NA 20 80
# View(matthews.df)
str(matthews.df)
## 'data.frame': 190 obs. of 18 variables:
## $ id : Factor w/ 190 levels "R_0JLtfRpOyh8pOCN",..: 126 52 156 158 17 125 173 87 44 60 ...
## $ gender : int 2 2 1 1 1 1 1 1 1 1 ...
## $ age : int 26 32 25 33 24 22 47 26 29 32 ...
## $ income : int 7 4 2 5 1 2 3 4 1 7 ...
## $ p1 : int 1 1 0 1 1 1 1 1 1 1 ...
## $ p2 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ p3 : int 1 1 1 1 0 0 0 0 0 1 ...
## $ p4 : int 1 1 1 1 1 0 0 0 1 1 ...
## $ p5 : int 1 1 1 1 1 1 1 0 0 1 ...
## $ p6 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ p7 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ p8 : int 1 1 1 1 1 1 1 1 1 0 ...
## $ p9 : int 1 1 0 1 1 0 1 0 1 1 ...
## $ p10 : int 1 1 0 1 1 1 0 0 1 1 ...
## $ task : int 0 0 0 0 1 0 1 0 1 1 ...
## $ havemore: int NA NA NA NA 99 NA 95 NA 70 25 ...
## $ haveless: int 50 25 10 50 NA 20 NA 30 NA NA ...
## $ pcmore : int 50 75 90 50 99 80 95 70 70 25 ...
- What are the names of the data columns?
names(matthews.df)
## [1] "id" "gender" "age" "income" "p1" "p2"
## [7] "p3" "p4" "p5" "p6" "p7" "p8"
## [13] "p9" "p10" "task" "havemore" "haveless" "pcmore"
- Currently gender is coded as 1 and 2. Let’s create a new character column called
gender.a
that codes the data asmale
andfemale
. Do this by running the following code:
# Create a new column called gender.a that codes gender as a string
matthews.df$gender.a <- NA
matthews.df$gender.a[matthews.df$gender == 1] <- "male"
matthews.df$gender.a[matthews.df$gender == 2] <- "female"
- What percent of participants were male?
mean(matthews.df$gender.a == "male")
## [1] 0.6263158
- What was the mean age?
mean(matthews.df$age)
## [1] 31.71579
- Create a new dataframe called
product.df
that only contain columns p1, p2, … p10 frommatthews.df
by running the following code.
# Create product.df, a dataframe containing only columns p1, p2, ... p10
product.df <- matthews.df[,paste("p", 1:10, sep = "")]
- The
colMeans()
function takes a dataframe as an argument, and returns a vector showing means across rows for each column of data. UsingcolMeans()
, calculate the percentage of participants who indicated that the ‘typical’ participant would be willing to pay more than them for each item. Do your values match what the authors reported in Table 1?
colMeans(product.df)
## p1 p2 p3 p4 p5 p6 p7
## 0.6947368 0.8631579 0.7052632 0.8526316 0.7736842 0.8000000 0.8631579
## p8 p9 p10
## 0.9000000 0.6736842 0.7315789
# Yes the numbers match!!!
- The
rowMeans()
function is likecolMeans()
, but for calculating means across columns for every row of data. UsingrowMeans()
calculate for each participant, the percentage of the 10 items that the participant believed other people would spend more on. Save this data as a vector calledpall
.
pall <- rowMeans(product.df)
- Add the
pall
vector as a new column calledpall
to thematthews.df
dataframe
matthews.df$pall <- pall
- What was the mean value of
pall
across participants? This value is the answer to the question: “How often does the average participant think that someone else would pay more for an item than themselves?”
mean(matthews.df$pall)
## [1] 0.7857895
- I created a new table containing fictional demographic information about each participant. The data are stored in a tab–delimited text file (with a header row) at http://nathanieldphillips.com/wp-content/uploads/2016/10/matthews_demographics.txt. Load the data into an object called
demo.df
into R.
demo.df <- read.table("http://nathanieldphillips.com/wp-content/uploads/2016/10/matthews_demographics.txt",
sep = "\t",
header = TRUE)
- Using
merge
add the demographic data tomatthews.df
matthews.df <- merge(x = matthews.df,
y = demo.df,
by = "id")
- Using either basic indexing or
subset()
, calculate the mean age for males only.
mean(matthews.df$age[matthews.df$gender.a == "male"])
## [1] 29.76471
# OR
male.df <- subset(matthews.df, subset = gender.a == "male")
mean(male.df$age)
## [1] 29.76471
- Using either basic indexing or
subset()
, calculate the mean age for females only.
mean(matthews.df$age[matthews.df$gender.a == "female"])
## [1] 34.98592
# OR
female.df <- subset(matthews.df, subset = gender.a == "female")
mean(female.df$age)
## [1] 34.98592
- Using
aggregate()
calculate the mean age of male and female participants separately. Do you get the same answers as before?
aggregate(formula = age ~ gender.a,
FUN = mean,
data = matthews.df)
## gender.a age
## 1 female 34.98592
## 2 male 29.76471
# Yes the answers are the same!
- Using
aggregate()
calculate the meanpall
value for male and female participants separately. Which gender tends to think that others would pay more for products than them?
aggregate(formula = pall ~ gender.a,
FUN = mean,
data = matthews.df)
## gender.a pall
## 1 female 0.8014085
## 2 male 0.7764706
# Males tend to think that others will pay more for items than them relative to females.
- Using
aggregate()
calculate the meanpall
value of participants for each level of income. Do you find a consistent relationship betweenpall
and income?
aggregate(formula = pall ~ income,
FUN = mean,
data = matthews.df)
## income pall
## 1 1 0.9037037
## 2 2 0.8044444
## 3 3 0.7370370
## 4 4 0.7862069
## 5 5 0.7500000
## 6 6 0.6958333
## 7 7 0.8142857
## 8 8 0.8666667
# The values decrease from income = 1 to income = 6, then they go up again!
- Now repeat the previous analysis, but only for females (Hint: use the
subset
argument within theaggregate
function)
aggregate(formula = age ~ income,
FUN = mean,
data = matthews.df,
subset = gender.a == "female")
## income age
## 1 1 31.12500
## 2 2 36.35294
## 3 3 33.14286
## 4 4 36.75000
## 5 5 35.00000
## 6 6 34.00000
## 7 7 37.60000
## 8 8 38.50000
- What was the mean age for participants for each combination of gender and income?
aggregate(formula = age ~ income + gender.a,
FUN = mean,
data = matthews.df)
## income gender.a age
## 1 1 female 31.12500
## 2 2 female 36.35294
## 3 3 female 33.14286
## 4 4 female 36.75000
## 5 5 female 35.00000
## 6 6 female 34.00000
## 7 7 female 37.60000
## 8 8 female 38.50000
## 9 1 male 28.73684
## 10 2 male 30.14286
## 11 3 male 29.45000
## 12 4 male 28.52381
## 13 5 male 31.00000
## 14 6 male 29.60000
## 15 7 male 43.50000
## 16 8 male 23.00000
- The variable
pcmore
reflects the question: “What percent of people taking part in this survey do you think earn more than you do?”. Usingaggregate()
, calculate the median value of this variable separately for each level of income. What does the result tell you?
aggregate(formula = pcmore ~ income,
FUN = median,
data = matthews.df)
## income pcmore
## 1 1 80
## 2 2 75
## 3 3 50
## 4 4 60
## 5 5 50
## 6 6 45
## 7 7 50
## 8 8 50
# The higher one's income, the less people think that other people make more than them.
- For the remaining problems, we’ll be using
dplyr
. Load thedplyr
library:
library(dplyr)
- Using
dplyr
, for each level of gender, calculate the summary statistics in the following table. Save the summary statistics to an object calledgender.df
variable | description |
---|---|
n | Number of participants |
age.mean | Mean age |
age.sd | Standard deviation of age |
income.mean | Mean income |
pcmore.mean | Mean value of pcmore |
pall.mean | Mean value of pall |
gender.df <- matthews.df %>%
group_by(gender) %>%
summarise(
n = n(),
age.mean = mean(age),
age.sd = sd(age),
income.mean = mean(income),
pcmore.mean = mean(pcmore),
pall.mean = mean(pall)
)
gender.df
## # A tibble: 2 × 7
## gender n age.mean age.sd income.mean pcmore.mean pall.mean
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 119 29.76471 7.648757 3.285714 62.25210 0.7764706
## 2 2 71 34.98592 10.430029 3.943662 58.80282 0.8014085
- Using
dplyr
, for each level ofincome
, calculate the summary statistics in the following table – only for participants older than 21 – and save them to a new object calledincome.df
.
variable | description |
---|---|
n | Number of participants |
age.min | Minimum age |
age.mean | Mean age |
male.p | Percent of males |
female.p | Percent of females |
pcmore.mean | Mean value of pcmore |
pall.mean | Mean value of pall |
income.df <- matthews.df %>%
filter(age > 21) %>%
group_by(income) %>%
summarise(
n = n(),
age.mean = mean(age),
male.p = mean(gender == 1),
female.p = mean(gender == 2),
pcmore.mean = mean(pcmore),
pall.mean = mean(pall)
)
income.df
## # A tibble: 8 × 7
## income n age.mean male.p female.p pcmore.mean pall.mean
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 26 29.76923 0.6923077 0.3076923 74.88462 0.9038462
## 2 2 43 33.06977 0.6279070 0.3720930 70.09302 0.8069767
## 3 3 25 31.20000 0.7600000 0.2400000 54.60000 0.7400000
## 4 4 27 31.62963 0.7037037 0.2962963 61.25926 0.7814815
## 5 5 26 33.30769 0.6153846 0.3846154 53.65385 0.7500000
## 6 6 23 32.78261 0.3913043 0.6086957 46.00000 0.6826087
## 7 7 7 39.28571 0.2857143 0.7142857 41.42857 0.8142857
## 8 8 3 33.33333 0.3333333 0.6666667 33.33333 0.8666667
- Using
dplyr
, calculate several summary statistics (you choose which ones!) aggregated at each level of race and gender. Save the results to an object calledracegender.df
racegender.df <- matthews.df %>%
group_by(race, gender.a) %>%
summarise(
n = n(), # N
age.max = max(age), # Oldest person
income.mean = mean(income) # Mean income
)
racegender.df
## Source: local data frame [8 x 5]
## Groups: race [?]
##
## race gender.a n age.max income.mean
## <fctr> <chr> <int> <int> <dbl>
## 1 asian female 14 60 3.714286
## 2 asian male 27 58 3.444444
## 3 black female 13 49 3.615385
## 4 black male 24 44 3.291667
## 5 hispanic female 7 67 4.571429
## 6 hispanic male 16 52 2.625000
## 7 white female 37 59 4.027027
## 8 white male 52 57 3.403846
- Using
dplyr
, calculate several summary statistics (you choose which ones!) aggregated at each level of independent variables of your choice. Save the results to an object calledXXX.df
, where XXX are the names of the variables you aggregated.
taskgender.df <- matthews.df %>%
group_by(task, gender.a) %>%
summarise(
n = n(), # N
pcmore.mean = mean(pcmore), # mean pcmore
p.black = mean(race == "black") # Percent black
)
taskgender.df
## Source: local data frame [4 x 5]
## Groups: task [?]
##
## task gender.a n pcmore.mean p.black
## <int> <chr> <int> <dbl> <dbl>
## 1 0 female 38 61.71053 0.1578947
## 2 0 male 55 63.69091 0.2181818
## 3 1 female 33 55.45455 0.2121212
## 4 1 male 64 61.01562 0.1875000
- Using
save()
, savematthews.df
,gender.df
,income.df
,racegender.df
, andXXX.df
objects to a file calledmatthews.RData
in thedata
folder in your working directory.
save(matthews.df, gender.df, income.df, racegender.df, taskgender.df, file = "data/matthews.RData")
Submit!
Save and email your wpa_X_LastFirst.R
file to me at nathaniel.phillips@unibas.ch. Then, go to https://goo.gl/forms/UblvQ6dvA76veEWu1 to complete the WPA submission form.