WPA #4 – YaRrr! Chapter 8

Why do we overestimate others’ willingness to pay?

In this WPA, we will analyze data from Matthews et al. (2016): Why do we overestimate others’ willingness to pay? The purpose of this research was to test if our beliefs about other people’s affluence (i.e.; wealth) affect how much we think they will be willing to pay for items. You can find the full paper at http://journal.sjdm.org/15/15909/jdm15909.pdf.

Study 1

In this WPA, we will analyze data from their first study. In study 1, participants indicated the proportion of other people taking part in the survey who have more than themselves, and then whether other people would be willing to pay more than them for each of 10 items.

The following table shows a table of the 10 projects and proportion of participants who indicated that others would be more willing to pay for the product than themselves (Table 1 in Matthews et al., 2016).

Product Number	Product	Reported p(other > self)
1	A freshly-squeezed glass of apple juice	.695
2	A Parker ballpoint pen	.863
3	A pair of Bose noise-cancelling headphones	.705
4	A voucher giving dinner for two at Applebee’s	.853
5	A 16 oz jar of Planters dry-roasted peanuts	.774
6	A one-month movie pass	.800
7	An Ikea desk lamp	.863
8	A Casio digital watch	.900
9	A large, ripe pineapple	.674
10	A handmade wooden chess set	.732

Table 1: Proportion of participants who indicated that the “typical participant” would pay more than they would for each product in Study 1.

Study 1 variable description

Here are descriptions of the data variables (taken from the author’s dataset notes available at http://journal.sjdm.org/15/15909/Notes.txt)

id: participant id code
gender: participant gender. 1 = male, 2 = female
age: participant age
income: participant annual household income on categorical scale with 8 categorical options: Less than $15,000; $15,001–$25,000; $25,001–$35,000; $35,001–$50,000; $50,001–$75,000; $75,001–$100,000; $100,001–$150,000; greater than $150,000.
p1-p10: whether the “typical” survey respondent would pay more (coded 1) or less (coded 0) than oneself, for each of the 10 products
task: whether the participant had to judge the proportion of other people who “have more money than you do” (coded 1) or the proportion who “have less money than you do” (coded 0)
havemore: participant’s response when task = 1
haveless: participant’s response when task = 0
pcmore: participant’s estimate of the proportion of people who have more than they do (calculated as 100-haveless when task=0)

Open your R project from last week (I recommended calling it RCourse or something similar). There should be at least two folders in this working directory: data and R.
Open a new R script and save it as wpa3.R in the R folder in your project directory
The data are stored at http://journal.sjdm.org/15/15909/data1.csv. Load the data into R by using read.table() into a new object called matthews.df.
Using write.table(), save the data as a tab–delimited text file called matthews.txt in the data folder of your working directory.
Look at the first few rows of matthews.df using head(), View(), and str()
What are the names of the data columns?
Currently gender is coded as 1 and 2. Let’s create a new character column called gender.a that codes the data as male and female. Do this by running the following code:

# Create a new column called gender.a that codes gender as a string
matthews.df$gender.a <- NA
matthews.df$gender.a[matthews.df$gender == 1] <- "male"
matthews.df$gender.a[matthews.df$gender == 2] <- "female"

What percent of participants were male?
What was the mean age?
Create a new dataframe called product.df that only contain columns p1, p2, … p10 from matthews.df by running the following code.

# Create product.df, a dataframe containing only columns p1, p2, ... p10
product.df <- matthews.df[,paste("p", 1:10, sep = "")]

The colMeans() function takes a dataframe as an argument, and returns a vector showing means across rows for each column of data. Using colMeans(), calculate the percentage of participants who indicated that the ‘typical’ participant would be willing to pay more than them for each item. Do your values match what the authors reported in Table 1?
The rowMeans() function is like colMeans(), but for calculating means across columns for every row of data. Using rowMeans() calculate for each participant, the percentage of the 10 items that the participant believed other people would spend more on. Save this data as a vector called pall.
Add the pall vector as a new column called pall to the matthews.df dataframe
What was the mean value of pall across participants? This value is the answer to the question: “How often does the average participant think that someone else would pay more for an item than themselves?”
I created a new table containing fictional demographic information about each participant. The data are stored in a tab–delimited text file (with a header row) at http://nathanieldphillips.com/wp-content/uploads/2016/10/matthews_demographics.txt. Load the data into an object called demo.df into R.
Using merge add the demographic data to matthews.df
Using either basic indexing or subset(), calculate the mean age for males only.
Using either basic indexing or subset(), calculate the mean age for females only.
Using aggregate() calculate the mean age of male and female participants separately. Do you get the same answers as before?
Using aggregate() calculate the mean pall value for male and female participants separately. Which gender tends to think that others would pay more for products than them?
Using aggregate() calculate the mean pall value of participants for each level of income. Do you find a consistent relationship between pall and income?
Now repeat the previous analysis, but only for females (Hint: use the subset argument within the aggregate function)
What was the mean age for participants for each combination of gender and income?
The variable pcmore reflects the question: “What percent of people taking part in this survey do you think earn more than you do?”. Using aggregate(), calculate the median value of this variable separately for each level of income. What does the result tell you?
For the remaining problems, we’ll be using dplyr. Load the dplyr library:
Using dplyr, for each level of gender, calculate the summary statistics in the following table. Save the summary statistics to an object called gender.df

variable	description
n	Number of participants
age.mean	Mean age
age.sd	Standard deviation of age
income.mean	Mean income
pcmore.mean	Mean value of pcmore
pall.mean	Mean value of pall

Using dplyr, for each level of income, calculate the summary statistics in the following table – only for participants older than 21 – and save them to a new object called income.df.

variable	description
n	Number of participants
age.min	Minimum age
age.mean	Mean age
male.p	Percent of males
female.p	Percent of females
pcmore.mean	Mean value of pcmore
pall.mean	Mean value of pall

Using dplyr, calculate several summary statistics (you choose which ones!) aggregated at each level of race and gender. Save the results to an object called racegender.df
Using dplyr, calculate several summary statistics (you choose which ones!) aggregated at each level of independent variables of your choice. Save the results to an object called XXX.df, where XXX are the names of the variables you aggregated.
Using save(), save matthews.df, gender.df, income.df, racegender.df, and XXX.df objects to a file called matthews.RData in the data folder in your working directory.

Submit!

Save and email your wpa_X_LastFirst.R file to me at nathaniel.phillips@unibas.ch. Then, go to https://goo.gl/forms/UblvQ6dvA76veEWu1 to complete the WPA submission form.