WPA #4: Chapters 7 and 8 – Data management, Advanced dataframe manipulation

Why do we overestimate others’ willingness to pay?

In this WPA, we will analyze data from Matthews et al. (2016): Why do we overestimate others’ willingness to pay? The purpose of this research was to test if our beliefs about other people’s affluence (i.e.; wealth) affect how much we think they will be willing to pay for items. You can find the full paper at http://journal.sjdm.org/15/15909/jdm15909.pdf.

Study 1

In this WPA, we will analyze data from their first study. In study 1, participants indicated the proportion of other people taking part in the survey who have more than themselves, and then whether other people would be willing to pay more than them for each of 10 items.

The following table shows a table of the 10 projects and proportion of participants who indicated that others would be more willing to pay for the product than themselves (Table 1 in Matthews et al., 2016).

Product Number	Product	Reported p(other > self)
1	A freshly-squeezed glass of apple juice	.695
2	A Parker ballpoint pen	.863
3	A pair of Bose noise-cancelling headphones	.705
4	A voucher giving dinner for two at Applebee’s	.853
5	A 16 oz jar of Planters dry-roasted peanuts	.774
6	A one-month movie pass	.800
7	An Ikea desk lamp	.863
8	A Casio digital watch	.900
9	A large, ripe pineapple	.674
10	A handmade wooden chess set	.732

Table 1: Proportion of participants who indicated that the “typical participant” would pay more than they would for each product in Study 1.

Study 1 variable description

Here are descriptions of the data variables (taken from the author’s dataset notes available at http://journal.sjdm.org/15/15909/Notes.txt)

id: participant id code
gender: participant gender. 1 = male, 2 = female
age: participant age
income: participant annual household income on categorical scale with 8 categorical options: Less than $15,000; $15,001–$25,000; $25,001–$35,000; $35,001–$50,000; $50,001–$75,000; $75,001–$100,000; $100,001–$150,000; greater than $150,000.
p1-p10: whether the “typical” survey respondent would pay more (coded 1) or less (coded 0) than oneself, for each of the 10 products
task: whether the participant had to judge the proportion of other people who “have more money than you do” (coded 1) or the proportion who “have less money than you do” (coded 0)
havemore: participant’s response when task = 1
haveless: participant’s response when task = 0
pcmore: participant’s estimate of the proportion of people who have more than they do (calculated as 100-haveless when task=0)

Create a new R Project called matthews2016. Set the working directory of the object to an appropriate folder on your computer.
Outside of RStudio, navigate to your project folder and create three new folders: data, papers, and r.
Go back to RStudio. Open a new R script called analysis and save the script in the r folder you just created. You will do the rest of your analyses in this script.
Using read.table(), load the data as a new dataframe called study1.df in R. The data for study 1 are available at http://journal.sjdm.org/15/15909/data1.csv.
Using the write.table() function, save study1.df as a tab-delimited text file called study1.txt into the data folder.
Look at the first few rows of study1.df using head() or indexing. The data should look like this:

##                  id gender age income p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 task
## 1 R_3PtNn51LmSFdLNM      2  26      7  1  1  1  1  1  1  1  1  1   1    0
## 2 R_2AXrrg62pgFgtMV      2  32      4  1  1  1  1  1  1  1  1  1   1    0
## 3 R_cwEOX3HgnMeVQHL      1  25      2  0  1  1  1  1  1  1  1  0   0    0
## 4 R_d59iPwL4W6BH8qx      1  33      5  1  1  1  1  1  1  1  1  1   1    0
## 5 R_1f3K2HrGzFGNelZ      1  24      1  1  1  0  1  1  1  1  1  1   1    1
## 6 R_3oN5ijzTfoMy4ca      1  22      2  1  1  0  0  1  1  1  1  0   1    0
##   havemore haveless pcmore
## 1       NA       50     50
## 2       NA       25     75
## 3       NA       10     90
## 4       NA       50     50
## 5       99       NA     99
## 6       NA       20     80

What are the names of the data columns (use names())?
What percent of participants were male? (Hint: create a logical index from the gender column, then use mean())
What percent of participants were female?
What was the mean age?
What was the standard deviation of ages?
Re-order the study1.df dataframe by age (in increasing order). You’ll need to use the order() function to do this.
Create a new dataframe called study1.stimuli that only contain columns p1, p2, … p10 from study1.df. (Hint: your code should look something like this: study1.stimuli <- study1.df[…])
Using colMeans(), calculate the percentage of participants who indicated that the ‘typical’ participant would be willing to pay more than them for each item. Do your values match what the authors reported in Table 1?
Using rowMeans(), calculate for each participant, the percentage of the 10 items that the participant believed other people would spend more on. Save this data as a vector called pall
Add the pall vector to the study1.df dataframe
Using aggregate() calculate the mean age of male and female participants separately. Which gender tends to be older?
Using aggregate() calculate the mean age of participants for each level of income. What do the results tell you?
Using aggregate() calculate the mean age of female participants only for each level of income.
Using aggregate() calculate the mean age of participants separated by both gender and income.
The variable pcmore reflects the question: “What percent of people taking part in this survey do you think earn more than you do?”. Using aggregate(), calculate the mean value of this variable separately for each level of income. What does the result tell you?
Load the dplyr library using the library() function.
Using dplyr, for each level of gender, calculate the following summary statistics: n (the number of participants), age.mean (mean age), age.sd (sd of age), income.mean (mean income), pcmore.mean (mean of pcmore), pall.mean (mean of pall). Save the summary statistics to an object called gender.summary
Using dplyr For each level of income, calculate the following summary statistics: n (number of participants), age.mean (mean age), male.p (percent of men), female.p (percent of women), pcmore.mean (mean of pcmore), pall.mean (mean of pall). Save the summary statistics to an object called income.summary
Now repeat question 22, but only include participants older than 25. Save the summary statistics to an object called income.u25.summary
Save study1.df, gender.summary, income.summary and income.u25.summary objects to a file called summary.RData in the data folder in your working directory
Clear your workspace using the rm(list = ls()) command. Run the ls() command to make sure that your workspace is empty.
Load summary.RData back into your workspace. Run the ls() command to make sure all the objects are back.

WPA #4: Chapters 7 and 8 – Data management, Advanced dataframe manipulation

Basel Spring 2016

Why do we overestimate others’ willingness to pay?

Study 1