WPA #4: Chapters 7 and 8 – Data management, Advanced dataframe manipulation

Why do we overestimate others’ willingness to pay?

In this WPA, we will analyze data from Matthews et al. (2016): Why do we overestimate others’ willingness to pay? The purpose of this research was to test if our beliefs about other people’s affluence (i.e.; wealth) affect how much we think they will be willing to pay for items. You can find the full paper at http://journal.sjdm.org/15/15909/jdm15909.pdf.

Study 1

In this WPA, we will analyze data from their first study. In study 1, participants indicated the proportion of other people taking part in the survey who have more than themselves, and then whether other people would be willing to pay more than them for each of 10 items.

The following table shows a table of the 10 projects and proportion of participants who indicated that others would be more willing to pay for the product than themselves (Table 1 in Matthews et al., 2016).

Product Number	Product	Reported p(other > self)
1	A freshly-squeezed glass of apple juice	.695
2	A Parker ballpoint pen	.863
3	A pair of Bose noise-cancelling headphones	.705
4	A voucher giving dinner for two at Applebee’s	.853
5	A 16 oz jar of Planters dry-roasted peanuts	.774
6	A one-month movie pass	.800
7	An Ikea desk lamp	.863
8	A Casio digital watch	.900
9	A large, ripe pineapple	.674
10	A handmade wooden chess set	.732

Table 1: Proportion of participants who indicated that the “typical participant” would pay more than they would for each product in Study 1.

Study 1 variable description

Here are descriptions of the data variables (taken from the author’s dataset notes available at http://journal.sjdm.org/15/15909/Notes.txt)

id: participant id code
gender: participant gender. 1 = male, 2 = female
age: participant age
income: participant annual household income on categorical scale with 8 categorical options: Less than $15,000; $15,001–$25,000; $25,001–$35,000; $35,001–$50,000; $50,001–$75,000; $75,001–$100,000; $100,001–$150,000; greater than $150,000.
p1-p10: whether the “typical” survey respondent would pay more (coded 1) or less (coded 0) than oneself, for each of the 10 products
task: whether the participant had to judge the proportion of other people who “have more money than you do” (coded 1) or the proportion who “have less money than you do” (coded 0)
havemore: participant’s response when task = 1
haveless: participant’s response when task = 0
pcmore: participant’s estimate of the proportion of people who have more than they do (calculated as 100-haveless when task=0)

Create a new R Project called matthews2016. Set the working directory of the object to an appropriate folder on your computer.

# Ok I did that

Outside of RStudio, navigate to your project folder and create three new folders: data, papers, and r.

# Yep, I did that too

Go back to RStudio. Open a new R script called analysis and save the script in the r folder you just created. You will do the rest of your analyses in this script.

# yes captain!

Using read.table(), load the data as a new dataframe called study1.df in R. The data for study 1 are available at http://journal.sjdm.org/15/15909/data1.csv.

study1.df <- read.table(file = "http://journal.sjdm.org/15/15909/data1.csv", 
                     sep = ",",
                     header = T
                     )

Using the write.table() function, save study1.df as a tab-delimited text file called study1.txt into the data folder.

write.table(study1.df, 
            file = "data/study1.txt", 
            sep = "\t")

Look at the first few rows of study1.df using head() or indexing. The data should look like this:

head(study1.df)

##                  id gender age income p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 task
## 1 R_3PtNn51LmSFdLNM      2  26      7  1  1  1  1  1  1  1  1  1   1    0
## 2 R_2AXrrg62pgFgtMV      2  32      4  1  1  1  1  1  1  1  1  1   1    0
## 3 R_cwEOX3HgnMeVQHL      1  25      2  0  1  1  1  1  1  1  1  0   0    0
## 4 R_d59iPwL4W6BH8qx      1  33      5  1  1  1  1  1  1  1  1  1   1    0
## 5 R_1f3K2HrGzFGNelZ      1  24      1  1  1  0  1  1  1  1  1  1   1    1
## 6 R_3oN5ijzTfoMy4ca      1  22      2  1  1  0  0  1  1  1  1  0   1    0
##   havemore haveless pcmore
## 1       NA       50     50
## 2       NA       25     75
## 3       NA       10     90
## 4       NA       50     50
## 5       99       NA     99
## 6       NA       20     80

What are the names of the data columns (use names())?

names(study1.df)

##  [1] "id"       "gender"   "age"      "income"   "p1"       "p2"      
##  [7] "p3"       "p4"       "p5"       "p6"       "p7"       "p8"      
## [13] "p9"       "p10"      "task"     "havemore" "haveless" "pcmore"

What percent of participants were male? (Hint: create a logical index from the gender column, then use mean())

# males are coded as 1
mean(study1.df$gender == 1)

## [1] 0.6263158

What percent of participants were female?

mean(study1.df$gender == 2)

## [1] 0.3736842

What was the mean age?

mean(study1.df$age)

## [1] 31.71579

What was the standard deviation of ages?

sd(study1.df$age)

## [1] 9.123101

Re-order the study1.df dataframe by age (in increasing order). You’ll need to use the order() function to do this.

study1.df <- study1.df[order(study1.df$age),]

Create a new dataframe called study1.stimuli that only contain columns p1, p2, … p10 from study1.df. (Hint: your code should look something like this: study1.stimuli <- study1.df[…])

study1.stimuli <- study1.df[,5:14]

# OR

study1.stimuli <- study1.df[c("p1", "p2", "p3", "p3", "p4", "p5",
                              "p6", "p7", "p8", "p9", "p10"
                              )]

Using colMeans(), calculate the percentage of participants who indicated that the ‘typical’ participant would be willing to pay more than them for each item. Do your values match what the authors reported in Table 1?

colMeans(study1.stimuli)

##        p1        p2        p3      p3.1        p4        p5        p6 
## 0.6947368 0.8631579 0.7052632 0.7052632 0.8526316 0.7736842 0.8000000 
##        p7        p8        p9       p10 
## 0.8631579 0.9000000 0.6736842 0.7315789

Using rowMeans(), calculate for each participant, the percentage of the 10 items that the participant believed other people would spend more on. Save this data as a vector called pall

pall <- rowMeans(study1.stimuli)

Add the pall vector to the study1.df dataframe

study1.df$pall <- pall

Using aggregate() calculate the mean age of male and female participants separately. Which gender tends to be older?

aggregate(formula = age ~ gender,
          FUN = mean,
          data = study1.df
          )

##   gender      age
## 1      1 29.76471
## 2      2 34.98592

Using aggregate() calculate the mean age of participants for each level of income. What do the results tell you?

aggregate(formula = age ~ income,
          FUN = mean,
          data = study1.df
          )

##   income      age
## 1      1 29.44444
## 2      2 32.48889
## 3      3 30.40741
## 4      4 30.79310
## 5      5 32.42857
## 6      6 32.16667
## 7      7 39.28571
## 8      8 33.33333

Using aggregate() calculate the mean age of female participants only for each level of income.

aggregate(formula = age ~ income,
          FUN = mean,
          data = subset(study1.df, gender == 2)
          )

##   income      age
## 1      1 31.12500
## 2      2 36.35294
## 3      3 33.14286
## 4      4 36.75000
## 5      5 35.00000
## 6      6 34.00000
## 7      7 37.60000
## 8      8 38.50000

Using aggregate() calculate the mean age of participants separated by both gender and income.

aggregate(formula = age ~ income + gender,
          FUN = mean,
          data = study1.df
          )

##    income gender      age
## 1       1      1 28.73684
## 2       2      1 30.14286
## 3       3      1 29.45000
## 4       4      1 28.52381
## 5       5      1 31.00000
## 6       6      1 29.60000
## 7       7      1 43.50000
## 8       8      1 23.00000
## 9       1      2 31.12500
## 10      2      2 36.35294
## 11      3      2 33.14286
## 12      4      2 36.75000
## 13      5      2 35.00000
## 14      6      2 34.00000
## 15      7      2 37.60000
## 16      8      2 38.50000

The variable pcmore reflects the question: “What percent of people taking part in this survey do you think earn more than you do?”. Using aggregate(), calculate the mean value of this variable separately for each level of income. What does the result tell you?

aggregate(formula = pcmore ~ income,
          FUN = mean,
          data = study1.df
          )

##   income   pcmore
## 1      1 75.44444
## 2      2 69.64444
## 3      3 56.66667
## 4      4 62.72414
## 5      5 54.46429
## 6      6 47.83333
## 7      7 41.42857
## 8      8 33.33333

Load the dplyr library using the library() function.

library(dplyr)

Using dplyr, for each level of gender, calculate the following summary statistics: n (the number of participants), age.mean (mean age), age.sd (sd of age), income.mean (mean income), pcmore.mean (mean of pcmore), pall.mean (mean of pall). Save the summary statistics to an object called gender.summary

gender.summary <- study1.df %>%
  group_by(gender) %>%
  summarise(
    n = n(),
    age.mean = mean(age),
    age.sd = sd(age),
    income.mean = mean(income),
    pcmore.mean = mean(pcmore),
    pall.mean = mean(pall)
  )

gender.summary

## Source: local data frame [2 x 7]
## 
##   gender     n age.mean    age.sd income.mean pcmore.mean pall.mean
##    (int) (int)    (dbl)     (dbl)       (dbl)       (dbl)     (dbl)
## 1      1   119 29.76471  7.648757    3.285714    62.25210 0.7631780
## 2      2    71 34.98592 10.430029    3.943662    58.80282 0.8040973

Using dplyr For each level of income, calculate the following summary statistics: n (number of participants), age.mean (mean age), male.p (percent of men), female.p (percent of women), pcmore.mean (mean of pcmore), pall.mean (mean of pall). Save the summary statistics to an object called income.summary

income.summary <- study1.df %>%
  group_by(income) %>%
  summarise(
    n = n(),
    age.mean = mean(age),
    male.p = mean(gender == 1),
    female.p = mean(gender == 2),
    pcmore.mean = mean(pcmore),
    pall.mean = mean(pall)
  )

income.summary

## Source: local data frame [8 x 7]
## 
##   income     n age.mean    male.p  female.p pcmore.mean pall.mean
##    (int) (int)    (dbl)     (dbl)     (dbl)       (dbl)     (dbl)
## 1      1    27 29.44444 0.7037037 0.2962963    75.44444 0.8855219
## 2      2    45 32.48889 0.6222222 0.3777778    69.64444 0.8000000
## 3      3    27 30.40741 0.7407407 0.2592593    56.66667 0.7239057
## 4      4    29 30.79310 0.7241379 0.2758621    62.72414 0.7805643
## 5      5    28 32.42857 0.6428571 0.3571429    54.46429 0.7467532
## 6      6    24 32.16667 0.4166667 0.5833333    47.83333 0.6931818
## 7      7     7 39.28571 0.2857143 0.7142857    41.42857 0.8181818
## 8      8     3 33.33333 0.3333333 0.6666667    33.33333 0.8484848

Now repeat question 22, but only include participants older than 25. Save the summary statistics to an object called income.u25.summary

income.u25.summary <- study1.df %>%
  filter(age < 25) %>%
  group_by(income) %>%
  summarise(
    n = n(),
    age.mean = mean(age),
    male.p = mean(gender == 1),
    female.p = mean(gender == 2),
    pcmore.mean = mean(pcmore),
    pall.mean = mean(pall)
  )

income.u25.summary

## Source: local data frame [7 x 7]
## 
##   income     n age.mean    male.p  female.p pcmore.mean pall.mean
##    (int) (int)    (dbl)     (dbl)     (dbl)       (dbl)     (dbl)
## 1      1     9 22.88889 0.5555556 0.4444444    78.22222 0.9191919
## 2      2    12 22.66667 0.5000000 0.5000000    69.58333 0.7575758
## 3      3     6 22.16667 0.8333333 0.1666667    57.50000 0.6818182
## 4      4     4 21.25000 1.0000000 0.0000000    63.75000 0.7954545
## 5      5     5 22.20000 1.0000000 0.0000000    56.00000 0.7636364
## 6      6     3 21.33333 0.6666667 0.3333333    70.00000 0.6666667
## 7      8     1 23.00000 1.0000000 0.0000000     0.00000 0.7272727

Save study1.df, gender.summary, income.summary and income.u25.summary objects to a file called summary.RData in the data folder in your working directory

save(study1.df, gender.summary, income.summary, income.u25.summary,
     file = "data/summary.RData"
     )

Clear your workspace using the rm(list = ls()) command. Run the ls() command to make sure that your workspace is empty.

rm(list = ls())
ls()

## character(0)

Load summary.RData back into your workspace. Run the ls() command to make sure all the objects are back.

load("data/summary.RData")

WPA #4: Chapters 7 and 8 – Data management, Advanced dataframe manipulation

Basel Spring 2016

Why do we overestimate others’ willingness to pay?

Study 1