Sexual Economics Theory predicts that women want sex to be seen as rare and special, that it is worth a lot. Vohs et al. wanted to study whether this is true in real life. They think that this would be true if we could find correlation between being a women and seeing sex as a high value item, therefore they designed an experiment to find out whether or not women prefer sex to be associated with expensive items, whereas men do not. I am going to be carrying out the same experiment by using similar methods as the original paper, collecting data on Prolific, and then performing an ANOVA analysis to see if I obtain similar statistical results.
First of all, I found the topic really interesting while scrolling through the list of published studies online (before realizing it’s published in a volume that’s before the preferred years). Since I don’t have a particular area of research as an undergrad (I work at a wet lab, so reproducibility would be tough and the experiments are not very similar to the area we are looking at in class), I prioritized what I found most interesting, alongside what seemed reproducible. I found this topic really refreshing—I’ve never heard the Sexual Economics Theory and the logic behind it, and upon discovering it I wanted to test out the hypothesis. A lot of studies would be hard/impossible to reproduce with the given population that we have access to, but this one seems doable.
To reproduce this study, I will recruit a population (of 100) that includes people who identify as male or female. It’s a 2 gender 2 price study. We randomize people into two groups, one is the sexy ad group and the other is the non-sexy-ad (control) group.
For the sexy ad group, we will use a watch ad where the explicit sexual imagery tasking up majority of ad, with an image of the product in the bottom corner. We will show participants an ad with a cheap price tag ($10) for 20 seconds. We repeat the same procedure, but with a higher price tag ($1250). We will also add a neutral add (non-sexual for both groups) with a medium price ($500). The order in which the ads are shown is randomized. Before we show the participants the ads, we will induce a cognitive load to induce more spontaneous reactions. This means that we will give them 7 digits to remember before viewing the pictures, and then after viewing each image, we will ask them to recite back the digits. For the non-sexy ad group, we use the exact same procedures, but we will show them ads with neutral nature backgrounds taking up the majority of image instead of sexual content.
For both groups, after the participants view each of the images, they will be asked to rate the images on three different scales. The first one is on attitude towards the ad, and participants will be asked to rate it on a scale of 1-7, with 1= unlikeable/bad and 7 = likable/good. Next they will have their emotional reaction rated on two different scales. The first is positive affect (explained as feels happy, energized, in a good mood and interested) and the second negative affect (feels upset, disgusted, unpleasantly surprised, and angry). They are both rated out of 7, 1=none 7=very much so. After data collection, we will be using the attitude ratings to carry out ANOVA F tests to see if the variance is significant.
In terms of challenges, I think that the content, having to do with a sexual nature, might make participants feel uncomfortable, so I can foresee having trouble recruiting for the research, or having the side effect of having the participants come out feeling uncomfortable. This study also uses ANOVA test to analyze the data, and I don’t have experience with that, so I can also foresee that as a learning curve for me.
Project repository (on Github): https://github.com/psych251/Vohs2013.git
Original paper (as hosted in your repo): https://journals.sagepub.com/doi/full/10.1177/0956797613502732
(View procedure section above for more experimental details.)
I will need to create a survey on qualtrics that is able to collect the desired data. The following sections will be included in the survey: (a) Consent/general question about gender & age (b) Introducing them to the cognitive load (ask them to remember 7 digits) (c) Asking them to view the three images, each for 20s and rate them with three (1-7) scales after each viewing. The scales are: 1) how did you like the ad 2) to what extent did it make you feel good 3) to what extent did it make you feel bad. The participants will be randomly assigned a condition, and the sexual condition sees watch sexual ads with different prices, whereas non-sexual condition sees nature background ads with different prices. The order in which the images are viewed is randomized.
I will connect that survey to prolific, and set up profilic correctly to make the data collection process smooth. I will first run pilot B with 20 participants. Then I will adjust as needed and recruit 100 participants for the final trial.
After I’ve collected the desired data, I will carry out ANOVA data analysis, obtaining F-test p values to see if those are significant.
First of all, the orginal study carries this study out twice, and I will only be doing the study once. Secondly, I do not have the exact images he authors used for their experiments (I only have two examples, not the whole set). I will be generating images based on examples and believe that they will be extremely similar and will generate the same effects on participants. Thirdly, I’ll be carrying this experiment out on Prolific instead of in person, meaning certain details are likely different, for example having participants remember a 7 digit number instead of 10 digits (to encourage actual memorization online while still incuding a load), and also recruiting not only college students, but any participant who is willing to join.
Firstly, I will try to reach out to the author of the paper and confirm that my procedures are similar to those used in the original experiment. At the moment, it’s a little hard to tell whether or not I’m doing a good job of exactly replicating the steps as some steps were unclear (for example the aurthor mentioned showing participants 3 images without specifying what the third image is).
A general aspect of success for myself would be to see if I am able to make similar graphs as the original study/carry out the same statistical tests as the original study by working with the raw data.
On top of that, we could look the numbers and compare the values. The author has some different means and average scores listed for certain ratings, and I can try to compare those to the average I’ve obtained. If they are wildly different, it could suggest that the procedure might have differed greatly from the original study and that we’ve changed it in a way where we’re not obtaining the same data anymore. It could also be that the populations just happen to be different and maybe times have changed since this study was carried out, but definitely something to look out for regardless.
Finally, of course, we could see if the results we obtain are similar to the original study. Of course, different results don’t necessarily mean the replication was not carried out properly, it may be that the replication is carried out perfectly and that in fact this time the participants proved that the theory may be incorrect. However, I do think that looking at these numbers could give us a sense of the magnitude of how different the original is from the replication and think about what that may mean.
Progress 1: So far, I’ve made my qualtrics survey and connected it to Prolific (unactivated on prolific right now). Here’s the link: https://stanforduniversity.qualtrics.com/jfe/form/SV_8CxHBBRflzqb1v8 I had many different people test the survey (using the Qualtrics link) and has since improved the survey many times. My first version included the exact same images for both pricing, and I think people easily assumed they were the same ads. I think people were also shocked to see sexual images because I did not warn them in advance. In the new version, I made sure to warn people about the potentially strong adult content. To deal with people thinking they are seeing the same ads, I added very slight background variations to keep it interesting and added one more ad as a buffer. I also realized the way I set up the survey made it hard to tell who got which version of the survey, so I made sure I made that clear in the new version.
Below, I have also coded up ways to clean up the expected data and how to obtain the numbers we want.
I will try to reach out to the author this week to see if I could obtain more details on how exactly to run the experiment as there were some procedures that are unclear in the paper and we don’t have the exact images from the experiment.
Update for Nov 13: Tried contacting author, no luck. Edited 10 digit number to 7 digits, changed consent part to warn for sexual content, and played around with more data from friends.
Update for Nov 20: Just launched the pilot study on prolific, will play around with more data analysis and adjust things as needed as the results come in.
Update for Nov 25: Got a chance to plug in the data from Prolific pilot study. So far, we are not seeing the results the author originally got. Some observations so far: 1) Overall, we can see that female ratings are overall higher than male ratings.In fact, the Anova F test has shown that this is indeed significant. 2) We can see that seemingly price impacts ratings too. The rating means increase with price.The difference doesn’t seem to be significant, however if we look at the means there is a difference.
The data I will obtain are a series of ratings of the images by different participants.
I will isolate the attitude column with ratings from women who rated sexual, cheap images and compare the statistics with the attitude column of women who rated sexual, expensive images. We will then do the same analysis on men for the same categories. We then take data column with women’s rating of sexual cheap images and compare that with the same category in men. Then we repeat this step with sexual expensive images.
In order for the hypothesis to be supported, we could expect to find that men’s attitude towards the different images are not different. On top of that, we could expect the expensive sexual image would be more highly rated than the cheap sexual images for women.
We are only taking results from those who identify as either male or female. While I would like to include other options, the original study assumed gender is binary and if any participants did not identify as either male or female, their data would be hard to incorporate.
Since the original study did not mention other exclusions, we will assume that all participants otherwise are welcomed and should be included.
One difference that was mentioned earlier was that since we are carrying out the study on Prolific, we won’t be filtering age. The original study did use undergraduate students from a university, so my replication is more inclusive in terms of age/education.
### Data Preparation
#### Load Relevant Libraries and Functions
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ stringr 1.4.1
## ✔ tidyr 1.2.1 ✔ forcats 0.5.2
## ✔ readr 2.1.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
#### Import data
results <- read.csv("/Users/yishuchen/Downloads/Vohs2013_FinalData.csv", header=T, na.strings=c("","NA"))
#### Prepare data for analysis - create columns etc.
# Isolating relevant columns
filter <- as.vector(c("Q3","Q13_1","Q49_1","Q22_1", "Q52_1", "Q55_1"))
filtered_results <- results[, filter]
fr <- mutate(filtered_results, Control = is.na(Q13_1))
fr[is.na(fr)] <- ""
fr$Q13_1 <- paste(fr$Q13_1, fr$Q52_1, sep = '')
fr$Q22_1 <- paste(fr$Q22_1, fr$Q55_1, sep = '')
fr <- subset(fr, select = -c(Q52_1, Q55_1) )
#Renaming columns to more intuitive names
names(fr)[names(fr) == 'Q13_1'] <- 'Expensive'
names(fr)[names(fr) == 'Q22_1'] <- 'Cheap'
names(fr)[names(fr) == 'Q49_1'] <- 'Neutral'
names(fr)[names(fr) == 'Q3'] <- 'Gender'
fr$Neutral <- as.character(fr$Neutral)
#Convert to long form
fr_new <- pivot_longer(fr, cols=c('Expensive','Neutral','Cheap'), names_to = "Price", values_to = "Ratings")
fr_final <- fr_new[fr_new$Price != 'Neutral',]
fr_final$Ratings <- as.numeric(as.character(fr_final$Ratings))
fr_final$Gender <- as.factor(fr_final$Gender)
fr_final$Control <- as.factor(fr_final$Control)
fr_final$Price <- as.factor(fr_final$Price)
#Mean Table
mean_table <- aggregate(fr_final$Ratings, list(fr_final$Gender, fr_final$Control, fr_final$Price), mean)
names(mean_table)[names(mean_table) == 'Group.1'] <- 'Gender'
names(mean_table)[names(mean_table) == 'Group.2'] <- 'Control'
names(mean_table)[names(mean_table) == 'Group.3'] <- 'Price'
names(mean_table)[names(mean_table) == 'x'] <- 'Mean_Ratings'
print(mean_table)
## Gender Control Price Mean_Ratings
## 1 Female FALSE Cheap 3.736842
## 2 Male FALSE Cheap 3.290323
## 3 Female TRUE Cheap 3.892857
## 4 Male TRUE Cheap 4.818182
## 5 Female FALSE Expensive 4.157895
## 6 Male FALSE Expensive 4.225806
## 7 Female TRUE Expensive 3.928571
## 8 Male TRUE Expensive 5.227273
#Plot
ggplot(mean_table, aes(fill=interaction(Price, Control), y=Mean_Ratings, x=Gender)) + geom_bar(position='dodge', stat='identity') + ggtitle("Attitude Rating to Ads")
As seen in plot above, differences in attitude rating for difference
prices and groups in female participants are not very clear. In men, we
see higher ratings for expensive ads and control ads.
#Overall
#Control sigificant
#Price & Gender not significant
overall0 = aov(Ratings ~ Price + Gender + Control, data = fr_final)
summary(overall0)
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 11.0 11.045 3.935 0.04869 *
## Gender 1 6.4 6.367 2.268 0.13365
## Control 1 20.8 20.796 7.409 0.00707 **
## Residuals 196 550.1 2.807
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Overall, we see that the control does significantly contribute to rating difference. Specifically, control ratings are overall higher than sexy ratings. Price has a significant effect too, with higher prices having higher ratings.
#Female
#Price & Control not significant
female <- fr_final %>%
filter(Gender == 'Female')
overall1 = aov(Ratings ~ Price + Control, data = female)
summary(overall1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 0.86 0.8617 0.291 0.591
## Control 1 0.03 0.0304 0.010 0.920
## Residuals 91 269.59 2.9625
We do not see significance for price or control in female attitude scores.
#Male
#Price & Control significant
male <- fr_final %>%
filter(Gender == 'Male')
overall2 = aov(Ratings ~ Price + Control, data = male)
summary(overall2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 13.62 13.62 5.465 0.0213 *
## Control 1 41.16 41.16 16.514 9.45e-05 ***
## Residuals 103 256.73 2.49
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For male attitude ratings, we see that price and control are both significant. Men rated more expensive ads higher and control ads higher.
male_c <- male %>%
filter(Control = TRUE)
overall3 = aov(Ratings ~ Price, data = male_c)
summary(overall3)
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 13.62 13.623 4.756 0.0314 *
## Residuals 104 297.89 2.864
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For male control attitude ratings, we see that price is significant (expensive is rated higher).
sexy <- fr_final %>%
filter(Control == FALSE)
overall4 = aov(Ratings ~ Price + Gender, data = sexy)
summary(overall4)
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 13.69 13.690 4.963 0.0282 *
## Gender 1 0.84 0.844 0.306 0.5814
## Residuals 97 267.58 2.759
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the sexy group, price is significant. Expensive ads were rated more highly.
#Sexy
#Price significant
#Sexy male price significant
sexy_m <- sexy %>%
filter(Gender == 'Male')
overall5 = aov(Ratings ~ Price, data = sexy_m)
summary(overall5)
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 13.56 13.56 5.157 0.0267 *
## Residuals 60 157.81 2.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Males in the sexy group rated expensive images more highly.
sexy_f <- sexy %>%
filter(Gender == 'Female')
overall6 = aov(Ratings ~ Price, data = sexy_f)
summary(overall6)
## Df Sum Sq Mean Sq F value Pr(>F)
## Price 1 1.68 1.684 0.56 0.459
## Residuals 36 108.21 3.006
Females in the sexy group rated cheap and expensive images similarly.
#Mean Table with Neutral
fr_new$Ratings <- as.numeric(as.character(fr_new$Ratings))
mean_table_N <- aggregate(fr_new$Ratings, list(fr_new$Gender, fr_new$Control, fr_new$Price), mean)
To explore the difference in ratings between gender and control, I’d like to compare the neutral mean ratings, which may not be something the original study mentions or even tested. Sexy group mean is around 4.8, control group mean is around 4.75, which also rounds to 4.8. Male mean is around 5.1, Female mean is around 4.45, so women in generally rated images lower.
names(results)[names(results) == 'Q41'] <- 'Age'
results <- results[results$Age <= 22, ]
#### Prepare data for analysis - create columns etc.
# Isolating relevant columns
filter <- as.vector(c("Q3","Q13_1","Q49_1","Q22_1", "Q52_1", "Q55_1"))
filtered_results <- results[, filter]
fr <- mutate(filtered_results, Control = is.na(Q13_1))
fr[is.na(fr)] <- ""
fr$Q13_1 <- paste(fr$Q13_1, fr$Q52_1, sep = '')
fr$Q22_1 <- paste(fr$Q22_1, fr$Q55_1, sep = '')
fr <- subset(fr, select = -c(Q52_1, Q55_1) )
#Renaming columns to more intuitive names
names(fr)[names(fr) == 'Q13_1'] <- 'Expensive'
names(fr)[names(fr) == 'Q22_1'] <- 'Cheap'
names(fr)[names(fr) == 'Q49_1'] <- 'Neutral'
names(fr)[names(fr) == 'Q3'] <- 'Gender'
fr$Neutral <- as.character(fr$Neutral)
#Convert to long form
fr_new <- pivot_longer(fr, cols=c('Expensive','Neutral','Cheap'), names_to = "Price", values_to = "Ratings")
fr_final <- fr_new[fr_new$Price != 'Neutral',]
fr_final$Ratings <- as.numeric(as.character(fr_final$Ratings))
fr_final$Gender <- as.factor(fr_final$Gender)
fr_final$Control <- as.factor(fr_final$Control)
fr_final$Price <- as.factor(fr_final$Price)
#Mean Table
mean_table <- aggregate(fr_final$Ratings, list(fr_final$Gender, fr_final$Control, fr_final$Price), mean)
names(mean_table)[names(mean_table) == 'Group.1'] <- 'Gender'
names(mean_table)[names(mean_table) == 'Group.2'] <- 'Control'
names(mean_table)[names(mean_table) == 'Group.3'] <- 'Price'
names(mean_table)[names(mean_table) == 'x'] <- 'Ratings'
#Plot
ggplot(mean_table, aes(fill=interaction(Price, Control), y=Ratings, x=Gender)) + geom_bar(position='dodge', stat='identity') + ggtitle("Attitude Rating to Ads")
As seen when we filter the age to college age (18-22), we see a clearer
overall trend. Both women and men preferred more expensive ads and both
also liked the non-sexual ads more.
Overall control ads were rated more highly than sexy ads (F(1, 100) = 7.409, p = 0.00707, p < 0.05), which could suggest that overall, people do find sexual ads slightly more distasteful. Influence of price was also significant (F(1, 100) = 3.935, p = 0.04869, p < 0.05) and influence of gender was not significant (F(1, 100) = 2.268, p = 0.13365).
The original study concluded that the difference in variance between expensive and cheap ads for sexy group women ratings was significant. However, as seen in both the graph and through the statistical analysis in this study, the difference in price is not significant in influencing ratings for sexy group women (F(1, 100) < 1, p = 0.459, p > 0.05). This means that contrary to the SET and the original study results, women rated sexy ads with high and low prices relatively equally.
On the other hand, in the original study and according to the SET, we shouldn’t see a difference in attitude rating between different prices for men. However, in my replication, men preferred the expensive ads for both the control (F(1, 100) = 4.756, p = 0.0314, p < 0.05) and sexy (F(1, 100) = 5.157, p = 0.0267, p < 0.05) condition. The p value for both are significant, which may suggest that men simply prefer more expensive objects/ads that promoted more expensive products. Men also rated control ads significantly higher than sexy ads (p<<0.05).
As seen in results above, I was not able to reproduce the results the aurthor originally produced. These results raise questions about several different concepts:
First of all, it’s been almost a decade since this study was carried out. I wonder if women have become more sexually liberated since the time the original study was carried out. Change in attitude toward sex might change how people view the relationship between sex and value and influence the outcome of this study and explain the discrepancies we encountered.
Secondly, I wonder if people simply preferred certain background images over others, introducing confounding factors. Since the original study was extremely unclear about the images used (especially in terms of whether different price levels used the same image), and I had no luck contacting the original author, I used similar but different images for different price levels for both conditions (I got the initial feedback that people couldn’t tell the difference between ads and began to simply look for the differences). I wonder if certain background images are simply more aesthetically pleasing to people, causing the rating to be higher. This means that people may have rated certain images higher/lower for different reasons than the sex/price aspect.
Thirdly, I wonder if this experiment even accurately represents the concept being tested. From the feedback I got from those who took the survey, some didn’t even notice the price difference (I used the same font sized as the example ad), so I wonder if expense even factored into people’s decisions in deciding whether or not they liked the ad. Perhaps we need a better experimental design to truly be able to present sex and value at the same time without being too obvious.
Of course, another factor that could have influenced the results is simply the fact that participants on Prolific didn’t care enough to actually pay enough attention to answer the questions truthfully. This study doesn’t use attention checks, so it’s hard to tell if the participants followed instructions and gave thoughtful answers.
The reproduction is successful in terms of being able to recreate a similar study with data and results from the raw data. It is not as successful in the sense that I obtained very different results from the original study. As described above, there are many factors that may have contributed to this, including a shift in values, method differences in replication, and study efficacy etc. I believe that for the future, it would be worth reproducing a study where the original author is able to provide more insight on the materials used and analysis methods so that we could better understand what went well and what went wrong, and if there are any discrepancies, what might have led to them. I realized that without the details of the original paradigm, it is quite hard to gauge how good of a job the replication was.