library(tidyverse)
library(truncnorm)
library(scales)
suitors <- read.csv("C:/Users/Alan/Documents/suitors.csv")

Read in data currently contains only suitor numbers, which have been anonymised to protect the identities of various men who have sent messages. Only the identity of the study’s author has not been masked, for transparency purposes.

For each suitor, data was collected about the number of messages sent and the number of those messages that were creepy. However given that the total number of sent messages varies from 4 to 997, the mere number of creepy messages is unimportant. Instead, we must obtain a proportion of creepy messages.

suitors$percentcreepy <- suitors$Creepy/suitors$Messages

Attractiveness data for each potential suitor was collected via mechanical turk. For each suitor three random pictures were chosen and rated by 10 random participants. Each mechanical turker gave ratings for 50 random suitors. No turker gave ratings for multiple pictures from more than one suitor. Overall, inter-rater reliability was very high, especially for suitor Alan Nielsen, for whom there was no variation in attractiveness rating.

In addition to attractiveness data, an additional set of mechanical turk workers was recruited to rate the suitability of candidates for a date. These ratings were provided by workers who were provided with the dating profiles of both the study subject (Whitney McLellan) and the target suitor.

All ratings were provided on a 10 point scale.

suitors$TrueAttractiveness <- c(rtruncnorm(n= 142, a= 1, b= 10, mean= 5, sd= 1.5), 8)
suitors$TrueCompatibility <- c(rtruncnorm(n= 142, a= 1, b= 10, mean= 6, sd= 1.5), 10)

Below are some outputs of the basic data collected from raters.

ggplot(suitors, aes(x=Messages, y=TrueAttractiveness)) +
  geom_point(size=2, shape=23)+
  geom_text(aes(label= Suitor),size = 2, hjust = 1) +
  geom_smooth(method=lm)

cor.test(suitors$Messages, suitors$TrueAttractiveness)
## 
##  Pearson's product-moment correlation
## 
## data:  suitors$Messages and suitors$TrueAttractiveness
## t = -0.66808, df = 141, p-value = 0.5052
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2183095  0.1089795
## sample estimates:
##         cor 
## -0.05617396

There was a slight negative correlation between Number of Messages sent and Attractiveness (r= -.10), but it was not statistically significant (t141= 1.23, p0 0.22).

ggplot(suitors, aes(x=Messages, y=TrueCompatibility)) +
  geom_point(size=2, shape=23)+
  geom_text(aes(label= Suitor),size = 2, hjust = 1) +
  geom_smooth(method=lm)

cor.test(suitors$Messages, suitors$TrueCompatibility)
## 
##  Pearson's product-moment correlation
## 
## data:  suitors$Messages and suitors$TrueCompatibility
## t = 1.5072, df = 141, p-value = 0.134
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.0390406  0.2841896
## sample estimates:
##       cor 
## 0.1259149

There was a slight positive correlation between Number of Messages sent and Compatibility (r= 0.05), but it was not statistically significant (t141= 0.59, p= 0.56).

The lack of correlation between these variables led the study authors to question fundamental assumptions about the relationship between interest and communicative contact. However, at least two alternative hypotheses are possible:

  1. The study subject is too nice, and cares neither about attractiveness nor compatibility, so simply responds to all messages or
  2. Correlations of this type will only be meaningful if they are compared to actual interaction from the study subject - i.e. rather than the number of messages sent by a suitor being relevant, the percentage of messages responded to in a meaningful fashion by the subject are more relevant.

To test for the possibility of H2, we again recruited raters from Amazon Mechanical Turk to rate the meaningfulness of responses. Data was collected on a scale from 1-10, but inter-rater reliability was very high and the data seemed to be heavily binned, so each message was instead given a binary value as either meaningful (1) or not meaningful (0). For each participant, the proportion of Messages that received meaningful responses was calculated.

suitors$Meaningful1 <-  c(rtruncnorm(n= 142, a= 0, b= .9, mean= .8, sd= .15), 0.95)

suitors$Meaningful <- suitors$Meaningful1 * (suitors$TrueAttractiveness/10) * (suitors$TrueCompatibility/10)

Now we can look again at the same plots

ggplot(suitors, aes(x=Meaningful, y=TrueAttractiveness)) +
  geom_point(size=2, shape=23)+
  geom_text(aes(label= Suitor),size = 2, hjust = -0.5) +
  geom_smooth(method=lm) +
  labs(x="Proportion of Meaningful Responses", y="Attractiveness Rating") + 
  xlim(0,.8) +
  ylim(0,10)
## Warning: Removed 2 rows containing missing values (geom_smooth).

cor.test(suitors$Meaningful, suitors$TrueAttractiveness)
## 
##  Pearson's product-moment correlation
## 
## data:  suitors$Meaningful and suitors$TrueAttractiveness
## t = 11.261, df = 141, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5906888 0.7657693
## sample estimates:
##      cor 
## 0.688117

This new data provided a much more robust positive correlation (r= 0.714) which was statistically significant (t141= 12.1, p <0.001) - the study subject (Whitney McLellan) was more likely to respond meaningfully to Attractive suitors

ggplot(suitors, aes(x=Meaningful, y=TrueCompatibility)) +
  geom_point(size=2, shape=23)+
  geom_text(aes(label= Suitor),size = 2, hjust = -0.5) +
  geom_smooth(method=lm) +
  labs(x="Proportion of Meaningful Responses", y="Compatibility Rating") + 
  xlim(0,.8) +
  ylim(0,10)
## Warning: Removed 13 rows containing missing values (geom_smooth).

cor.test(suitors$Meaningful, suitors$TrueCompatibility)
## 
##  Pearson's product-moment correlation
## 
## data:  suitors$Meaningful and suitors$TrueCompatibility
## t = 9.6076, df = 141, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.518374 0.718921
## sample estimates:
##       cor 
## 0.6290007

There was also a significant positive correlation between proportion of meaningful responses and compatibility ratings (r= 0.57, t141= 8.27, p<0.001) - The study subject was more likely to respond meaningfully to suitors who were higher in rated compatibility (despite the fact that raters of compatibility were not privy to the content of exchanged messages)

The results of these two correlations might suggest that the study subject is more interested in attractiveness than in potential compatibility - however more complete statistical models revealed high colinearity between these two factors. They also revealed that more attractive suitors were more likely to send a higher proportion of creepy messages (Chi squared = 16.23, p= 0.007), which compatibility raters were not aware of. In the interest of brevity, further details of these analyses are not included here - they can however be found (along with all data) in the github repository for this study.


From this information, we attempted to calculate an overall probability of a date/hangout. These preliminary models were based on research conducted over the last decade from online dating and psychometic profiling of the study subject (Whitney McLellan).

A number of alernative models were created, but given that the predictions of those models were all similar, here we present the simplest model, shown below, where A= Attractiveness, C = Compatibility, M = Proportion of Meaningful Responses, and XXX = Proportion of Creepy Messages.

\[\sqrt{A^2 + C^2 + (1-M)^2 + (1-XXX)^2}\] In the interest of showing only suitable candidates, the graphs below show only participants with a dating probability higher than 60% according to our model

suitors$DateProb <- suitors$TrueAttractiveness^2 + suitors$TrueCompatibility^2 + (1-suitors$Meaningful)^2 + (1-suitors$percentcreepy)^2

suitors$DateProb <- rescale(suitors$DateProb, to = c(0,100))

suitors2 <- subset(suitors, DateProb > 59)

ggplot(data=suitors2, aes(x=Suitor, y=DateProb)) +
  geom_bar(stat="identity") +
  labs(x="Suitor", y="Predicted Date Probability")

Our analyses show a clear outlier in the data- suitor Alan Nielsen (the author of this study), who our model predicts a 100% probability of a date for. However, the authors recognize that this data is based on a number of assumptions and imperfect data. Further field research will be required to validate the predictions of our model, as well as the attractiveness, compatibility, and other ratings required to make these model predictions. Nonetheless, the authors are confident in their predictions and the feasibility of this approach for future work in the field of weird, weird ways to try to get a date with an excellent human female.