Background

Purpose

We recently asked participants to write response emails to a hypothetical landlord who wants to increase their rent. Then, we piped these responses to another set of participants and asked them to rate these emails on a number of dimensions.

Original scenario

Participants were given the following scenario:

1. You’ve been living in this apartment for almost a year at $2,100 a month.
2. You like the apartment you’re in and would generally want to stay, but not at any price.
3. You looked into some online rental boards and even went to a few open houses, but you were not happy with what you saw.

Imagine that you’re in this situation and you receive this email from the landlord:

Dear tenant,

Your annual lease is coming to an end in two months. As preparation for the renewal of the lease, we want to reach out and ask if you would like to stay or move out. If you were to stay, the rent will increase from 2,100 Dollars to 2,450 Dollars a month.

Would you like to renew your lease?

Please let us know as soon as possible!

Sincerely,
Management

Original manipulation

Before answering the email, you reach out to a friend of yours who recently negotiated their rent down. This is what they said:

My main piece of advice for you with your management company is this: Try to understand your [opponent/partner]’s motives going into this negotiation. Is your [opponent/partner] trying to play hardball with you, like bullying or intimidating you? Or is your [opponent/partner] being reasonable, just trying to solve the problems that they’re facing? Your answer can shape how you respond to your [opponent/partner] and which moves might be effective in achieving your goal.

Raters

Raters saw the following:

We have previously asked a set of participants on Connect to respond to a hypothetical email from a landlord who is raising their rent. In the following screens, you will be asked to:

Read background information about the scenario that our participants were in.
Read the email that the landlord wrote to our participants in this fictitious scenario.
Read ten randomly-selected response emails (no more than a paragraph each) that participants wrote.
And, finally, indicate your perceptions about each of the response emails.

They were then told what the original participants did and were asked two attention checks. One of the attention checks was too hard, so I ended up excluding only those who failed the other other.

Inter-rater reliability

First, let’s examine inter-rated reliability.

Because it’s continuous data and the number of raters per email varies quite a lot (some have 1 and some have 6), we’ll get an intraclass correlation coefficient (see here: https://stats.stackexchange.com/questions/263217/can-you-run-intraclass-correlations-with-different-raters-and-different-numbers).

Leaving the code visible below in case we want to check the parameters of the ICC.

comp <- df_ratings %>% 
  filter(passcheck_1 == 1) %>% 
  dplyr::select(PID_og,PID,comp) %>% 
  rename(subject = PID_og,
         rater = PID)

icc_2A_A_1 <- 
  dim_icc(
    comp, 
    model = "2A", 
    type = "consistency", 
    unit = "single",
    object = subject, 
    rater = rater, 
    score = comp, 
    bootstrap = 2000
  )

Inter-rater ICC: 0.146831. hmm, that’s not great. But I guess that’s understandable, given that the emails were so subjective and hard to interpret. We might need to get more people to rate each one and just take the mean, acknowledging that we can’t avoid the variance.

Analysis

Perceived competitiveness

How cooperative or competitive do you think this participant’s response is? (1 = Extremely Cooperative to 7 = Extremely Competetive)

label mean sd
opponent 4.167663 1.398019
partner 4.029853 1.344436


t-test: t(187.32) = 0.69, p = .490, d = 0.10

ok, not much here. But that’s a general question, pretty hard to answer, so there’s probably just a lot of noise. We’d need a mich bigger sample to detect an effect if there is one.

Perceptions of the email

To what extent do you believe that this participant used the following method in their response email? (1 = Not at All to 5 = Extremely)

raters_deceit: They Misled the management company about how they feel about the apartment and their other options
raters_demeanor: They used blunt or tough language toward the management company
raters_terms: They suggested or threatened that they might walk away and find an apartment somewhere else

item label mean sd
raters_deceit opponent 1.466354 0.6013026
raters_deceit partner 1.426075 0.6836900
raters_demeanor opponent 2.414750 0.9399230
raters_demeanor partner 2.342989 0.9506758
raters_terms opponent 2.727018 0.9857057
raters_terms partner 2.586748 1.2109822


raters_deceit: t(185.88) = 0.43, p = .667, d = 0.06
raters_demeanor: t(187.98) = 0.52, p = .601, d = 0.08
raters_terms: t(181.96) = 0.88, p = .382, d = 0.13

No effect. But, I mean, they’re all in the hypothesized direction. That’s encouraging. There’s just so much noise in these open responses so the effects are not easily detectable. I still find these encouraging, fwiw.

Correlation matrix

Let’s see how correlated are the self-reports, to chat-gpt coding, to human coders. As well as the competitiveness raters item.

pretask_deceit: [SELF-REPORT] Misleading my [opponent/partner] about things like my readiness to pursue alternative options
pretask_demeanor: [SELF-REPORT] Using tough or aggressive language toward my [opponent/partner]
pretask_terms: [SELF-REPORT] Standing my ground on the current rent and resisting concessions that would increase it
gpt_deceit: [GPT-CODING] Please rate the extent to which my response is misleading my counterpart about I feel about the apartment and my other options on a 1 to 5 scale where 1 means it’s not misleading at all and 5 means it’s extremely misleading
gpt_demeanor: [GPT-CODING] Please rate the extent to which my response uses blunt or tough language toward my counterpart on a 1 to 5 scale where 1 means it’s not at all blunt or tough and 5 means it’s extremely blunt or tough
gpt_terms: [GPT-CODING] Please rate the extent to which my response suggests or threatens that I might walk away and find an apartment elsewhere on a 1 to 5 scale where 1 means it’s not at all suggestive or threatening that I’ll walk away and 5 means it’s extremely suggestive or threatening that I’ll walk away
raters_competitive: [HUMAN-CODING] How cooperative or competitive do you think this participant’s response is?
raters_deceit: [HUMAN-CODING] They Misled the management company about how they feel about the apartment and their other options
raters_demeanor: [HUMAN-CODING] They used blunt or tough language toward the management company
raters_terms: [HUMAN-CODING] They suggested or threatened that they might walk away and find an apartment somewhere else

Very interesting. Looks like terms is measured pretty consistently across these methods. Demeanor and deceit not so much.