Are Happy People Helpful People?

Author

Annet Isa

Source: NY Times

Are Happy People Helpful People?

Introduction

This project aims to answer if a person in a good is mood more likely to be helpful. A 1975 study sought to see if a positive mood led to increased optimism and in an increase in helpfulness. The study leads, Dr. Paula F Levin and Dr. Alice M. Isen, designed a study to see how often a stranger would mail a sealed envelope found in a phone booth. Sometimes, the envelope was stamped. Sometimes, the subject also found a dime in the coin return slot of the telephone. At the time of the study, a first class stamp was $0.08. $0.10 would cover the price of a phone call.

The dataset was found on openintro.org and is titled mail_me (1). Additional background information came from the 1975 article in Socimetry where Levin and Isen discuss their study (2). As Levin and Isen state in the 1975 article that “there were no differences between the sexes on this measure”, I am not including gender in my analyses (4).

Data Analysis

The existing dataframe consists of 4 factor columns. For further analysis, I will duplicate the data frame and convert the coin, mailed_letter, and stamped columns to numeric. I will also explore the summary statistics for the dataset and create a simple visualization of the data.

#install relevant libraries
library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#import csv file
mail_me <- read_csv("C:/Users/bombshellnoir/Dropbox (Personal)/00000 Montgomery College/DATA 101/Projects/FINAL PROJECT/mail_me.csv")
Rows: 42 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): stamped, found_coin, gender, mailed_letter

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#duplicate data set, convert observations to numeric
num_mail_me <- mail_me %>%
  mutate(
    mailed_letter = case_match(
      mailed_letter,
      "no" ~ 0,
      "yes" ~ 1),
    found_coin = case_match(
      found_coin,
      "no_coin" ~ 0,
      "coin" ~ 1),
    stamped = case_match(
      stamped,
      "no" ~ 0,
      "yes" ~ 1)
)
head(num_mail_me)
# A tibble: 6 × 4
  stamped found_coin gender mailed_letter
    <dbl>      <dbl> <chr>          <dbl>
1       0          1 male               1
2       0          1 male               1
3       0          1 male               1
4       0          1 male               1
5       0          1 male               0
6       0          1 female             1
#perform summary statistics
summary(num_mail_me)
    stamped         found_coin        gender          mailed_letter   
 Min.   :0.0000   Min.   :0.0000   Length:42          Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   Class :character   1st Qu.:0.0000  
 Median :1.0000   Median :0.0000   Mode  :character   Median :1.0000  
 Mean   :0.5714   Mean   :0.4524                      Mean   :0.5238  
 3rd Qu.:1.0000   3rd Qu.:1.0000                      3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000                      Max.   :1.0000  

The summary suggests that though a coin was found less than half the time (mean = 0.4524), the envelope was mailed more than half the time (mean = 0.5238).

# how many letters were stamped?
table(mail_me$stamped)

 no yes 
 18  24 
# how many letters were mailed?
table(mail_me$mailed_letter)

 no yes 
 20  22 
# how many found coins?
table(mail_me$found_coin)

   coin no_coin 
     19      23 

For the simple visualization, I will create grouped bar charts. For consistency, I have to recode the observations for the coin variable to match stamped and mailed_letter.

mail_me_2 <- mail_me %>%
  mutate(
    found_coin = case_match(
      found_coin,
      "no_coin" ~ "no",
      "coin" ~ "yes")
  ) %>%
  select(stamped, found_coin, mailed_letter)

I also need to convert the data set to long format.

long_mail_me_2 <- mail_me_2 %>%
  pivot_longer(cols = everything(), names_to = "variable", values_to = "response")

bar1 <- ggplot(long_mail_me_2, aes(x = variable, fill = response)) +
  geom_bar(position = "dodge") +
  labs(title = "Response Frequency For Coin Study",
       x = "Variable",
       y = "Count",
       caption = "Source: OpenIntro",
       fill = "Response")

bar1

Statistical Analysis

My null hypothesis is that a stamped letter without a dime was as likely to be mailed as an unstamped letter found with a dime. My alternative hypothesis is that a stamped letter w/o a dime is more likely to be mailed than an unstamped letter with a dime.

PS - proportion of yes-stamp, no-dime letters mailed
PD - proportion of no-stamp, yes-dime letters mailed

HO: PS = PD
HA: PS > PD

alpha/significance level = 0.05

count_yes_stamps_no_coin <- mail_me_2 %>%
  filter(stamped == "yes", found_coin =="no") %>%
  summarise(count = n())

count_yes_stamps_no_coin
# A tibble: 1 × 1
  count
  <int>
1    13
total_yes_stamps_no_coin <- 13

mailed_count_yes_stamps_no_coin <- mail_me_2 %>%
  filter(stamped == "yes", found_coin =="no", mailed_letter == "yes") %>%
  summarise(count = n())

mailed_count_yes_stamps_no_coin
# A tibble: 1 × 1
  count
  <int>
1     4
tm_yes_stamps_no_coin <- 4

count_no_stamps_yes_coin <- mail_me_2 %>%
  filter(stamped == "no", found_coin =="yes") %>%
  summarise(count = n())

count_no_stamps_yes_coin
# A tibble: 1 × 1
  count
  <int>
1     8
total_no_stamps_yes_coin <- 8

mailed_count_no_stamps_yes_coin <- mail_me_2 %>%
  filter(stamped == "no", found_coin =="yes", mailed_letter == "yes") %>%
  summarise(count = n())

mailed_count_no_stamps_yes_coin
# A tibble: 1 × 1
  count
  <int>
1     7

Prop.Test

tm__no_stamps_yes_coin <- 7

#correct = FALSE disables continuity correction which can be a concern which such a small sample size
results <- prop.test(
  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
  c(total_yes_stamps_no_coin, total_no_stamps_yes_coin), alternative = "greater",
  correct = FALSE
)
Warning in prop.test(c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
c(total_yes_stamps_no_coin, : Chi-squared approximation may be incorrect
results

    2-sample test for equality of proportions without continuity correction

data:  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin) out of c(total_yes_stamps_no_coin, total_no_stamps_yes_coin)
X-squared = 6.3899, df = 1, p-value = 0.9943
alternative hypothesis: greater
95 percent confidence interval:
 -0.8524793  1.0000000
sample estimates:
   prop 1    prop 2 
0.3076923 0.8750000 
#without correction
results2 <- prop.test(
  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
  c(total_yes_stamps_no_coin, total_no_stamps_yes_coin),
  alternative = "greater"
)
Warning in prop.test(c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
c(total_yes_stamps_no_coin, : Chi-squared approximation may be incorrect
results2

    2-sample test for equality of proportions with continuity correction

data:  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin) out of c(total_yes_stamps_no_coin, total_no_stamps_yes_coin)
X-squared = 4.3179, df = 1, p-value = 0.9811
alternative hypothesis: greater
95 percent confidence interval:
 -0.9534408  1.0000000
sample estimates:
   prop 1    prop 2 
0.3076923 0.8750000 

Wow. That is an incredibly high p-value. I do NOT reject the null hypothesis.

But what if I reverse the alternative hypothesis? What if I predict that no-stamp/yes-dime letters are more likely to be mailed?

#correct = FALSE disables continuity correction which can be a concern which such a small sample size
results3 <- prop.test(
  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
  c(total_yes_stamps_no_coin, total_no_stamps_yes_coin), alternative = "less",
  correct = FALSE
)
Warning in prop.test(c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
c(total_yes_stamps_no_coin, : Chi-squared approximation may be incorrect
results3

    2-sample test for equality of proportions without continuity correction

data:  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin) out of c(total_yes_stamps_no_coin, total_no_stamps_yes_coin)
X-squared = 6.3899, df = 1, p-value = 0.005738
alternative hypothesis: less
95 percent confidence interval:
 -1.0000000 -0.2821361
sample estimates:
   prop 1    prop 2 
0.3076923 0.8750000 

Look at the tiny p-value!

If I had hypothesized that two proportions were not equal, I would also be able to reject the null hypothesis.

#correct = FALSE disables continuity correction which can be a concern which such a small sample size
results <- prop.test(
  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
  c(total_yes_stamps_no_coin, total_no_stamps_yes_coin),
  correct = FALSE
)
Warning in prop.test(c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin),
c(total_yes_stamps_no_coin, : Chi-squared approximation may be incorrect
results

    2-sample test for equality of proportions without continuity correction

data:  c(tm_yes_stamps_no_coin, tm__no_stamps_yes_coin) out of c(total_yes_stamps_no_coin, total_no_stamps_yes_coin)
X-squared = 6.3899, df = 1, p-value = 0.01148
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.9071106 -0.2275048
sample estimates:
   prop 1    prop 2 
0.3076923 0.8750000 

Tableau Plot

Visit this interactive plot.

Screenshot of Visualization

LINK TO TABLEAU.

Conclusion

The statistical analysis is eye-opening. Even though my initial premise (that there is no difference in the proportion of letters mailed in the yes-stamp/no-dime and no-stamp/yes-dime scenarios) is incorrect, I was not able to reject it in my first prop.test. As I gain familiarity with statistical testing, I will make it a habit to phrase my alternative hypothesis in different ways.

I am surprised people in 1975 were more likely to buy a stamp to mail a letter than drop an already-stamped letter in a mailbox.

An interesting aspect of this vintage 1975 study is that the study leads expected someone to be happy or to be in a better mood if they found a dime. Between inflation (a 1974 dime was $0.57 in 2023, a first class stamp $0.66 (3)) and the lack of payphones in 2024, the study would be difficult to replicate today.

Citations

  1. “Influence of a Good Mood on Helpfulness” https://www.openintro.org/data/index.php?data=mail_me

  2. “Further Studies on the Effect of Feeling Good on Helping” https://www.jstor.org/stable/2786238

  3. “Inflation Calculator” https://www.minneapolisfed.org/about-us/monetary-policy/inflation-calculator

  4. “Further Studies on the Effect of Feeling Good on Helping” (page 5 of 7) https://www.jstor.org/stable/2786238