Week 9 Learning Log

Week 9 Goals

To try and finish my exploratory analysis for the verification report…no such luck

Challenges and successes

Well…I have code for all of my exploratory analysis…but I just feel like it’s all wrong and looks terrible. Feeling pretty hopeless this week! Here’s how I went…

# Load packages

library(tidyverse)
library(patchwork)
library(extrafont)
library(cowplot)
library(rstatix)
library(ggeasy)
library(ggpubr)
library(corrplot)
library(car)
library(psych)
library(BayesFactor)
library(gt)
library(jmv)

# Read data

# The data in the OSF repo is in .csv format, so I need to use the `read_csv()` function from the `reader` package. 

data_1_raw = read_csv('WTR_Comfort_S1.csv')
data_2 = read_csv('WTR_Comfort_S2.csv')
data_3 = read_csv('WTR_Comfort_S3.csv')

3a. Are there gender differences in the threshold for disgust?

My experience is that women tend to feel higher levels of disgust compared to men. This is supported in the literature where women have continually been found to have noticably higher levels of disgust compared to men, across many different types of disgust, such as insects, open sores, dirty clothing, and faeces (Curtis et al., 2004; Prokop & Fančovičová, 2010), and animals, death, hygiene, food, and sex (Haidt et al., 1994). In Tybur et al. (2020), disgust was measured on a likert scale and disgsut statements related to dog faeces, red sores, sweaty palms, mouldy food, body odor, cockroaches, and bloody cuts.

According to the codebook, the gender variable is sex (where 1 = male, 2 = female, 3 = other). The disgust scale for seeing a cockroach run across the floor is DS6. First, I checked that the data had these variables - it does. However, it’s spread across study 1-3. I wanted to see if I could somehow merge the data in these variables into one dataset by their participant number to answer this question. However, after trying some code, I realised this probably wouldn’t work because it was a between subjects design where the participants in each study were separate. I first tried merge(by=) but this threw up an error.

I used dim() to get some information about the 3 datasets so I could see if the merge worked.

dim(data_1_raw)

## [1] 504 100

dim(data_2)

## [1] 430  85

dim(data_3)

## [1] 905  68

data_1_raw has 504 rows and 100 columns, data_2 has 430 rows and 85 columns and data_3 has 905 rows and 68 columns. I then merged the data using full_join() and joined by participant. This would supposedly return all rows and all columns from all datasets in a final dataset. I then used slice to show a tibble of the new dataset. I went from (1:905) because that was the range of rows I got from dim().

total <- full_join(data_1_raw, data_2, data_3, by="participant")
total %>% slice(1:905)

## # A tibble: 504 x 184
##    participant sex.x age.x relat.x income.x poli_soc.x poli_econ.x trust_gen.x
##          <dbl> <dbl> <dbl>   <dbl>    <dbl>      <dbl>       <dbl>       <dbl>
##  1           1     2    21       1        4          3           6           9
##  2           2     2    22       2        1          2           2           5
##  3           3     1    25       2        1          4           4           4
##  4           4     2    28       1        5          7           7           8
##  5           5     1    38       1       20          2           2           2
##  6           6     1    49       1       20          3           6           7
##  7           7     2    32       1       10          4           4           8
##  8           8     2    39       1       16          6           6           7
##  9           9     2    47       2        7          1           1           7
## 10          10     2    19       2       10          4           4           3
## # … with 494 more rows, and 176 more variables: DS1.x <dbl>, DS2.x <dbl>,
## #   DS3.x <dbl>, DS4.x <dbl>, DS5.x <dbl>, DS6.x <dbl>, DS7.x <dbl>,
## #   relationship_category <dbl>, part_leng <dbl>, part_sex <dbl>,
## #   part_age <dbl>, HH1.x <dbl>, HH2.x <dbl>, HH3.x <dbl>, HH4.x <dbl>,
## #   HH5.x <dbl>, HH6.x <dbl>, HH7.x <dbl>, HH8.x <dbl>, HH9.x <dbl>,
## #   HH10.x <dbl>, comf1.x <dbl>, comf2.x <dbl>, comf3.x <dbl>, comf4.x <dbl>,
## #   comf5.x <dbl>, comf6.x <dbl>, comf7.x <dbl>, comf8.x <dbl>, comf9.x <dbl>,
## #   comf10.x <dbl>, 37_54 <dbl>, 37_46 <dbl>, 37_39 <dbl>, 37_31 <dbl>,
## #   37_24 <dbl>, 37_17 <dbl>, 37_9 <dbl>, 37_2 <dbl>, 37_-6 <dbl>,
## #   37_-13 <dbl>, 23_33 <dbl>, 23_29 <dbl>, 23_24 <dbl>, 23_20 <dbl>,
## #   23_15 <dbl>, 23_10 <dbl>, 23_6 <dbl>, 23_1 <dbl>, 23_-3 <dbl>, 23_-8 <dbl>,
## #   75_109.x <dbl>, 75_94.x <dbl>, 75_79.x <dbl>, 75_64.x <dbl>, 75_49.x <dbl>,
## #   75_34.x <dbl>, 75_19.x <dbl>, 75_4.x <dbl>, 75_-11.x <dbl>, 75_-26.x <dbl>,
## #   19_28.x <dbl>, 19_24.x <dbl>, 19_20.x <dbl>, 19_16.x <dbl>, 19_12.x <dbl>,
## #   19_9.x <dbl>, 19_5.x <dbl>, 19_1.x <dbl>, 19_-3.x <dbl>, 19_-7.x <dbl>,
## #   46_67.x <dbl>, 46_58.x <dbl>, 46_48.x <dbl>, 46_39.x <dbl>, 46_30.x <dbl>,
## #   46_21.x <dbl>, 46_12.x <dbl>, 46_2.x <dbl>, 46_-7.x <dbl>, 46_-16.x <dbl>,
## #   68_99 <dbl>, 68_85 <dbl>, 68_71 <dbl>, 68_58 <dbl>, 68_44 <dbl>,
## #   68_31 <dbl>, 68_17 <dbl>, 68_3 <dbl>, 68_-10 <dbl>, 68_-24 <dbl>,
## #   English_exclude.x <dbl>, sex.y <dbl>, age.y <dbl>, relat.y <dbl>,
## #   income.y <dbl>, poli_soc.y <dbl>, poli_econ.y <dbl>, trust_gen.y <dbl>,
## #   HH1.y <dbl>, …

Hmmm… this wasn’t quite what I wanted. The datasets are side by side and I want them on top of each other. I’m wondering if pivot_longer() would help?

 total %>% pivot_longer(participant, names_to = NULL, values_to = 'total_participants')

## # A tibble: 504 x 184
##    sex.x age.x relat.x income.x poli_soc.x poli_econ.x trust_gen.x DS1.x DS2.x
##    <dbl> <dbl>   <dbl>    <dbl>      <dbl>       <dbl>       <dbl> <dbl> <dbl>
##  1     2    21       1        4          3           6           9     7     5
##  2     2    22       2        1          2           2           5     5     3
##  3     1    25       2        1          4           4           4     3     2
##  4     2    28       1        5          7           7           8     7     5
##  5     1    38       1       20          2           2           2     6     6
##  6     1    49       1       20          3           6           7     4     3
##  7     2    32       1       10          4           4           8     7     7
##  8     2    39       1       16          6           6           7     6     4
##  9     2    47       2        7          1           1           7     6     3
## 10     2    19       2       10          4           4           3     7     2
## # … with 494 more rows, and 175 more variables: DS3.x <dbl>, DS4.x <dbl>,
## #   DS5.x <dbl>, DS6.x <dbl>, DS7.x <dbl>, relationship_category <dbl>,
## #   part_leng <dbl>, part_sex <dbl>, part_age <dbl>, HH1.x <dbl>, HH2.x <dbl>,
## #   HH3.x <dbl>, HH4.x <dbl>, HH5.x <dbl>, HH6.x <dbl>, HH7.x <dbl>,
## #   HH8.x <dbl>, HH9.x <dbl>, HH10.x <dbl>, comf1.x <dbl>, comf2.x <dbl>,
## #   comf3.x <dbl>, comf4.x <dbl>, comf5.x <dbl>, comf6.x <dbl>, comf7.x <dbl>,
## #   comf8.x <dbl>, comf9.x <dbl>, comf10.x <dbl>, 37_54 <dbl>, 37_46 <dbl>,
## #   37_39 <dbl>, 37_31 <dbl>, 37_24 <dbl>, 37_17 <dbl>, 37_9 <dbl>, 37_2 <dbl>,
## #   37_-6 <dbl>, 37_-13 <dbl>, 23_33 <dbl>, 23_29 <dbl>, 23_24 <dbl>,
## #   23_20 <dbl>, 23_15 <dbl>, 23_10 <dbl>, 23_6 <dbl>, 23_1 <dbl>, 23_-3 <dbl>,
## #   23_-8 <dbl>, 75_109.x <dbl>, 75_94.x <dbl>, 75_79.x <dbl>, 75_64.x <dbl>,
## #   75_49.x <dbl>, 75_34.x <dbl>, 75_19.x <dbl>, 75_4.x <dbl>, 75_-11.x <dbl>,
## #   75_-26.x <dbl>, 19_28.x <dbl>, 19_24.x <dbl>, 19_20.x <dbl>, 19_16.x <dbl>,
## #   19_12.x <dbl>, 19_9.x <dbl>, 19_5.x <dbl>, 19_1.x <dbl>, 19_-3.x <dbl>,
## #   19_-7.x <dbl>, 46_67.x <dbl>, 46_58.x <dbl>, 46_48.x <dbl>, 46_39.x <dbl>,
## #   46_30.x <dbl>, 46_21.x <dbl>, 46_12.x <dbl>, 46_2.x <dbl>, 46_-7.x <dbl>,
## #   46_-16.x <dbl>, 68_99 <dbl>, 68_85 <dbl>, 68_71 <dbl>, 68_58 <dbl>,
## #   68_44 <dbl>, 68_31 <dbl>, 68_17 <dbl>, 68_3 <dbl>, 68_-10 <dbl>,
## #   68_-24 <dbl>, English_exclude.x <dbl>, sex.y <dbl>, age.y <dbl>,
## #   relat.y <dbl>, income.y <dbl>, poli_soc.y <dbl>, poli_econ.y <dbl>,
## #   trust_gen.y <dbl>, HH1.y <dbl>, HH2.y <dbl>, HH3.y <dbl>, …

Well..that seemed to do something, but I just don’t know what I’m looking at… I think I’ll scratch merging the data. I will use the information I got from the dim() function though, and choose to do my further analyses on Study 3 because it has 905 rows (and therefore 905 participants) so should afford me the greatest statistical power. I chose the first variable to examine - dog poo!

I made a summary table of descriptive statistics instead. Working out the best way to do this took me hours and many meltdowns. I couldn’t work out how Jenny had done it despite being at the Q&A and re-watching her code it. Finally came up with what is quite a pretty descriptives table! I just started it with one variable (DS1) to start off with.

DS1summary <- data_3 %>%
  filter(sex != '3') %>%
  group_by(sex) %>%
  summarise (mean = mean(DS1),
             sd = sd(DS1),
             n = n(),
             se = sd/sqrt(n))
print(DS1summary)

## # A tibble: 2 x 5
##     sex  mean    sd     n     se
##   <dbl> <dbl> <dbl> <int>  <dbl>
## 1     1  5.15  1.40   452 0.0657
## 2     2  5.56  1.38   445 0.0656

DS1summary %>% gt() %>% fmt_number(columns = 2:5, decimals = 2)

sex	mean	sd	n	se
1	5.15	1.40	452.00	0.07
2	5.56	1.38	445.00	0.07

Well… according to this table, females feel higher levels of disgust on average (5.15) compared to males (5.56).

I then plotted this to have a visual comparison, before doing some inferential statistical testing.

DS1summary %>% ggplot(aes(x=sex, y=mean, fill=sex)) +
  geom_col() +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .2) +
  scale_x_discrete(name = 'gender', breaks = c(1:2), labels = c('Males', 'Females')) +
  scale_y_continuous(name = 'Average disgust - dog poo', expand = c(0,0), limits = c(0, 6)) + 
  easy_text_size(15) + 
  theme(legend.position = "none")

Yep.. females are definitely still higher! Now let’s see if this is statistically significant.

data_3 %>%
  filter(sex != '3')

## # A tibble: 897 x 68
##    participant   sex   age relat income DS_Inst   DS1   DS2   DS3   DS4   DS5
##          <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1         599     2    28     1     10       1     7     6     7     5     7
##  2         542     2    37     1      7       1     5     7     5     6     5
##  3         685     1    37     1      6       1     7     7     7     7     7
##  4         778     2    77     1      4       1     7     7     5     7     7
##  5         484     2    73     2      4       1     4     6     4     4     4
##  6          23     1    70     1      7       1     3     3     2     5     4
##  7         643     1    70     1      5       1     4     5     3     6     5
##  8         276     2    69     2      2       1     7     5     4     7     4
##  9         360     1    69     1     10       1     7     6     4     6     5
## 10          33     2    68     2      4       1     6     7     4     4     7
## # … with 887 more rows, and 57 more variables: DS6 <dbl>, DS7 <dbl>,
## #   Target <dbl>, Value <dbl>, Alex_inst <dbl>, Alex_height <dbl>,
## #   Alex_weight <dbl>, Alex_age <dbl>, Alex_wealth <dbl>,
## #   Alex_intelligent <dbl>, Alex_attractive <dbl>, Alex_kind <dbl>,
## #   Alex_honest <dbl>, Alex_care <dbl>, comfort_inst <dbl>, comf1 <dbl>,
## #   comf2 <dbl>, comf3 <dbl>, comf4 <dbl>, comf5 <dbl>, comf6 <dbl>,
## #   comf7 <dbl>, comf8 <dbl>, comf9 <dbl>, comf10 <dbl>, WTR_INST <dbl>,
## #   75_109 <dbl>, 75_94 <dbl>, 75_79 <dbl>, 75_64 <dbl>, 75_49 <dbl>,
## #   75_34 <dbl>, 75_19 <dbl>, 75_4 <dbl>, 75_-11 <dbl>, 75_-26 <dbl>,
## #   19_28 <dbl>, 19_24 <dbl>, 19_20 <dbl>, 19_16 <dbl>, 19_12 <dbl>,
## #   19_9 <dbl>, 19_5 <dbl>, 19_1 <dbl>, 19_-3 <dbl>, 19_-7 <dbl>, 46_67 <dbl>,
## #   46_58 <dbl>, 46_48 <dbl>, 46_39 <dbl>, 46_30 <dbl>, 46_21 <dbl>,
## #   46_12 <dbl>, 46_2 <dbl>, 46_-7 <dbl>, 46_-16 <dbl>, English_Exclude <dbl>

data3sexxds1 <- select(data_3, sex, DS1)

female <- data3sexxds1 %>%
  filter(sex == '2')

male <- data3sexxds1 %>%
  filter(sex == '1')

t.test(female, male, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  female and male
## t = 6.8712, df = 1792, p-value = 8.753e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.5034617 0.9056812
## sample estimates:
## mean of x mean of y 
##  3.780899  3.076327

Okay this doesn’t appear to be significant, as p = 8.75. I’m not actually sure if this is right…I can’t quite work out how to do a t-test in R because this seems wrong…

3b. Does the type of social politic view (from very liberal to very conservative) differ based on annual household income?

The general consensus (and wider literature) finds that there is a demonstrated correlation between income and social political views, such that people with higher household income have increasingly conservative social and political views. I will look into this in our dataset.

I made a sumamry table of descriptive statsistcs.

INCOMExSOCPOL <- data_1_raw %>%
  group_by(income) %>%
  summarise (mean = mean(poli_soc),
             sd = sd(poli_soc),
             n = n(),
             se = sd/sqrt(n))
print(INCOMExSOCPOL)

## # A tibble: 20 x 5
##    income  mean    sd     n    se
##     <dbl> <dbl> <dbl> <int> <dbl>
##  1      1  2.9   1.65    20 0.369
##  2      2  2.88  2.03    24 0.414
##  3      3  3.26  1.75    27 0.336
##  4      4  2.78  1.55    37 0.254
##  5      5  3.10  2.09    29 0.389
##  6      6  3.5   1.68    30 0.306
##  7      7  3.33  2.06    33 0.358
##  8      8  3.07  1.98    27 0.381
##  9      9  3.35  1.90    37 0.313
## 10     10  4.30  1.61    37 0.265
## 11     11  3.42  1.57    19 0.361
## 12     12  3.03  1.75    30 0.320
## 13     13  3.15  2.01    26 0.395
## 14     14  3.33  2.09    15 0.540
## 15     15  3.73  1.74    11 0.524
## 16     16  3.5   1.85     8 0.655
## 17     17  4.83  1.94     6 0.792
## 18     18  3.38  1.66    13 0.460
## 19     19  3.10  1.89    21 0.413
## 20     20  2.54  1.50    54 0.204

INCOMExSOCPOL %>% gt() %>% fmt_number(columns = 2:5, decimals = 2)

income	mean	sd	n	se
1	2.90	1.65	20.00	0.37
2	2.88	2.03	24.00	0.41
3	3.26	1.75	27.00	0.34
4	2.78	1.55	37.00	0.25
5	3.10	2.09	29.00	0.39
6	3.50	1.68	30.00	0.31
7	3.33	2.06	33.00	0.36
8	3.07	1.98	27.00	0.38
9	3.35	1.90	37.00	0.31
10	4.30	1.61	37.00	0.27
11	3.42	1.57	19.00	0.36
12	3.03	1.75	30.00	0.32
13	3.15	2.01	26.00	0.39
14	3.33	2.09	15.00	0.54
15	3.73	1.74	11.00	0.52
16	3.50	1.85	8.00	0.65
17	4.83	1.94	6.00	0.79
18	3.38	1.66	13.00	0.46
19	3.10	1.89	21.00	0.41
20	2.54	1.50	54.00	0.20

This table isn’t really that helpful because there’s so many levels of income. I’ll plot it for a visual comparison, before doing some inferential statistical testing.

INCOMExSOCPOL %>% ggplot(aes(x=income, y=mean, fill=income)) +
  geom_col() +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .2) +
  scale_x_discrete(name = 'Annual Household Income', limits = c("5,000", "", "", "", "", "", "", "", "", "50,000", "", "", "", "", "","", "", "", "", "100,000")) +
  scale_y_continuous(name = 'Average social political views', expand = c(0,0), limits = c(0, 6)) + 
  easy_text_size(15) +
  scale_fill_continuous(name=
                          'annual household 
      income') + 
  theme(legend.position = "none")

Nope…nothing very interesting appears to be happening here either. There are two spikes at $50,000 and $85,000 where political views were more conservative but on general they were pretty liberal. This is probably given the sample and sampling method. Given the exploratory analysis hasn’t shown anything interesting, I won’t go onto any inferential tests.

3c. Do people in a relationship have higher levels of contact comfort than those in platonic relationships?

I would assume that people in a relationship would have higher levels of contact comfort than those who aren’t…so let’s see! I’m going to use the comf2 variable which is ‘using target’s deodorant stick on yourself’ because that seems like something that people in relationships would do all the time, but people who aren’t as used to being close to someone, may find pretty gross… This variable is measured from 1= very uncomfortable to 7 = very comfortable.

I first made a summary table of descriptive statistics.

comf2summary <- data_1_raw %>%
  group_by(relat) %>%
  summarise (mean = mean(comf2),
             sd = sd(comf2),
             n = n(),
             se = sd/sqrt(n))
print(comf2summary)

## # A tibble: 2 x 5
##   relat  mean    sd     n    se
##   <dbl> <dbl> <dbl> <int> <dbl>
## 1     1  4.25  2.37   335 0.130
## 2     2  2.99  2.01   169 0.155

comf2summary %>% gt() %>% fmt_number(columns = 2:5, decimals = 2)

relat	mean	sd	n	se
1	4.25	2.37	335.00	0.13
2	2.99	2.01	169.00	0.15

Okay! Definitely looks like there is a difference, with people in a romantic relationship about 1.25 points higher on the level of comfort. Let’s see if the plot supports this…

comf2summary %>% ggplot(aes(x=relat, y=mean, fill=relat)) +
  geom_col() +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                width = .2) +
  scale_x_continuous(name = 'Relationship Category', breaks = c(1:2), labels = c('In a romantic relationship', 'Not in a romantic relationship')) +
  scale_y_continuous(name = 'Average contact comfort - deoderant', expand = c(0,0), limits = c(0, 6)) + 
  easy_text_size(15) + 
  theme(legend.position = "none")

Oh yeah, definitely some sort of difference! Wonder if it’s statistically significant?

data1relxcomf2 <- select(data_1_raw, relat, comf2)

yesrelationship <- data1relxcomf2 %>%
  filter(relat == '1')

norelationship <- data1relxcomf2 %>%
  filter(relat == '2')

t.test(yesrelationship, norelationship, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  yesrelationship and norelationship
## t = 0.95002, df = 1006, p-value = 0.3423
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1414894  0.4070551
## sample estimates:
## mean of x mean of y 
##  2.626866  2.494083

Wow! Looks like it’s significant too! At least one of my questions came up with something interesting!

Week 10 Goals

Get someone to check my statistical analyses
Write up my recommendations
Knit my report into word
Don’t throw my computer at the wall