Week 8 Learning Log

Week 8 Goals

To make a start on my part 3 analyses in R
To successfully transfer my Figure 2 code to R Markdown

Challenges and successes

Well, I have both successfully (well…sort of) transferred my Figure 2 code to R Markdown (although I have no idea why it wasn’t working before!) and made a start on my part 3 analyses in R!
- The code is there but the figure is all skew-whiff! I’ll need to work out how to fix this before I publish…
My first task was to change my part 3 questions slightly after Jenny’s feedback.
My new questions are:
1. Are there gender differences in the amount of disgust when you see a cockcroach run across the floor?
2. Is there a relationship between annual household income and level of disgust felt?
3. Do people in a relationship have higher levels of contact comfort than those in platonic relationships?
To answer question 1, I went to the codebook. According to the codebook, the gender variable is sex (where 1 = male, 2 = female, 3 = other). The disgust scale for seeing a cockroach run across the floor is DS6. First, I checked that the data had these variables - it does. However, it’s spread across study 1-3. I want to see if I can somehow merge the data in these variables into one dataset by their particpant number to answer this question. I first tried merge(by=) but this threw up an error and I knew it was in base R so I tried googling how to merge datasets with Tidyverse. Indeed, I can’t even knit this code but here it is commented out

library(tidyverse)
library(patchwork)
library(extrafont)
library(cowplot)

data_1_raw = read_csv('WTR_Comfort_S1.csv')
data_2 = read_csv('WTR_Comfort_S2.csv')
data_3 = read_csv('WTR_Comfort_S3.csv')

#total <- merge(data_1_raw, data_2, data_3, by="participant")

I used dim() to get some information about the 3 datasets so I could see if the merge worked.

dim(data_1_raw)

## [1] 504 100

dim(data_2)

## [1] 430  85

dim(data_3)

## [1] 905  68

data_1_raw has 504 rows and 100 columns, data_2 has 430 rows and 85 columns and data_3 has 905 rows and 68 columns. I then merged the data using full_join() and joined by participant. This would supposedly return all rows and all columns from all datasets in a final dataset. I then used slice to show a tibble of the new dataset. I went from (1:905) because that was the range of rows I got from dim().

total <- full_join(data_1_raw, data_2, data_3, by="participant")
total %>% slice(1:905)

## # A tibble: 504 x 184
##    participant sex.x age.x relat.x income.x poli_soc.x poli_econ.x trust_gen.x
##          <dbl> <dbl> <dbl>   <dbl>    <dbl>      <dbl>       <dbl>       <dbl>
##  1           1     2    21       1        4          3           6           9
##  2           2     2    22       2        1          2           2           5
##  3           3     1    25       2        1          4           4           4
##  4           4     2    28       1        5          7           7           8
##  5           5     1    38       1       20          2           2           2
##  6           6     1    49       1       20          3           6           7
##  7           7     2    32       1       10          4           4           8
##  8           8     2    39       1       16          6           6           7
##  9           9     2    47       2        7          1           1           7
## 10          10     2    19       2       10          4           4           3
## # … with 494 more rows, and 176 more variables: DS1.x <dbl>, DS2.x <dbl>,
## #   DS3.x <dbl>, DS4.x <dbl>, DS5.x <dbl>, DS6.x <dbl>, DS7.x <dbl>,
## #   relationship_category <dbl>, part_leng <dbl>, part_sex <dbl>,
## #   part_age <dbl>, HH1.x <dbl>, HH2.x <dbl>, HH3.x <dbl>, HH4.x <dbl>,
## #   HH5.x <dbl>, HH6.x <dbl>, HH7.x <dbl>, HH8.x <dbl>, HH9.x <dbl>,
## #   HH10.x <dbl>, comf1.x <dbl>, comf2.x <dbl>, comf3.x <dbl>, comf4.x <dbl>,
## #   comf5.x <dbl>, comf6.x <dbl>, comf7.x <dbl>, comf8.x <dbl>, comf9.x <dbl>,
## #   comf10.x <dbl>, 37_54 <dbl>, 37_46 <dbl>, 37_39 <dbl>, 37_31 <dbl>,
## #   37_24 <dbl>, 37_17 <dbl>, 37_9 <dbl>, 37_2 <dbl>, 37_-6 <dbl>,
## #   37_-13 <dbl>, 23_33 <dbl>, 23_29 <dbl>, 23_24 <dbl>, 23_20 <dbl>,
## #   23_15 <dbl>, 23_10 <dbl>, 23_6 <dbl>, 23_1 <dbl>, 23_-3 <dbl>, 23_-8 <dbl>,
## #   75_109.x <dbl>, 75_94.x <dbl>, 75_79.x <dbl>, 75_64.x <dbl>, 75_49.x <dbl>,
## #   75_34.x <dbl>, 75_19.x <dbl>, 75_4.x <dbl>, 75_-11.x <dbl>, 75_-26.x <dbl>,
## #   19_28.x <dbl>, 19_24.x <dbl>, 19_20.x <dbl>, 19_16.x <dbl>, 19_12.x <dbl>,
## #   19_9.x <dbl>, 19_5.x <dbl>, 19_1.x <dbl>, 19_-3.x <dbl>, 19_-7.x <dbl>,
## #   46_67.x <dbl>, 46_58.x <dbl>, 46_48.x <dbl>, 46_39.x <dbl>, 46_30.x <dbl>,
## #   46_21.x <dbl>, 46_12.x <dbl>, 46_2.x <dbl>, 46_-7.x <dbl>, 46_-16.x <dbl>,
## #   68_99 <dbl>, 68_85 <dbl>, 68_71 <dbl>, 68_58 <dbl>, 68_44 <dbl>,
## #   68_31 <dbl>, 68_17 <dbl>, 68_3 <dbl>, 68_-10 <dbl>, 68_-24 <dbl>,
## #   English_exclude.x <dbl>, sex.y <dbl>, age.y <dbl>, relat.y <dbl>,
## #   income.y <dbl>, poli_soc.y <dbl>, poli_econ.y <dbl>, trust_gen.y <dbl>,
## #   HH1.y <dbl>, …

Hmmm… this wasn’t quite what I wanted. The datasets are side by side and I want them on top of each other. I’m wondering if pivot_longer() would help?

 total %>% pivot_longer(participant, names_to = NULL, values_to = 'total_participants')

## # A tibble: 504 x 184
##    sex.x age.x relat.x income.x poli_soc.x poli_econ.x trust_gen.x DS1.x DS2.x
##    <dbl> <dbl>   <dbl>    <dbl>      <dbl>       <dbl>       <dbl> <dbl> <dbl>
##  1     2    21       1        4          3           6           9     7     5
##  2     2    22       2        1          2           2           5     5     3
##  3     1    25       2        1          4           4           4     3     2
##  4     2    28       1        5          7           7           8     7     5
##  5     1    38       1       20          2           2           2     6     6
##  6     1    49       1       20          3           6           7     4     3
##  7     2    32       1       10          4           4           8     7     7
##  8     2    39       1       16          6           6           7     6     4
##  9     2    47       2        7          1           1           7     6     3
## 10     2    19       2       10          4           4           3     7     2
## # … with 494 more rows, and 175 more variables: DS3.x <dbl>, DS4.x <dbl>,
## #   DS5.x <dbl>, DS6.x <dbl>, DS7.x <dbl>, relationship_category <dbl>,
## #   part_leng <dbl>, part_sex <dbl>, part_age <dbl>, HH1.x <dbl>, HH2.x <dbl>,
## #   HH3.x <dbl>, HH4.x <dbl>, HH5.x <dbl>, HH6.x <dbl>, HH7.x <dbl>,
## #   HH8.x <dbl>, HH9.x <dbl>, HH10.x <dbl>, comf1.x <dbl>, comf2.x <dbl>,
## #   comf3.x <dbl>, comf4.x <dbl>, comf5.x <dbl>, comf6.x <dbl>, comf7.x <dbl>,
## #   comf8.x <dbl>, comf9.x <dbl>, comf10.x <dbl>, 37_54 <dbl>, 37_46 <dbl>,
## #   37_39 <dbl>, 37_31 <dbl>, 37_24 <dbl>, 37_17 <dbl>, 37_9 <dbl>, 37_2 <dbl>,
## #   37_-6 <dbl>, 37_-13 <dbl>, 23_33 <dbl>, 23_29 <dbl>, 23_24 <dbl>,
## #   23_20 <dbl>, 23_15 <dbl>, 23_10 <dbl>, 23_6 <dbl>, 23_1 <dbl>, 23_-3 <dbl>,
## #   23_-8 <dbl>, 75_109.x <dbl>, 75_94.x <dbl>, 75_79.x <dbl>, 75_64.x <dbl>,
## #   75_49.x <dbl>, 75_34.x <dbl>, 75_19.x <dbl>, 75_4.x <dbl>, 75_-11.x <dbl>,
## #   75_-26.x <dbl>, 19_28.x <dbl>, 19_24.x <dbl>, 19_20.x <dbl>, 19_16.x <dbl>,
## #   19_12.x <dbl>, 19_9.x <dbl>, 19_5.x <dbl>, 19_1.x <dbl>, 19_-3.x <dbl>,
## #   19_-7.x <dbl>, 46_67.x <dbl>, 46_58.x <dbl>, 46_48.x <dbl>, 46_39.x <dbl>,
## #   46_30.x <dbl>, 46_21.x <dbl>, 46_12.x <dbl>, 46_2.x <dbl>, 46_-7.x <dbl>,
## #   46_-16.x <dbl>, 68_99 <dbl>, 68_85 <dbl>, 68_71 <dbl>, 68_58 <dbl>,
## #   68_44 <dbl>, 68_31 <dbl>, 68_17 <dbl>, 68_3 <dbl>, 68_-10 <dbl>,
## #   68_-24 <dbl>, English_exclude.x <dbl>, sex.y <dbl>, age.y <dbl>,
## #   relat.y <dbl>, income.y <dbl>, poli_soc.y <dbl>, poli_econ.y <dbl>,
## #   trust_gen.y <dbl>, HH1.y <dbl>, HH2.y <dbl>, HH3.y <dbl>, …

Well..that seemed to do something, but I just don’t know what I’m looking at… I think I’ll scratch merging the data and look at each study separately for now.

I’m using the geom_col() function in the ggplot() package to draw a column graph. I’m plotting disgust about seeing a cockroach as a function of gender. I’m using a column graph because gender is categorical, as is the DS scale (a likert scale from ‘1 = not at all disgusting to 7 = extremely disgusting’)

S1_disgust_gender_plot = 
  ggplot(
    data = data_1_raw, 
    mapping = aes(
      x = DS6, 
      fill = factor(sex))) +
  geom_bar() +
  labs (
    x = 'Disgust level', 
    y = 'Number of participants') + 
  scale_x_continuous(
    breaks = c(1,7), 
    labels = c('not at all disgusting', 'extremely disgusting'))

print(S1_disgust_gender_plot)

S2_disgust_gender_plot = 
  ggplot(
    data = data_2, 
    mapping = aes(
      x = DS6, 
      fill = factor(sex))) +
  geom_bar() +
  labs (
    x = 'Disgust level', 
    y = 'Number of participants') + 
  scale_x_continuous(
    breaks = c(1,7), 
    labels = c('not at all disgusting', 'extremely disgusting'))

print(S2_disgust_gender_plot)

S3_disgust_gender_plot = 
  ggplot(
    data = data_3, 
    mapping = aes(
      x = DS6, 
      fill = factor(sex))) +
  geom_bar() +
  labs (
    x = 'Disgust level', 
    y = 'Number of participants') + 
  scale_x_continuous(
    breaks = c(1,7), 
    labels = c('not at all disgusting', 'extremely disgusting'))

print(S3_disgust_gender_plot)

S1_disgust_gender_plot + S2_disgust_gender_plot + S3_disgust_gender_plot

Interesting… I totally need to rename these variables but I can’t work out how so, for now, remember that 1 = males and 2 = females. Looks like females are more disgusted than females by seeing a cockroach run across the floor! This seems to change across different levels of the likert scale… Also…this plot is also skew whiff when I knit it!

Let’s do some actual statistics to see if this is significant. Both of my variables are categorical and I want to see the relationship between them, so I need to use a Chi-square test. I’m using the chisq.test for this.

# Study 1

chisq_S1_table = data_1_raw %>% select(sex, DS6)
chisq_S1 <- chisq.test(chisq_S1_table)

## Warning in chisq.test(chisq_S1_table): Chi-squared approximation may be
## incorrect

print(chisq_S1)

## 
##  Pearson's Chi-squared test
## 
## data:  chisq_S1_table
## X-squared = 135.05, df = 503, p-value = 1

# Study 2

chisq_S2_table = data_2 %>% select(sex, DS6)
chisq_S2 <- chisq.test(chisq_S2_table)

## Warning in chisq.test(chisq_S2_table): Chi-squared approximation may be
## incorrect

print(chisq_S2)

## 
##  Pearson's Chi-squared test
## 
## data:  chisq_S2_table
## X-squared = 126.96, df = 429, p-value = 1

# Study 3

chisq_S3_table = data_3 %>% select(sex, DS6)
chisq_S3 <- chisq.test(chisq_S3_table)

## Warning in chisq.test(chisq_S3_table): Chi-squared approximation may be
## incorrect

print(chisq_S3)

## 
##  Pearson's Chi-squared test
## 
## data:  chisq_S3_table
## X-squared = 250.69, df = 904, p-value = 1

Okay…p-value of 1 for every study..p > .05 so non-significant. Well! That sorts that out, any results I got were absolutely non-significant and therefore, no inference can be made about the gender differences in the disgust of seeing a cockroach run across the floor. I’m also not entirely sure if this warning message makes my results totally invalid. It’s also been 2 years since I’ve done chi-squared tests so my use of them may be terrible. The test statistic seems extremely high…

Week 9 Goals

Complete statistical analyses for the remaining two questions
Get someone to check my statistical analyses for question 1
Find out how to make my plots knit better than they currently are!
Make my bar plots more visually appealling and easy to read…