Week 8 Goals
- To make a start on my part 3 analyses in R
- To successfully transfer my Figure 2 code to R Markdown
Challenges and successes
- Well, I have both successfully (well…sort of) transferred my Figure 2 code to R Markdown (although I have no idea why it wasn’t working before!) and made a start on my part 3 analyses in R!
- The code is there but the figure is all skew-whiff! I’ll need to work out how to fix this before I publish…
- The code is there but the figure is all skew-whiff! I’ll need to work out how to fix this before I publish…
- My first task was to change my part 3 questions slightly after Jenny’s feedback.
- My new questions are:
- Are there gender differences in the amount of disgust when you see a cockcroach run across the floor?
- Is there a relationship between annual household income and level of disgust felt?
- Do people in a relationship have higher levels of contact comfort than those in platonic relationships?
- To answer question 1, I went to the codebook. According to the codebook, the gender variable is sex (where 1 = male, 2 = female, 3 = other). The disgust scale for seeing a cockroach run across the floor is DS6. First, I checked that the data had these variables - it does. However, it’s spread across study 1-3. I want to see if I can somehow merge the data in these variables into one dataset by their particpant number to answer this question. I first tried
merge(by=)but this threw up an error and I knew it was in base R so I tried googling how to merge datasets with Tidyverse. Indeed, I can’t even knit this code but here it is commented out
library(tidyverse)
library(patchwork)
library(extrafont)
library(cowplot)
data_1_raw = read_csv('WTR_Comfort_S1.csv')
data_2 = read_csv('WTR_Comfort_S2.csv')
data_3 = read_csv('WTR_Comfort_S3.csv')
#total <- merge(data_1_raw, data_2, data_3, by="participant")I used dim() to get some information about the 3 datasets so I could see if the merge worked.
dim(data_1_raw)## [1] 504 100
dim(data_2)## [1] 430 85
dim(data_3)## [1] 905 68
data_1_raw has 504 rows and 100 columns, data_2 has 430 rows and 85 columns and data_3 has 905 rows and 68 columns. I then merged the data using full_join() and joined by participant. This would supposedly return all rows and all columns from all datasets in a final dataset. I then used slice to show a tibble of the new dataset. I went from (1:905) because that was the range of rows I got from dim().
total <- full_join(data_1_raw, data_2, data_3, by="participant")
total %>% slice(1:905)## # A tibble: 504 x 184
## participant sex.x age.x relat.x income.x poli_soc.x poli_econ.x trust_gen.x
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 21 1 4 3 6 9
## 2 2 2 22 2 1 2 2 5
## 3 3 1 25 2 1 4 4 4
## 4 4 2 28 1 5 7 7 8
## 5 5 1 38 1 20 2 2 2
## 6 6 1 49 1 20 3 6 7
## 7 7 2 32 1 10 4 4 8
## 8 8 2 39 1 16 6 6 7
## 9 9 2 47 2 7 1 1 7
## 10 10 2 19 2 10 4 4 3
## # … with 494 more rows, and 176 more variables: DS1.x <dbl>, DS2.x <dbl>,
## # DS3.x <dbl>, DS4.x <dbl>, DS5.x <dbl>, DS6.x <dbl>, DS7.x <dbl>,
## # relationship_category <dbl>, part_leng <dbl>, part_sex <dbl>,
## # part_age <dbl>, HH1.x <dbl>, HH2.x <dbl>, HH3.x <dbl>, HH4.x <dbl>,
## # HH5.x <dbl>, HH6.x <dbl>, HH7.x <dbl>, HH8.x <dbl>, HH9.x <dbl>,
## # HH10.x <dbl>, comf1.x <dbl>, comf2.x <dbl>, comf3.x <dbl>, comf4.x <dbl>,
## # comf5.x <dbl>, comf6.x <dbl>, comf7.x <dbl>, comf8.x <dbl>, comf9.x <dbl>,
## # comf10.x <dbl>, 37_54 <dbl>, 37_46 <dbl>, 37_39 <dbl>, 37_31 <dbl>,
## # 37_24 <dbl>, 37_17 <dbl>, 37_9 <dbl>, 37_2 <dbl>, 37_-6 <dbl>,
## # 37_-13 <dbl>, 23_33 <dbl>, 23_29 <dbl>, 23_24 <dbl>, 23_20 <dbl>,
## # 23_15 <dbl>, 23_10 <dbl>, 23_6 <dbl>, 23_1 <dbl>, 23_-3 <dbl>, 23_-8 <dbl>,
## # 75_109.x <dbl>, 75_94.x <dbl>, 75_79.x <dbl>, 75_64.x <dbl>, 75_49.x <dbl>,
## # 75_34.x <dbl>, 75_19.x <dbl>, 75_4.x <dbl>, 75_-11.x <dbl>, 75_-26.x <dbl>,
## # 19_28.x <dbl>, 19_24.x <dbl>, 19_20.x <dbl>, 19_16.x <dbl>, 19_12.x <dbl>,
## # 19_9.x <dbl>, 19_5.x <dbl>, 19_1.x <dbl>, 19_-3.x <dbl>, 19_-7.x <dbl>,
## # 46_67.x <dbl>, 46_58.x <dbl>, 46_48.x <dbl>, 46_39.x <dbl>, 46_30.x <dbl>,
## # 46_21.x <dbl>, 46_12.x <dbl>, 46_2.x <dbl>, 46_-7.x <dbl>, 46_-16.x <dbl>,
## # 68_99 <dbl>, 68_85 <dbl>, 68_71 <dbl>, 68_58 <dbl>, 68_44 <dbl>,
## # 68_31 <dbl>, 68_17 <dbl>, 68_3 <dbl>, 68_-10 <dbl>, 68_-24 <dbl>,
## # English_exclude.x <dbl>, sex.y <dbl>, age.y <dbl>, relat.y <dbl>,
## # income.y <dbl>, poli_soc.y <dbl>, poli_econ.y <dbl>, trust_gen.y <dbl>,
## # HH1.y <dbl>, …
Hmmm… this wasn’t quite what I wanted. The datasets are side by side and I want them on top of each other. I’m wondering if pivot_longer() would help?
total %>% pivot_longer(participant, names_to = NULL, values_to = 'total_participants')## # A tibble: 504 x 184
## sex.x age.x relat.x income.x poli_soc.x poli_econ.x trust_gen.x DS1.x DS2.x
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 21 1 4 3 6 9 7 5
## 2 2 22 2 1 2 2 5 5 3
## 3 1 25 2 1 4 4 4 3 2
## 4 2 28 1 5 7 7 8 7 5
## 5 1 38 1 20 2 2 2 6 6
## 6 1 49 1 20 3 6 7 4 3
## 7 2 32 1 10 4 4 8 7 7
## 8 2 39 1 16 6 6 7 6 4
## 9 2 47 2 7 1 1 7 6 3
## 10 2 19 2 10 4 4 3 7 2
## # … with 494 more rows, and 175 more variables: DS3.x <dbl>, DS4.x <dbl>,
## # DS5.x <dbl>, DS6.x <dbl>, DS7.x <dbl>, relationship_category <dbl>,
## # part_leng <dbl>, part_sex <dbl>, part_age <dbl>, HH1.x <dbl>, HH2.x <dbl>,
## # HH3.x <dbl>, HH4.x <dbl>, HH5.x <dbl>, HH6.x <dbl>, HH7.x <dbl>,
## # HH8.x <dbl>, HH9.x <dbl>, HH10.x <dbl>, comf1.x <dbl>, comf2.x <dbl>,
## # comf3.x <dbl>, comf4.x <dbl>, comf5.x <dbl>, comf6.x <dbl>, comf7.x <dbl>,
## # comf8.x <dbl>, comf9.x <dbl>, comf10.x <dbl>, 37_54 <dbl>, 37_46 <dbl>,
## # 37_39 <dbl>, 37_31 <dbl>, 37_24 <dbl>, 37_17 <dbl>, 37_9 <dbl>, 37_2 <dbl>,
## # 37_-6 <dbl>, 37_-13 <dbl>, 23_33 <dbl>, 23_29 <dbl>, 23_24 <dbl>,
## # 23_20 <dbl>, 23_15 <dbl>, 23_10 <dbl>, 23_6 <dbl>, 23_1 <dbl>, 23_-3 <dbl>,
## # 23_-8 <dbl>, 75_109.x <dbl>, 75_94.x <dbl>, 75_79.x <dbl>, 75_64.x <dbl>,
## # 75_49.x <dbl>, 75_34.x <dbl>, 75_19.x <dbl>, 75_4.x <dbl>, 75_-11.x <dbl>,
## # 75_-26.x <dbl>, 19_28.x <dbl>, 19_24.x <dbl>, 19_20.x <dbl>, 19_16.x <dbl>,
## # 19_12.x <dbl>, 19_9.x <dbl>, 19_5.x <dbl>, 19_1.x <dbl>, 19_-3.x <dbl>,
## # 19_-7.x <dbl>, 46_67.x <dbl>, 46_58.x <dbl>, 46_48.x <dbl>, 46_39.x <dbl>,
## # 46_30.x <dbl>, 46_21.x <dbl>, 46_12.x <dbl>, 46_2.x <dbl>, 46_-7.x <dbl>,
## # 46_-16.x <dbl>, 68_99 <dbl>, 68_85 <dbl>, 68_71 <dbl>, 68_58 <dbl>,
## # 68_44 <dbl>, 68_31 <dbl>, 68_17 <dbl>, 68_3 <dbl>, 68_-10 <dbl>,
## # 68_-24 <dbl>, English_exclude.x <dbl>, sex.y <dbl>, age.y <dbl>,
## # relat.y <dbl>, income.y <dbl>, poli_soc.y <dbl>, poli_econ.y <dbl>,
## # trust_gen.y <dbl>, HH1.y <dbl>, HH2.y <dbl>, HH3.y <dbl>, …
Well..that seemed to do something, but I just don’t know what I’m looking at… I think I’ll scratch merging the data and look at each study separately for now.
I’m using the geom_col() function in the ggplot() package to draw a column graph. I’m plotting disgust about seeing a cockroach as a function of gender. I’m using a column graph because gender is categorical, as is the DS scale (a likert scale from ‘1 = not at all disgusting to 7 = extremely disgusting’)
S1_disgust_gender_plot =
ggplot(
data = data_1_raw,
mapping = aes(
x = DS6,
fill = factor(sex))) +
geom_bar() +
labs (
x = 'Disgust level',
y = 'Number of participants') +
scale_x_continuous(
breaks = c(1,7),
labels = c('not at all disgusting', 'extremely disgusting'))
print(S1_disgust_gender_plot)S2_disgust_gender_plot =
ggplot(
data = data_2,
mapping = aes(
x = DS6,
fill = factor(sex))) +
geom_bar() +
labs (
x = 'Disgust level',
y = 'Number of participants') +
scale_x_continuous(
breaks = c(1,7),
labels = c('not at all disgusting', 'extremely disgusting'))
print(S2_disgust_gender_plot)S3_disgust_gender_plot =
ggplot(
data = data_3,
mapping = aes(
x = DS6,
fill = factor(sex))) +
geom_bar() +
labs (
x = 'Disgust level',
y = 'Number of participants') +
scale_x_continuous(
breaks = c(1,7),
labels = c('not at all disgusting', 'extremely disgusting'))
print(S3_disgust_gender_plot)S1_disgust_gender_plot + S2_disgust_gender_plot + S3_disgust_gender_plotInteresting… I totally need to rename these variables but I can’t work out how so, for now, remember that 1 = males and 2 = females. Looks like females are more disgusted than females by seeing a cockroach run across the floor! This seems to change across different levels of the likert scale… Also…this plot is also skew whiff when I knit it!
Let’s do some actual statistics to see if this is significant. Both of my variables are categorical and I want to see the relationship between them, so I need to use a Chi-square test. I’m using the chisq.test for this.
# Study 1
chisq_S1_table = data_1_raw %>% select(sex, DS6)
chisq_S1 <- chisq.test(chisq_S1_table)## Warning in chisq.test(chisq_S1_table): Chi-squared approximation may be
## incorrect
print(chisq_S1)##
## Pearson's Chi-squared test
##
## data: chisq_S1_table
## X-squared = 135.05, df = 503, p-value = 1
# Study 2
chisq_S2_table = data_2 %>% select(sex, DS6)
chisq_S2 <- chisq.test(chisq_S2_table)## Warning in chisq.test(chisq_S2_table): Chi-squared approximation may be
## incorrect
print(chisq_S2)##
## Pearson's Chi-squared test
##
## data: chisq_S2_table
## X-squared = 126.96, df = 429, p-value = 1
# Study 3
chisq_S3_table = data_3 %>% select(sex, DS6)
chisq_S3 <- chisq.test(chisq_S3_table)## Warning in chisq.test(chisq_S3_table): Chi-squared approximation may be
## incorrect
print(chisq_S3)##
## Pearson's Chi-squared test
##
## data: chisq_S3_table
## X-squared = 250.69, df = 904, p-value = 1
Okay…p-value of 1 for every study..p > .05 so non-significant. Well! That sorts that out, any results I got were absolutely non-significant and therefore, no inference can be made about the gender differences in the disgust of seeing a cockroach run across the floor. I’m also not entirely sure if this warning message makes my results totally invalid. It’s also been 2 years since I’ve done chi-squared tests so my use of them may be terrible. The test statistic seems extremely high…
Week 9 Goals
- Complete statistical analyses for the remaining two questions
- Get someone to check my statistical analyses for question 1
- Find out how to make my plots knit better than they currently are!
- Make my bar plots more visually appealling and easy to read…