Just a disclaimer, I realised that you cannot knit projects if the code is faulty so instead I’ll show screenshots of the mistakes I’ve made. Maybe it is possible to display incorrect code in the chunks but I do not yet know how so in the meantime this is how I am dealing with it.

The Goal

This week, my goal was to finish the data wrangling series and, more specifically, attempt to do Exercise 7, which was to create a graph showing the backwards responses.

Progress

After following along with Danielle’s tutorials and managing to produce the forwards graph, I figured the backwards one shouldn’t be that difficult.

If only I knew how wrong I was.

To get an idea of all the difficulties I faced, let’s start at the beginning:

#load packages

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# import data and make some minor adjustments

swow <- "file_show.zip" %>% 
  read_tsv() %>% 
  mutate(id = 1:n()) %>% 
  rename(
    n_response = R1,
    n_total = N,
    strength = R1.Strength
  )
## ! Multiple files in zip: reading ''swow.csv''
## 
## -- Column specification --------------------------------------------------------
## cols(
##   cue = col_character(),
##   response = col_character(),
##   R1 = col_double(),
##   N = col_double(),
##   R1.Strength = col_double()
## )
# words associated with 'man' and 'woman'

woman_fwd <- swow %>% 
  filter(cue == "woman", n_response > 1) %>% 
  select(cue, response, strength, id) %>% 
  mutate(
    rank = rank(-strength),
    type = "forward",
    word = "woman",
    associate = response
  )

woman_bck <- swow %>% 
  filter(response == "woman", n_response > 1) %>% 
  arrange(desc(strength)) %>% 
  select(cue, response, strength, id) %>% 
  mutate(
    rank = rank(-strength),
    type = "backward",
    word = "woman",
    associate = cue
  )

man_fwd <- swow %>%
  filter(cue == "man", n_response > 1)  %>%
  select(-n_response, -n_total)   %>%
  mutate(
    rank = rank(-strength),  
    type = "forward",        
    word = "man",            
    associate = response  
  )

man_bck <- swow %>%
  filter(response == "man", n_response > 1) %>%
  arrange(desc(strength)) %>%
  select(-starts_with("n_")) %>% 
  mutate(
    rank = rank(-strength),  
    type = "backward",       
    word = "man",            
    associate = cue          
  )

# combine data sets

gender <- bind_rows(woman_fwd, woman_bck, man_bck, man_fwd) %>% 
  select(id:associate) %>% 
  filter(associate != "man", associate != "woman")

Alright, so that caught us up to where the tutorial ended - with our data sets for backwards and forwards associations for the words woman and man.

Now, to attempt to make this graph below (from Danielle’s video) which shows associated words and their relative strengths to man and woman for the backwards association.

# First, need to make the gender back data set 

gender_bck <- gender %>% 
  filter(
    type == "backward"
  ) %>% 
  pivot_wider(
    id_cols = associate,
    names_from = word,
    values_from = rank
  ) %>% 
  mutate(
    woman = (1/woman) %>% replace_na(0),
    man = (1/man) %>%  replace_na(0),
    diff = woman - man
  ) %>% 
  arrange (diff)

# let's see what this looks like 

print(gender_bck)
## # A tibble: 477 x 4
##    associate    woman    man    diff
##    <chr>        <dbl>  <dbl>   <dbl>
##  1 macho            0 1      -1     
##  2 handy            0 0.333  -0.333 
##  3 guy              0 0.25   -0.25  
##  4 super            0 0.2    -0.2   
##  5 Mr               0 0.167  -0.167 
##  6 bloke            0 0.143  -0.143 
##  7 gingerbread      0 0.125  -0.125 
##  8 fellow           0 0.105  -0.105 
##  9 testosterone     0 0.105  -0.105 
## 10 hunk             0 0.0909 -0.0909
## # ... with 467 more rows

Well, there appears to a lot of rows present - more than there could be graphed clearly. I noticed in Danielle’s plot only the first 19 and the bottom 19 words were included, but before I filter the rest out I was a little curious to see how this would look if I was to graph it as is.

pic_bck <- ggplot(
  gender_bck, 
  aes(
    x = associate %>% reorder(diff), 
    y = diff)
  ) +
  geom_col() +
  coord_flip()

plot(pic_bck)

Yep, that looks absolutely horrendous

Moving on, let’s see if I can filter out a majority of those variables and leave only the first and last 19 to be graphed.

Attempt no. 1

I tried to see if I could insert an extra line into the filter function when making gender_bck data set

Well it appears that did not work at all. Maybe instead I should replace associate with diff?

Attempt no. 2

And yet another success; It might be time for the good ol’ google.

Use the top_n() function

Attempt no. 3

# third time's the charm?

gender_bck <- gender %>% 
  filter(
    type == "backward"
  ) %>% 
  top_n(19) %>% 
  top_n(-19) %>% 
  pivot_wider(
    id_cols = associate, 
    names_from = word, 
    values_from = rank
  ) %>%
  mutate(
    woman = replace_na(1/woman, 0),
    man = replace_na(1/man, 0),
    diff = woman - man
  )  %>%  
  arrange(diff)
## Selecting by associate
## Selecting by associate
print(gender_bck)
## # A tibble: 19 x 4
##    associate   woman     man     diff
##    <chr>       <dbl>   <dbl>    <dbl>
##  1 wise      0       0.0179  -0.0179 
##  2 waiter    0       0.0142  -0.0142 
##  3 women     0       0.0118  -0.0118 
##  4 wicker    0       0.01    -0.01   
##  5 wing      0       0.00775 -0.00775
##  6 working   0       0.00678 -0.00678
##  7 woodsman  0       0.00501 -0.00501
##  8 wanted    0       0.00373 -0.00373
##  9 weather   0       0.00373 -0.00373
## 10 wolf      0       0.00373 -0.00373
## 11 young     0       0.00373 -0.00373
## 12 waist     0.00526 0        0.00526
## 13 waitress  0.00631 0        0.00631
## 14 wonder    0.0139  0        0.0139 
## 15 womanly   0.0175  0        0.0175 
## 16 whore     0.0198  0        0.0198 
## 17 widow     0.0238  0        0.0238 
## 18 wife      0.0238  0        0.0238 
## 19 womb      0.0294  0        0.0294

Aha, so… it kinda worked. From what I can gather, it seems that one of the top_n() functions has overridden the other, leaving only the bottom 19 words remaining.

It may be possible to insert the functions in a way where both the top and bottom 19 are kept, but at this point I am running out of patience and am going to take the long approach.

I intend to create two separate data sets (one containing the top 19 and the other containing the bottom 19) and then merge them together to create the complete and, hopefully, final set of data.

# Here's the first one

gender_bckone <- gender %>% 
  filter(
    type == "backward",
  ) %>% 
  pivot_wider(
    id_cols = associate, 
    names_from = word, 
    values_from = rank
  ) %>%
  mutate(
    woman = replace_na(1/woman, 0),
    man = replace_na(1/man, 0),
    diff = woman - man,
  )  %>%  
  arrange(diff) %>% 
  top_n(-19)
## Selecting by diff
# And the second one

gender_bcktwo <- gender %>% 
  filter(
    type == "backward",
  ) %>% 
  pivot_wider(
    id_cols = associate, 
    names_from = word, 
    values_from = rank
  ) %>%
  mutate(
    woman = replace_na(1/woman, 0),
    man = replace_na(1/man, 0),
    diff = woman - man,
  )  %>%  
  arrange(diff) %>% 
  top_n(19)
## Selecting by diff
# Now, let's merge these

pleasework <- bind_rows(gender_bckone, gender_bcktwo)

print(pleasework)
## # A tibble: 38 x 4
##    associate    woman    man    diff
##    <chr>        <dbl>  <dbl>   <dbl>
##  1 macho            0 1      -1     
##  2 handy            0 0.333  -0.333 
##  3 guy              0 0.25   -0.25  
##  4 super            0 0.2    -0.2   
##  5 Mr               0 0.167  -0.167 
##  6 bloke            0 0.143  -0.143 
##  7 gingerbread      0 0.125  -0.125 
##  8 fellow           0 0.105  -0.105 
##  9 testosterone     0 0.105  -0.105 
## 10 hunk             0 0.0909 -0.0909
## # ... with 28 more rows

IT WORKED

38 rows. What an absolute relief.

I’m sure there is a faster, more efficient way of doing that but for now I’m quite happy with what I’ve accomplished.

Now for the actual plotting.

yay <- ggplot(
  pleasework,
  aes(
    x = associate %>% reorder(diff),
    y = diff
  )
) +
  geom_col() +
  coord_flip()

plot(yay)

What a beauty that is

And it is done, though I really am not a fan of the labelled axes so let me quickly clean this up a bit.

yay <- ggplot(
  pleasework,
  aes(
    x = associate %>% reorder(diff),
    y = diff
  )
) +
  geom_col() +
  coord_flip() +
  scale_x_discrete(name = "Associated Words") +
  scale_y_continuous(name = "Difference In Associative Strength") +
  ggtitle("Word Associations For 'Man' and 'Woman'")

plot(yay)

Voila

Successes/Challenges

I managed to fulfill my goal of finishing the final exercise of the final coding series and recreating the backwards plot for the man and woman word associations.

However, it was quite the challenge. Although I showed some of the errors I made above, there were many other incidences of sheer mental inadequacy in the process of achieving my goal but for the sake of brevity and pride I chose to exclude those from my logbook entry. It was ultimately tough but it was a good experience as I feel a little more confident in my ability and familiar with the language of R.

Next steps

Now that I have finished all the coding tutorials, I will start applying what I’ve learnt to the COVID paper data to get a sense of what awaits me and all the relevant code that I will need to continue learning to successfully reproduce the data.