My goals for the week

  1. I had seen that there were two parts to this week’s coding module so I planned to start watching the videos earlier so that I could have enough time to actually go through it. My goal was to finish working on the exercises before the Q&A session so that I could come with questions prepared however, I was unable to do so. This forced me to solve my problems independently, which I’m grateful for (as arduous as the journey was!)

  2. To learn more about data wrangling and practice using various functions that might be useful for the reproducibility project. In particular, I want to have a solid understanding of not just how to do something but why those functions exist and what exactly they do.

How did I go?

# Exercise 1

# Load the packages needed

library(tidyverse)
## ── Attaching packages ────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Import the swow data

swow <- read_tsv(file = "data_swow.csv.zip")
## Multiple files in zip: reading 'swow.csv'
## Parsed with column specification:
## cols(
##   cue = col_character(),
##   response = col_character(),
##   R1 = col_double(),
##   N = col_double(),
##   R1.Strength = col_double()
## )
swow <- swow %>% mutate(id = 1:n())

# Adding code to rename variables

swow <- swow %>%
  rename(
    n_response = R1,
    n_total = N,
    strength = R1.Strength
    )
# Exercise 2: Backward associates to the word "woman"

woman_bck <- swow %>%
  filter(response == "woman", n_response > 1)
# Exercise 3: Forward and backward associates to the word "man" in descending order

# Forward associates 

man_fwd <- swow %>%
  filter(cue == "man", n_response > 1)

# Backward associates

man_bck <- swow %>%
  filter(response == "man", n_response > 1) %>%
  arrange(desc(strength))
# Exercise 4: Using the select tool to remove the n_response, n_total columns

  # Forward associates
  
  man_fwd <- swow %>%
    filter(cue == "man", n_response > 1) %>%
    select(-n_response, -n_total)
  
  # Backward associates
  
  man_bck <- swow %>%
    filter(response == "man", n_response > 1) %>%
    arrange(desc(strength)) %>%
    select(-starts_with("n_"))
# Exercise 5: Adding the mutate code to the "man_fwd" and "man_bck" variables
  
  # Forward associates
  
  man_fwd <- swow %>%
    filter(cue == "man", n_response > 1) %>%
    select(-n_response, -n_total) %>%
    mutate(
      rank = rank(-strength),
      type = "forward",
      word = "man",
      associate = response
    )
  
  # Backward associates
  
  man_bck <- swow %>%
    filter(response == "man", n_response > 1) %>%
    arrange(desc(strength)) %>%
    select(-starts_with("n_")) %>%
    mutate(
      rank = rank(-strength),
      type = "backward",
      word = "man",
      associate = cue
    )
# Exercise 6: Creating the 'gender' variable using the bind function
  
gender <- bind_rows(woman_bck, man_fwd, man_bck) %>%
  select(id:associate) %>%
  filter(associate != "man", associate != "woman")
# Exercise 7: Trying to re-create the plot shown in the slides 

# load packages
library(tidyverse)

# import data
swow <- "data_swow.csv.zip" %>%
  read_tsv() %>%           
  mutate(id = 1:n()) %>%  
  rename(
    n_response = R1,       
    n_total = N,           
    strength = R1.Strength   
  )
## Multiple files in zip: reading 'swow.csv'
## Parsed with column specification:
## cols(
##   cue = col_character(),
##   response = col_character(),
##   R1 = col_double(),
##   N = col_double(),
##   R1.Strength = col_double()
## )
# words associated with "man" and "woman" ---------------------------------

woman_fwd <- swow %>%
  filter(cue == "woman", n_response > 10) %>%
  select(cue, response, strength, id) %>%
  mutate(
    rank = rank(-strength),  
    type = "forward",        
    word = "woman",          
    associate = response     
  )

woman_bck <- swow %>%
  filter(response == "woman", n_response > 10)  %>%
  arrange(desc(strength)) %>%
  select(cue, response, strength, id) %>%
  mutate(
    rank = rank(-strength),  
    type = "backward",       
    word = "woman",          
    associate = cue          
  )

man_fwd <- swow %>%
  filter(cue == "man", n_response > 10)  %>%
  select(-n_response, -n_total)   %>%
  mutate(
    rank = rank(-strength),  
    type = "forward",        
    word = "man",           
    associate = response    
  )

man_bck <- swow %>%
  filter(response == "man", n_response > 10) %>%
  arrange(desc(strength)) %>%
  select(-starts_with("n_")) %>% 
  mutate(
    rank = rank(-strength),  
    type = "backward",       
    word = "man",            
    associate = cue          
  )


# combining the data sets ---------------------------------------------------

gender <- bind_rows(woman_fwd, woman_bck, 
                    man_fwd, man_bck) %>%
  select(id:associate) %>%
  filter(associate != "man", associate != "woman")


# creating and plotting "gender_bck" ----------------------------------------------

gender_bck <- gender %>%        
  filter(
    type == "backward"
  ) %>% 
  pivot_wider(                    
    id_cols = associate, 
    names_from = word, 
    values_from = rank
  ) %>%
  mutate(                          
    woman = replace_na(1/woman, 0),
    man = replace_na(1/man, 0), 
    diff = woman - man
  )  %>%  
  arrange(diff) 


picture_bck <- ggplot(
  data = gender_bck,
  mapping = aes(
    x = associate %>% reorder(diff), 
    y = diff
  )) + 
  geom_col() + 
  coord_flip()

plot(picture_bck)

Challenges

  • Exercise 3: When I was trying to use the arrange function, I kept getting an error message saying “object: ‘strength’ wasn’t found” despite the fact that I had renamed the original ‘R.1 Strength’ variable to ‘strength’. When I removed the “arrange(desc(strength))” code from the forward associations, the issue resolved. When coding, it looked a bit off so I’m glad that my initial intuition was correct.

  • Exercise 6: I was confused with the wording on the instructions. I wasn’t sure what exactly “boring” associations were, so I refrained from doing anything because I didn’t want to mess up the data. I kind of cheated by looking at exercise 7 for guidance on that particular instruction. Do I regret it? No Do I wish I could’ve figured it out on my own? yes, but my hairs might’ve turned grey by then

  • Exercise 7: This was a CHALLENGE. Initially, I looked at the differences between the two plots and noticed that the one on the slides was more leaner? More narrow looking. I thought maybe using ‘pivot_longer’ instead of ‘pivot_wider’ would have helped me get the plot looking more like the one on the slide. However, I kept getting error messages so I wasn’t sure what to do. I then realised that the ‘diff’ variable on the y axis ranged from -0.5 to +0.5, whereas it ranged from -1 to +1 on the slide. So I thought maybe changing the n_responses in the filter function would help? I changed it from 1 to 5 and noticed that the number of associates had minimised. It was then a trial and error process of changing the numbers until the plot looked the same.

  • General: One of the tasks in exercise 7 was to explain what the code does by leaving comments. I realised that while I was able to write the code, I wasn’t actually sure what the code was doing. This is a bit of an issue because leaving comments explaining what the code is doing are kind of…fundamental really, especially when sharing your data. I might need to explore that skill more but I’m not exactly sure how.

Strengths

  • This week was slightly more difficult, for a number of reasons. I do think that I am getting more comfortable coding and more comfortable making mistakes and persevering through it all. My initial sort of, feeling about whether or not the code looks right tends to be correct and I’m happy that I’m learning!