From the Top

Everything here will cover my overall thoughts however I still want to keep the admittedly long-winded code inputs and outputs further down so I can refer to my own challenges and how they were fixed as seen in the code. I would like to use these learning logs as self-reference to improve into the future.

Goals For Week 3

  1. Install R onto both my desktop and laptop and understand how to export all the files and get it working together

  2. Attend the Tuesday QnA session

  3. Complete all Data Wrangling videos

  4. Experiment on Rmarkdown and just have fun with it, explore all the different possible options

How Week 3 Went

Overall, I think this was a much harder week than the last two in terms of absorbing overall content. There was a lot more to take in regarding functions, and the many ways that they can interact in particular the use of the pipe function.I know their general purpose but understanding it on a more fundamental and deeper level is going to take some time with practice. I found a lot of errors along the way, but it’s just par for the course with coding :)

Challenges

The challenges are mentioned in the code but listed here as well:

  • The main challenge that I had was working with exporting files from the online rstudio into the desktop versions and working with the different working directories. Trying to sort that out was a bit of nightmare but I eventually got it down.

  • I’ve learnt that running code in rmarkdown usually requires the output to by fully knitted instead of just running the chunk itself especially when some components are separated. This took some time to get used to.

  • Another thing I’ve noticed is that I still don’t fully understand the full utility for each function, I think this comes from a lack of self-practice and experimentation and I aim to do more into the coming weeks.

  • The last big challenge that I’ve had is time management and work flow. I work to get these lectures and learning log done within the last 2 days instead of working consistently across the week. I really also aim to change this aswell.

Successes

  • The main error was sorting out the working directory but apart from that, I was for the most part able to solve each error that popped up a lot easier than previous weeks since I’m used to seeing what I have to be on the look out for in the listed error (console). So dealing with these errors has been a personal success

  • I was able to solve most of my issues on my own, while of course this isn’t necessarily a good thing because getting help from others is extremely great. However if it comes down to it, I’m getting more used to the pressure of dealing with coding issues.

  • The last success is that I completed 3 of 4 coding goals I had set for the week.

Next steps in my coding journey

  1. Improving work flow and time management with regards to the workload
  2. Reviewing content from week 1 to 3 in preparation for the group project
  3. Experiment with the designs and explore how far I can truly take my coding skills!

Just Getting Started

"load packages"
## [1] "load packages"
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.4     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
"read the data"
## [1] "read the data"
frames <- read.csv(file = "data_reasoning.csv")

Inspecting the data

Normally would be under a code chunk however the command would be print(frames) in order to get an inspection of all the data, however there are just WAY too many observations to actually output this.

Learning about The Pipe

"Utilising the pipe"  
## [1] "Utilising the pipe"
frames %>% group_by(test_item) %>% summarise(mean_resp = mean(response)) 
## # A tibble: 7 x 2
##   test_item mean_resp
##       <int>     <dbl>
## 1         1      6.77
## 2         2      6.88
## 3         3      5.71
## 4         4      4.48
## 5         5      3.76
## 6         6      3.43
## 7         7      3.26

One note is that running the chunk above by itself does not output anything and leads to an error, stating error with the piping function. However, in the overall knit it does output it.

"No grouping statement = grand mean"
## [1] "No grouping statement = grand mean"
frames %>% summarise(mean_resp = mean(response), sd_resp = sd(response))
##   mean_resp  sd_resp
## 1  4.898201 3.042252
"same code but tidier and breaking down all the variables"
## [1] "same code but tidier and breaking down all the variables"
my_summary <- frames %>% 
  group_by(test_item, condition, sample_size) %>% 
  summarise(
    mean_resp = mean(response), 
    sd_resp = sd(response)
  )
## `summarise()` has grouped output by 'test_item', 'condition'. You can override using the `.groups` argument.
print(my_summary)
## # A tibble: 42 x 5
## # Groups:   test_item, condition [14]
##    test_item condition sample_size mean_resp sd_resp
##        <int> <chr>     <chr>           <dbl>   <dbl>
##  1         1 category  large            7.60    2.36
##  2         1 category  medium           7.32    2.49
##  3         1 category  small            6.07    2.82
##  4         1 property  large            7.16    2.23
##  5         1 property  medium           6.66    2.40
##  6         1 property  small            5.78    2.57
##  7         2 category  large            7.51    2.01
##  8         2 category  medium           7.17    1.99
##  9         2 category  small            6.26    2.28
## 10         2 property  large            7.20    1.84
## # ... with 32 more rows
"Ungrouping, and it get rids of all the groups - a wise programming strategy but I am unsure as to why"
## [1] "Ungrouping, and it get rids of all the groups - a wise programming strategy but I am unsure as to why"
my_summary <- frames %>% 
  group_by(test_item, condition, sample_size) %>% 
  summarise(
    mean_resp = mean(response), 
    sd_resp = sd(response)
  ) %>% 
  ungroup
## `summarise()` has grouped output by 'test_item', 'condition'. You can override using the `.groups` argument.
print(my_summary)
## # A tibble: 42 x 5
##    test_item condition sample_size mean_resp sd_resp
##        <int> <chr>     <chr>           <dbl>   <dbl>
##  1         1 category  large            7.60    2.36
##  2         1 category  medium           7.32    2.49
##  3         1 category  small            6.07    2.82
##  4         1 property  large            7.16    2.23
##  5         1 property  medium           6.66    2.40
##  6         1 property  small            5.78    2.57
##  7         2 category  large            7.51    2.01
##  8         2 category  medium           7.17    1.99
##  9         2 category  small            6.26    2.28
## 10         2 property  large            7.20    1.84
## # ... with 32 more rows

Attempting Exercise 3

forensic <- read.csv("data_forensic.csv")
"Participant 1 summary - unecessarily complex since reading it inside out"
## [1] "Participant 1 summary - unecessarily complex since reading it inside out"
participant1 <- ungroup(
  summarise(
    group_by(
      filter(forensic, participant == 1),
      band
    ),
    mean = mean(est), 
    sd = sd(est)  
  )
)

print(participant1)
## # A tibble: 6 x 3
##   band     mean    sd
##   <chr>   <dbl> <dbl>
## 1 Band 01   9.5  16.8
## 2 Band 25  46.7  31.2
## 3 Band 50  53.5  28.3
## 4 Band 75  50.4  24.1
## 5 Band 99  80.9  29.1
## 6 Band NA  NA    NA

One thing I think I just resolved with the problem I had above about the chunk not being able to run by itself is because loading the data is in a separate chunk in comparison to the chunk for codes, hence when I run the code chunk by itself - it doesn’t register - where as when knitting, its run in tandem and thus works.

"Participant 2 Summary"
## [1] "Participant 2 Summary"
x <- filter(forensic, participant == 2)
y <- group_by(x, band)
z <- summarise(y, mean = mean(est), sd = sd(est))
participant2 <- ungroup(z)

print(participant2)
## # A tibble: 5 x 3
##   band     mean    sd
##   <chr>   <dbl> <dbl>
## 1 Band 01  6.26 14.9 
## 2 Band 25 10.9   9.62
## 3 Band 50 17.6  15.9 
## 4 Band 75 21.9  16.5 
## 5 Band 99 49.5  20.6
"Participant 3 Summary"
## [1] "Participant 3 Summary"
participant3 <- forensic %>%
  filter(participant == 3) %>%
  group_by(band) %>%
  summarise(mean = mean(est), sd = sd(est)) %>%
  ungroup()

print(participant3)
## # A tibble: 5 x 3
##   band     mean    sd
##   <chr>   <dbl> <dbl>
## 1 Band 01  31.2  44.1
## 2 Band 25  46.2  49.4
## 3 Band 50  48.3  47.8
## 4 Band 75  60.7  47.8
## 5 Band 99  85    33.7

Overall, it seems like participant 3 is the cleanest however I’m still not fully sure as to why, the nunace in why it is much cleaner and the different between participant 2 and 3 I can’t really see. The difference with participant 1 is starkly clear though. - The basic idea is that the groups go first (variable groupings) and summarise (for results like mean and sd) are used second, but together hand in hand. You ungroup them so the results are not treated as group again.

Writing Data

After pressing ctrl-shift-s (sourcing), it will create a new data set in the files section for you

my_summary <- frames %>% 
  group_by(test_item, condition, sample_size) %>% 
  summarise(
    mean_resp = mean(response), 
    sd_resp = sd(response)
  ) %>% 
  ungroup
## `summarise()` has grouped output by 'test_item', 'condition'. You can override using the `.groups` argument.
write_csv(my_summary, path = "my_data_summary.csv")
## Warning: The `path` argument of `write_csv()` is deprecated as of readr 1.4.0.
## Please use the `file` argument instead.

Part II - Dplyr, or a dance with data

Importing the snow data

"read the data"
## [1] "read the data"
swow <- readr::read_tsv("data_swow.csv.zip")
## ! Multiple files in zip: reading ''swow.csv''
## 
## -- Column specification --------------------------------------------------------
## cols(
##   cue = col_character(),
##   response = col_character(),
##   R1 = col_double(),
##   N = col_double(),
##   R1.Strength = col_double()
## )
swow <- swow %>% mutate(id = 1:n())

One thing I wanted to note here is I was often having trouble not being able to find read_csv for the the top ones and just went with the function read.csv. However I just found out the package that uses the function for read_csv and found it by searching up how to get read_tsv. So by searching that package up, it allowed for more options than I got usually just by the automatic fill-in.

Automatic name cleaning

library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
swow <- clean_names(swow)

What is better: manual name cleaning

swow <- readr::read_tsv("data_swow.csv.zip")
## ! Multiple files in zip: reading ''swow.csv''
## 
## -- Column specification --------------------------------------------------------
## cols(
##   cue = col_character(),
##   response = col_character(),
##   R1 = col_double(),
##   N = col_double(),
##   R1.Strength = col_double()
## )
swow <- swow %>% mutate(id = 1:n())

swow <- swow %>% 
  rename(n_response = R1, n_total = N, strength = R1.Strength)

Was having an issue where it wouldn’t read the swow data even though it is clearly in the lines above, I just checked again and now it works which is really strange. No clue what was the initial cause of the error.

Data Filtering, Arranging, Selection

woman_fwd <- swow %>% 
  filter(cue == "woman",
         n_response > 1)

ggplot(woman_fwd) + 
    geom_col(aes(
      x = response, 
      y = strength 
    )) + 
    coord_flip()

Attempting Exercise 2 - Backwards Association

woman_bck <- swow %>% 
  filter(response == "woman",
         n_response > 1) %>% 
          arrange(desc(strength))

Forward and Backward associations for man

man_fwd <- swow %>%  
  filter(cue == "man", n_response > 1)

man_bck <- swow %>%  
  filter(response == "man", n_response > 1) %>% 
  arrange(desc(strength))

Selecting variables

woman_fwd <- swow %>% 
  filter (cue == "woman",n_response > 1) %>%
  select (cue, response, strength, id)

print(woman_fwd)
## # A tibble: 8 x 4
##   cue   response strength     id
##   <chr> <chr>       <dbl>  <int>
## 1 woman man          0.38 477315
## 2 woman female       0.22 477316
## 3 woman girl         0.07 477317
## 4 woman lady         0.05 477318
## 5 woman beauty       0.02 477319
## 6 woman me           0.02 477320
## 7 woman strong       0.02 477321
## 8 woman wife         0.02 477322

For a while, the id variable couldn’t be found and the reason why was because I did not mutate the variable when reading the data. I deleted it as it was in the automatic clean up version instead option for the manual one.

Testing Mutate once more

woman_fwd %>% 
  mutate(rank = rank(-strength))
## # A tibble: 8 x 5
##   cue   response strength     id  rank
##   <chr> <chr>       <dbl>  <int> <dbl>
## 1 woman man          0.38 477315   1  
## 2 woman female       0.22 477316   2  
## 3 woman girl         0.07 477317   3  
## 4 woman lady         0.05 477318   4  
## 5 woman beauty       0.02 477319   6.5
## 6 woman me           0.02 477320   6.5
## 7 woman strong       0.02 477321   6.5
## 8 woman wife         0.02 477322   6.5

The following that I’ve done above has been a bit chaotic and unorganized because small tricks I have tested here and there, sometimes doubling up with a comparison - therefore for the rest of the following lectures I would like a fresh start

A fresh start with Binding and Pivoting

"forward associations for woman"
## [1] "forward associations for woman"
woman_fwd <- swow %>% 
  filter (cue == "woman", n_response > 1) %>% 
  select (cue, response, strength, id) %>%
  mutate(
      rank= rank(-strength),
      type = "forward", 
      word = "woman", 
      association = response
  )
"backward associations for woman"
## [1] "backward associations for woman"
woman_bck <- swow %>% 
  filter (response == "woman", n_response > 1) %>% 
  arrange(desc(strength)) %>% 
  select (-n_response, -n_total) %>%
  mutate(
      rank= rank(-strength),
      type = "backward", 
      word = "woman", 
      association = cue
  )
"forward associations for man"
## [1] "forward associations for man"
man_fwd <- swow %>% 
  filter (cue == "man", n_response > 1) %>% 
  select (cue, response, strength, id) %>%
  mutate(
      rank= rank(-strength),
      type = "forward", 
      word = "man", 
      association = response
  )
"backward associations for man"
## [1] "backward associations for man"
man_bck <- swow %>% 
  filter (response == "man", n_response > 1) %>% 
  select (-n_response, -n_total) %>%
  arrange(desc(strength)) %>% 
  mutate(
      rank= rank(-strength),
      type = "backward", 
      word = "man", 
      association = cue
  )

Binding the variables

gender <- bind_rows(
  woman_fwd, woman_bck,man_fwd, man_bck) %>% 
  select(id:association) %>% 
  filter(association != "man", association !="woman")

Pivoting the Data

love <- read_csv("data_love.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   colour = col_character(),
##   heart = col_character(),
##   book = col_character()
## )
pivot_longer(data = love, cols = c(heart, book), names_to = "object", values_to = "emoji")
## # A tibble: 8 x 3
##   colour object emoji       
##   <chr>  <chr>  <chr>       
## 1 blue   heart  "\U0001f499"
## 2 blue   book   "\U0001f4d8"
## 3 green  heart  "\U0001f49a"
## 4 green  book   "\U0001f4d7"
## 5 yellow heart  "\U0001f49b"
## 6 yellow book   "\U0001f4d2"
## 7 orange heart  "\U0001f9e1"
## 8 orange book   "\U0001f4d9"

Reshaping the data

gender_fwd <- gender %>% 
  filter(
    type == "forward"
  ) %>% 
  pivot_wider(
    id_cols = association, 
    names_from = word,
    values_from = rank
  ) %>% 
  mutate(
    woman = (1/woman) %>% replace_na(0),
    man = (1/man) %>% replace_na(0),
    diff = woman - man
  )

ggplot(data = gender_fwd) + 
  geom_col(mapping = aes(
    y = diff,
    x = association %>% reorder(diff)
  )) + 
  coord_flip()

Another thing that I’ve realised is I copy the code when Dani is on the slides, and I don’t pay attention to alot of the key functions and reasons as to why we use them and use her writing it up as confirmation. It shouldn’t accordingly. be this way. I should listen then type in the code