Coding goals for this week:

One of my goals for this week was to go through the final module on data wrangling. I did get through most of it, but I only have a few videos remaining, so I will finish watching those next week! My other goal was to practice adding summary statistics on violin plots with box plots, since I will need to know this for my group project :D

Successes and challenges:

I was able to successfully create basic a violin plot with summary statistics, followed by a more complex one that included box plots and colours using the demo data set called “ToothGrowth”. This made me very happy!!

data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth, 4)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
library(ggplot2)
theme_set(
  theme_classic() +
    theme(legend.position = "top")
  )
e <- ggplot(ToothGrowth, aes(x = dose, y = len))


e + geom_violin(trim = FALSE) + 
  stat_summary(
    fun.data = "mean_sdl",  fun.args = list(mult = 1), 
    geom = "pointrange", color = "black"
    )

e + geom_violin(aes(fill = dose), trim = FALSE) + 
  geom_boxplot(width = 0.2)+
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  theme(legend.position = "none")

However, this certainly did not come without challenges.

I kept getting this warning message:

Continuous x aesthetic – did you forget aes(group=…)?

and I had NO IDEA how to fix it. I sought help from Google but alas, none of the solutions that were offered helped. I will definitely attend the live Q and A this week to learn more about this type of error! Attending the live Q and A sessions (and watching the recordings when I haven’t been able to make it) have been great so far, but not knowing what I don’t know unfortunately doesn’t allow me to get the most out of these sessions.

Though I was also happy that I was able to apply what I learnt about the pipe and a dot graph using the dataset from the module!

library(tidyverse)
## ── Attaching packages ──────── tidyverse 1.3.0 ──
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ✓ purrr   0.3.3
## ── Conflicts ─────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
frames <- read_csv(file = "data_reasoning.csv")
## Parsed with column specification:
## cols(
##   id = col_double(),
##   gender = col_character(),
##   age = col_double(),
##   condition = col_character(),
##   sample_size = col_character(),
##   n_obs = col_double(),
##   test_item = col_double(),
##   response = col_double()
## )
by_item <- frames %>%
  group_by(gender) %>%
  summarise(
    mean_response = mean(response),
    sd_response = sd(response),
    n_response = n()
  )

pic <- ggplot(
  data = by_item,
  mapping = aes(
    x = gender, 
    y = mean_response
  )
) +
geom_point()

pic + theme_minimal()

Admittedly, I did struggle with what exactly the pipe meant at first. Based on Dani’s explanations and some wider reading, pipes aim to made multiple functions more readable and seem to translate to “and then”

E.g. by_item <- frames “and then” group_by(test_item) “and then” summarise()…

So all of these things, plus seeing the pipes play out in R really helped me understand the concept a lot more!

Next steps in my coding journey:

In the coming week, I aim to finish the remaining videos from the data wrangling module, learn more about how to create violin plots with included boxplots for the upcoming group project, as well as meet with my group! :D