Goals for this week

Hey everyone, my goals for this week were to transition onto the PC version of RStudio and to complete the data wrangling modules.

Creating histograms for the group 5 project paper

After the data visualisation module, I decided to try and recreate one of the histograms from my groups paper before the Friday workshop. I downloaded the data used in the paper, and imported into my RStudio.

I first had to load a few packages:

library(lm.beta)
library(lsr)
library(tidyr)
library(psych) 
library(ggplot2)

I then looked at the histogram in the paper to determine which variables I was trying to plot:

So here, I need to plot frequency against social connectedness difference. To do this, I searched up how to make a histogram in R, and learned about the hist function:

hist(Data$SCdiff, 
     xlab = "Social Connectedness Difference Score (T2 - T1)", 
     main = "Distribution of social connectedness difference scores (Study 1)", cex.main = 0.75, col = "darkgrey")

There are a few things to note here:

  • hist(Data$SCdiff = tells R to compute a histogram of the data values in the social connectedness difference column from the dataframe.

  • xlab = lets you label the x axis.

  • main = lets you label the title.

  • cex.main = lets you change the size of the title font.

  • col = lets you change the colour of the histogram columns.

Something I would like to do is to learn how to control to height and width of the plot. I might potentially have to learn how to recreate the plot using geom_histogram instead to get an exact replication, as you can see the aesthetics do not exactly match.

Data wrangling - Hello data

Piping

In the Hello data modules, I was able to get a grasp on how to use piping to make code look neat (input from exercise 4/5):

forensic_banded <- forensic %>%
  group_by(band, participant, handwriting_expert)  %>%
  summarise(mean_est = mean(est), sd_est = sd(est)) %>%
  ungroup()

# Print a small table of the data
print(forensic_banded)
## # A tibble: 478 x 5
##    band    participant handwriting_expert mean_est sd_est
##    <chr>         <dbl> <chr>                 <dbl>  <dbl>
##  1 Band 01           1 HW Expert              9.5   16.8 
##  2 Band 01           2 Novice                 6.26  14.9 
##  3 Band 01           3 Novice                31.2   44.1 
##  4 Band 01           4 Novice                31.7   26.1 
##  5 Band 01           5 Novice                23.1   15.2 
##  6 Band 01           6 Novice                 3.42   2.70
##  7 Band 01           7 Novice                12      7.20
##  8 Band 01           8 HW Expert             17.1   16.8 
##  9 Band 01           9 HW Expert              8.25   8.85
## 10 Band 01          10 HW Expert             11.8   11.7 
## # ... with 468 more rows
# Draw a plot
picture <- ggplot(data = forensic_banded) + 
  geom_violin(mapping = aes(x = band, y = mean_est)) + 
  xlab("Stimulus Band") + 
  ylab("Responses") + 
  ggtitle("Distribution of responses")

plot(picture)

The way I understand it, piping allows for you to essentially say “and then do this”, which makes things feel quite intuitive.

Writing data

Creating a new csv file that includes the code you have used. In this case, grouping and summarising the data_reasoning.csv data and creating a new csv file called “my_data_summary.csv”:

#load packages
library(tidyverse)

#read data
frames <- read_csv(file = "data_reasoning.csv")

# piping and summarising
my_summary <- frames %>%
  group_by(test_item, condition, sample_size) %>%
  summarise(
    mean_resp = mean(response),
    sd_resp = sd(response)
  ) %>% 
  ungroup

# write summary to file
write_csv(my_summary, path = "my_data_summary.csv")

This concludes the hello data section of the data wrangling module.

Challenges and successes

Successes:

  • Transitioning off RStudio Cloud and onto RStudio

  • Discovering and successfully implementing the hist function + aesthetics for my paper.

  • Completing Hello data modules.

  • Becoming more familiar with RMarkdown for learning logs.

Challenges:

  • Not being able to understand some error messages.

  • Not being able to exactly replicate the aesthetics of the histogram in my paper using the hist function.

  • I planned to get through all of the data wrangling modules for this week, but it took longer to get through than I anticipated.

Goals for week 4

  1. Learn how to use geom_histogram for my paper.

  2. Finish the second half of data wrangling module.

  3. Meet with my workshop group.