Hey everyone, my goals for this week were to transition onto the PC version of RStudio and to complete the data wrangling modules.
After the data visualisation module, I decided to try and recreate one of the histograms from my groups paper before the Friday workshop. I downloaded the data used in the paper, and imported into my RStudio.
I first had to load a few packages:
library(lm.beta)
library(lsr)
library(tidyr)
library(psych)
library(ggplot2)
I then looked at the histogram in the paper to determine which variables I was trying to plot:
So here, I need to plot frequency against social connectedness difference. To do this, I searched up how to make a histogram in R, and learned about the hist function:
hist(Data$SCdiff,
xlab = "Social Connectedness Difference Score (T2 - T1)",
main = "Distribution of social connectedness difference scores (Study 1)", cex.main = 0.75, col = "darkgrey")
There are a few things to note here:
hist(Data$SCdiff = tells R to compute a histogram of the data values in the social connectedness difference column from the dataframe.
xlab = lets you label the x axis.
main = lets you label the title.
cex.main = lets you change the size of the title font.
col = lets you change the colour of the histogram columns.
Something I would like to do is to learn how to control to height and width of the plot. I might potentially have to learn how to recreate the plot using geom_histogram instead to get an exact replication, as you can see the aesthetics do not exactly match.
In the Hello data modules, I was able to get a grasp on how to use piping to make code look neat (input from exercise 4/5):
forensic_banded <- forensic %>%
group_by(band, participant, handwriting_expert) %>%
summarise(mean_est = mean(est), sd_est = sd(est)) %>%
ungroup()
# Print a small table of the data
print(forensic_banded)
## # A tibble: 478 x 5
## band participant handwriting_expert mean_est sd_est
## <chr> <dbl> <chr> <dbl> <dbl>
## 1 Band 01 1 HW Expert 9.5 16.8
## 2 Band 01 2 Novice 6.26 14.9
## 3 Band 01 3 Novice 31.2 44.1
## 4 Band 01 4 Novice 31.7 26.1
## 5 Band 01 5 Novice 23.1 15.2
## 6 Band 01 6 Novice 3.42 2.70
## 7 Band 01 7 Novice 12 7.20
## 8 Band 01 8 HW Expert 17.1 16.8
## 9 Band 01 9 HW Expert 8.25 8.85
## 10 Band 01 10 HW Expert 11.8 11.7
## # ... with 468 more rows
# Draw a plot
picture <- ggplot(data = forensic_banded) +
geom_violin(mapping = aes(x = band, y = mean_est)) +
xlab("Stimulus Band") +
ylab("Responses") +
ggtitle("Distribution of responses")
plot(picture)
The way I understand it, piping allows for you to essentially say “and then do this”, which makes things feel quite intuitive.
Creating a new csv file that includes the code you have used. In this case, grouping and summarising the data_reasoning.csv data and creating a new csv file called “my_data_summary.csv”:
#load packages
library(tidyverse)
#read data
frames <- read_csv(file = "data_reasoning.csv")
# piping and summarising
my_summary <- frames %>%
group_by(test_item, condition, sample_size) %>%
summarise(
mean_resp = mean(response),
sd_resp = sd(response)
) %>%
ungroup
# write summary to file
write_csv(my_summary, path = "my_data_summary.csv")
This concludes the hello data section of the data wrangling module.
Successes:
Transitioning off RStudio Cloud and onto RStudio
Discovering and successfully implementing the hist function + aesthetics for my paper.
Completing Hello data modules.
Becoming more familiar with RMarkdown for learning logs.
Challenges:
Not being able to understand some error messages.
Not being able to exactly replicate the aesthetics of the histogram in my paper using the hist function.
I planned to get through all of the data wrangling modules for this week, but it took longer to get through than I anticipated.
Learn how to use geom_histogram for my paper.
Finish the second half of data wrangling module.
Meet with my workshop group.