Welcome to the PSYC3361 coding W2 self test. The test assesses your ability to use the coding skills covered in the Week 2 online coding modules.

In particular, it assesses your ability to…

choose packages/functions
read in data
make a scatter plot
use facet_wrap
customise your plot with themes, colours and labels
use group_by and summarise

It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.

Your notes should also document the troubleshooting process you went through to arrive at the code that worked.

For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.

Good luck!!

Jenny

PS- if you get stuck have a look in the /images folder for inspiration

load the packages you need

This data requires the tidyverse and here packages. Tidyverse contains functions to read in the data read_csv and to create gruoped summaries (group_by and summarise). The here package makes it easy to tell R where the data is when you are reading it in.

library(tidyverse)
library(here)

read in the dino data

To read the data, I created a frame to store the dino data called “dino”, and used the read_csv() function to tell R to find the data “here” within the data folder.

dino <- read_csv(here("data", "dino.csv"))

seperating data to include only dino

I created another frame called “dino_only”. I then grouped by the dataset and filtered it to include only the “dino” rows using “==” meaning that the dataset must be exactly equal to dino.

dino_only <- dino %>% 
  group_by(dataset) %>% 
  filter(dataset == "dino")

reproduce this plot

The dino dataset comes from a paper illustrating the importance of plotting your data. In each of these datasets, the mean and variance of x and y are identical and the two variables are correlated in the same way (R = -0.06). When plotted, however, each reveals a very different pattern

picture <- ggplot(data = dino_only) +
  geom_point(mapping = aes(x, y))
plot(picture)

dino %>%
  ggplot(aes(x = x, y = y)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  facet_wrap(~ dataset) +
  scale_y_continuous(limits = c(0,100))

## `geom_smooth()` using formula = 'y ~ x'

what can you do to make it prettier

HINT: add some colour, play with palettes, try a different theme, add a title, subtitle, caption

picture <- ggplot(data = dino_only) +
  geom_point(mapping = aes(x, y), colour = "green4") +
  theme_minimal() +
  ggtitle(label = "Dino Graph", subtitle = "Wow dinos are so cool") +
  labs(caption = "This is a caption")
plot(picture)

dino %>%
  ggplot(aes(x = x, y = y, colour = x)) +
  geom_point() + 
  scale_color_gradientn(colours = rainbow(5)) +
  facet_wrap(~ dataset) +
  scale_y_continuous(limits = c(0,100)) + 
  theme_minimal() +
  ggtitle(label = "Plotting Graphs", subtitle = "Each graph has its own unique shape") +
  labs(caption = "Plotting graphs is an important skill to learn")

extra challenge

Can you write code to show that the mean, variance, and correlation between x and y is the same for each of the datasets?? HINT: this is a group_by and summarise problem

dino %>% 
  group_by(dataset) %>% 
  summarise (
    mean_x = mean(x), 
    mean_y = mean(y), 
    var_x = var(x), 
    var_y = var(y)
    )

## # A tibble: 13 × 5
##    dataset    mean_x mean_y var_x var_y
##    <chr>       <dbl>  <dbl> <dbl> <dbl>
##  1 away         54.3   47.8  281.  726.
##  2 bullseye     54.3   47.8  281.  726.
##  3 circle       54.3   47.8  281.  725.
##  4 dino         54.3   47.8  281.  726.
##  5 dots         54.3   47.8  281.  725.
##  6 h_lines      54.3   47.8  281.  726.
##  7 high_lines   54.3   47.8  281.  726.
##  8 slant_down   54.3   47.8  281.  726.
##  9 slant_up     54.3   47.8  281.  726.
## 10 star         54.3   47.8  281.  725.
## 11 v_lines      54.3   47.8  281.  726.
## 12 wide_lines   54.3   47.8  281.  726.
## 13 x_shape      54.3   47.8  281.  725.

Wk 2 self test

Daphne Ly

2023-06-08