Welcome to the PSYC3361 coding W2 self test. The test assesses your ability to use the coding skills covered in the Week 2 online coding modules.

In particular, it assesses your ability to…

  • choose packages/functions
  • read in data
  • make a scatter plot
  • use facet_wrap
  • customise your plot with themes, colours and labels
  • use group_by and summarise

It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.

Your notes should also document the troubleshooting process you went through to arrive at the code that worked.

For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.

Good luck!!

Jenny

PS- if you get stuck have a look in the /images folder for inspiration

Load the packages you need

library(tidyverse)

Read in the dino data

dino_data <- read_csv("data/dino.csv")
print(dino_data)
## # A tibble: 1,846 × 3
##    dataset     x     y
##    <chr>   <dbl> <dbl>
##  1 dino     55.4  97.2
##  2 dino     51.5  96.0
##  3 dino     46.2  94.5
##  4 dino     42.8  91.4
##  5 dino     40.8  88.3
##  6 dino     38.7  84.9
##  7 dino     35.6  79.9
##  8 dino     33.1  77.6
##  9 dino     29.0  74.5
## 10 dino     26.2  71.4
## # ℹ 1,836 more rows

reproduce this plot

The dino dataset comes from a paper illustrating the importance of plotting your data. In each of these datasets, the mean and variance of x and y are identical and the two variables are correlated in the same way (R = -0.06). When plotted, however, each reveals a very different pattern

dino_data %>%
  ggplot(data = dino_data, mapping = aes(x,y)) + geom_point() + facet_wrap(~dataset) + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

what can you do to make it prettier

HINT: add some colour, play with palettes, try a different theme, add a title, subtitle, caption

dino_data %>%
  ggplot(data = dino_data, mapping = aes(x,y)) + geom_point(colour="blue") + facet_wrap(~dataset) + geom_smooth(method = "lm", color = "purple") + labs(title = "Plots", caption = "haha trinity")
## `geom_smooth()` using formula = 'y ~ x'

extra challenge

Can you write code to show that the mean, variance, and correlation between x and y is the same for each of the datasets?? HINT: this is a group_by and summarise problem

dino_data %>%
  group_by(dataset) %>%
  summarise(meanx = mean(x), meany = mean(y), 
            varx = var(x), vary = var(y), cor_xy = cor(x, y))
## # A tibble: 13 × 6
##    dataset    meanx meany  varx  vary  cor_xy
##    <chr>      <dbl> <dbl> <dbl> <dbl>   <dbl>
##  1 away        54.3  47.8  281.  726. -0.0641
##  2 bullseye    54.3  47.8  281.  726. -0.0686
##  3 circle      54.3  47.8  281.  725. -0.0683
##  4 dino        54.3  47.8  281.  726. -0.0645
##  5 dots        54.3  47.8  281.  725. -0.0603
##  6 h_lines     54.3  47.8  281.  726. -0.0617
##  7 high_lines  54.3  47.8  281.  726. -0.0685
##  8 slant_down  54.3  47.8  281.  726. -0.0690
##  9 slant_up    54.3  47.8  281.  726. -0.0686
## 10 star        54.3  47.8  281.  725. -0.0630
## 11 v_lines     54.3  47.8  281.  726. -0.0694
## 12 wide_lines  54.3  47.8  281.  726. -0.0666
## 13 x_shape     54.3  47.8  281.  725. -0.0656

knit your document to pdf