Welcome to the PSYC3361 coding W2 self test. The test assesses your ability to use the coding skills covered in the Week 2 online coding modules.
In particular, it assesses your ability to…
It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.
Your notes should also document the troubleshooting process you went through to arrive at the code that worked.
For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.
Good luck!!
Jenny
PS- if you get stuck have a look in the /images folder for inspiration
library(tidyverse)
dino = read_csv("data/dino.csv")
print(dino)
## # A tibble: 1,846 × 3
## dataset x y
## <chr> <dbl> <dbl>
## 1 dino 55.4 97.2
## 2 dino 51.5 96.0
## 3 dino 46.2 94.5
## 4 dino 42.8 91.4
## 5 dino 40.8 88.3
## 6 dino 38.7 84.9
## 7 dino 35.6 79.9
## 8 dino 33.1 77.6
## 9 dino 29.0 74.5
## 10 dino 26.2 71.4
## # ℹ 1,836 more rows
The dino dataset comes from a paper illustrating the importance of plotting your data. In each of these datasets, the mean and variance of x and y are identical and the two variables are correlated in the same way (R = -0.06). When plotted, however, each reveals a very different pattern
dino = read_csv("data/dino.csv")
## Rows: 1846 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): dataset
## dbl (2): x, y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
picture <- ggplot(dino, aes (x = x, y = y)) +
geom_point() + # Plotting the Data
geom_smooth(method ='lm') + # Adding Regression Line
scale_x_continuous(
name = NULL,
labels = NULL
) +
scale_y_continuous(
limits = c(0,100)
) +
facet_wrap((vars(dataset)))
print(picture)
## `geom_smooth()` using formula = 'y ~ x'
HINT: add some colour, play with palettes, try a different theme, add a title, subtitle, caption
dino = read_csv("data/dino.csv")
## Rows: 1846 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): dataset
## dbl (2): x, y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
picture <- ggplot(dino, aes (x, y)) +
geom_point(colour = "deepskyblue1") + # Plotting the Data
geom_smooth(method = 'lm', colour = "coral1") + # Adding Regression Line
labs(title = "Plotted Data", # Title, Subtitle, Caption
subtitle = "that's colourful",
caption = "Brought to you by me"
) +
scale_x_continuous(
name = NULL,
labels = NULL
) +
scale_y_continuous(
limits = c(0, 100)
) +
facet_wrap((vars(dataset))) +
theme_minimal()
print(picture)
## `geom_smooth()` using formula = 'y ~ x'
Can you write code to show that the mean, variance, and correlation between x and y is the same for each of the datasets?? HINT: this is a group_by and summarise problem
dino = read_csv("data/dino.csv")
## Rows: 1846 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): dataset
## dbl (2): x, y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
proof_dino <- dino %>%
group_by(dataset) %>%
summarise(
mean_x = mean(x),
mean_y = mean(y),
var_x = var(x),
var_y = var(y),
correlation = cor(x,y)
)
picture <- ggplot(dino, aes(x = x, y = y, colour = x)) +
geom_point() + # Plotting the Data
geom_smooth(method = 'lm') + # Adding Regression Line
labs(title = "Plotted Data", # Title, Subtitle, Caption
subtitle = "that's colourful",
caption = "Brought to you by me"
) +
scale_x_discrete(
name = NULL,
labels = NULL
) +
scale_y_continuous(
limits = c (0,100)
) +
facet_wrap((vars(dataset))) +
theme_minimal() +
scale_colour_gradientn(colours = rainbow(8))
print(picture)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?