Goals for this week

Hey everyone! My goals for this week were to do all of the exercises in the data visualisation module while trying not to refer back to the module notes for help. I also got some tips from the Tuesday Q&A which I will be including as a learning.

What I learned in the Tuesday session

How to change workspace style:

  1. Click tools
  2. Click global options
  3. Go into the appearance tab
  4. Select a theme! (I like the dark themes)

 

How to include code in RMarkdown but hide it in the html:

{r, include = FALSE}

 

How to hide the results of your input in RMarkdown html:

{r, results = "hide"}

 

Learnings during the data visualisation exercises

Before we start any exercises, we need to load in our library:

library(tidyverse)

Exercise 6 - Emoji translation dino

Here is the untouched exercise:

As you can see, the left hand side shows you where there is errors that will prevent the code from running, so lets fix it!

dino <- read_csv("data_dino.csv")
print(dino)
# Create a new "picture"...
picture <- ggplot(data = dino) + 
  geom_point(mapping = aes(x = horizontal, y = vertical), colour = "purple")

# ... and plot it
plot(picture)

Here is our output!

One challenge I had with this was when I tried to make the dinosaur a different colour, I learned that you have to put it outside of the mapping brackets, because colour is not a variable.

Because exercises 7 - 8 were just harder versions of exercise 6, I decided not to include them here.

Exercise 9 - Emoji Translation Forensic

I learned how to use the glimpse function, which seems to display columns and data in a different orientation:

glimpse(forensic)
## Observations: 5,700
## Variables: 14
## $ participant         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ handwriting_expert  <chr> "HW Expert", "HW Expert", "HW Expert", "HW Expert…
## $ us                  <chr> "Non-US", "Non-US", "Non-US", "Non-US", "Non-US",…
## $ condition           <chr> "Non-US HW Expert", "Non-US HW Expert", "Non-US H…
## $ age                 <dbl> 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 5…
## $ forensic_scientist  <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", …
## $ forensic_specialty  <chr> "Handwriting", "Handwriting", "Handwriting", "Han…
## $ handwriting_reports <dbl> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 2…
## $ confidence          <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ familiarity         <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ feature             <chr> "PLCW.6.5", "PLCY.6.A", "PLCZ.3.2", "PUCX.4.b", "…
## $ est                 <dbl> 1, 60, 1, 2, 5, 5, 1, 20, 3, 4, 10, 2, 90, 50, 20…
## $ true                <dbl> 1.571, 1.971, 2.100, 1.096, 1.104, 1.132, 1.376, …
## $ band                <chr> "Band 01", "Band 01", "Band 01", "Band 01", "Band…

 

Now I had to try and fix the code to create a scatterplot with the variables I had got from the glimpse function. Specifically, the “true” variable against the “est” variable.

# Okay, now we can see what all the variables are. Now what we want to do
# is create a scatter plot. The plot we want should be the exact same 
# style as the one we drew for the dino data, but we want to use
# the forensic data instead. On the x-axis we want to plot the true value
# (true frequency of a feature), and on the y-axis we want to plot what 
# the participant guessed:

"picture <- 🎨(🙂 = 💖) + 
  🎨(🙂 = 🎨(🙂 = 💖, 🙂 = 💖))

plot(picture)"

 

There we go:

picture <- ggplot(data = forensic) + 
  geom_point(mapping = aes(x = true, y = est))

plot(picture)
## Warning: Removed 4 rows containing missing values (geom_point).

Not a very pretty looking scatterplot. I wasn’t sure if receiving the warning message was part of the exercise, but apparently there was missing values for a few of the rows.

 

(╯°□°)╯︵ ┻━┻

Exercise 10

In this exercise, we solved the problem of the not very helpful scatterplot produced in exercise 9. I learned the functions geom_boxplot and geom_violin. When used to plot the forensic data, after changing the x variable to ‘band’, we produced more useful looking graphs.

picture <- ggplot(data = forensic) + 
  geom_boxplot(mapping = aes(x = band, y = est))

plot(picture)
## Warning: Removed 4 rows containing non-finite values (stat_boxplot).

picture <- ggplot(data = forensic) + 
  geom_violin(mapping = aes(x = band, y = est))

plot(picture)
## Warning: Removed 4 rows containing non-finite values (stat_ydensity).

Challenges and successes

Starting off with successes, I’ve found it really fulfilling to learn new ways of formatting my RMarkdown document for these learning logs, whether it be including more spacing between paragraphs/figures, or cleaning up code chunks to create a more aesthetically pleasing document. With the exercises themselves, I haven’t ran into too many roadblocks. I’m starting to get my head around what separates a function, argument and variable, and R makes it really easy for you to figure out where a mistake has been made by giving an error message in the console and pointing out the exact line(s) where you code is not working.

Goals for week 3

  1. Transition off RStudio cloud (as I’m running out of hours).

  2. Get started on the data wrangling module and try and finish it by this time next week!