Working from home and pivoting to teaching online has made me realise that my wifi connection is really bad, particularly when it rains. I have been teaching new honours students R and needed a little dataset to demo how to get data into R, so made a google form and put it out on twitter to confirm to myself that my connection really is worse than most other people. You can contribute to the data here.

Packages for reading data into R

library(tidyverse) # includes readr for .csv files
library(readxl) #for excel files
library(googlesheets4) #read straight from google sheets
library(datapasta) # for pasting data into R

1. read csv or xlsx

The standard way to get data into R is to read a file that you have downloaded.

speed1 <- read_csv("crappy_internet.csv")

speed2 <- read_excel("crappy_internet.xlsx")

2. direct from google docs

But the googlesheets4 package allows you to authenticate your google account and read data straight from from googlesheets using only the url. More info here https://googlesheets4.tidyverse.org/

speed3 <- read_sheet("https://docs.google.com/spreadsheets/d/1yyl4fNMErNQ5mQaYgc2ELF7zF6cEPcfRNUtWr56nkg8/edit#gid=552570759")

## Using an auto-discovered, cached token.
## To suppress this message, modify your code or options to clearly consent to the use of a cached token.
## See gargle's "Non-interactive auth" vignette for more details:
## https://gargle.r-lib.org/articles/non-interactive-auth.html
## The googlesheets4 package is using a cached token for jennyrichmond@gmail.com.

3. datapasta

Alternatively, you can copy and “paste” the data into R using the datapasta package. Find the vignette here

speed4 <- # select your data and do Ctrl-C, put your cursor here, and choose Addins, paste as dataframe, and then run the chunk

Packages for plotting

library(ggbeeswarm) # add noise to point plots
library(ggeasy) # easy wrappers for difficult to remember ggplot things
library(papaja) # this is mostly a package for writing APA formatted manuscripts, but it also includes a ggplot theme that is nice

First make the data long

speedlong <- speed3 %>%
  select(live, raining, starts_with("home")) %>%
  pivot_longer(names_to = "updown", values_to = "speed", home_download:home_upload)

Plot up and download speeds.

1. geom_point()

speedlong %>%
 ggplot(aes(x = updown, y = speed)) +
  geom_point()

This plot is ok, but all the points are on top of each other. Use the ggbeeswarm package to add a little noise.

2. geom_beeswarm()

speedlong %>%
 ggplot(aes(x = updown, y = speed)) +
  geom_beeswarm()

Beeswarm is better but I’d like more noise.

3. geom_quasirandom()

speedlong %>%
 ggplot(aes(x = updown, y = speed)) +
  geom_quasirandom(width = 0.2)

Now I want to know which of these points were collected when it was raining.

4. colouring the points

speedlong %>%
 ggplot(aes(x = updown, y = speed, colour = raining)) +
  geom_quasirandom(width = 0.2)

5. facet_wrap()

speedlong %>%
 ggplot(aes(x = updown, y = speed, colour = raining)) +
  geom_quasirandom(width = 0.2) +
  facet_wrap(~ raining)

Making ggplot easy

Now this version has lots of duplicated information. We probably don’t need the legend. How to remove the legend is something I have to google EVERY TIME. The ggplot solution is + theme(legend.title = element_blank()) — hard to remember

6. easily remove the legend

speedlong %>%
 ggplot(aes(x = updown, y = speed, colour = raining)) +
  geom_quasirandom(width = 0.2) +
  facet_wrap(~ raining) +
  easy_remove_legend()

7. fix the formatting

I really dislike the grey default of ggplot. Use theme_apa() to get nice formatting

speedlong %>%
 ggplot(aes(x = updown, y = speed, colour = raining)) +
  geom_quasirandom(width = 0.2) +
  facet_wrap(~ raining) +
  theme_apa() + 
  easy_remove_legend()

Getting plots out of ggplot

Option 1: ggsave()

Put ggsave(“nameofplot.png”) at the end of each chunk and it will export the most recent plot.

ggsave("testplot.png")

## Saving 7 x 5 in image

Option 2: RMarkdown magic

Use fig.path in your setup chunk to export all your plots to a folder. This is where chunk labels are important. If your chunks are not labelled the exported files will be called “unnamed-chunk-somenumber.png” BUT if you label the chunk the file name of the exported plot will be meaningful.

Check out the RMarkdown reference guide for details

ggplot tips show and tell

Jen Richmond

22/05/2020