Code club session 0.0

The challenges

We ran an introductory workshop on plotting in R and ggplot2 that you can find here. In the second part of that workshop, there was a lot of challenges that we didn’t get round to doing. Here, we’re going to use the same dataset to see if you can remember the basic processes involved in getting a scatter plot up and running in ggplot2.

So, here, you’re going to make some scatter plots for the hills dataset that is found in the MASS package of R.

This document

This is an R-markdown document and working with them is slightly different from working with the R-scripts we used in the workshop. Both R-scripts (suffix \*.R) and R-markdown documents (suffix \*.Rmd) are plain-text files (so you can view them in notepad or similar) and they need to be processed by R before your results / graphs are visible. The easiest way to work with Rmarkdown documents is inside R-studio, however.

To process an R-markdown file in R-studio:

  1. Make sure the file is open (in the source panel; Use File -> Open File if not);

  2. Either click the Run -> Run All dialogue at the top of the source panel (this intersperses compiled figures with your raw text/code);

  3. Or preferably, knit the whole document to an .html file using the Knit dialogue (again this should be at the top of the source panel in Rstudio: shortcut - Ctrl-Shift-K or Cmd-Shift-K).

In the document, we have challenges interspersed with plain-text and formatting information. You are going to write your own code to solve the challenges. However, you should be able to knit this document at any point (even without having attempted the challenges) to produce a viewable .html file. So try running Knit right now.

Load the required packages

# Load the packages that are used later in the document
library(MASS)     # Contains the `hills` dataset
library(dplyr)    # Data manipulation tools
library(ggplot2)  # Plotting tools

Load the dataset

We import the dataset from the MASS package and rearrange it slightly.

# Loads the `hills` dataset from the `MASS` package
data(hills)
tidy_hills <- mutate(hills, peak = row.names(hills))
str(tidy_hills)
## 'data.frame':    35 obs. of  4 variables:
##  $ dist : num  2.5 6 6 7.5 8 8 16 6 5 6 ...
##  $ climb: int  650 2500 900 800 3070 2866 7500 800 800 650 ...
##  $ time : num  16.1 48.4 33.6 45.6 62.3 ...
##  $ peak : chr  "Greenmantle" "Carnethy" "Craig Dunain" "Ben Rha" ...
head(tidy_hills)

From here, you should use the tidy_hills dataset.

Challenge 1: Time-versus-Distance

You made a number of scatter plots using the Anscombe dataset in the initial workshop. Try and plot the record-time versus race distance for the Scottish hills dataset.

If you need a refresher, check the examples in the help-pages for ggplot and geom_point.

The steps (for ggplot2) are:

  1. Define your dataset

  2. Map columns of that dataset to aesthetic entities in your chart

  3. Define the type of chart you want to generate

# Your code goes here:
# Plot time on the y-axis, and distance on the x-axis



# .. End of your code
# Challenge 1 Solution:
ggplot(data = tidy_hills, aes(x = dist, y = time)) +
  geom_point() +
  xlim(0, NA) +
  ylim(0, NA)

Having filled in the code, you can now either run that code or knit the document to generate an .html file.

Challenge 2: Axis labels

This time try and make a scatter plot of time-against-height, but add axis labels to the x- (height) and y- (time) axes.

Check the help-page for labs if you need to work out how to add titles or axis-labels.

If you want to include the units for the height-climbed and the record-time, have a look at the help-page for hills (? hills).

# Your code goes here:
# - Plot time on the y-axis and height on the x-axis and add labels for the x-
# and y- axes.


# .. End of your code
# Challenge 2 Solution:
ggplot(data = tidy_hills, aes(x = climb, y = time)) +
  geom_point() +
  labs(
    x = "Height climbed / feet",
    y = "Record time / min"
    ) +
  xlim(0, NA) +
  ylim(0, NA)

Challenge 3: Dynamic sizing of the points

This time, try to plot height-against-distance for the hill runs but encode the time-taken using the size of the scatter-plot points.

Hint: If you look at the help-page for geom_point you can see that it “… understands the following aesthetics: x, y, …, shape, size, stroke”. This means that you can make a mapping of a column from your dataset into the size attribute of the corresponding points just as you would map a column into the x-position attribute.

# Your code goes here:
# - Plot distance on the x-axis and height on the y-axis
# - Use the time-taken to determine the size of the corresponding points


# .. End of your code
# Challenge 3 Solution:
ggplot(data = tidy_hills, aes(x = dist, y = climb, size = time)) +
  geom_point() +
  labs(
    x = "Race distance / miles",
    y = "Height climbed / feet",
    size = "Record time / min"
    ) +
  lims(
    x = c(0, NA),
    y = c(0, NA)
    )

Now knit the whole document together.

Hopefully those three graphs came out as you wanted. Feel free to modify them to make them a bit prettier - there’s lots of suggestions for how to do this on the ggplot2 website. For example, you could change the axis ranges, the point colours, the point styles, or the theme of the charts.