Introduction


I am beginning to learn about ggplot and will follow this tutorial

I am runnning through this tutorial to learn about ggplot. Functions covered are as following:

  1. geom_line() to make line graphs
  2. () to make scatter plots with two variables plotted on x and y

What is ggplot?

Before we dig into creating line graphs with the ggplot geom_line function, I want to briefly touch on ggplot and why I think it’s the best choice for plotting graphs in R.

ggplot is a package for creating graphs in R, but it’s also a method of thinking about and decomposing complex graphs into logical subunits.

ggplot takes each component of a graph–axes, scales, colours, objects, etc.–and allows you to build graphs up sequentially one component at a time. You can then modify each of those components in a way that’s both flexible and user-friendly. When components are unspecified, ggplot uses sensible defaults. This makes ggplot a powerful and flexible tool for creating all kinds of graphs in R.

geom_line()


We are going to use the built in Orange dataset about trees and their growth over time Let’s have a look at the data quickly:

head(Orange)
## Grouped Data: circumference ~ age | Tree
##   Tree  age circumference
## 1    1  118            30
## 2    1  484            58
## 3    1  664            87
## 4    1 1004           115
## 5    1 1231           120
## 6    1 1372           142

simple example using ggplot + geom_line()

library(tidyverse)

# filter the data we need
tree_1 <- filter(Orange, Tree==1)
## Warning: package 'bindrcpp' was built under R version 3.4.4
# graph the data
ggplot(tree_1) +
  geom_line(aes(x = age, y = circumference))

For this graph we only plotted the first tree’s age and circumference. The filter function is part of the dplyr package. It is another way to subset data.

First, I called ggplot, which creates a new ggplot graph. It’s essentially a blank canvas on which we add data and graphics. In this case, I passed tree_1 to ggplot, indicating that we’ll be using the tree_1 data for this particular ggplot graph.

Next, I added my geom_line call to the base ggplot graph in order to create this line. In ggplot, you use the + symbol to add new layers to an existing graph. In this second layer, I told ggplot to use age as the x-axis variable and circumference as the y-axis variable.

And that’s it, we have a simple line graph!

changing the line colour in ggplot + geom_line()

# filter the data we need
tree_1 <- filter(Orange, Tree==1)

# graph the data
ggplot(tree_1) +
  geom_line(aes(x = age, y = circumference), colour = "violet")

Here, all I have done is add ,colour = "..." to adjust the colour of the line.

Now we can do something a little different with the data:

ggplot(Orange) +
  geom_line(aes(x = age, y = circumference, colour = Tree))

Here, there are three differences to the previous graph I made:

  1. The dataset changed from the filtered tree_1 to Orange.
  2. Instead of specifying colour = "violet" we specified colour = Tree
  3. We moved the colour = call within the aes() call

For each change, this is what happens:

  1. we wanted to graph all data for all trees, rather than just one, as before 2 & 3. the colours were defined based on the different levels within the Tree column inside the aesthetic() call. So, there will be a different colour for each Tree

changing line type

Now, instead of changing the colour according to Tree, we could also change the line type (maybe the document will be black and white, for example)

ggplot(Orange) +
  geom_line(aes(x = age, y = circumference, linetype = Tree))

deeper review of aes() (aesthetic) mapping in ggplot

We can create graphs in ggplot that map the Tree variable to colour or linetype in a line graph. ggplot refers to these mappings as aesthetic mappings, and they encompass everything you see within the aes() in ggplot.

Aesthetic mappings are a way of mapping variables in your data to particular visual properties (aesthetics) of a graph.

Reviewing the list of geom_line aesthetic mappings The main aesthetic mappings for ggplot + geom_line() include:

x: Map a variable to a position on the x-axis y: Map a variable to a position on the y-axis colour: Map a variable to a line color linetype: Map a variable to a linetype group: Map a variable to a group (each variable on a separate line) size: Map a variable to a line size alpha: Map a variable to a line transparency

x and y are what we used in our first ggplot + geom_line() function call to map the variables age and circumference to x-axis and y-axis values. Then, we experimented with using colour and linetype to map the Tree variable to different coloured lines or linetypes.

In addition to those, there are 3 other main aesthetic mappings often used with geom_line.

The group mapping allows us to map a variable to different groups. Within geom_line, that means mapping a variable to different lines. Think of it as a pared down version of the colour and linetype aesthetic mappings you already saw. While the colour aesthetic mapped each Tree to a different line with a different color, the group aesthetic maps each Tree to a different line, but does not differentiate the lines by color or anything else. Let’s take a look:

changing the group aesthetic mapping in ggplot + geom_line

ggplot(Orange) +
    geom_line(aes(x = age, y = circumference, group = Tree))

Now the lines are all black, with no legend. This is not too useul in this case, so it is up to the UseR to decide what is appropriate for the data we are displaying.

In our Orange tree dataset, if you’re interested in investigating how specific orange trees grew over time, you’d want to use the colour or linetype aesthetics to make sure you can track the progress for specific trees. If, instead, you’re interested in only how orange trees in general grow, then using the group aesthetic is more appropriate, simplifying your graph and discarding unnecessary detail.

changing line transparency in ggplot + geom_line() with alpha

ggplot(Orange) +
  geom_line(aes(x = age, y = circumference, alpha = Tree))

Not particularly appropriate here, but you get the point.

changing the size aesthetic in ggplot + geom_line()

ggplot(Orange)+
  geom_line(aes(x=age, y=circumference, size = Tree))

This is hideous!

Aesthetic mapping vs. parameters in ggplot

Before, we saw that we are able to use color in two different ways with geom_line. First, we were able to set the colour of a line outside of our aes() mappings. Then, we were able to map the variable Tree to colour by specifying colour = Tree inside of our aes() mappings. But how does this work with all of the other aesthetics you just learned about?

Essentially, they all work the same as colour! That’s the beautiful thing about graphing in ggplot–once you understand the syntax, it’s very easy to expand your capabilities.

Each of the aesthetic mappings you’ve seen can also be used as a parameter, that is, a fixed value defined outside of the aes() aesthetic mappings. You saw how to do this with colour when we set the line to violet with colour = 'violet' before. Now let’s look at an example of how to do this with linetype in the same manner:

ggplot(Orange)+
  geom_line(aes(x=age,y=circumference, group = Tree),linetype = "dotdash")

To review what values linetype, size, and alpha accept, just run ?linetype, ?size, or?alpha from the console

Common errors with aesthetic mappings and parameters in ggplot

When getting started, the distinction between aesthetic mappings (vlaues included inside your aes()) and parameters (values outside the aes())

Trying to include aesthetic mappings outside your aes() call

If you’re trying to map the Tree variable to linetype, you should include linetype == tree within the aes() of your geom_line call. What happens if you accidentally include it outside, and instead run ggplot(Orange) + geom_line(aes(x = age, y = circumference), linetype = Tree)? You’ll get an error message that looks like this:

## Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomLine, : object 'Tree' not found

Whenever you see this error about object not found, make sure you check and make sure you’re including your aesthetic mappings inside the aes() call!

Trying to specify parameters inside your aes() call

Alternatively, if we try to specify a specific parameter value (for example, colour = 'red') inside of the aes() mapping, we get a less intutive issue:

ggplot(Orange) +
    geom_line(aes(x = age, y = circumference, color = 'red'))

In this case, ggplot actually does produce a line graph (success!), but it doesn’t have the result we intended. The graph it produces looks odd, because it is putting the values for all 5 trees on a single line, rather than on 5 separate lines like we had before. It did change the colour to red, but it also included a legend that simply says ‘red’. When you run into issues like this, double check to make sure you’re including the parameters of your graph outside your aes() call!

You should now have a solid understanding of how to use R to plot line graphs using ggplot and geom_line!

Session Information - reproducibility

sessionInfo() # provides the details about what packages have been used during the session to improve reproducibility
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2  forcats_0.3.0   stringr_1.2.0   dplyr_0.7.5    
##  [5] purrr_0.2.4     readr_1.1.1     tidyr_0.8.1     tibble_1.4.2   
##  [9] ggplot2_3.0.0   tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16      cellranger_1.1.0  pillar_1.3.0     
##  [4] compiler_3.4.1    plyr_1.8.4        bindr_0.1.1      
##  [7] tools_3.4.1       digest_0.6.13     viridisLite_0.3.0
## [10] lubridate_1.7.4   jsonlite_1.5      evaluate_0.10.1  
## [13] nlme_3.1-131      gtable_0.2.0      lattice_0.20-35  
## [16] pkgconfig_2.0.1   rlang_0.2.0       cli_1.0.0        
## [19] rstudioapi_0.7    yaml_2.1.16       haven_1.1.2      
## [22] withr_2.1.0       xml2_1.2.0        httr_1.3.1       
## [25] knitr_1.20        hms_0.4.1         rprojroot_1.3-2  
## [28] grid_3.4.1        tidyselect_0.2.4  glue_1.2.0       
## [31] R6_2.2.2          readxl_1.1.0      rmarkdown_1.10   
## [34] modelr_0.1.2      magrittr_1.5      backports_1.1.2  
## [37] scales_1.0.0      htmltools_0.3.6   rvest_0.3.2      
## [40] assertthat_0.2.0  colorspace_1.3-2  labeling_0.3     
## [43] stringi_1.1.6     lazyeval_0.2.1    munsell_0.5.0    
## [46] broom_0.5.0       crayon_1.3.4