Adapted from Visualize Data with ggplot2 by Garrett Grolemund at RStudio.

First step: Load the library ggplot2 or load the tidyverse.

library(tidyverse)

For this exercise we will start by using an “on-board” (ggplot2) highway mileage data set. The data set is called mpg. To learn about this data set use the help features of R.

?mpg
#Various ways to explore the data
mpg
str(mpg)
tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
 $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
 $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
 $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr [1:234] "f" "f" "f" "f" ...
 $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr [1:234] "p" "p" "p" "p" ...
 $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
view(mpg)
glimpse(mpg)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "a…
$ model        <chr> "a4", "a4", "a4", "a4", "a…
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2…
$ year         <int> 1999, 1999, 2008, 2008, 19…
$ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4,…
$ trans        <chr> "auto(l5)", "manual(m5)", …
$ drv          <chr> "f", "f", "f", "f", "f", "…
$ cty          <int> 18, 21, 20, 21, 16, 18, 18…
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27…
$ fl           <chr> "p", "p", "p", "p", "p", "…
$ class        <chr> "compact", "compact", "com…
#What if we want to rename the data?
my_data=mpg
glimpse(my_data)
Rows: 234
Columns: 11
$ manufacturer <chr> "audi", "audi", "audi", "a…
$ model        <chr> "a4", "a4", "a4", "a4", "a…
$ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2…
$ year         <int> 1999, 1999, 2008, 2008, 19…
$ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4,…
$ trans        <chr> "auto(l5)", "manual(m5)", …
$ drv          <chr> "f", "f", "f", "f", "f", "…
$ cty          <int> 18, 21, 20, 21, 16, 18, 18…
$ hwy          <int> 29, 29, 31, 30, 26, 26, 27…
$ fl           <chr> "p", "p", "p", "p", "p", "…
$ class        <chr> "compact", "compact", "com…

Now we are ready to build a graphic. We are going to explore the relationship between engine size (displ) and mileage (hwy) in different ways?

Before we get started, a quick look at the base function

?ggplot

The two most important arguments are the data argument and the mapping argument. So based on this we can create a basic template for generating a plot.

ggplot2 template

ggplot(data = —, mapping = aes(x = —, y = —)) + geom_—-()`*

Lets make a simple graph

Pay strict attention to spelling, capitalization, and parentheses!

ggplot(data = mpg, aes(x = displ, y = hwy)) +
  geom_point()


?ggplot

ggplot(data = mpg) +
  geom_point(aes(x = displ, y = hwy))



##my preference

p1=ggplot(data = mpg, aes(x = displ, y = hwy))
p1+  geom_point()

We can customize the aesthetics aes() in a wide variety of ways. We can do this in the base function or in any of the layers separately. For instance we might want to change one or more the following aesthetics.

Aesthetics Color Size Line type Opacity Shape

aes(blues9)
Aesthetic mapping: 
* `x` -> `blues9`

We can do this by adding arguments for each to our aesthetics Add color, size, alpha, and shape aesthetics to your graph.

p1 + geom_point(aes(color = class))


p1 + geom_point(aes(size = cyl))


p1 + geom_point(aes(shape = drv))


p1 + geom_point(aes(alpha = hwy))


p1 + geom_point(aes(color = class,
                           size = cyl,
                           shape = drv,
                           alpha = hwy))

Note You have to be careful about where you specify the arguments.

p1 + geom_point(aes(size = cyl,color="blue"))

#vs
p1 + geom_point(aes(size = cyl),color="blue")

#mapping
p1 + geom_point(aes(size = cyl,color=cyl))

Discrete v Continuous data

Aesthetics arguments (color, size, shape, etc.) are affected by mode and type of data

Is your data

Discrete (Categorical: ordinal, nominal) Continuous (Numeric)

p1 + geom_point(aes(color = cyl))

GEOM TYPES

geoms = geometric objects

Here are some of the more common options…

but there are many more that can be found here (https://ggplot2.tidyverse.org/reference/index.html#section-layer-geoms)

Lets add a smoothed (loess fit) curve to our plot from above.

p1+geom_point()+geom_smooth()

Now lets change the mapping and make a stipplot, a boxplot, and a violin plot of the variable hwy as a function of class.

p2=ggplot(data = mpg, aes(x = class, y = hwy))
p2+geom_point() #strip chart


##note that you can override the mappings in the base command
p1+geom_point(aes(class, hwy))

##

p2+geom_boxplot(aes(class, hwy))

p2+geom_violin(aes(class, hwy))

One last geom…make a histogram of hwy .

p2=ggplot(data = mpg, aes(x = class, y = hwy))
p2 <- ggplot(data = mpg, aes(hwy))+geom_histogram()
p2

geom_histogram(aes(hwy))
mapping: x = ~hwy 
geom_bar: na.rm = FALSE, orientation = NA
stat_bin: binwidth = NULL, bins = NULL, na.rm = FALSE, orientation = NA, pad = FALSE
position_stack 
stat_bin(30)
Error in `validate_mapping()`:
! `mapping` must be created by `aes()`
Backtrace:
 1. ggplot2::stat_bin(30)
 2. ggplot2::layer(...)
 3. ggplot2:::validate_mapping(mapping)

Challenge

Make a density plot of hwy colored by class.

p2=ggplot(data = mpg, aes(x = class, y = hwy))
data <- hwy
Error: object 'hwy' not found

Challenge

Make a bar chart hwy colored by class.

Faceting Faceting is an effective way to summarize data, visualize interactions or display patterns when there are multiple processes happening.
use facet_grid and facet_wrap

p1+geom_point() 

Now lets create facets based on the category class

 p1+geom_point()+facet_wrap(~ class)

You can do this with as much aes mapping as you want

p1+geom_point(aes(color = class,size = cyl,shape = drv, alpha = hwy))+facet_wrap(~class)

You can also facet by multiple factors,

p1+geom_point(aes(color = class,alpha = hwy))+facet_grid(cyl~drv)

NA
NA

Beautifying/customizing your plot

Changing axis labels You can use labs() to name all axes and add titles xlab() or ylab() to just rename 1 axes ggtitle() to just add a title

p1+geom_point()


p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")


p1+geom_point()+labs(x = "Displacement", y = "Highway mpg",title = "Automobile mileage", caption = "Source:  ggplot2::mpg")

Changing axis limits ylim() xlim()

p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+ylim(0,100)

Changing the text size and other aspects of appearance

Themes

Several simple built in theme functions are available in ggplot2 (https://ggplot2.tidyverse.org/reference/ggtheme.html). These include:

theme_gray(): Gray background color and white grid lines. It puts the data forward to make comparisons easy.

theme_bw(): White background and gray grid lines. May work best for presentations displayed with a projector.

theme_linedraw(): A theme with black lines of various widths on white backgrounds, reminiscent of a line drawings.

theme_light(): A theme similar to theme_linedraw() but with light grey lines and axes, to direct more attention towards the data.

theme_dark(): The dark cousin of theme_light(), with similar line sizes but a dark background. Useful to make thin coloured lines pop out.

theme_tufte: Theme based on Chapter 6 ‘Data-Ink Maximization and Graphical Design’ of Edward Tufte The Visual Display of Quantitative Information.

Lets try a few:

p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+theme_gray()

p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+theme_bw()

p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+theme_linedraw()

p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+theme_light()

p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+theme_dark()

library(ggthemes)
p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+theme_tufte()

You can also modify specific aspects of the theme

Changing text size: Use theme

theme(axis.text.x = element_text(size=15,colour = "black"), axis.title.x = element_text(size=15,face="bold"))

theme(axis.text.y = element_text(size=15,colour = "black"), axis.title.y = element_text(size=15,face="bold"))

theme(plot.background = element_blank(),panel.grid.major = element_blank(), panel.grid.minor = element_blank())

For example: To increase text size on x axis:

p6 <-p1+geom_point()+labs(x = "Displacement", y = "Highway mpg")+theme_bw()
p6


p6+theme(axis.text.x = element_text(size=15,colour = "black", hjust=1),
      axis.title.x = element_text(size=15,face="bold")) 

increase y axis:


p6+theme(axis.text.y = element_text(size=15,colour = "black"),
      axis.title.y = element_text(size=15,face="bold")) 

Get rid of grid lines:

p6+theme(plot.background = element_blank(),panel.grid.major = element_blank(),
      panel.grid.minor = element_blank())

put them all together

p6+
  theme(axis.text.x = element_text(size=15,colour = "black", hjust=1),
      axis.title.x = element_text(size=15,face="bold")) +
  theme(axis.text.y = element_text(size=15,colour = "black"),
      axis.title.y = element_text(size=15,face="bold")) +
  theme(plot.background = element_blank(),panel.grid.major = element_blank(),
      panel.grid.minor = element_blank())

You can also change the text in the facets

p1+geom_point(aes(color = class,size = cyl,shape = drv, alpha = hwy))+facet_wrap(~drv)+ theme(strip.text.x = element_text(size = 15,face="bold"),strip.text.y = element_text(size = 15,face="bold"))

NA

##Statistical summaries

Load in the “Culcita” data.

Culcita - Another new type of data

load("culcita.RData")
summary(culcita_dat)
     block      predation         ttt    
 1      : 8   Min.   :0.000   none  :20  
 2      : 8   1st Qu.:0.000   crabs :20  
 3      : 8   Median :1.000   shrimp:20  
 4      : 8   Mean   :0.625   both  :20  
 5      : 8   3rd Qu.:1.000              
 6      : 8   Max.   :1.000              
 (Other):32                              

These data are from McKeon et al. 2012 “Multiple defender effects: synergistic coral defense by mutualist crustaceans” Oecologia, 169(4):1095-1103.

The basic data can be reduced, for the purposes of this exercise, to a single treatment [which consists of combinations of different symbionts: [crab, shrimp, both or neither]; a binary response (predation). We want to generate a plot displaying summary statistics from this experiment.

plot1<- ggplot(culcita_dat,aes(x=ttt,y=predation))
plot1+geom_point()

plot1+geom_point()+geom_violin()

Summary functions You can either supply summary functions individually or as a single function (fun.data):

fun.data Complete summary function. Should take numeric vector as input and return data frame as output

fun.min min summary function (should take numeric vector and return single number)

fun main summary function (should take numeric vector and return single number)

fun.max max summary function (should take numeric vector and return single number)

ggplot(culcita_dat,aes(x=ttt,y=predation))+
  stat_summary(fun=mean,size=2)+
  ylim(c(0,1))+xlab("Treatment")


ggplot(culcita_dat,aes(x=ttt,y=predation))+
  stat_summary(fun=mean,, fun.min = min, fun.max = max, size=2)+
  ylim(c(0,1))+xlab("Treatment")


ggplot(culcita_dat,aes(x=ttt,y=predation))+
  stat_summary(fun=median,size=2)+
  ylim(c(0,1))+xlab("Treatment")


ggplot(culcita_dat,aes(x=ttt,y=predation))+
  stat_summary(fun.data=mean_se,size=2)+
  ylim(c(0,1))+xlab("Treatment")


ggplot(culcita_dat,aes(x=ttt,y=predation))+
  stat_summary(fun.data=mean_cl_boot,size=2)+
  ylim(c(0,1))+xlab("Treatment")

Challenges

The data are from L. Partridge and M. Farquhar (1981), Sexual activity and the lifespan of male fruitflies, Nature 294: 580-581. The experiment placed male fruit flies with varying numbers of previously-mated or virgin females to investigate how mating activity affects male lifespan.

Read the data file (fruitflies.csv) into your R environment and name it.

1. Create a boxplot displaying Longevity for each “treatment”.

Be sure to label the axes and make sure the text is readable.

2. The variable thorax stands for thorax length, which was used as a measure of body size. The measurement was included in case body size also affected longevity. Produce a scatter plot of thorax length and longevity. Make longevity the response variable (i.e., plot it on the vertical axis). Make the symbols and colors differ among treatments

3. Redraw the scatter plot above except create separate facets for each treatment.

Final Challenge

Load the data “WRC_Plant Diversity.csv”

These data are from Long‐term nutrient enrichment, mowing, and ditch drainage interact in the dynamics of a wetland plant community by Goodwillie et al. 2020 Ecosphere. The experimental design was a 2x2 factorial experiment with 2 levels of disturbance (mowed or unmowed) and 2 levels of nutrient addition (fertilized and unfertilized). The experiment was arranged in 8 spatial blocks with each containing the 4 treatment plots. Three fixed quadrats were created within each plot that were sampled annually. The spatial blocks were arranged in two rows that differed in proximity to a ditch. Your challenge is to draw a figure that shows the mean and confidence intervals for alpha diversity over time (Year) in a way that highlights the independent effects of Mowing, Fertilizer treatment and proximity to the ditch.

