Alban Guillaumet, Troy University
“Numerical quantities focus on expected values, graphical summaries on unexpected values.”
- John Tukey
Objectives & rules of graphing
Making some of our first graphs in R
Data visualization is one step in exploratory data analysis.
Quote: …the first step in any data analysis or statistical procedure is to graph the data and look at it. Humans are a visual species, with brains evolved to process visual information. Take advantage of millions of years of evolution, and look at visual representations of your data before doing anything else.
- Whitlock & Schluter
Strip chart (left) vs. bar plot of serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours. Which one is better?
http://whitlockschluter.zoology.ubc.ca/r-code
http://whitlockschluter.zoology.ubc.ca/wp-content/data
Alternative data set links (older / less complete?):
Read the data and store in data frame
locustData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv")
Show the first few lines of the data, to ensure it read correctly.
head(locustData)
serotoninLevel treatmentTime
1 5.3 0
2 4.6 0
3 4.5 0
4 4.3 0
5 4.2 0
6 3.6 0
Check the object type of the variables using str.
str(locustData)
'data.frame': 30 obs. of 2 variables:
$ serotoninLevel: num 5.3 4.6 4.5 4.3 4.2 3.6 3.7 3.3 12.1 18 ...
$ treatmentTime : int 0 0 0 0 0 0 0 0 0 0 ...
Draw a stripchart (the tilde “~” means that the first argument below is a formula, relating one variable to the other).
stripchart(serotoninLevel ~ treatmentTime,
data = locustData)
stripchart(serotoninLevel ~ treatmentTime,
data = locustData,
method = "jitter",
vertical = TRUE,
xlab="Treatment time (hours)",
ylab="Serotonin (pmoles)",
cex.lab = 1.5)
A fancier strip chart by including more options.
A histogram uses the height of rectangular bars to display the frequency distribution of a numerical variable.
Example: Hemoglobine concentration in males of four distinct human populations.
desert_bird <- read.csv("http://www.zoology.ubc.ca/~whitlock/ABD/teaching/datasets/02/02e1bDesertBirdCensus.csv")
str(desert_bird)
'data.frame': 43 obs. of 2 variables:
$ Species: Factor w/ 43 levels "American Kestrel",..: 7 39 23 36 1 17 37 43 31 22 ...
$ Count : int 64 23 3 16 7 148 7 625 135 1 ...
head(desert_bird)
Species Count
1 Black Vulture 64
2 Turkey Vulture 23
3 Harris's Hawk 3
4 Red-tailed Hawk 16
5 American Kestrel 7
6 Gambel's Quail 148
d = desert_bird
hist(d$Count)
range(d$Count)
[1] 1 625
( bin = seq(0, 650, by = 50) )
[1] 0 50 100 150 200 250 300 350 400 450 500 550 600 650
hist(d$Count, breaks = bin, col = "forestgreen", xlab = "Abundance", ylab = "Number of species", main = "", cex.lab = 1.5)
hist(d$Count, breaks = bin, plot = F)
$breaks
[1] 0 50 100 150 200 250 300 350 400 450 500 550 600 650
$counts
[1] 28 4 3 3 1 3 0 0 0 0 0 0 1
$density
[1] 0.0130232558 0.0018604651 0.0013953488 0.0013953488 0.0004651163
[6] 0.0013953488 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[11] 0.0000000000 0.0000000000 0.0004651163
$mids
[1] 25 75 125 175 225 275 325 375 425 475 525 575 625
$xname
[1] "d$Count"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
hist(d$Count, breaks = bin, right = F, col = "pink", xlab = "Abundance", ylab = "Number of species", main = "", cex.lab = 1.5)
arrows(325, 15, 325, 5, col = "forestgreen", lwd = 2)
Three different histograms that depict the body mass of 228 female sockeye salmon
Question: What’s the explanatory and response variable?
Answer: Neither
Death by tiger
Frequency distribution for a discrete variable
Scatter plot showing the relationship between the ornementation of male guppies and the average attractiveness of their sons.