Displaying Data

Alban Guillaumet, Troy University

“Numerical quantities focus on expected values, graphical summaries on unexpected values.”

- John Tukey

Objectives

  • Objectives & rules of graphing

  • Making some of our first graphs in R

Visualize before you analyze!!!

Data visualization is one step in exploratory data analysis.

Quote: …the first step in any data analysis or statistical procedure is to graph the data and look at it. Humans are a visual species, with brains evolved to process visual information. Take advantage of millions of years of evolution, and look at visual representations of your data before doing anything else.
- Whitlock & Schluter

Important!

W&S 4 Rules of Graphing

  • Show the data.
  • Make patterns in the data easy to see.
  • Represent magnitudes honestly.
  • Draw graphical elements clearly.

Good example...or not?

alt text

4 Rules of Graphing

  • Show the data.
  • Make patterns in the data easy to see.
  • Represent magnitudes honestly.
  • Draw graphical elements clearly.

One more...

alt text

4 Rules of Graphing

  • Show the data.
  • Make patterns in the data easy to see.
  • Represent magnitudes honestly.
  • Draw graphical elements clearly.

Locust serotonin

Strip chart (left) vs. bar plot of serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours. Which one is better?

alt text

R code + datasets for the book!

Locust serotonin - Load data

Read the data and store in data frame

locustData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv")

Locust serotonin - Look at data

Show the first few lines of the data, to ensure it read correctly.

head(locustData)
  serotoninLevel treatmentTime
1            5.3             0
2            4.6             0
3            4.5             0
4            4.3             0
5            4.2             0
6            3.6             0

Locust serotonin - Look at data

Check the object type of the variables using str.

str(locustData)
'data.frame':   30 obs. of  2 variables:
 $ serotoninLevel: num  5.3 4.6 4.5 4.3 4.2 3.6 3.7 3.3 12.1 18 ...
 $ treatmentTime : int  0 0 0 0 0 0 0 0 0 0 ...

Locust serotonin - basic graph

Draw a stripchart (the tilde “~” means that the first argument below is a formula, relating one variable to the other).

stripchart(serotoninLevel ~ treatmentTime,
           data = locustData)

Locust serotonin - basic graph

plot of chunk unnamed-chunk-5

Locust serotonin - better graph

stripchart(serotoninLevel ~ treatmentTime, 
           data = locustData, 
           method = "jitter", 
           vertical = TRUE, 
           xlab="Treatment time (hours)", 
           ylab="Serotonin (pmoles)", 
           cex.lab = 1.5)

Locust serotonin - better graph

plot of chunk unnamed-chunk-6

Locust serotonin - Fancier graph

A fancier strip chart by including more options.

plot of chunk unnamed-chunk-8

Do it yourself!

Showing numerical data - Histogram

A histogram uses the height of rectangular bars to display the frequency distribution of a numerical variable.

Example: Hemoglobine concentration in males of four distinct human populations.

Showing numerical data - Histogram

Creating a histogram in R

desert_bird <- read.csv("http://www.zoology.ubc.ca/~whitlock/ABD/teaching/datasets/02/02e1bDesertBirdCensus.csv")
str(desert_bird)
'data.frame':   43 obs. of  2 variables:
 $ Species: Factor w/ 43 levels "American Kestrel",..: 7 39 23 36 1 17 37 43 31 22 ...
 $ Count  : int  64 23 3 16 7 148 7 625 135 1 ...

Creating a histogram in R

head(desert_bird)
           Species Count
1    Black Vulture    64
2   Turkey Vulture    23
3    Harris's Hawk     3
4  Red-tailed Hawk    16
5 American Kestrel     7
6   Gambel's Quail   148
d = desert_bird

Creating a histogram in R

hist(d$Count)

plot of chunk unnamed-chunk-11

Creating a histogram in R

range(d$Count)
[1]   1 625
( bin = seq(0, 650, by = 50) )
 [1]   0  50 100 150 200 250 300 350 400 450 500 550 600 650

Creating a histogram in R

hist(d$Count, breaks = bin, col = "forestgreen", xlab = "Abundance", ylab = "Number of species", main = "", cex.lab = 1.5)

plot of chunk unnamed-chunk-13

Access the results

hist(d$Count, breaks = bin, plot = F)
$breaks
 [1]   0  50 100 150 200 250 300 350 400 450 500 550 600 650

$counts
 [1] 28  4  3  3  1  3  0  0  0  0  0  0  1

$density
 [1] 0.0130232558 0.0018604651 0.0013953488 0.0013953488 0.0004651163
 [6] 0.0013953488 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[11] 0.0000000000 0.0000000000 0.0004651163

$mids
 [1]  25  75 125 175 225 275 325 375 425 475 525 575 625

$xname
[1] "d$Count"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

Histogram in R: 'right' argument

hist(d$Count, breaks = bin, right = F, col = "pink", xlab = "Abundance", ylab = "Number of species", main = "", cex.lab = 1.5)
arrows(325, 15, 325, 5, col = "forestgreen", lwd = 2)

plot of chunk unnamed-chunk-16

Histograms

Three different histograms that depict the body mass of 228 female sockeye salmon

alt text

Question: What’s the explanatory and response variable?

Answer: Neither

Bar plot

alt text

Death by tiger