# Displaying Data

Alban Guillaumet, Troy University

“Numerical quantities focus on expected values, graphical summaries on unexpected values.”

- John Tukey

### Objectives

• Objectives & rules of graphing

• Making some of our first graphs in R

### Visualize before you analyze!!!

Data visualization is one step in exploratory data analysis.

Quote: …the first step in any data analysis or statistical procedure is to graph the data and look at it. Humans are a visual species, with brains evolved to process visual information. Take advantage of millions of years of evolution, and look at visual representations of your data before doing anything else.
- Whitlock & Schluter

## W&S 4 Rules of Graphing

• Show the data.
• Make patterns in the data easy to see.
• Represent magnitudes honestly.
• Draw graphical elements clearly.

## 4 Rules of Graphing

• Show the data.
• Make patterns in the data easy to see.
• Represent magnitudes honestly.
• Draw graphical elements clearly.

## 4 Rules of Graphing

• Show the data.
• Make patterns in the data easy to see.
• Represent magnitudes honestly.
• Draw graphical elements clearly.

### Locust serotonin

Strip chart (left) vs. bar plot of serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours. Which one is better?

### Locust serotonin - Load data

Read the data and store in data frame

``````locustData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv")
``````

### Locust serotonin - Look at data

Show the first few lines of the data, to ensure it read correctly.

``````head(locustData)
``````
``````  serotoninLevel treatmentTime
1            5.3             0
2            4.6             0
3            4.5             0
4            4.3             0
5            4.2             0
6            3.6             0
``````

### Locust serotonin - Look at data

Check the object type of the variables using `str`.

``````str(locustData)
``````
``````'data.frame':   30 obs. of  2 variables:
\$ serotoninLevel: num  5.3 4.6 4.5 4.3 4.2 3.6 3.7 3.3 12.1 18 ...
\$ treatmentTime : int  0 0 0 0 0 0 0 0 0 0 ...
``````

### Locust serotonin - basic graph

Draw a stripchart (the tilde “~” means that the first argument below is a formula, relating one variable to the other).

``````stripchart(serotoninLevel ~ treatmentTime,
data = locustData)
``````

### Locust serotonin - better graph

``````stripchart(serotoninLevel ~ treatmentTime,
data = locustData,
method = "jitter",
vertical = TRUE,
xlab="Treatment time (hours)",
ylab="Serotonin (pmoles)",
cex.lab = 1.5)
``````

### Locust serotonin - Fancier graph

A fancier strip chart by including more options.

### Showing numerical data - Histogram

A histogram uses the height of rectangular bars to display the frequency distribution of a numerical variable.

Example: Hemoglobine concentration in males of four distinct human populations.

### Creating a histogram in R

``````desert_bird <- read.csv("http://www.zoology.ubc.ca/~whitlock/ABD/teaching/datasets/02/02e1bDesertBirdCensus.csv")
str(desert_bird)
``````
``````'data.frame':   43 obs. of  2 variables:
\$ Species: Factor w/ 43 levels "American Kestrel",..: 7 39 23 36 1 17 37 43 31 22 ...
\$ Count  : int  64 23 3 16 7 148 7 625 135 1 ...
``````

### Creating a histogram in R

``````head(desert_bird)
``````
``````           Species Count
1    Black Vulture    64
2   Turkey Vulture    23
3    Harris's Hawk     3
4  Red-tailed Hawk    16
5 American Kestrel     7
6   Gambel's Quail   148
``````
``````d = desert_bird
``````

### Creating a histogram in R

``````hist(d\$Count)
``````

### Creating a histogram in R

``````range(d\$Count)
``````
``````[1]   1 625
``````
``````( bin = seq(0, 650, by = 50) )
``````
`````` [1]   0  50 100 150 200 250 300 350 400 450 500 550 600 650
``````

### Creating a histogram in R

``````hist(d\$Count, breaks = bin, col = "forestgreen", xlab = "Abundance", ylab = "Number of species", main = "", cex.lab = 1.5)
``````

### Access the results

``````hist(d\$Count, breaks = bin, plot = F)
``````
``````\$breaks
[1]   0  50 100 150 200 250 300 350 400 450 500 550 600 650

\$counts
[1] 28  4  3  3  1  3  0  0  0  0  0  0  1

\$density
[1] 0.0130232558 0.0018604651 0.0013953488 0.0013953488 0.0004651163
[6] 0.0013953488 0.0000000000 0.0000000000 0.0000000000 0.0000000000
[11] 0.0000000000 0.0000000000 0.0004651163

\$mids
[1]  25  75 125 175 225 275 325 375 425 475 525 575 625

\$xname
[1] "d\$Count"

\$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
``````

### Histogram in R: 'right' argument

``````hist(d\$Count, breaks = bin, right = F, col = "pink", xlab = "Abundance", ylab = "Number of species", main = "", cex.lab = 1.5)
arrows(325, 15, 325, 5, col = "forestgreen", lwd = 2)
``````

### Histograms

Three different histograms that depict the body mass of 228 female sockeye salmon

Question: What’s the explanatory and response variable?