Note: This document was converted to R-Markdown from this page, with modifications, by M. Drew LaMar. You can download the R-Markdown here.
Download the R code on this page as a single file here
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
Read data from a comma-separated (.csv) file into a data frame. R will take columns having only numbers and make them numeric variables in the data frame. Columns having characters will be converted to factor variables, rather than to character variables.
locustData <- read.csv(chap02f1_2locustSerotonin.csv)
Inspect data in a data frame. The head
function shows the first few lines of the data frame to confirm the data was read properly. The nrow
function indicates the number of cases (rows) in the data frame.
head(locustData)
nrow(locustData)
Basic strip chart relating a numeric variable to a categorical variable:
stripchart(serotoninLevel ~ treatmentTime, data = locustData, method = “jitter”, vertical = TRUE)
Frequency table or contingency table:
table(tigerData$activity)
Bar graph for count data:
barplot(tigerTable, ylab = “Frequency”, cex.names = 0.5)
Histogram for a numeric variable:
Scatter plot showing association between two numeric variables:
plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData)
Other new methods:
These tables were created by M. Drew LaMar, based on ones at the end of Chapter 2 of Whitlock & Schluter’s textbook.
Frequency distributions of univariate data
Type of data | Graphical method | R command | Example |
---|---|---|---|
Categorical | Bar graph | barplot |
Figure 2.1-3. Education spending (processed) |
“ | “ | “ | Example 2.2A. Deaths from tigers (raw) |
Numerical | Histogram | hist |
Figure 2.2-5. Salmon body size |
Showing association of bivariate data
Type of data | Graphical method | R command | Example |
---|---|---|---|
Two numerical | Scatter plot | plot |
Example 2.3B. Guppy father and son comparison |
“ | Line plot | plot |
Example 2.4A. Measles outbreaks |
“ | Map | — | — |
Two categorical | Grouped bar graph | barplot |
Example 2.3A. Bird malaria |
“ | Mosaic plot | mosaicplot |
Example 2.3A. Bird malaria |
Mixed | Strip chart | stripchart |
Example 2.3C. Human hemoglobin and elevation |
“ | Box plot | boxplot |
Example 2.3C. Human hemoglobin and elevation |
“ | Multiple histograms | hist |
Example 2.3C. Human hemoglobin and elevation |
“ | Cumulative frequency distributions | — | — |
Strip chart of serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours.
Read the data and store in data frame (here named locustData). The following command uses read.csv
to grab the data from a file on the internet (on the current web site).
locustData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv")
Show the first few lines of the data, to ensure it read correctly. Determine the number of cases in the data.
head(locustData)
## serotoninLevel treatmentTime
## 1 5.3 0
## 2 4.6 0
## 3 4.5 0
## 4 4.3 0
## 5 4.2 0
## 6 3.6 0
nrow(locustData)
## [1] 30
Draw a stripchart (the tilde “~” means that the first argument below is a formula, relating one variable to the other).
stripchart(serotoninLevel ~ treatmentTime,
data = locustData,
method = "jitter",
vertical = TRUE)
A fancier strip chart, closer to that shown in Figure 2.1-2, by including more options.
# Stripchart with options
par(bty = "l") # plot x and y axes only, not a complete box
stripchart(serotoninLevel ~ treatmentTime,
data = locustData,
vertical = TRUE,
method = "jitter",
pch = 16,
col = "firebrick",
cex = 1.5,
las = 1,
ylab = "Serotonin (pmoles)",
xlab = "Treatment time (hours)",
ylim = c(0, max(locustData$serotoninLevel)))
# The following command calculates the means in each treatment group (time)
meanSerotonin = tapply(locustData$serotoninLevel,
locustData$treatmentTime,
mean)
# "segments" draws draws lines to indicate the means
segments(x0 = c(1,2,3) - 0.1,
y0 = meanSerotonin,
x1 = c(1,2,3) + 0.1,
y1 = meanSerotonin, lwd = 2)
A bar graph of education spending per student in different years in British Columbia.
Read the data into a data frame named educationSpending, and inspect.
educationSpending <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_3EducationSpending.csv")
head(educationSpending)
## year spendingPerStudent
## 1 1998 5844
## 2 1999 5983
## 3 2000 6216
## 4 2001 6328
## 5 2002 6455
## 6 2003 6529
Draw a bar graph. The names
argument generates numbers between 1998 and 2004 in 1-year increments, to be used as labels along the horizontal axis.
barplot(educationSpending$spendingPerStudent,
names.arg = educationSpending$year,
ylab = "Education spending ($ per student)")
A slightly fancier bar graph like that in Figure 2.1-3, which includes additional options, is shown here.
# Bar graph with more options
barplot(educationSpending$spendingPerStudent,
names.arg = educationSpending$year,
las = 1,
col = "firebrick",
ylim = c(0,8000),
ylab = "Education spending ($ per student)")
Frequency table and bar graph showing activities of 88 people at the time they were killed by tigers near Chitwan National Park, Nepal, from 1979 to 2006.
Read the data into data frame named tigerData
tigerData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2aDeathsFromTigers.csv")
head(tigerData)
## person activity
## 1 1 Disturbing tiger kill
## 2 2 Forest products
## 3 3 Grass/fodder
## 4 4 Fuelwood/timber
## 5 5 Grass/fodder
## 6 6 Forest products
Generate a frequency table. The sort
function is included to sort the categories by their frequencies.
tigerTable <- sort(table(tigerData$activity), decreasing = TRUE)
tigerTable
##
## Grass/fodder Forest products Fishing
## 44 11 8
## Herding Disturbing tiger kill Fuelwood/timber
## 7 5 5
## Sleeping in house Walking Toilet
## 3 3 2
You can arrange the frequency table vertically.
data.frame(Frequency = tigerTable)
## Frequency.Var1 Frequency.Freq
## 1 Grass/fodder 44
## 2 Forest products 11
## 3 Fishing 8
## 4 Herding 7
## 5 Disturbing tiger kill 5
## 6 Fuelwood/timber 5
## 7 Sleeping in house 3
## 8 Walking 3
## 9 Toilet 2
Use the addmargins
command to include sums in your frequency table.
data.frame(Frequency = addmargins(tigerTable))
## Frequency.Var1 Frequency.Freq
## 1 Grass/fodder 44
## 2 Forest products 11
## 3 Fishing 8
## 4 Herding 7
## 5 Disturbing tiger kill 5
## 6 Fuelwood/timber 5
## 7 Sleeping in house 3
## 8 Walking 3
## 9 Toilet 2
## 10 Sum 88
Draw a bar graph. The additional arguments cex.names = 0.5
shrinks the axis labels by 50%, and las = 2
flips the labels, so that they all fit in the window.
barplot(tigerTable,
ylab = "Frequency",
cex.names = 0.5,
las = 2)
A slightly fancier bar graph of the data with more options, like that shown in Figure 2.2-1, is shown here.
oldpar = par(no.readonly = TRUE) # stores a backup copy of current graph settings in "oldpar"
par(mar = c(8, 4, 4, 2) + 0.1) # creates more room for labels below the x-axis
barplot(tigerTable,
las = 2,
col = "firebrick",
cex.names = 0.8,
ylim = c(0,50),
xlab = "",
ylab = "Frequency (number of people)")
mtext("Activity",
side = 1,
line = 7,
cex = 0.9) # adds the text under the x-axis
par(oldpar) # reverts graph settings back to default
Frequency table and histogram illustrating the frequency distribution of bird species abundance at Organ Pipe Cactus National Monument.
Read and inspect the data.
birdAbundanceData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2bDesertBirdAbundance.csv")
head(birdAbundanceData)
## species abundance
## 1 Black Vulture 64
## 2 Turkey Vulture 23
## 3 Harris's Hawk 3
## 4 Red-tailed Hawk 16
## 5 American Kestrel 7
## 6 Gambel's Quail 148
Generate a frequency table of the numeric bird abundance variable. The option right = FALSE
ensures that abundance value 300 (for example) is counted in the 300-400 bin rather than in the 200-300 bin.
birdAbundanceTable <- table(cut(birdAbundanceData$abundance,
breaks = seq(0,650,by=50),
right = FALSE))
birdAbundanceTable
##
## [0,50) [50,100) [100,150) [150,200) [200,250) [250,300) [300,350)
## 28 4 3 3 1 2 1
## [350,400) [400,450) [450,500) [500,550) [550,600) [600,650)
## 0 0 0 0 0 1
The same table oriented vertically and including the sum.
data.frame(Frequency = addmargins(birdAbundanceTable))
## Frequency.Var1 Frequency.Freq
## 1 [0,50) 28
## 2 [50,100) 4
## 3 [100,150) 3
## 4 [150,200) 3
## 5 [200,250) 1
## 6 [250,300) 2
## 7 [300,350) 1
## 8 [350,400) 0
## 9 [400,450) 0
## 10 [450,500) 0
## 11 [500,550) 0
## 12 [550,600) 0
## 13 [600,650) 1
## 14 Sum 43
Draw a histogram of bird abundances.
hist(birdAbundanceData$abundance, right = FALSE)
Commands to draw a histogram with more options, such as Figure 2.2-3, are here.
# Histogram with options
hist(birdAbundanceData$abundance,
right = FALSE,
breaks = seq(0,650,by=50),
col = "firebrick",
las = 1,
xlab = "Abundance (No. individuals)",
ylab = "Frequency (No. species)", main = "")
Histograms of body mass of 228 female sockeye salmon sampled from Pick Creek in Alaska. The same data are plotted for three different interval widths: 0.1 kg, 0.3 kg, and 0.5 kg.
Read the data into data frame named salmonSizeData
salmonSizeData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f2_5SalmonBodySize.csv")
head(salmonSizeData)
## year sex oceanAgeYears lengthMm massKg
## 1 1996 FALSE 3 513 3.090
## 2 1996 FALSE 3 513 2.909
## 3 1996 FALSE 3 525 3.056
## 4 1996 FALSE 3 501 2.690
## 5 1996 FALSE 3 513 2.876
## 6 1996 FALSE 3 501 2.978
Histograms with different bin widths (0.1, 0.3, and 0.5).
hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.1),
col = "firebrick")
hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.3),
col = "firebrick")
hist(salmonSizeData$massKg,
right = FALSE,
breaks = seq(1,4,by=0.5),
col = "firebrick")
Contingency table, grouped bar plot, and mosaic plot showing the association between egg removal treatment and incidence of malaria in female great tits.
Read the data from the data file and inspect.
birdMalariaData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3aBirdMalaria.csv")
head(birdMalariaData)
## bird treatment response
## 1 1 Control Malaria
## 2 2 Control Malaria
## 3 3 Control Malaria
## 4 4 Control Malaria
## 5 5 Control Malaria
## 6 6 Control Malaria
Optional step: Set the desired order of treatment categories in tables and graphs. The factor
command allows us to order the levels so that the category “Egg removal” comes before “Control” in tables and graphs (categories are otherwise ordered alphabetically).
birdMalariaData$treatment <- factor(birdMalariaData$treatment, levels= c("Egg removal", "Control"))
Create a contingency table. Use table
with the names of two categorical variables as arguments, beginning with the response variable.
birdMalariaTable <- table(birdMalariaData$response, birdMalariaData$treatment)
birdMalariaTable
##
## Egg removal Control
## Malaria 15 7
## No Malaria 15 28
Include row and column sums in the contingency table.
addmargins(birdMalariaTable,
margin = c(1,2),
FUN = sum,
quiet = TRUE)
##
## Egg removal Control sum
## Malaria 15 7 22
## No Malaria 15 28 43
## sum 30 35 65
Draw a grouped bar graph using the contingency table.
barplot(as.matrix(birdMalariaTable), beside = TRUE)
Commands to produce a grouped bar graph with more options are shown here.
# Grouped bar graph with options
barplot(as.matrix(birdMalariaTable),
beside = TRUE,
space = c(0.1, 0.3),
las = 1,
xlab = "Treatment",
ylab = "Relative frequency",
col = c("firebrick", "goldenrod1"),
legend.text = rownames(birdMalariaTable),
args.legend = list(x = 0.3,
y = max(birdMalariaTable),
xjust = 0))
Draw a mosaic plot. The “t
” command flips (transposes) the table to ensure that the explanatory variable is along the horizontal axis.
mosaicplot(t(birdMalariaTable),
col = c("firebrick", "goldenrod1"),
sub = "Treatment",
ylab = "Relative frequency",
cex.axis = 1.1,
main = "")
Scatter plot of the relationship between the ornamentation of male guppies and the average attractiveness of their sons.
Read the data from the file.
guppyFatherSonData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3bGuppyFatherSonAttractiveness.csv")
head(guppyFatherSonData)
## fatherOrnamentation sonAttractiveness
## 1 0.35 -0.32
## 2 0.03 -0.03
## 3 0.14 0.11
## 4 0.10 0.28
## 5 0.22 0.31
## 6 0.23 0.18
Draw a scatter plot using the formula approach (with the “~
”).
plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData)
See here for commands to draw a fancier scatter plot like that in Figure 2.3-3.
# Scatter plot with options
plot(sonAttractiveness ~ fatherOrnamentation,
data = guppyFatherSonData,
las = 1,
pch = 16,
col = "firebrick",
cex = 1.5,
bty = "l",
xlab = "Father's ornamentation",
ylab = "Son's attractiveness")
Strip chart, box plot, and multiple histograms showing hemoglobin concentration in men living at high altitude in three different parts of the world and in a sea level population (USA).
Read the data.
hemoglobinData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv")
head(hemoglobinData)
## id hemoglobin population
## 1 US.Sea.level1 10.40 USA
## 2 US.Sea.level2 11.20 USA
## 3 US.Sea.level3 11.70 USA
## 4 US.Sea.level4 11.80 USA
## 5 US.Sea.level5 11.90 USA
## 6 US.Sea.level6 12.05 USA
Obtain the sample sizes in each of the four populations
table(hemoglobinData$population)
##
## Andes Ethiopia Tibet USA
## 71 128 59 1704
Draw a strip chart of the hemoglobin data.
stripchart(hemoglobin ~ population,
data = hemoglobinData,
method = "jitter",
vertical = TRUE)
Commands for a fancier strip chart like that shown in Figure 2.3-4 are here.
par(bty = "l")
stripchart(hemoglobin ~ population,
data = hemoglobinData,
vertical = TRUE,
method = "jitter",
jitter = 0.2,
pch = 1,
col = "firebrick",
las = 1,
xlab = "Male population",
ylab = "Hemoglobin concentration (g/dL)")
Draw a box plot, relating hemoglobin to population.
boxplot(hemoglobin ~ population, data = hemoglobinData)
A box plot with more options, like that in Figure 2.3-4, is shown here.
par(bty = "l")
boxplot(hemoglobin ~ population,
data = hemoglobinData,
col = "goldenrod1",
boxwex = 0.5,
whisklty = 1,
outcol = "black",
outcex = 1,
outlty = "blank",
las = 1,
xlab="Male population",
ylab = "Hemoglobin concentration (g/dL)")
Show the association between hemoglobin and population using the multiple histograms approach (Figure 2.3-5). The following commands use the lattice
package, which must first be loaded.
library(lattice)
histogram(~ hemoglobin | population,
data = hemoglobinData,
layout = c(1,4),
col = "firebrick",
breaks = seq(10,26,by=1))
Here we show commands to draw the multiple histograms in basic R, without using the lattice
package. This approach is more tedious, but the resulting graphs are often easier to modify.
# Multiple histograms using base graphics, plotting them one at a time.
# The "oma" option adjusts the outer margins of the whole figure
# The "mar" option tweaks the margins within each of the 4 plots.
oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(4,1),
oma = c(4, 6, 2, 6),
mar = c(2, 5, 4, 2))
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Andes"],
col = "firebrick",
las = 1,
main = "Andes",
breaks = seq(10,26,by=1),
ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Ethiopia"],
col = "firebrick",
las = 1,
main = "Ethiopia",
breaks = seq(10,26,by=1),
ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Tibet"],
col = "firebrick",
las = 1,
main = "Tibet",
breaks = seq(10,26,by=1),
ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "USA"],
col = "firebrick",
las = 1,
main = "USA",
breaks = seq(10,26,by=1),
ylab = "Frequency")
mtext("Hemoglobin concentration (g/dL)",
side = 1,
outer = TRUE,
padj = 1.5)
par(oldpar) # revert to default graph settings
Line graph showing confirmed cases of measles in England and Wales from 1995 to 2011. Annual counts are quarterly.
Read the data from file.
measlesData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e4aMeaslesOutbreaks.csv")
head(measlesData)
## year quarter confirmedCases yearByQuarter
## 1 2011 4th 136 2011.88
## 2 2011 3rd 154 2011.62
## 3 2011 2nd 346 2011.38
## 4 2011 1st 151 2011.12
## 5 2010 4th 31 2010.88
## 6 2010 3rd 134 2010.62
Drawing a line graph uses the same command as a scatter plot but adds the type = "l"
argument to draw a line graph instead.
plot(confirmedCases ~ yearByQuarter,
data = measlesData,
type="l")
Commands to draw a fancier line plot (like that in Figure 2.4-1) are shown here.
# Line plot with options
plot(confirmedCases ~ yearByQuarter,
data = measlesData,
type = "l",
las = 1,
lwd = 1,
bty = "l",
xlab = "Year",
ylab = "Number of cases",
xaxp = c(1995,2012,2012-1995))
points(confirmedCases ~ yearByQuarter,
data = measlesData,
pch = 16,
col = "firebrick",
cex = 0.7)