See https://en.wikipedia.org/wiki/Iris_flower_data_set for more info.
———————————————-
Fisher’s iris dat comes automaically with R. You can load it into memory using the command “data()”
#Load the iris data
data(iris)
You can check that it was loaded using the ls() command (“list”).
ls()
## [1] "iris"
You can get info about the nature of the daframe using commands like dim()
dim(iris)
## [1] 150 5
This tells us that the iris data is essentially a spreadhshett that has 150 rows and 5 columns.
We can get the column names with names()
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## [5] "Species"
Note that the first letter of each word is capitalized. What are the implications of this?
The top of the data and the bottom of the data can be checked with head() and tail
#top of data
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
#bottom of data
tail(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
Another common R command is is(), which tells you what something is in R land
is(iris)
## [1] "data.frame" "list" "oldClass" "vector"
R might spew a lot of things out at you; usually the first item is most important. Here, it tells us that the “object” called iris in your workspace is first and foremost a “data.frame”, which is esseentially a spreadsheet of data loaded into R.
You can get basic info about the data themseleves using comamnds like summary()
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
If you wanted information on just a single column, you would tell R to isolate that column like this
summary(iris$Sepal.Width)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.800 3.000 3.057 3.300 4.400
That is, that name of the dataframe, a dollar sign ($), and the name of the column.
What happens when you don’t capitalize something?
#all lower case
# summary(iris$sepal.width) # this won't work
#just "s" in "sepal" lower case
# summary(iris$sepal.Width) #this won't work either
#or what if you capitalize "i" in "Iris"?
# summary(Iris$Sepal.Width) #won't work either
The 1st two error messages are not very informative; the 3rd one does make a little sense.
Many scientists develop software for R, and they often include datasets to demonstrate how the software works. Some of this software, called a “package” comes with R already and just needs to be loaded. This is done with the library() command.
The MASS package comes with R when you download it.
#Load the MASS package
library(MASS)
MASS contains a dataset called called “mammals”
data(mammals)
You can confirm that the mammals data is in your workspace using ls()
ls()
## [1] "iris" "mammals"
You should now have the iris and the mammals data.
What is in the mammals dataset? Datasets actually usually have useful help files. Access help using the “?” function.
?mammals
## starting httpd help server ...
## done
The help screen you pop up. It tells us that mammals is “A data frame with average brain and body weights for 62 species of land mammals.” At tghe bottom we can see that these data come from the paper
“Selected from: Allison, T. and Cicchetti, D. V. (1976) Sleep in mammals: ecological and constitutional correlates. Science 194, 732-734.”
We can learn about the mammals data usig the usual commands
dim(mammals)
## [1] 62 2
names(mammals)
## [1] "body" "brain"
head(mammals)
## body brain
## Arctic fox 3.385 44.5
## Owl monkey 0.480 15.5
## Mountain beaver 1.350 8.1
## Cow 465.000 423.0
## Grey wolf 36.330 119.5
## Goat 27.660 115.0
tail(mammals)
## body brain
## Echidna 3.000 25.0
## Brazilian tapir 160.000 169.0
## Tenrec 0.900 2.6
## Phalanger 1.620 11.4
## Tree shrew 0.104 2.5
## Red fox 4.235 50.4
summary(mammals)
## body brain
## Min. : 0.005 Min. : 0.14
## 1st Qu.: 0.600 1st Qu.: 4.25
## Median : 3.342 Median : 17.25
## Mean : 198.790 Mean : 283.13
## 3rd Qu.: 48.203 3rd Qu.: 166.00
## Max. :6654.000 Max. :5712.00
Most packages don’t come with R when you download it but are stored in a central site called CRAN. We’ll load the “doBy” package.
RStudio makes it easy to find and load packages.
* In the panel of RStudio that has the tabs “Plots”, “Packages”,“Help”, “Viewer” click on packages * Onthe next line it says “Install” and “Update”. Click on “Install” * A window will pop up. In the white field in the middle of the window under “Packages” type the name of the package you want. * RStudio will automatically bring up potential packages as you type. * Finish typing “doBy” or click on the name. * Click on the “Install” button. * In the source viewere some misc. test should show up. Most of the time this works. If it doesn’t, talk to the professor!
You can also use the install.packages() command to try to load the package. I already have the “doBy” download so I have “commented out” the code with a “#”. To run the code, remove the “#”. If you already followed the instructions above you don’t need to run the code.
#install.packages("doBy")
If you already have the package downloaded to your computer then a window will pop up asking you if you want to restart your computer. Normally this isn’t necessary; just click “no”. You might see a “warning” message pop up in the console such as “Warning in install.packages: package ‘doBy’ is in use and will not be installed”. This isn’t a problem
Making a basic plot is faily easy. This basic structure is used frequently in R so we will discuss it in some detial.
#Load the iris data using the data() command
data(iris) #load iris data in case you haven't
#make a very basic boxplot
boxplot(Sepal.Length ~ Species, #the plotting formula of y ~ x
data = iris) #the data to plot
Key features of the comman are * the “~”" * the “comma” * the “data = iris” * capitalization
R will usually generate labels for the x and y axes based on the command. These need to be changed.
“xlab =” sets the labels for the x-axis, “ylab” for the y axis. Note that these both occur inside the paretheses, and that the text for the labels goes in quotes. Forgetting the quotes will cause the code to fail.
boxplot(Sepal.Length ~ Species, #the plotting formula of y ~ x
data = iris, #the data to plot
xlab = "Iris Species", #Label for x axis
ylab = "Sepal Length (mm)" ) #Label for y axis, w/ units
If we wanted we could change the color of the boxplots using the universal command “col =”. This code can be used to change the color of most types of plots in R. This doesn’t increase the information content of the figure but maybe makes it nicer to look at.
boxplot(Sepal.Length ~ Species, #the plotting formula of y ~ x
data = iris, #the data to plot
xlab = "Iris Species", #Label for x axis
ylab = "Sepal Length",
col = 3)
If we want we could set the color of each box different. This accents that each box is a differnet Iris species. The code “col = 2:4” tells R to use the sequence of colors “2, 3, 4”. I skip “1” b/c its black and will obscure the bar that indicates the median
#boxplot w/ each color different
boxplot(Sepal.Length ~ Species, #the plotting formula of y ~ x
data = iris, #the data to plot
xlab = "Iris Species", #Label for x axis
ylab = "Sepal Length",
col = 2:4)
This next code uses a slight variant. I’ve reversed the order of the colorrs so you can see the difference. I’ve use “col = c(4,3,2)” which is longhand for “col = 4:2”
boxplot(Sepal.Length ~ Species, #the plotting formula of y ~ x
data = iris, #the data to plot
xlab = "Iris Species", #Label for x axis
ylab = "Sepal Length",
col = c(4,3,2))
We could also do this
boxplot(Sepal.Length ~ Species, #the plotting formula of y ~ x
data = iris, #the data to plot
xlab = "Iris Species", #Label for x axis
ylab = "Sepal Length",
col = c("red","green","blue"))
library(MASS)
data(mammals)
#The errbar() function used below is in the Hmisc package
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
n <- dim(mammals)[1]
u.body <- mean(mammals$body)
u.brain <- mean(mammals$brain)
se.body <- sd(mammals$body)/sqrt(n)
se.brain <- sd(mammals$brain)/sqrt(n)
mam.df <- data.frame(body.part = c("body","brain"),
mean = c(u.body,u.brain),
SE = c(se.body,se.brain))
errbar(x = 1:2,
y = mam.df$mean,
yplus =mam.df$mean+ mam.df$SE,yminus = mam.df$mean-mam.df$SE,
xlab = "Body Part",ylab = "Mass",
xlim=c(0.5,2.5),
xaxt="n",
cex =3)
axis(side=1,at=1:2,labels=mam.df$body.part)
legend("topleft", legend = "Error bars = SE",bty = "n")
legend("bottomleft", legend = "n = 62 spp",bty = "n")
#Data from Science paper by biogeographer...not Gaston...
Island <- c("Hawaii","Kauai","Lana","Maui","Molokai","Oahu")
Extinctions <- c(11,1,7,4,7,7)
i <- order(Extinctions,decreasing = T)
barplot(height = Extinctions[i], names.arg = Island[i],
col = 1:length(Island))
This plot shows the distribution of birthweights from a set of babies born in a hospital.
library(MASS)
data(birthwt)
hist(birthwt$bwt)
library(MASS)
hist(birthwt$bwt,
xlab = "Birthweight (grams)",
ylab = "Frequency")
library(MASS)
hist(birthwt$bwt,
xlab = "Birthweight (grams)",
ylab = "Frequency")
abline(v = mean(birthwt$bwt), col = 2, lwd = 3,lty =3)
hist(birthwt$bwt,
xlab = "Birthweight (grams)",
ylab = "Frequency")
abline(v = mean(birthwt$bwt), col = 2, lwd = 3,lty =3) #first reference line for the mean
abline(v = 4422.525607500001, col = 3, lwd = 3,lty =1) #Dr. Brouwer's son jude
#"Mammal body-brain allometry"
library(MASS)
data(mammals)
plot(log(brain) ~ log(body),
data = mammals)
#"Mammal body-brain allometry"
library(MASS)
data(mammals)
plot(log(brain) ~ log(body),
data = mammals,
xlab = "Log of animal body mass",
ylab = "log(brain mass)")
Increase the size of the points wiht cex = 2
plot(log(brain) ~ log(body),
data = mammals,
cex = 2,
xlab = "Log of animal body mass",
ylab = "log(brain mass)")
Change the shape of the point being plotted w/ “pch = 2”
plot(log(brain) ~ log(body),
data = mammals,
cex = 2,
pch =2,
xlab = "Log of animal body mass",
ylab = "log(brain mass)")
Increase the thickness of the lines used to draw the points with “lwd=2”
plot(log(brain) ~ log(body),
data = mammals,
cex = 2,
pch =2,
lwd = 2,
xlab = "Log of animal body mass",
ylab = "log(brain mass)")
#install.packages("primer")
library(primer)
## Loading required package: deSolve
##
## Attaching package: 'deSolve'
## The following object is masked from 'package:graphics':
##
## matplot
data(sparrows)
plot(Count ~ Year, data = sparrows, type = "b",
lwd = 2, pch = 2, xlab = "Year",ylab = "Sparrows on Route")
sparrow.lm <- lm(Count ~ Year, data = sparrows)
abline(sparrow.lm, col = 2, lwd = 3, lty = 2)
Song Sparrow (Melospiza melodia) counts in Darrtown, OH, USA. From USGUS Breeding Bird Survey (BBS)
Websites www.biostat.wisc.edu/~kbroman/topten_worstgraphs/ www.americanscientist.org/issues/pub/population-growth-technology-and-tricky-graphs
Papers Wainer. 1984. How to Display Data Badly. Am. Statistician.