R is a very flexible and powerful programming language. It can be used as a calculator or to import, manipulate, and export data. It has many built-in statistical functions, and you can readily create your own functions. It can create publication-ready graphics. It is free and well supported by a programming community.
So what is not to like? Well, for starters there is a steep learning curve. It requires writing code.1 As an object-based programming language, R is simple in concept but sometimes complex to implement. There is often more than one way to do things, but no way may be obvious to a beginner. Because you opened this document, I assume you are interested. Yes, the effort is worth it.
An object-based programming language treats different data types the same, as objects. There are many types of objects, some of which are numeric:
but some are something else entirely:
If this is not familiar to you, don’t worry, this chapter will explain it.
The goal of this chapter is to introduce you to expressions, assignments, and these different types of objects. Along the way you should pick up some elementary coding syntax. This document is designed to be short and self contained, but you will certainly get more out of it if you follow this with the R program running. If so, please enter the commands in the console and see how they work for yourself.
The tutorial assumes you already have R on your computer. If not, then you need to search the web for ‘R Core Development Team,’ to find, download, and configure R on your computer (www.r-project.org/). It is free but it may take a while to get it all set up.
This tutorial is not meant to be a detailed reference. At the end of this document, I refer to other websites that provide more introductory information about R, and still others that demonstrate a wide variety of fishery science applications using R.
An expression is evaluated and printed. In the following examples, it is recommended that you type this code into your R console to follow along.
When writing an expression, R acts like a calculator. Type the expression ‘1 + 1’ after the prompt (>) in your console. Text following the hashtag (#) is not evaluated by the program. It is used here specifically to supplement the text, and you do not need to type it in yourself if you are using the console at the same time.
1 + 1 #A simple arithmetic calculation
## [1] 2
A double hashtag precedes output from the code. It is part of the program used to create this document (i.e., ‘rmarkdown’), so you will not see the double hashtag in the console environment. Instead focus on the number in brackets (i.e., [1]). This indicates that there is only one value (i.e, 2, a scalar) as part of the output.
R has many familiar functions,
log (10) #The natural log of the number 10
## [1] 2.302585
log10(10) #The log (base 10) of the number 10
## [1] 1
and some special ones too.
pi #The value of the number pi
## [1] 3.141593
R will let you know when an operation does not make sense. Here the output is not a number (NaN) together with a warning message.
sqrt (-5) #This shows the ‘not a number’ response
## Warning in sqrt(-5): NaNs produced
## [1] NaN
In summary, when using expressions as demonstrated here, R acts like a calculator. In the console environment, an expression is typed in at the R prompt (>), and the printed output is preceded by a [1].
An assignment is ‘evaluated’ but not printed. The R symbol for assigning data to a variable name is “<-“.
x <- log10(10) #Assigns a = log (base 10) of the number 10
y <- x #One assignment can be assigned to another variable
x + y -> z #Assignments can be given 'to the right'
An assignment can be printed by typing it at the prompt or by using the print command.
x #Prints the current value of ‘x’ (defined, as above)
## [1] 1
print (y) #A print command can be used as well if you like to type
## [1] 1
z
## [1] 2
More recent versions of R allow the use of the equal sign instead of the assignment symbol used above (i.e., you may use = instead of <-). This is fine as long as there is no ambiguity (e.g., assignments cannot be named with only numbers). It is simple to use = and some examples from now on will do so.
Assignment names are case sensitive and they may be written as words, which give them virtually limitless possibilities:
M = "April" #Assignments can be for character strings
M #Notice a character string is printed in quotations ...
## [1] "April"
m = "11-May-1960" #Assignments are case sensitive
m
## [1] "11-May-1960"
height.cm = 123 #A period can be used to create more complex names
weight.kg = 6.2 #Try printing these out yourself in the R console
Each assignment creates an object. So far we have assigned objects as either a scalar or a character object. In this environment, a numeric symbol (e.g., 3), can be either a scalar or a character. This would be problematic if we tried to add scalar and character objects together. Moreover, there are other modes for data objects. Scalar can be extended to vectors and matrices. There are also logical modes, factors, lists, and data frames, which we will cover below. Therefore we need a way to keep track of the mode of an object.
Can you tell the mode of these assignments?
mode.1 = T
mode.2 = "T"
mode.3 = "3.5"
mode.4 = factor ("Florida")
The answer is revealed with the class command.
class (mode.1) #The mode of example 1 is …
## [1] "logical"
class (mode.2) #This is not the same class, is it?
## [1] "character"
class (mode.3) #”3.5” was not input as a number but as a character
## [1] "character"
class (mode.4) #A factor is an attribute or ordinal data
## [1] "factor"
Let’s examine the same object from several different angles.
class (m)
## [1] "character"
mode (m) #Mode is another way to check for object type
## [1] "character"
str (m) #This displays the data type and the datum
## chr "11-May-1960"
summary (m) #This displays the number of values and both class and mode
## Length Class Mode
## 1 character character
The mode of an object can be changed. This is really helpful when you are programming because the default mode may not be what you expected or want. The following example changes the previously assigned m.
mode.3n = as.numeric(mode.3) # Remember mode.3 had been a character
str(mode.3)
## chr "3.5"
str(mode.3n)
## num 3.5
m.date = as.Date (m, "%d-%B-%Y") #Change m’s mode from character to date
m.date #Go ahead: verify the class of m & m.date
## [1] "1960-05-11"
format (m, format = "%d- %B- %y") #Reformat the date output
## [1] "11-May-1960"
as.integer (as.Date(m, "%d- %B- %y")) #Days since January 1, 1970
## [1] 18027
Try this yourself by changing class of mode.4 from ‘factor’ to ‘character’ in the manner demonstrated above.
Remember, the subscript [1] is explicit that R stores data as a vector. So far, the examples have all been scalar (i.e., vectors with one value). Longer vectors may extend across multiple lines, so that each line identifies the position of the first value of each line.
rep(0, 100) #Create repeating values with 'rep'
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Matrix algebra is rather simple in R because scalar, arrays, vectors, and matrixes are all treated as objects. The simplicity and power of an object-based program should become apparent in the following example using arrays and matrices.
num.first = seq(1,5) #Assign numbers quickly with ‘sequence’
num.last = c(6, 7, 8, 9, 10) #Use c() for concatenation
num.both = c(num.first, num.last) #Vectors can be combined easily
num.first
## [1] 1 2 3 4 5
num.last
## [1] 6 7 8 9 10
num.both
## [1] 1 2 3 4 5 6 7 8 9 10
num.first+num.last #Vectors can be added
## [1] 7 9 11 13 15
sum (num.both) #Or the sum can be tallied
## [1] 55
num.first [5] #The value at the fifth position can be checked
## [1] 5
num.both > 8 #A logical vector can indentify certain values
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
sum (num.both>8) #And these identified values can be counted
## [1] 2
In this manner, R can be used to add, describe, and audit an object.
Arrays and matrices can also be generated and evaluated quickly.
X = rbind(num.first, num.last) #Arrange by rows, retaining the names
X #A vector is often named with a capital letter
## [,1] [,2] [,3] [,4] [,5]
## num.first 1 2 3 4 5
## num.last 6 7 8 9 10
x #Remember that R names are case sensitive
## [1] 1
matrix (num.both, 2, 5) #Notice the notation of [,c] and [r,]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
Check the difference between rbind and cbind and how the order of matrix elements is altered by matrix (num.both, 2, 5, byrow=T).
In general, all the parts of an object should be the same class: numeric, logic, character, etc. If they are not, then R will try to force them into the same class type. The problem is obvious. You may not expect which class it does choose, or it may not be able to coerce the data into a single class and you will get no values as a result. An exception to this is to use lists, since a list does not have any restrictions on using the same class. This is demonstrated here where an integer, a character, a logic value, and a complex number are mixed together into a vector.
mix <- list (1:5L, "a", T, 1+0i)
mix
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+0i
You can check yourself, that the class of mix is a ‘list’ and that the class of the first element of mix is an ‘integer’ (e.g., if the ‘L’ was not used here, then the class of the first element would have been ‘numeric’).
Lists are very flexible in this way. A common form of data – the data frame – is a type of list. Many outputs in R are in the form of lists.
One more object type worth mentioning is the data frame. This is a basic data structure in R, which combines the properties of matrices and lists, similar to how an spreadsheet is structured. You can import these from external data sources or create your own in the R console. Here is an example of the latter along with several commands to examine the data set.
age <- seq(1, 8, 1) #A sequence from 1 to 8
size <- c(10, 18, 24, 28, 30, 31, 31.5, 31.75)
growth.fish <- data.frame(age,size)
This dataset is small enough to print out entirely by typing growth.fish
but what if it was very large? Here are some basic commands to understand the dataset.
names(growth.fish) #The variable names
## [1] "age" "size"
str(growth.fish) #The structure of the data frame
## 'data.frame': 8 obs. of 2 variables:
## $ age : num 1 2 3 4 5 6 7 8
## $ size: num 10 18 24 28 30 ...
dim(growth.fish) #The dimensions (RxC) of the data frame
## [1] 8 2
head(growth.fish) #A peek at the first few rows (default is 6)
## age size
## 1 1 10
## 2 2 18
## 3 3 24
## 4 4 28
## 5 5 30
## 6 6 31
summary(growth.fish) #A description of the data (e.g., quantiles)
## age size
## Min. :1.00 Min. :10.00
## 1st Qu.:2.75 1st Qu.:22.50
## Median :4.50 Median :29.00
## Mean :4.50 Mean :25.53
## 3rd Qu.:6.25 3rd Qu.:31.12
## Max. :8.00 Max. :31.75
Variables from a data frame can be plotted with the plot command.
plot (age, size, type='b', xlab="Age", ylab="Size")
This brief review introduced the concepts of expressions, assignments, and objects so that you can better understand the flexibility of the R programming language and get a feel for how to write, understand, and debug simple code. You need to understand the data completely before you can write code to analyze it, so this introduction also included some commands to evaluate the data matrix itself. This point should be re-emphasized to say: The flexibility of R can sometimes trip you up; when you find that a command is not working, it is wise to check the class
or str
of the object to make sure it is what you think it is!
This chapter does not attempt to serve as more than a basic introduction. There are many very good guides to R on the web that are listed below. I have also listed several other websites that are more advanced and cover topics related to fishery science.
http://cran.r-project.org/ The R website, the source of the program and tons of documentation
On the web you should also try http://www.rseek.org/ and http://stackoverflow.com/tags/r
In the console, you can also type help (function)
or ?function
to learn more about each function. When you do this, be sure to scroll all the way down to see the examples at the end.
http://cran.r-project.org/doc/contrib/Short-refcard.pdf
http://cran.r-project.org/doc/contrib/usingR.pdf Maindonald. Using R for Data Analysis and Graphics: Introduction, code, and commentary
http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Paradis. R for Beginners
http://cran.r-project.org/doc/manuals/R-intro.pdf Venables, Smith, et al. An Introduction to R
http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf Verzani. Using R for Introductory Statistics
http://www.rforge.net/FSA/vignettes.html A series of fishery datasets, programs, and ‘vignettes’ written by Derek Ogle
http://cran.r-project.org/web/packages/fishmethods/fishmethods.pdf A variety of datasets and programs written by Gary Nelson.
http://code.google.com/p/r4ss/ R code for stock synthesis, from the Northwest Fisheries Science Center, a part of the US National Marine Fisheries Service
http://flr-project.org/ Fisheries Library in R
http://sourceforge.net/apps/trac/ichthyoanalysis/wiki Ichthyoplankton analysis
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0092725 ss3sim: An R Package for Fisheries Stock Assessment Simulation with Stock Synthesis
http://stats-lab.com/user-2015-aalborg/ This looks particularly interesting where it has assembled materials from a number of different sources
There is a graphic user interface version call Splus, which you can buy.↩