How to organise your work directory
Understand the use of logic operators
Understand how to use vectors and data frames and know how to subset those
Understand the key object classes in R
Know a few useful functions to create random numbers and to explore a data frame
Know a few simple functions to visualise data quickly
The workspace or working directory is the place where R puts your objects, but also where it looks for files when you load data. You can think of it as your target folder or user folder. If you are working in R markdown, it is best to set the work space to a known folder (e.g. the desktop or your memory stick). For example, you can include the below command (replace the path with your own). Use the tab to autocomplete your path. We can query the working directory using the following command:
## Check working directory
getwd()
## [1] "/Users/em8189/Desktop/SCIE802"
We can list all files in the current working directory (all files in the current target folder).
list.files()
list.files(pattern = "docx") # confine search to MS Word documents
Setting the working directory is done using the setwd()
command (requires you to provide the path to the folder….annoying) or more conveniently clicking on Session, Set Working Directory, Choose Directory… in the menu bar (the bar on top of the window) of RStudio. This procedure will allow you to navigate to the working directory (your target folder) using a windows approach. Once you’ve selected your working directory, you will see a setwd
command popping up in the console. Now simply copy and paste this command from the console into your R script, so that next time you run the script the working directory is conveniently set by running this line of code.
setwd("~/Desktop") # this won't work on your PC, your path will be different...set it via the menu bar...
R has a number of logical operators allowing data subsetting and manipulation:
x <- c(2, 5, 7)
x
## [1] 2 5 7
x < 4
## [1] TRUE FALSE FALSE
Vectors can be subset using square brackets.
x[2] # extracts the second element of the x-vector
## [1] 5
x[c(1, 3)] # extracts the first and third element of the x-vector
## [1] 2 7
Using logical operators creates vectors of TRUE
and FALSE
, which we can use for subsetting like this:
x[x < 4]
## [1] 2
x[x > 4]
## [1] 5 7
x[x >= 5]
## [1] 5 7
x[x <= 5]
## [1] 2 5
x[x == 2]
## [1] 2
The exclamation mark !
can be used to exclude values and the ampersand symbol &
can be used as a logical ‘AND’:
x[x != 2]
## [1] 5 7
x[x > 2 & x < 7]
## [1] 5
To select one value specifically use the ‘equal to’ operator indicated by two equal signs ==
like this:
x[x == 2]
## [1] 2
The logical operators can also be used with multiple vectors.
y <- c(2, 2, 2)
x == y # Logical equal
## [1] TRUE FALSE FALSE
test <- x == y
## How does R operate? Visualisation by gathering all information in a data frame
data.frame(x, y, test)
x < y
## [1] FALSE FALSE FALSE
x <= y
## [1] TRUE FALSE FALSE
x != y # not equal
## [1] FALSE TRUE TRUE
x == y & x < y
## [1] FALSE FALSE FALSE
x == y | x < y
## [1] TRUE FALSE FALSE
Let’s create some vectors other than numeric ones, e.g. character vectors:
sites <- c("site1", "site2", "site3")
sites
## [1] "site1" "site2" "site3"
sites[sites == "site3"]
## [1] "site3"
sites[sites != "site3"]
## [1] "site1" "site2"
Yet another approach to extract an element from a vector is to use it position within the vector like this:
sites[2]
## [1] "site2"
x[c(1, 3)]
## [1] 2 7
sites[c(1, 3)]
## [1] "site1" "site3"
Selecting or excluding multiple elements requires the use of the c()
command:
sites[c(1, 3)]
## [1] "site1" "site3"
We can easily deselect elements by inverting the selection using a minus sign:
sites[3]
## [1] "site3"
sites[-3]
## [1] "site1" "site2"
sites[-c(1, 3)]
## [1] "site2"
Now let’s take it to the next level and try these tools out on a simple data set. Data sets (tables) are called data frames in R and can be read in or created within R using the data.frame
command.
dat <- data.frame(site = sites, biomass = x)
dat
Data frames subsetting can be done in various ways and here we will use the square bracket method. We write the name of the data frame directly followed by square brackets [ ]
. Withing the square brackets, everything to do with the rows comes first, then followed by a comma, we specify the column selection. If no row or column selection is made, we leave the respective slot blank.
dat[1, ] # select row 1
dat[1, 2] # select row 1 and column 2
## [1] 2
dat[, 2] # select column 2
## [1] 2 5 7
dat[, "biomass"] # select column two by name
## [1] 2 5 7
dat$biomass # using the dollar operator to extract a single column by name
## [1] 2 5 7
dat[c(1, 3), ] # select rows 1 and 3
dat[c(1, 2), ] # select rows 1 and 2
We can also use logical operators to create conditional subsets.
dat[dat$biomass < 5, ]
dat[dat$biomass < 5 & dat$site != "site2", ] # we can string multiple logical conditions together
dat[dat$biomass < 5 | dat$site != "site2", ] # note the difference between the logical AND &, and the logical OR |
R has various functions to create numeric and character vectors, the most important ones are the c
(combine function), the colon opeartor :
for integer sequences, the seq
(sequence function) for all sorts of sequences and the rep
(replicate function):
y <- c(3, 5, 6, 8, 10)
y1 <- 2:20
y2 <- seq(from = 2, to = 20, by = 0.5)
length(y2)
## [1] 37
y3 <- seq(from = 0, to = 10, length.out = 100)
length(y3)
## [1] 100
y4 <- rep(x = 3, times = 5)
y5 <- rep(c(3, 5), times = 5)
y6 <- rep(c(3, 5), each = 5)
y7 <- rep(c(3, 5), times = 5, each = 2)
y7 <- rep(c("high N", "low N"), each = 3)
We have seen that the nature of a vector can vary, so far we had vectors containing numeric values and others containing character strings (words). We can query the nature of a vector by using the class
function like this:
class(x)
## [1] "numeric"
class(sites)
## [1] "character"
class(y7)
## [1] "character"
class(dat)
## [1] "data.frame"
R has numerous built-in data sets that you can view and access using the data()
function.
data() # view built-in data sets
mtcars # a built-in data set on car specs
Illustrate the most important objects in R (explanation, examples):
numeric
character
data.frame
function
logial
Make use of the function class
!
Explain the use of
rnorm()
rep()
head()
tail()
summary()
Explain and illustrate the use of
plot()
boxplot()
pairs()
hist()
…