How to organise your work directory
Understand the use of logic operators
Understrand how to use vectors and data frames and know how to subset those
Understand the key object classes in R
Know a few useful functions to create random numbers and to explore a data frame
Know a few simple functions to visualise data quickly
The workspace in R is where R puts all your objects, but also where it looks to pull up files when it wants to load your data. It is essentially a target folder. It is bnest to set your work space to a known folder (i.e your desktop/usb stick).
You can query the working directory using the following command:
getwd()
## [1] "C:/Users/Samir Brown/Desktop/R Notes 2018"
#checks working directory
You are also able to list all the files in the current working directory with the code below:
list.files()
## [1] "SCIE802 Lab 2 Notes.Rmd" "SCIE802 Lab 3 Notes.Rmd"
## [3] "SCIE802_Lab_1_Notes.html" "SCIE802_Lab_2_Notes.Rmd"
list.files(pattern = "docx") #confines search to MS WORD Docs only
## character(0)
You are alos able to change the working directory by using the commandsetwd()but this is rather tedious it is more convienient to just click on session, Set working directory, choose directory all in the menu bar above.
Once you do this a setwd command pops up in the console. Now simply copy and paste this code from the console to the Rscript.
setwd("C:/Users/Samir Brown/Desktop/R Notes 2018")
R has multiple logical operators which allows for data subsetting and manipulation as seen below
x <- c(2,5,7)
x
## [1] 2 5 7
x < 4 # will tell you all numbers in the vector X that are less than 4
## [1] TRUE FALSE FALSE
Vectors can also be subsetted through the use of square brackets as seen below:
x[2] # extracts only the second element of the x-vector
## [1] 5
x[c(1,3)] # extracts the first and third element of the x-vector ****note you must use the combine function if you want to pull out multiple
## [1] 2 7
Using logical operators creates vectors of TRUE and FALSE, which we canm use for subsetting like this
x[x < 4] #pulls up data that is less than 4
## [1] 2
x[x > 4] #pulls up data that is more than 4
## [1] 5 7
x[x >= 5] #pulls up data that is greater than or equal to 5
## [1] 5 7
x[x <= 5] #Pulls up data that is less than or equal to 5
## [1] 2 5
x[x == 2] # To select one value specifically use the 'equal to' operator indicated by two equal signs `==` like this:
## [1] 2
Exclamation marks ! are used to exclude values and the ampersand symbol & can be used as a logical and as seen below
x[x!= 2] # exclude values that are equal to 2 from the x vector
## [1] 5 7
x[x > 2 & x < 7] # creates a subset in which values are less than 2 and greater than 7
## [1] 5
The logical operators can also be used with multiple vectors as seen below
y <- c(2, 2, 2)
x == y #This compares values from the x and y vector to see if they are equal to one another in the order that they are listed
## [1] TRUE FALSE FALSE
test <- x == y
## How does R operate ? visualisation by gathering all information in a data frame
data.frame(x, y, test)
## x y test
## 1 2 2 TRUE
## 2 5 2 FALSE
## 3 7 2 FALSE
x < y #are the values in Y greater than values in X
## [1] FALSE FALSE FALSE
x <= y #are the values in Y greater than or equal to the ones in X
## [1] TRUE FALSE FALSE
x != y #not equal
## [1] FALSE TRUE TRUE
x == y & x < y #are the values in X equal to those in Y AND are values in y greater than in X
## [1] FALSE FALSE FALSE
x == y | x < y #are the values in X equal to those in Y ORR are values in Y greater than those in Y
## [1] TRUE FALSE FALSE
As mentioned in the notes before, vectors can be numerical or character of nature as seen below:
sites <- c("site1", "site2", "site3") #note use combine function to indicate that they are linked + quotation marks
sites
## [1] "site1" "site2" "site3"
sites[sites == "site3"] #selects specific values that are only site3 in the sites vector
## [1] "site3"
sites[sites != "site3"] #selects all values that are NOT site 3
## [1] "site1" "site2"
You can also extact elements from a vector by using its position within the vector as seen below:
sites[2] #extracts only the second item in the sites vector
## [1] "site2"
x[c(1, 3)] #extracts both the 1st and 3rd item from the vector
## [1] 2 7
sites[c(1, 3)] # note if you want to select/exclude multiple items from the vector you must use the combine function
## [1] "site1" "site3"
You can also deselect elements from the vector through the use of a minus sign - :
sites[3]
## [1] "site3"
sites[-3]
## [1] "site1" "site2"
x[c(1, 3)]
## [1] 2 7
x[-c(1, 3)] # when combining the minus sign goes in front of the C not the numbers
## [1] 5
Data sets (tables) are called data frames in R and can bve read in or created within R using the data.frame command.
dat <- data.frame(site = sites, biomass = x)
dat
## site biomass
## 1 site1 2
## 2 site2 5
## 3 site3 7
Data frames can be subsetted in various ways such as using the the square bracket method. You first write the name of the data frame followed directly followed by square brackets []. Within the square brackets, everything to do with the rows comes first, then after the comma, we specify the coloumn section. If no row or column is made we leave the respective slot blank
dat[1, ] #Select Row 1
## site biomass
## 1 site1 2
dat[1, 2] #select row 1 and column 2
## [1] 2
dat[, 2] #select column 2
## [1] 2 5 7
dat[, "biomass"] #selects coloumn by name
## [1] 2 5 7
dat[c(1, 3), ] #selects rows 1 and 3 only
## site biomass
## 1 site1 2
## 3 site3 7
dat$biomass #using the dollar operator to extract a single column by name
## [1] 2 5 7
Logical operators can also come into play here to create conditional subsets as seen below
dat[dat$biomass < 5] #Tells you which sites had a biomass greater than 5
## site
## 1 site1
## 2 site2
## 3 site3
dat[dat$biomass < 5 & dat$site != "site2", ]
## site biomass
## 1 site1 2
dat[dat$biomass < 5 | dat$site != "site2" , ]
## site biomass
## 1 site1 2
## 3 site3 7
# we can string multiple logical conditions together
# note the difference between the logical AND &, and the logical OR |
R has multiple functions to create numeric and character vectors, the most important ones are the c combine functions, colon operator :, and the seq sequence function for all sorts of sequences and the rep replicate function
q <- c(3, 5, 6, 8, 10)
q1 <- 2:20
q2 <- seq(from = 2, to=20, by = 0.5)
length(q2)
## [1] 37
q3 <- seq(from = 0, to = 10, length.out = 100)
length(q3)
## [1] 100
q4 <- rep(x = 3, times = 5)
q5 <- rep(c(3,5), times = 5)
q6 <- rep(c(3, 5), each = 5)
q7 <- rep(c(3, 5), times = 5, each = 2)
p7 <- rep(c("high N", "low N"), each = 3)
We have seen that the nature of a vector can vary, so far we had vectors containing numeric values and others containing character strings (words). We can query the nature of a vector by using the class function like this:
class(x)
## [1] "numeric"
class(sites)
## [1] "character"
class(q7)
## [1] "numeric"
class(dat)
## [1] "data.frame"
R has numerous built-in data sets that you can view and access using the data() function.
data() # view built-in data sets
mtcars # a built-in data set on car specs
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Illustrate the most important objects in R (explanation, examples):
Numeric are a type of data class that contain decimals as well as whole numbers. Note that integer types can only contain whole numbers
numerix <- seq(from = 2, to = 10, by = 1)
class(numerix)
## [1] "numeric"
is.numeric(numerix)
## [1] TRUE
Text in R is represented by character vectors. A character vectorr is a vector consisting of characters. Note remember to use “”
Cavs5 <- c("lebron", "JR", "love", "TT", "Hill")
class(Cavs5)
## [1] "character"
is.character(Cavs5)
## [1] TRUE
Data frames are used for storing data tables. It is a list of vectors of equal length. The top lione is the header containing all the coloumn names. each horizontal row denotes a data row, which begins with the name of the row followed by the actual data. Each data member of a row is called a cell
u <- data.frame(x, y, test)
class(u)
## [1] "data.frame"
is.data.frame(u)
## [1] TRUE
Functions are used to logically break our code into simpler parts whcih become easy to maintain and understand. (aka takes an input does a series of commands and then an output)
this is the general structure
Function_Name <-function(argument(s)){ Expressions return(output) }
Arguments are an input, its what the function uses to get an output Expressions are the intermediate steps
circ.area <- function(r){
Area <- pi*r^2
return(Area)
}
#r is the radius but it can be any letter as long as it is consistent
circ.area(1)
## [1] 3.141593
circ.area(5)
## [1] 78.53982
class(circ.area)
## [1] "function"
is.function(circ.area)
## [1] TRUE
radi <- c(3, 5, 6, 8, 10)
circ.area(radi)
## [1] 28.27433 78.53982 113.09734 201.06193 314.15927
circle <- function(r){
Area <- pi*r^2
Circumference <- 2*pi*r
return(list(Area = Area, Circumference = Circumference))
}
circle(radi)
## $Area
## [1] 28.27433 78.53982 113.09734 201.06193 314.15927
##
## $Circumference
## [1] 18.84956 31.41593 37.69911 50.26548 62.83185
#the area = and circumference = are names of different elements of the list
Logical value is often created via comparison between variables
i <- x < y #is x less than y
class(i)
## [1] "logical"
is.logical(i)
## [1] TRUE
Make use of the function class!
Explain the use of
rnorm()Creates a vector that is consists of random numbers that are normally distributed . n = amount of numbers u want generated, mean = mean, sd is standard deviation.
set.seed(12-06-2018)
pop <- rnorm(n = 8, mean = 50, sd = 2 )
rep()replicates the values
rep(x, times = 5, length.out = NA, each = 1)
## [1] 2 5 7 2 5 7 2 5 7 2 5 7 2 5 7
head()Obtain the first several rows of a matrix or data frame using head, and use tail to obtain the last several rows. These functions may also be applied to obtain the first or last values in a vector.
head(x, n =6) x - A matrix, data frame, or vector. n - The first n rows (or values if x is a vector) will be returned.
library(MASS)
data("Boston")
head(Boston, 3)
## crim zn indus chas nox rm age dis rad tax ptratio black
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83
## lstat medv
## 1 4.98 24.0
## 2 9.14 21.6
## 3 4.03 34.7
tail() tail(x, n = 6) x - A matrix, data frame, or vector. n - The first n rows (or values if x is a vector) will be returned.tail(Boston, 4)
## crim zn indus chas nox rm age dis rad tax ptratio black
## 503 0.04527 0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21 396.90
## 504 0.06076 0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21 396.90
## 505 0.10959 0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21 393.45
## 506 0.04741 0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21 396.90
## lstat medv
## 503 9.08 20.6
## 504 5.64 23.9
## 505 6.48 22.0
## 506 7.88 11.9
summary()summary is a generic function used to produce result summaries of the results of various model fitting functions
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Explain and illustrate the use of
plot()it is a generic function, in its siumplest case we get a scatter plot as seen below
plot(numerix)
boxplot()Produce box-and-whisker plot(s) of the given (grouped) values.
boxplot(Boston$tax)
boxplot(Boston$tax, ylab = 'Tax', main = 'Full value property tax rate per ten thousand dollars')
pairs() Pairs function creates beautiful correlation matrix plot in between parameters in the dataset.pairs(cars)
hist() Creates a histogramhist(radi)