Learning Targets

  1. How to organise your work directory

  2. Understand the use of logic operators

  3. Understrand how to use vectors and data frames and know how to subset those

  4. Understand the key object classes in R

  5. Know a few useful functions to create random numbers and to explore a data frame

  6. Know a few simple functions to visualise data quickly

Organising the work space

The workspace in R is where R puts all your objects, but also where it looks to pull up files when it wants to load your data. It is essentially a target folder. It is bnest to set your work space to a known folder (i.e your desktop/usb stick).

You can query the working directory using the following command:

getwd()
## [1] "C:/Users/Samir Brown/Desktop/R Notes 2018"
#checks working directory

You are also able to list all the files in the current working directory with the code below:

list.files()
## [1] "SCIE802 Lab 2 Notes.Rmd"  "SCIE802 Lab 3 Notes.Rmd" 
## [3] "SCIE802_Lab_1_Notes.html" "SCIE802_Lab_2_Notes.Rmd"
list.files(pattern = "docx") #confines search to MS WORD Docs only 
## character(0)

You are alos able to change the working directory by using the commandsetwd()but this is rather tedious it is more convienient to just click on session, Set working directory, choose directory all in the menu bar above.

Once you do this a setwd command pops up in the console. Now simply copy and paste this code from the console to the Rscript.

setwd("C:/Users/Samir Brown/Desktop/R Notes 2018")

Logical Operators in R (Relational Operators tbh)

R has multiple logical operators which allows for data subsetting and manipulation as seen below

x <- c(2,5,7)
x
## [1] 2 5 7
x < 4 # will tell you all numbers in the vector X that are less than 4
## [1]  TRUE FALSE FALSE

Vectors can also be subsetted through the use of square brackets as seen below:

x[2] # extracts only the second element of the x-vector
## [1] 5
x[c(1,3)] # extracts the first and third element of the x-vector ****note you must use the combine function if you want to pull out multiple
## [1] 2 7

Using logical operators creates vectors of TRUE and FALSE, which we canm use for subsetting like this

x[x < 4] #pulls up data that is less than 4
## [1] 2
x[x > 4] #pulls up data that is more than 4
## [1] 5 7
x[x >= 5] #pulls up data that is greater than or equal to 5
## [1] 5 7
x[x <= 5] #Pulls up data that is less than or equal to 5
## [1] 2 5
x[x == 2] # To select one value specifically use the 'equal to' operator indicated by two equal signs `==` like this:
## [1] 2

Logical Operators in R (Actually logical operators in R)

Exclamation marks ! are used to exclude values and the ampersand symbol & can be used as a logical and as seen below

x[x!= 2] # exclude values that are equal to 2 from the x vector
## [1] 5 7
x[x > 2 & x < 7] # creates a subset in which values are less than 2 and greater than 7
## [1] 5

The logical operators can also be used with multiple vectors as seen below

y <- c(2, 2, 2)

x == y #This compares values from the x and y vector to see if they are equal to one another in the order that they are listed
## [1]  TRUE FALSE FALSE
test <-  x == y

## How does R operate ? visualisation by gathering all information in a data frame

data.frame(x, y, test)
##   x y  test
## 1 2 2  TRUE
## 2 5 2 FALSE
## 3 7 2 FALSE
x < y #are the values in Y greater than values in X
## [1] FALSE FALSE FALSE
x <= y #are the values in Y greater than or equal to the ones in X
## [1]  TRUE FALSE FALSE
x != y #not equal
## [1] FALSE  TRUE  TRUE
x == y & x < y #are the values in X equal to those in Y AND are values in y greater than in X
## [1] FALSE FALSE FALSE
x == y | x < y #are the values in X equal to those in Y ORR are values in Y greater than those in Y
## [1]  TRUE FALSE FALSE

Character Vectors

As mentioned in the notes before, vectors can be numerical or character of nature as seen below:

sites <- c("site1", "site2", "site3") #note use combine function to indicate that they are linked + quotation marks 
sites
## [1] "site1" "site2" "site3"
sites[sites == "site3"] #selects specific values that are only site3 in the sites vector
## [1] "site3"
sites[sites != "site3"] #selects all values that are NOT site 3
## [1] "site1" "site2"

You can also extact elements from a vector by using its position within the vector as seen below:

sites[2] #extracts only the second item in the sites vector
## [1] "site2"
x[c(1, 3)] #extracts both the 1st and 3rd item from the vector
## [1] 2 7
sites[c(1, 3)] # note if you want to select/exclude multiple items from the vector you must use the combine function
## [1] "site1" "site3"

You can also deselect elements from the vector through the use of a minus sign - :

sites[3]
## [1] "site3"
sites[-3]
## [1] "site1" "site2"
x[c(1, 3)] 
## [1] 2 7
x[-c(1, 3)] # when combining the minus sign goes in front of the C not the numbers 
## [1] 5

Data Frames

Data sets (tables) are called data frames in R and can bve read in or created within R using the data.frame command.

dat <- data.frame(site = sites, biomass = x)
dat
##    site biomass
## 1 site1       2
## 2 site2       5
## 3 site3       7

Subsetting data frames

Data frames can be subsetted in various ways such as using the the square bracket method. You first write the name of the data frame followed directly followed by square brackets []. Within the square brackets, everything to do with the rows comes first, then after the comma, we specify the coloumn section. If no row or column is made we leave the respective slot blank

dat[1, ] #Select Row 1
##    site biomass
## 1 site1       2
dat[1, 2] #select row 1 and column 2
## [1] 2
dat[, 2] #select column 2
## [1] 2 5 7
dat[, "biomass"] #selects coloumn by name
## [1] 2 5 7
dat[c(1, 3), ] #selects rows 1 and 3 only 
##    site biomass
## 1 site1       2
## 3 site3       7
dat$biomass #using the dollar operator to extract a single column by name 
## [1] 2 5 7

Logical operators can also come into play here to create conditional subsets as seen below

dat[dat$biomass < 5] #Tells you which sites had a biomass greater than 5
##    site
## 1 site1
## 2 site2
## 3 site3
dat[dat$biomass < 5 & dat$site != "site2", ]
##    site biomass
## 1 site1       2
dat[dat$biomass < 5 | dat$site != "site2" , ]
##    site biomass
## 1 site1       2
## 3 site3       7
# we can string multiple logical conditions together
# note the difference between the logical AND &, and the logical OR |

Tools for creating vectors in R

R has multiple functions to create numeric and character vectors, the most important ones are the c combine functions, colon operator :, and the seq sequence function for all sorts of sequences and the rep replicate function

q <- c(3, 5, 6, 8, 10)
q1 <- 2:20
q2 <- seq(from = 2, to=20, by = 0.5)
length(q2)
## [1] 37
q3 <- seq(from = 0, to = 10, length.out = 100)
length(q3)
## [1] 100
q4 <-  rep(x = 3, times = 5)
q5 <- rep(c(3,5), times = 5)
q6 <- rep(c(3, 5), each = 5)
q7 <- rep(c(3, 5), times = 5, each = 2)
p7 <- rep(c("high N", "low N"), each = 3)

We have seen that the nature of a vector can vary, so far we had vectors containing numeric values and others containing character strings (words). We can query the nature of a vector by using the class function like this:

class(x)
## [1] "numeric"
class(sites)
## [1] "character"
class(q7)
## [1] "numeric"
class(dat)
## [1] "data.frame"

R has numerous built-in data sets that you can view and access using the data() function.

data() # view built-in data sets
mtcars # a built-in data set on car specs
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

The most important objects in R

Illustrate the most important objects in R (explanation, examples):

  • numeric

Numeric are a type of data class that contain decimals as well as whole numbers. Note that integer types can only contain whole numbers

numerix <-  seq(from = 2, to =  10, by = 1)
class(numerix)
## [1] "numeric"
is.numeric(numerix)
## [1] TRUE
  • character

Text in R is represented by character vectors. A character vectorr is a vector consisting of characters. Note remember to use “”

Cavs5 <- c("lebron", "JR", "love", "TT", "Hill")
class(Cavs5)
## [1] "character"
is.character(Cavs5)
## [1] TRUE
  • data.frame

Data frames are used for storing data tables. It is a list of vectors of equal length. The top lione is the header containing all the coloumn names. each horizontal row denotes a data row, which begins with the name of the row followed by the actual data. Each data member of a row is called a cell

u <- data.frame(x, y, test)
class(u)
## [1] "data.frame"
is.data.frame(u)
## [1] TRUE
  • function

Functions are used to logically break our code into simpler parts whcih become easy to maintain and understand. (aka takes an input does a series of commands and then an output)

this is the general structure

Function_Name <-function(argument(s)){ Expressions return(output) }

Arguments are an input, its what the function uses to get an output Expressions are the intermediate steps

circ.area <- function(r){
  Area <- pi*r^2
  return(Area)
} 
#r is the radius but it can be any letter as long as it is consistent

circ.area(1)
## [1] 3.141593
circ.area(5)
## [1] 78.53982
class(circ.area)
## [1] "function"
is.function(circ.area)
## [1] TRUE
radi <- c(3, 5, 6, 8, 10)
circ.area(radi)
## [1]  28.27433  78.53982 113.09734 201.06193 314.15927
circle <- function(r){
  Area <- pi*r^2
  Circumference <- 2*pi*r
  return(list(Area = Area, Circumference = Circumference))
} 
circle(radi)
## $Area
## [1]  28.27433  78.53982 113.09734 201.06193 314.15927
## 
## $Circumference
## [1] 18.84956 31.41593 37.69911 50.26548 62.83185
#the area = and circumference = are names of different elements of the list
  • logial

Logical value is often created via comparison between variables

i <- x < y #is x less than y
class(i)
## [1] "logical"
is.logical(i)
## [1] TRUE

Make use of the function class!

A few more useful functions

Explain the use of

  • rnorm()

Creates a vector that is consists of random numbers that are normally distributed . n = amount of numbers u want generated, mean = mean, sd is standard deviation.

set.seed(12-06-2018)
pop <- rnorm(n = 8, mean = 50, sd = 2 )
  • rep()

replicates the values

rep(x, times = 5, length.out = NA, each = 1)
##  [1] 2 5 7 2 5 7 2 5 7 2 5 7 2 5 7
  • head()

Obtain the first several rows of a matrix or data frame using head, and use tail to obtain the last several rows. These functions may also be applied to obtain the first or last values in a vector.

head(x, n =6) x - A matrix, data frame, or vector. n - The first n rows (or values if x is a vector) will be returned.

library(MASS)
data("Boston")
head(Boston, 3)
##      crim zn indus chas   nox    rm  age    dis rad tax ptratio  black
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83
##   lstat medv
## 1  4.98 24.0
## 2  9.14 21.6
## 3  4.03 34.7
  • tail() tail(x, n = 6) x - A matrix, data frame, or vector. n - The first n rows (or values if x is a vector) will be returned.
tail(Boston, 4)
##        crim zn indus chas   nox    rm  age    dis rad tax ptratio  black
## 503 0.04527  0 11.93    0 0.573 6.120 76.7 2.2875   1 273      21 396.90
## 504 0.06076  0 11.93    0 0.573 6.976 91.0 2.1675   1 273      21 396.90
## 505 0.10959  0 11.93    0 0.573 6.794 89.3 2.3889   1 273      21 393.45
## 506 0.04741  0 11.93    0 0.573 6.030 80.8 2.5050   1 273      21 396.90
##     lstat medv
## 503  9.08 20.6
## 504  5.64 23.9
## 505  6.48 22.0
## 506  7.88 11.9
  • summary()

summary is a generic function used to produce result summaries of the results of various model fitting functions

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Simple plotting functions

Explain and illustrate the use of

  • plot()

it is a generic function, in its siumplest case we get a scatter plot as seen below

plot(numerix)

  • boxplot()

Produce box-and-whisker plot(s) of the given (grouped) values.

boxplot(Boston$tax)

boxplot(Boston$tax, ylab = 'Tax', main = 'Full value property tax rate per ten thousand dollars')

  • pairs() Pairs function creates beautiful correlation matrix plot in between parameters in the dataset.
pairs(cars)

  • hist() Creates a histogram
hist(radi)