1 Basic Building Blocks
2 Sequences of Numbers
3 Vectors
- 3.1 logical vectors
- 3.2 Character vectors
4 Missing values
5 Subsetting Objects
6 Matrices
7 Data Frames

1 Basic Building Blocks

R can be used as an interactive calculator.

5+7

## [1] 12

5-7

## [1] -2

5*3

## [1] 15

5^2

## [1] 25

Any object that contains data is called a data structure and numeric vectors are the simplest type of data structure in R. The easiest way to create a vector is with the c() function.

c(5,6,7)

## [1] 5 6 7

You can combine vectors to make a new vector.

x<-c(4,5)
c(1,2,3,x)

## [1] 1 2 3 4 5

Numeric vectors can be used in arithmetic expressions.

x*2+100

## [1] 108 110

common arithmetic operators are +, -, /, and ^ (where x^2 means ‘x squared’). To take the square root, use the sqrt() function and to take the absolute value, use the abs() function.

sqrt(144)

## [1] 12

abs(4-5)

## [1] 1

When given two vectors of the same length, R simply performs the specified arithmetic operation element-by-element. If the vectors are of different lengths, R ‘recycles’ the shorter vector until it is the same length as the longer vector.

c(1,2,3)+c(0,1,2)

## [1] 1 3 5

c(1,2,3,4)+c(100,101)

## [1] 101 103 103 105

In many programming environments, the up arrow will cycle through previous commands.

2 Sequences of Numbers

The simplest way to create a sequence of numbers in R is by using the : operator.

1:20

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

pi:10 # sequence of real numbers

## [1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593

10:pi # sequence of integers

## [1] 10  9  8  7  6  5  4

15:1 # It counted backwards in increments of 1! It's unlikely we'd want this behavior, but nonetheless it's good to know how it could happen.

##  [1] 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

If you have questions about a particular R function, you can access its documentation with a question mark followed by the function name: ?function_name_here. However, in the case of an operator like the colon used above, you must enclose the symbol in backticks like this: ?:.

?`:`

seq(1,20) # This gives us the same output as 1:20.

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

You are still using the seq() function here, but this time with an extra argument that tells R you want to increment your sequence by 0.5.

seq(1,10,by=.5)

##  [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0
## [16]  8.5  9.0  9.5 10.0

we just want a sequence of 30 numbers between 5 and 10.

seq(5,10,length=30)

##  [1]  5.000000  5.172414  5.344828  5.517241  5.689655  5.862069  6.034483
##  [8]  6.206897  6.379310  6.551724  6.724138  6.896552  7.068966  7.241379
## [15]  7.413793  7.586207  7.758621  7.931034  8.103448  8.275862  8.448276
## [22]  8.620690  8.793103  8.965517  9.137931  9.310345  9.482759  9.655172
## [29]  9.827586 10.000000

Let’s create a vector of length 30.

x<-1:30
seq(along.with=x) # this generates the integer sequence 1,2,,,,,length(along.with)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

R has separate built-in functions.

seq_along(x)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

seq_len(length(x))

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

One more function related to creating sequences of numbers is rep(), which stands for ‘replicate’.

rep(0,times=5)

## [1] 0 0 0 0 0

rep(c(1,2,3),times=3)

## [1] 1 2 3 1 2 3 1 2 3

rep(c(1,2,3),each=3) # to create a vector which contains 3 ones, then 3 twos, then 3 threes.

## [1] 1 1 1 2 2 2 3 3 3

3 Vectors

The simplest and most common data structure in R is the vector. Vectors come in two different flavors: atomic vectors and lists. An atomic vector contains exactly one data type, whereas a list may contain multiple data types.

3.1 logical vectors

Logical vectors can contain the values TRUE, FALSE, and NA.

x<-c(.5,.67,4,5,6)
y<-x>1
y # y is a logical vector. The statement x>1 is a condition and y tells us whether each corresponding element of our numeric vector x satisfies this condition.

## [1] FALSE FALSE  TRUE  TRUE  TRUE

If we have two logical expressions, A and B, we can ask whether at least one is TRUE with A | B (logical ‘or’ a.k.a. ‘union’) or whether they are both TRUE with A & B (logical ‘and’ a.k.a. ‘intersection’). Lastly, !A is the negation of A and is TRUE when A is FALSE and vice versa.

(3>5) & (1==1)

## [1] FALSE

(3>5) | (1==1)

## [1] TRUE

(3<5) & (1==1)

## [1] TRUE

(!3>5) & (1==1)

## [1] TRUE

3.2 Character vectors

a<-c('I','am','Joy')
a

## [1] "I"   "am"  "Joy"

length(a) # to check no of elements

## [1] 3

Let’s say we want to join the elements of a together into one continuous character string(i.e. a character vector of length 1). We can do this using the paste() function.

paste(a,collapse=' ') # collapse: an optional character string to separate the results

## [1] "I am Joy"

In this example, we used the paste() function to collapse the elements of a single character vector. paste() can also be used to join the elements of multiple character vectors.

paste('Happy','coding!',sep=' ')

## [1] "Happy coding!"

paste(1:3,c('X','Y','Z'),sep='') # sep = "" to leave no space between the joined elements.

## [1] "1X" "2Y" "3Z"

x<-LETTERS
paste(x,1:4,sep='-')

##  [1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4"
## [13] "M-1" "N-2" "O-3" "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4"
## [25] "Y-1" "Z-2"

4 Missing values

In R, NA is used to represent any value that is ‘not available’ or ‘missing’(in the statistical sense). Any operation involving NA generally yields NA as the result.

x<-c(1,NA,2,3,NA)
x*100

## [1] 100  NA 200 300  NA

x+100

## [1] 101  NA 102 103  NA

The is.na() function tells us whether each element of a vector is NA.

y<-is.na(x)
y

## [1] FALSE  TRUE FALSE FALSE  TRUE

NA is not really a value, but just a placeholder for a quantity that is not available. For this reason you got a vector of all NAs in case of x==NA.

x == NA

## [1] NA NA NA NA NA

We have a vector, y, that has a TRUE for every NA and FALSE for every numeric value, we can compute the total number of NAs in our data. R represents TRUE as the number 1 and FALSE as the number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs , which is the total number of NAs.

sum(y)

## [1] 2

Let’s look at a second type of missing value – NaN, which stands for ‘not a number’.

x<-0/0
x

## [1] NaN

1/0 #  In R, Inf stands for infinity.

## [1] Inf

A NaN value is also NA but the converse isn’t true.

m<-c(.5,.6,NA,0,NaN)
is.na(m)

## [1] FALSE FALSE  TRUE FALSE  TRUE

is.nan(m)

## [1] FALSE FALSE FALSE FALSE  TRUE

5 Subsetting Objects

5.1 Basics

There are a number of operators that can be used to extract subsets of R objects.

[ always returns an object of the same class as the original; can be used to select more than one element(there is one exception)
[[ is used to extract elements of a list or a data frame; can return an object which isn’t a list or a data frame and is used to extract only one element.
$ is is used to extract elements of a list or a data frame by name.

x<-c('a','b','c','d')
x[1] # numeric index

## [1] "a"

x[1:3] # numeric index

## [1] "a" "b" "c"

x[x>'b'] # logical index

## [1] "c" "d"

y<-c(10,12,7,8,14)
y[y>10] # logical index

## [1] 12 14

5.2 Subsetting Lists

In this case, [ , [[ or $ all can be used. Let’s create a vector.

a<-list(foo=1:4,bar=.5)
a[1] # always returns a list since 'a' is a list

## $foo
## [1] 1 2 3 4

a[[1]]

## [1] 1 2 3 4

a$foo

## [1] 1 2 3 4

a['foo'] # always returns a list since 'a' is a list

## $foo
## [1] 1 2 3 4

a[['foo']]

## [1] 1 2 3 4

To Extract multiple elements, we can only use [

my_list<-list(a=c(2,3,4),b=.5,c='R')
my_list[c(2,3)]

## $b
## [1] 0.5
## 
## $c
## [1] "R"

name<-'b'
my_list[[name]]

## [1] 0.5

my_list$name

## NULL

my_list[[c(1,3)]] # to extract nested elements

## [1] 4

my_list[[1]][[3]]

## [1] 4

5.3 Subsetting Matrices

x<-matrix(1:6,nrow=2,ncol=3,byrow=TRUE)
x

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

x[1,2] # to extract element of 1st row and 2nd column

## [1] 2

x[1,] # to extract 1st row

## [1] 1 2 3

x[,2] # to extract 2nd column

## [1] 2 5

x[1,2,drop=FALSE] # to get the output as a matrix

##      [,1]
## [1,]    2

x[1,,drop=FALSE] # to get the output as a matrix

##      [,1] [,2] [,3]
## [1,]    1    2    3

x[,2,drop=FALSE] # to get the output as a matrix

##      [,1]
## [1,]    2
## [2,]    5

5.4 Partial Matching

It is useful when the object you’re working with has very long elements names.

x<-list(aaarkk=.5,b=c('a','b'))
x$a

## [1] 0.5

x[['a']]

## NULL

x[['a',exact=FALSE]]

## [1] 0.5

5.5 Removing NA Values

x<-c(1,NA,2,NA)
y<-c('a',NA,'b',NA)
good<-complete.cases(x,y) # Return a logical vector indicating which cases have no missing values.
good

## [1]  TRUE FALSE  TRUE FALSE

x[good]

## [1] 1 2

y[good]

## [1] "a" "b"

6 Matrices

Matrices are vectors with a dimension attribute. Let’s create a matrix by using matrix() function.

x<-matrix(1:20,nrow=4,ncol=5,byrow=TRUE)
x

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10
## [3,]   11   12   13   14   15
## [4,]   16   17   18   19   20

attributes(x)

## $dim
## [1] 4 5

Matrices can also be created directly from a vector.

y<-c(1,2,3,4,5,6)
dim(y)<-c(2,3)
y

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

m<-c('a','b','c')
n<-c('d','e','f')
cbind(m,n)

##      m   n  
## [1,] "a" "d"
## [2,] "b" "e"
## [3,] "c" "f"

rbind(m,n)

##   [,1] [,2] [,3]
## m "a"  "b"  "c" 
## n "d"  "e"  "f"

Matrices can only contain ONE class of data.

vector1<-1:6
vector2<-c('a','b','c','d','e','f')
cbind(vector1,vector2)

##      vector1 vector2
## [1,] "1"     "a"    
## [2,] "2"     "b"    
## [3,] "3"     "c"    
## [4,] "4"     "d"    
## [5,] "5"     "e"    
## [6,] "6"     "f"

7 Data Frames

Data frames can contain data of different classes.

names<-c('Bill', 'Gina', 'Kelly', 'Sean')
my_data<-matrix(1:20,nrow=4,ncol=5)
df<-data.frame(names,my_data)
df

##   names X1 X2 X3 X4 X5
## 1  Bill  1  5  9 13 17
## 2  Gina  2  6 10 14 18
## 3 Kelly  3  7 11 15 19
## 4  Sean  4  8 12 16 20

# use the colnames() function to set the `colnames` attribute for our data frame.
colnames(df)<-c('patient','age','weight','bp','rating','test')
df

##   patient age weight bp rating test
## 1    Bill   1      5  9     13   17
## 2    Gina   2      6 10     14   18
## 3   Kelly   3      7 11     15   19
## 4    Sean   4      8 12     16   20

Use str() to have a more compact view. Compactly display the internal structure of an R object.

str(list(a=.5,b=c('X','Y','Z')))

## List of 2
##  $ a: num 0.5
##  $ b: chr [1:3] "X" "Y" "Z"

str(list(a=.5,b=c(1,2,3)))

## List of 2
##  $ a: num 0.5
##  $ b: num [1:3] 1 2 3

R Basics

Prottoy Kumar Prodhan Joy

1/21/2021