R Programming - Playbook

Basics

1. Printing a sequence

If you wish to print a sequence in R, here it is, so simple.

10:50

##  [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [26] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

2. Mathematical Symbols

Feel free to apply all the calculator basics here too.

2+3   #addition

## [1] 5

10/2  #division

## [1] 5

9/2   #unlike other languages division returns float and not

## [1] 4.5

9%%2  #remainder

## [1] 1

9-2   #subtraction

## [1] 7

((2+5)/5) #BODMAS

## [1] 1.4

3. Objects

Objects are nothing but data containers, every data can be stored into a container.

a<-10 #assigns value 10 to the object a. <- is the assignment operator in R
a #displays what's in the data container or the object

## [1] 10

4. Naming an object

You can literally name R Object as anything provided these constraints are followed.

Do not begin R object with a number
They should not have special characters like: ^,!,$,@,+,-,/, or *
R is case-sensitive. Here is the example:

a<-10
A<-20
a+A

## [1] 30

5. Naming an object

You can literally name R Object as anything provided these constraints are followed.

Do not begin R object with a number
They should not have special characters like: ^,!,$,@,+,-,/, or *
R is case-sensitive. Here is the example:

a<-10
A<-20
a+A

## [1] 30

6. Mathematical operation on a sequence

Say, you wish to go back to school days, print a table of 20. R makes it very simple. Other example provided here would tell how o you add or subtract a number from entire sequence. If the length of the objects are different the shorter object is used again in a round robin fashion.

a<-1:10
b<-a*20
c<-b+1
d<-c-1
a #1:10

##  [1]  1  2  3  4  5  6  7  8  9 10

b #a*20

##  [1]  20  40  60  80 100 120 140 160 180 200

c #b+1

##  [1]  21  41  61  81 101 121 141 161 181 201

d #c-1

##  [1]  20  40  60  80 100 120 140 160 180 200

This basically tells us that R does element wise operations

a<-0:10
b<-10:20
c<-30:35
a+b

##  [1] 10 12 14 16 18 20 22 24 26 28 30

a*b

##  [1]   0  11  24  39  56  75  96 119 144 171 200

a+c

## Warning in a + c: longer object length is not a multiple of shorter object
## length

##  [1] 30 32 34 36 38 40 36 38 40 42 44

d<-1:2
a+d

## Warning in a + d: longer object length is not a multiple of shorter object
## length

##  [1]  1  3  3  5  5  7  7  9  9 11 11

#Inner Multiplication - Dot Product
a%*%b

##      [,1]
## [1,]  935

#Outer Multiplication - Matrix multiplication
a%o%b

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
##  [1,]    0    0    0    0    0    0    0    0    0     0     0
##  [2,]   10   11   12   13   14   15   16   17   18    19    20
##  [3,]   20   22   24   26   28   30   32   34   36    38    40
##  [4,]   30   33   36   39   42   45   48   51   54    57    60
##  [5,]   40   44   48   52   56   60   64   68   72    76    80
##  [6,]   50   55   60   65   70   75   80   85   90    95   100
##  [7,]   60   66   72   78   84   90   96  102  108   114   120
##  [8,]   70   77   84   91   98  105  112  119  126   133   140
##  [9,]   80   88   96  104  112  120  128  136  144   152   160
## [10,]   90   99  108  117  126  135  144  153  162   171   180
## [11,]  100  110  120  130  140  150  160  170  180   190   200

7. Functions in R

Let us write some functions in R. Here is the function definition:

get_fibonacci<-function(limit){
  a=0
  b=1
  print(a,b)
  while(a+b<=limit){
    c=a+b
    print(c)
    a=b
    b=c
  }
}

Here is the function call:

get_fibonacci(100)

## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 5
## [1] 8
## [1] 13
## [1] 21
## [1] 34
## [1] 55
## [1] 89

Now what if we do not pass a parameter to a function, we still need to have some default value. This default valued function is shown below:

get_fibonacci<-function(limit=20){
  a=0
  b=1
  print(a,b)
  while(a+b<=limit){
    c=a+b
    print(c)
    a=b
    b=c
  }
}

Here is the function call:

get_fibonacci()

## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 5
## [1] 8
## [1] 13

8. Package Theory

Install a package by calling install.packages(“shiny”). That is not enough, we do need to load a package as well through library(shiny).

9. Plots

library(ggplot2)
# To set aesthetics, wrap in I()
qplot(mpg, wt, data = mtcars, colour = I("red"))

# qplot will attempt to guess what geom you want depending on the input
# both x and y supplied = scatterplot
qplot(mpg, wt, data = mtcars)

# just x supplied = histogram
qplot(mpg, data = mtcars)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# just y supplied = scatterplot, with x = seq_along(y)
qplot(y = mpg, data = mtcars)

# Use different geoms
qplot(mpg, wt, data = mtcars, geom = "path")

qplot(factor(cyl), wt, data = mtcars, geom = c("boxplot", "jitter"))

qplot(mpg, data = mtcars, geom = "dotplot")

## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

#Plotting a histogram with binwidth
x3<-c(0,1,1,2,2,2,3,3,4)
qplot(x=x3,binwidth=1)

10. Attribute to atomic vectors

The most common attributes to give an atomic vector are names, dimensions, and classes.

x<-c(1,2,3,4,5,6) #Create a vector
names(x)<-c("one","two","three","four","five","six") #Assign names to the operator
dim(x)<-c(2,3) #Define dimensions of the vector
names(x) #Display names of the vector x

## NULL

x #Display the vector x

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

class(x) #Display the class of vector x

## [1] "matrix"

dim(x)<-NULL #Nullify the dimensions
x #Display the vector x

## [1] 1 2 3 4 5 6

11. Creating a Data Frame

It is simple, here it is: column major format is what we have to remember. And every vector can have a label that is the column name. And obviously the data types can be heterogeneous.

df <- data.frame(face = c("ace", "two", "six"),
      suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))

Every data frame is basically a class data frame contating lists. Also R loves factors. By default whatever is non numeric is treated as factors. Can we avoid this ?

typeof(df)

## [1] "list"

class(df)

## [1] "data.frame"

str(df)

## 'data.frame':    3 obs. of  3 variables:
##  $ face : Factor w/ 3 levels "ace","six","two": 1 3 2
##  $ suit : Factor w/ 1 level "clubs": 1 1 1
##  $ value: num  1 2 3

Yes, and here is how we do it:

df <- data.frame(face = c("ace", "two", "six"),
      suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),stringsAsFactors = FALSE)
str(df)

## 'data.frame':    3 obs. of  3 variables:
##  $ face : chr  "ace" "two" "six"
##  $ suit : chr  "clubs" "clubs" "clubs"
##  $ value: num  1 2 3

Its useful to use stringsAsFactors = FALSE when reading data in from a .csv or .txt using read.table or read.csv

12. Deep dive into Data Frames

As we go deeper into the data frames, let us learn some essential functions in the same context:

head(iris) #returns sample dataset with column names

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

attach(iris)
iris[5:10,1:5] #subset a dataframe, get rows 5-10 and columns 1:5

##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa

iris[5:10,] #by default specifying nothing in y place holder returns all the columns and hence the same result

##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa

iris[10,c(1,2)] #it is same as iris[10,1:2] - fetch row 10 and cols 1 and 2

##    Sepal.Length Sepal.Width
## 10          4.9         3.1

iris[10,c(1,5)] #fetch row 10 and cols 1 and 5. This is non-sequential subsetting

##    Sepal.Length Species
## 10          4.9  setosa

Negative indexing in R is same as positive indexing. It is just that postivite indexing means inclusion or fetch where as the negative indexing means exclude or everything except.

iris[-(2:148),-(2:3)] # get every row except from 2 to 148 and every column except 2 and 3

##     Sepal.Length Petal.Width   Species
## 1            5.1         0.2    setosa
## 149          6.2         2.3 virginica
## 150          5.9         1.8 virginica

Well, there is one more way to subsetting, say in the iris dataframe we wish to capture just last two columns out of the five columns, we can provide a boolean vector to get this done. The only condition is the length of the boolean vector must be equal to the length of the y dimension of the dataframe.

iris[1:5,c(FALSE,FALSE,FALSE,TRUE,TRUE)] # get rows 1 to 5 and just the last two columns

##   Petal.Width Species
## 1         0.2  setosa
## 2         0.2  setosa
## 3         0.2  setosa
## 4         0.2  setosa
## 5         0.2  setosa

And that’s not all, remember Names we discussed in the earlier section, you can do a dataframe subsetting by names as well. Here is how:

iris[1:5,c("Sepal.Length","Sepal.Width","Species")] # get rows 1 to 5 and just the last two columns

##   Sepal.Length Sepal.Width Species
## 1          5.1         3.5  setosa
## 2          4.9         3.0  setosa
## 3          4.7         3.2  setosa
## 4          4.6         3.1  setosa
## 5          5.0         3.6  setosa

In some programming languages, indexing begins with 0. This means that 0 returns the first element of a vector, 1 returns the second element, and so on.

If you select two or more columns from a data frame, R will return a new data frame. However, if you select a single column, R will return a vector. If you would prefer a data frame instead, you can add the optional argument drop = FALSE between the brackets.

class(iris[5:10,1:5])

## [1] "data.frame"

class(iris[5:10,1])

## [1] "numeric"

class(iris[5:10,1,drop=FALSE])

## [1] "data.frame"