If you wish to print a sequence in R, here it is, so simple.
10:50
## [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [26] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Feel free to apply all the calculator basics here too.
2+3 #addition
## [1] 5
10/2 #division
## [1] 5
9/2 #unlike other languages division returns float and not
## [1] 4.5
9%%2 #remainder
## [1] 1
9-2 #subtraction
## [1] 7
((2+5)/5) #BODMAS
## [1] 1.4
Objects are nothing but data containers, every data can be stored into a container.
a<-10 #assigns value 10 to the object a. <- is the assignment operator in R
a #displays what's in the data container or the object
## [1] 10
You can literally name R Object as anything provided these constraints are followed.
a<-10
A<-20
a+A
## [1] 30
You can literally name R Object as anything provided these constraints are followed.
a<-10
A<-20
a+A
## [1] 30
Say, you wish to go back to school days, print a table of 20. R makes it very simple. Other example provided here would tell how o you add or subtract a number from entire sequence. If the length of the objects are different the shorter object is used again in a round robin fashion.
a<-1:10
b<-a*20
c<-b+1
d<-c-1
a #1:10
## [1] 1 2 3 4 5 6 7 8 9 10
b #a*20
## [1] 20 40 60 80 100 120 140 160 180 200
c #b+1
## [1] 21 41 61 81 101 121 141 161 181 201
d #c-1
## [1] 20 40 60 80 100 120 140 160 180 200
This basically tells us that R does element wise operations
a<-0:10
b<-10:20
c<-30:35
a+b
## [1] 10 12 14 16 18 20 22 24 26 28 30
a*b
## [1] 0 11 24 39 56 75 96 119 144 171 200
a+c
## Warning in a + c: longer object length is not a multiple of shorter object
## length
## [1] 30 32 34 36 38 40 36 38 40 42 44
d<-1:2
a+d
## Warning in a + d: longer object length is not a multiple of shorter object
## length
## [1] 1 3 3 5 5 7 7 9 9 11 11
#Inner Multiplication - Dot Product
a%*%b
## [,1]
## [1,] 935
#Outer Multiplication - Matrix multiplication
a%o%b
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,] 0 0 0 0 0 0 0 0 0 0 0
## [2,] 10 11 12 13 14 15 16 17 18 19 20
## [3,] 20 22 24 26 28 30 32 34 36 38 40
## [4,] 30 33 36 39 42 45 48 51 54 57 60
## [5,] 40 44 48 52 56 60 64 68 72 76 80
## [6,] 50 55 60 65 70 75 80 85 90 95 100
## [7,] 60 66 72 78 84 90 96 102 108 114 120
## [8,] 70 77 84 91 98 105 112 119 126 133 140
## [9,] 80 88 96 104 112 120 128 136 144 152 160
## [10,] 90 99 108 117 126 135 144 153 162 171 180
## [11,] 100 110 120 130 140 150 160 170 180 190 200
Let us write some functions in R. Here is the function definition:
get_fibonacci<-function(limit){
a=0
b=1
print(a,b)
while(a+b<=limit){
c=a+b
print(c)
a=b
b=c
}
}
Here is the function call:
get_fibonacci(100)
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 5
## [1] 8
## [1] 13
## [1] 21
## [1] 34
## [1] 55
## [1] 89
Now what if we do not pass a parameter to a function, we still need to have some default value. This default valued function is shown below:
get_fibonacci<-function(limit=20){
a=0
b=1
print(a,b)
while(a+b<=limit){
c=a+b
print(c)
a=b
b=c
}
}
Here is the function call:
get_fibonacci()
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 5
## [1] 8
## [1] 13
Install a package by calling install.packages(“shiny”). That is not enough, we do need to load a package as well through library(shiny).
library(ggplot2)
# To set aesthetics, wrap in I()
qplot(mpg, wt, data = mtcars, colour = I("red"))
# qplot will attempt to guess what geom you want depending on the input
# both x and y supplied = scatterplot
qplot(mpg, wt, data = mtcars)
# just x supplied = histogram
qplot(mpg, data = mtcars)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# just y supplied = scatterplot, with x = seq_along(y)
qplot(y = mpg, data = mtcars)
# Use different geoms
qplot(mpg, wt, data = mtcars, geom = "path")
qplot(factor(cyl), wt, data = mtcars, geom = c("boxplot", "jitter"))
qplot(mpg, data = mtcars, geom = "dotplot")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
#Plotting a histogram with binwidth
x3<-c(0,1,1,2,2,2,3,3,4)
qplot(x=x3,binwidth=1)
The most common attributes to give an atomic vector are names, dimensions, and classes.
x<-c(1,2,3,4,5,6) #Create a vector
names(x)<-c("one","two","three","four","five","six") #Assign names to the operator
dim(x)<-c(2,3) #Define dimensions of the vector
names(x) #Display names of the vector x
## NULL
x #Display the vector x
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
class(x) #Display the class of vector x
## [1] "matrix"
dim(x)<-NULL #Nullify the dimensions
x #Display the vector x
## [1] 1 2 3 4 5 6
It is simple, here it is: column major format is what we have to remember. And every vector can have a label that is the column name. And obviously the data types can be heterogeneous.
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
Every data frame is basically a class data frame contating lists. Also R loves factors. By default whatever is non numeric is treated as factors. Can we avoid this ?
typeof(df)
## [1] "list"
class(df)
## [1] "data.frame"
str(df)
## 'data.frame': 3 obs. of 3 variables:
## $ face : Factor w/ 3 levels "ace","six","two": 1 3 2
## $ suit : Factor w/ 1 level "clubs": 1 1 1
## $ value: num 1 2 3
Yes, and here is how we do it:
df <- data.frame(face = c("ace", "two", "six"),
suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3),stringsAsFactors = FALSE)
str(df)
## 'data.frame': 3 obs. of 3 variables:
## $ face : chr "ace" "two" "six"
## $ suit : chr "clubs" "clubs" "clubs"
## $ value: num 1 2 3
Its useful to use stringsAsFactors = FALSE when reading data in from a .csv or .txt using read.table or read.csv
As we go deeper into the data frames, let us learn some essential functions in the same context:
head(iris) #returns sample dataset with column names
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
attach(iris)
iris[5:10,1:5] #subset a dataframe, get rows 5-10 and columns 1:5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
iris[5:10,] #by default specifying nothing in y place holder returns all the columns and hence the same result
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
iris[10,c(1,2)] #it is same as iris[10,1:2] - fetch row 10 and cols 1 and 2
## Sepal.Length Sepal.Width
## 10 4.9 3.1
iris[10,c(1,5)] #fetch row 10 and cols 1 and 5. This is non-sequential subsetting
## Sepal.Length Species
## 10 4.9 setosa
Negative indexing in R is same as positive indexing. It is just that postivite indexing means inclusion or fetch where as the negative indexing means exclude or everything except.
iris[-(2:148),-(2:3)] # get every row except from 2 to 148 and every column except 2 and 3
## Sepal.Length Petal.Width Species
## 1 5.1 0.2 setosa
## 149 6.2 2.3 virginica
## 150 5.9 1.8 virginica
Well, there is one more way to subsetting, say in the iris dataframe we wish to capture just last two columns out of the five columns, we can provide a boolean vector to get this done. The only condition is the length of the boolean vector must be equal to the length of the y dimension of the dataframe.
iris[1:5,c(FALSE,FALSE,FALSE,TRUE,TRUE)] # get rows 1 to 5 and just the last two columns
## Petal.Width Species
## 1 0.2 setosa
## 2 0.2 setosa
## 3 0.2 setosa
## 4 0.2 setosa
## 5 0.2 setosa
And that’s not all, remember Names we discussed in the earlier section, you can do a dataframe subsetting by names as well. Here is how:
iris[1:5,c("Sepal.Length","Sepal.Width","Species")] # get rows 1 to 5 and just the last two columns
## Sepal.Length Sepal.Width Species
## 1 5.1 3.5 setosa
## 2 4.9 3.0 setosa
## 3 4.7 3.2 setosa
## 4 4.6 3.1 setosa
## 5 5.0 3.6 setosa
In some programming languages, indexing begins with 0. This means that 0 returns the first element of a vector, 1 returns the second element, and so on.
If you select two or more columns from a data frame, R will return a new data frame. However, if you select a single column, R will return a vector. If you would prefer a data frame instead, you can add the optional argument drop = FALSE between the brackets.
class(iris[5:10,1:5])
## [1] "data.frame"
class(iris[5:10,1])
## [1] "numeric"
class(iris[5:10,1,drop=FALSE])
## [1] "data.frame"