This document was created by MAGM to better understand R syntax and .Rmd files. Its objective is to start with the fundamentals of the R programming language, it is not aimed at people without any previous knowledge of programming.
Markdown is a lightweight markup language with plain-text-formatting syntax. Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML.Markdown is often used to format readme files, for writing messages in online discussion forums, and to create rich text using a plain text editor.
Since the initial description of Markdown contained ambiguities and unanswered questions, the implementations that appeared over the years have subtle differences and many come with syntax extensions.
R Markdown combines markdown (an easy to write plain text format) with embedded R code chunks. When compiling R Markdown documents, the code components can be evaluated so that both the code and its output can be included in the final document. This makes analysis reports highly reproducible by allowing to automatically regenerate them when the underlying R code or data changes. R Markdown documents (.Rmd files) can be rendered to various formats including HTML and PDF. The R code in an .Rmd document is processed by knitr, while the resulting .md file is rendered by pandoc to the final output formats (e.g. HTML or PDF). Historically, R Markdown is an extension of the older Sweave/Latex environment. Rendering of mathematical expressions and reference management is also supported by R Markdown using embedded Latex syntax and Bibtex, respectively.
The rmarkdown package is required
install.packages(rmarkdown)The following lists the most important arguments to control the behavior of R code chunks:
r: specifies language for code chunk, here Rchode_chunk_name: name of code chunk; this name needs to be uniqueeval: if assigned TRUE the code will be evaluatedRmd script to htmlThe following shows how to run the rendering from the command-line.
$ export RSTUDIO_PANDOC=/usr/lib/rstudio/bin/pandoc
$ Rscript -e "rmarkdown::render('sample.Rmd', clean=TRUE)"
The first line command is meant so that this works for all your R scripts and to define the variable globally.
R is a programming language used for data science with the following advantages:
The last item means that you can go through an entire row or column of a table without having to explicitly write for loops.
To install a given package (module in Python) we run the following command
install.packages("plotrix")To uninstall it:
remove.packages("plotrix")Comments are made as in Python, prefixing the comment with a #. R does not support multiline comments but they can be embedded inside functions.
A variable provides us with named storage that our programs can manipulate. R has some rules as for how to represent them. Valid variables are var_name2., .var_name, var.name, invalid syntax for variables are var% (only a dot and an underscore besides letters and numbers are allowed), .2var (a dot cannot be followed by a number), _var (variables cannot start with an underscore).
Variables can be deleted by using the rm() function.
####Built-in constants Constants built into R. The following constants are available:
constants = list(LETTERS, letters, month.abb, month.name, pi)
print(constants)## [[1]]
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
##
## [[2]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
##
## [[3]]
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
##
## [[4]]
## [1] "January" "February" "March" "April" "May" "June"
## [7] "July" "August" "September" "October" "November" "December"
##
## [[5]]
## [1] 3.141593
A variables can be assigned to a value using either <-, ->, assing() or =.
x <-3
assign("y", 2)
a = (x-y)/2
a == a*2## [1] FALSE
Boolean (or relational) operators are ==, !=, <, >, <= and >=.
In R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are −
Logical TRUE or ``FALSE
v <- TRUE
print(class(v))## [1] "logical"
Numeric Even if we assign an integer variable to a variable, it is still being saved as numeric.
v <- 23.8
print(class(v))## [1] "numeric"
Integer To create an integer variable in R, we use the as.integer function or
v = as.integer(3)
v2 <- 23L
print(class(v))## [1] "integer"
print(class(v2))## [1] "integer"
Complex
v <- 23i
print(class(v))## [1] "complex"
Character
v <- "Hello world"
print(class(v))## [1] "character"
When you want to create vector with more than one element, you should use c() function which means to combine the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)## [1] "red" "green" "yellow"
# Get the class of the vector.
print(class(apple))## [1] "character"
A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1)## [[1]]
## [1] 2 5 3
##
## [[2]]
## [1] 21.3
##
## [[3]]
## function (x) .Primitive("sin")
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)## [,1] [,2] [,3]
## [1,] "a" "a" "b"
## [2,] "c" "b" "a"
While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimension. In the below example we create an array with two elements which are 3x3 matrices each. Live Demo
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)## , , 1
##
## [,1] [,2] [,3]
## [1,] "green" "yellow" "green"
## [2,] "yellow" "green" "yellow"
## [3,] "green" "yellow" "green"
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] "yellow" "green" "yellow"
## [2,] "green" "yellow" "green"
## [3,] "yellow" "green" "yellow"
Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.
Factors are created using the factor() function. The n-levels functions gives the count of levels.
# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')
# Create a factor object.
factor_apple <- factor(apple_colors)
# Print the factor.
print(factor_apple)## [1] green green yellow red red red green
## Levels: green red yellow
print(nlevels(factor_apple))## [1] 3
Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical. It is a list of vectors of equal length.
Data Frames are created using the data.frame() function.
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)## gender height weight Age
## 1 Male 152.0 81 42
## 2 Male 171.5 93 38
## 3 Female 165.0 78 26
The basic syntax for creating an if statement in R is
if(boolean_expression) {
statement(s) will execute if the boolean expression is true.
}
Example
x = 1
y = 2
if (y > x) {
print("y is greater than x")
}## [1] "y is greater than x"
The basic syntax for creating an if… else statement in R is
if(boolean_expression) {
statement(s) will execute if the boolean expression is true.
} else {
statement(s) will execute if the boolean expression is false.
}
Example
x = 4
y = 2
if (y > x) {
print("y is greater than x")
} else {
print ("x is greater than y")
}## [1] "x is greater than y"
A switch statement allows a variable to be tested for equality against a list of values. Each value is called a case, and the variable being switched on is checked for each case. The basic syntax for creating an if… else statement in R is
switch(expression, case1, case2, case3....)
Example
age = "40"
switch(age, "15" = print("Too young"),
"40" = print("Adult"),
"76" = print("Senior"),
)## [1] "Adult"
The Repeat loop executes the same code again and again until a stop condition is met.
Example
v <- c("Hello","while loop")
cnt <- 2
while (cnt < 7) {
print(v)
cnt = cnt + 1
}## [1] "Hello" "while loop"
## [1] "Hello" "while loop"
## [1] "Hello" "while loop"
## [1] "Hello" "while loop"
## [1] "Hello" "while loop"
A For loop is a repetition control structure that allows you to efficiently write a loop that needs to execute a specific number of times.
v <- LETTERS[1:4]
for ( i in v) {
print(i)
}## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
v <- c("Hello","loop")
cnt <- 2
repeat {
print(v)
cnt <- cnt+1
if(cnt > 5) {
break
}
}## [1] "Hello" "loop"
## [1] "Hello" "loop"
## [1] "Hello" "loop"
## [1] "Hello" "loop"
Terminates the loop statement and transfers execution to the statement immediately following the loop.
The next statement in R programming language is useful when we want to skip the current iteration of a loop without terminating it. On encountering next, the R parser skips further evaluation and starts next iteration of the loop.
v <- LETTERS[1:6]
for ( i in v) {
if (i == "D") {
next
}
print(i)
}## [1] "A"
## [1] "B"
## [1] "C"
## [1] "E"
## [1] "F"
An R function is created by using the keyword function. The basic syntax of an R function definition is as follows −
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Example
check <- function(x) {
if (x > 0) {
result <- "Positive"
}
else if (x < 0) {
result <- "Negative"
}
else {
result <- "Zero"
}
return(result)
}
check(-1)## [1] "Negative"
Some functions are built into R, example of those are listed below.
abs(): returns the absolute value of a numeric valuesubstr(x, start=n1,stop=n2): extracts substrings in a character vectorsqrt(): square rootc(): creates vector/array with the same data typeseq(from, to, by): create a sequencepaste(to combine, sep = " "): concatenates stringssummary(df): summarizes info about data frame dfdata.frame(): creates data frameThe first index starts at 1, not at 0 as in Python. Brackets [ ] are used for indexing
b = 10:15
print(b[1])## [1] 10
b[2] = 100
print(b[c(1,3)])## [1] 10 12
Here you see recycling at work. First we assign a single number to the first three elements of b, so the number is used three times. Then we assign two numbers to a sequence of 3 to 6, such that both numbers are used twice.
b[1:3] <- 2
print(b)## [1] 2 2 2 13 14 15
b[3:6] <- c(10,20)
print(b)## [1] 2 2 10 20 10 20
Packages are collections of R functions. Typically around a related set of tasks. R comes with a standard “base” set of packages. Others are available for download and installation. These packages are developed by (groups of) individuals independently of the “core R” development team. Most of these packages are developed by volunteers, who write them to support their research or other work. For that reason the are highly variable in design and quality. There is also a lot of overlap between packages and it can take a while to find the ones you need to best accomplish a task.
The directory where packages are stored is called the library. Once installed, a package needs to be loaded into the session (taken out of the library) to be used. You do that with the library function. For example:
library(raster)
If the package is not installed, you get an error message. So you install a package only once (for each R version), or once in a while (to get updates), but you load a package every time you start a new R session (script) that needs it.
It is very important to stay up to date with R and the packages, as they improve every day…. You should update the main R program every 6 months and update your packages more regularly, perhaps once a month. To update all your packages you can run update.packages()