Introduction to R (and RStudio)

Pavia, 1 Ottobre 2019

What is R?

On its website is defined as “a free software environment for statistical computing and graphics




RStudio is an integrated development environment (IDE) for R.
  • User-friendly
  • Implement other languages
  • A better overall cross-connection

One of the main advantages is the big and willing to share community behind R.

R Structure

5*5
## [1] 25
mean(c(5,7,9))
## [1] 7
set.seed(1234)
anomalies <- function(x)  {
   anom <-  (x-mean(x))/sd(x)
   return(anom)
   }
print(rnorm(5))
## [1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247
anomalies(rnorm(5))
## [1]  1.7239706 -0.3012748 -0.2486045 -0.2819967 -0.8920945

R Structure

#install.packages("ggplot2") #Install the package on local
library(ggplot2)          #Tells R to load the package   
ggplot(cars, aes(x=speed,y=dist))+geom_point()+theme_bw()

Basic syntax

Arithmetic Operators

+
-
*
/
**/^
%% NA Inf NaN

Logical Operator

Assignment Operator

x = 5
x <- 5
5 -> x
?plot
help(plot)

Type of data

There are mainly four type of data

These type of data can be stored into different kind of “container”

Several useful function to investigate nature of the data: class(),str(),dim(),as.vector(),mode(),length(), typeof(),attributes()

Vector

Is the basic data structure in R and is made of element of the same type character,logical, integer or numeric.
You can create a vector in different way:

#Empty vector with a specified length
vector()
## logical(0)
vector("numeric",length=3)
## [1] 0 0 0
#Vectors with specific content
x <- c(5,6,3.2)
x <- c(1L,2L)
x <- c(TRUE,FALSE)
x <- c("Hello","World")

Vector

Vector as any kind of data structure can be investigated with several functions.

length(x)
## [1] 2
class(x)
## [1] "character"
str(x)
##  chr [1:2] "Hello" "World"

One may add element to an already existing vector or obtain a vector from a sequence of number

x <- seq(0:10)
x <- 0:10

x <- c(15.2,x)

Matrix

Matrices are an extension of numeric or character vectors. They are basically multi-dimensional vector.

m <- matrix(nrow = 3,ncol = 2)
print(m)
##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA
## [3,]   NA   NA
dim(m)
## [1] 3 2
class(m)
## [1] "matrix"

Matrix

#filled column-wise
m <- matrix(data = seq(1,30,2),nrow = 5,ncol = 3)
print(m)
##      [,1] [,2] [,3]
## [1,]    1   11   21
## [2,]    3   13   23
## [3,]    5   15   25
## [4,]    7   17   27
## [5,]    9   19   29
typeof(m)
## [1] "double"
n <- matrix(data = 1:2,nrow = 2,ncol = 3)

Matrix

rbind(m,n)
##      [,1] [,2] [,3]
## [1,]    1   11   21
## [2,]    3   13   23
## [3,]    5   15   25
## [4,]    7   17   27
## [5,]    9   19   29
## [6,]    1    1    1
## [7,]    2    2    2
cbind(t(m),t(n))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    1    3    5    7    9    1    2
## [2,]   11   13   15   17   19    1    2
## [3,]   21   23   25   27   29    1    2

Indexing

Selecting a subset of a structure.

By position

x[1]

x[-3]

x[2:4]

x[-(2:4)]

x[c(1,6)]

By value

x[x == 2]

x[x < 0]

x[x %in%  seq(1,5,1)]

Indexing

By Name

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
head(iris[,"Species"])
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica

Lists

Lists allocate every mode of data, each element can be of a different type.
List can be created in multiple way:

x <- list("Hi",15.2,TRUE,5)
x
## [[1]]
## [1] "Hi"
## 
## [[2]]
## [1] 15.2
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] 5
x <- vector("list",length = 3)
x
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL

Lists

# List can have names for each element
x <- list(flowers=iris,numbers= rnorm(10,5,2),colors=c("red","yellow"))
names(x)
## [1] "flowers" "numbers" "colors"
names(x)[2] <- "norm numbers"
head(x$flowers)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
print(x[2:3])
## $`norm numbers`
##  [1] 4.779429 3.977981 3.177609 3.325657 9.831670 5.268176 4.018628
##  [8] 4.118904 5.919179 3.612560
## 
## $colors
## [1] "red"    "yellow"

List are helpful when writing functions. It allows you to return multiple object in a “tricky” way.

Data frame

It’s the basic data structure for tabular data. Data frame is a rectangular list, all columns have the same length and as for the list they can host every kind of data.

df <- data.frame(initials = rep("LC",5), height=runif(5,min = 1.50,max = 2),weight=rnorm(5,mean = 70,sd = 5))

df
##   initials   height   weight
## 1       LC 1.536890 70.04930
## 2       LC 1.654843 73.39136
## 3       LC 1.858636 75.14782
## 4       LC 1.752273 61.35236
## 5       LC 1.576499 58.97826
df$initials
## [1] LC LC LC LC LC
## Levels: LC
df[['initials']]
## [1] LC LC LC LC LC
## Levels: LC

Data frame

The indexing rules apply the same to dataframe structure

df[,1]
## [1] LC LC LC LC LC
## Levels: LC
df[3:5,2]
## [1] 1.858636 1.752273 1.576499
df[2,"height"]
## [1] 1.654843

Control statements

For loop is one of the control statements in R programming that executes a set of statements in a loop for a specific number of times, as per the vector provided to it.

for (variable in vector) {
  Something happen
}

A while loop is one of the control statements in R programming which executes a set of statements in a loop until the condition (the Boolean expression) evaluates to TRUE.

while (condition) {
  Something happen
}

Control statements

Consists of a Boolean expression (condition) and a set of statements (do something). If the condition is satisfied, the set of statements is executed otherwise the statements after the end of the if are executed

if (condition) {
  Do something
}else {
  Do something different
}

It is possible to write your own function, which is stored in R and can be ‘called’ in your scripts

function_name <- function(var){
  Do something special
  return(something special)
}

Example

i <- 0
while (i < 5) {
  print(i)
  i=i+1
}
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
if (i == 0) {
  my_name = "Luigi"
}else {
  print("I don't have a name")
}
## [1] "I don't have a name"
cyl <- c()
for (j in 1:4) {
  cyl[j] <- mtcars[j,"cyl"]
}
print(cyl)
## [1] 6 6 4 6

Reading data from external source

There are a lot of type of data that can be read by R. The basic function to read a .csv or a .txt file are

read.table(), write.table()
read.csv(), write.csv()

From the data.table package:
fread()

Sometimes might be handy to save some variables

save(df, file='injury_data.RData')

load(file='injury_data.RData')

Packages: dplyr,sf,ggplot,tmap,raster

Useful resources

Here a list of insightful material

My email: