Introduction to R (and RStudio)
Pavia, 1 Ottobre 2019
One of the main advantages is the big and willing to share community behind R.
## [1] 25
## [1] 7
## [1] -1.2070657 0.2774292 1.0844412 -2.3456977 0.4291247
## [1] 1.7239706 -0.3012748 -0.2486045 -0.2819967 -0.8920945
#install.packages("ggplot2") #Install the package on local
library(ggplot2) #Tells R to load the package
ggplot(cars, aes(x=speed,y=dist))+geom_point()+theme_bw()+
-
*
/
**/^
%% NA Inf NaN
&,&& (and)|,|| (or)==!=>=,<=, <,>is.na(),is.finite(),is.numeric()There are mainly four type of data
TRUE, FALSE1,5,5.2'2','Hello World''1', 'String with level'These type of data can be stored into different kind of “container”
Several useful function to investigate nature of the data: class(),str(),dim(),as.vector(),mode(),length(), typeof(),attributes()
Is the basic data structure in R and is made of element of the same type character,logical, integer or numeric.
You can create a vector in different way:
## logical(0)
## [1] 0 0 0
Vector as any kind of data structure can be investigated with several functions.
## [1] 2
## [1] "character"
## chr [1:2] "Hello" "World"
One may add element to an already existing vector or obtain a vector from a sequence of number
Matrices are an extension of numeric or character vectors. They are basically multi-dimensional vector.
## [,1] [,2]
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
## [1] 3 2
## [1] "matrix"
## [,1] [,2] [,3]
## [1,] 1 11 21
## [2,] 3 13 23
## [3,] 5 15 25
## [4,] 7 17 27
## [5,] 9 19 29
## [1] "double"
## [,1] [,2] [,3]
## [1,] 1 11 21
## [2,] 3 13 23
## [3,] 5 15 25
## [4,] 7 17 27
## [5,] 9 19 29
## [6,] 1 1 1
## [7,] 2 2 2
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 3 5 7 9 1 2
## [2,] 11 13 15 17 19 1 2
## [3,] 21 23 25 27 29 1 2
Selecting a subset of a structure.
By position
By value
By Name
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
Lists allocate every mode of data, each element can be of a different type.
List can be created in multiple way:
## [[1]]
## [1] "Hi"
##
## [[2]]
## [1] 15.2
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 5
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
# List can have names for each element
x <- list(flowers=iris,numbers= rnorm(10,5,2),colors=c("red","yellow"))
names(x)## [1] "flowers" "numbers" "colors"
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## $`norm numbers`
## [1] 4.779429 3.977981 3.177609 3.325657 9.831670 5.268176 4.018628
## [8] 4.118904 5.919179 3.612560
##
## $colors
## [1] "red" "yellow"
List are helpful when writing functions. It allows you to return multiple object in a “tricky” way.
It’s the basic data structure for tabular data. Data frame is a rectangular list, all columns have the same length and as for the list they can host every kind of data.
df <- data.frame(initials = rep("LC",5), height=runif(5,min = 1.50,max = 2),weight=rnorm(5,mean = 70,sd = 5))
df## initials height weight
## 1 LC 1.536890 70.04930
## 2 LC 1.654843 73.39136
## 3 LC 1.858636 75.14782
## 4 LC 1.752273 61.35236
## 5 LC 1.576499 58.97826
## [1] LC LC LC LC LC
## Levels: LC
## [1] LC LC LC LC LC
## Levels: LC
The indexing rules apply the same to dataframe structure
## [1] LC LC LC LC LC
## Levels: LC
## [1] 1.858636 1.752273 1.576499
## [1] 1.654843
For loop is one of the control statements in R programming that executes a set of statements in a loop for a specific number of times, as per the vector provided to it.
A while loop is one of the control statements in R programming which executes a set of statements in a loop until the condition (the Boolean expression) evaluates to TRUE.
Consists of a Boolean expression (condition) and a set of statements (do something). If the condition is satisfied, the set of statements is executed otherwise the statements after the end of the if are executed
It is possible to write your own function, which is stored in R and can be ‘called’ in your scripts
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] "I don't have a name"
## [1] 6 6 4 6
There are a lot of type of data that can be read by R. The basic function to read a .csv or a .txt file are
read.table(), write.table()
read.csv(), write.csv()
From the data.table package:
fread()
Sometimes might be handy to save some variables
save(df, file='injury_data.RData')
load(file='injury_data.RData')
dplyr : is part of tidyverse and is used for data manipulation allows the use of the pipe operator ggplot2: is also part of tidyverse is a system for creating graphic objects sf : stands for simple features, a standardized way toencode spatial vector data tmap : is a flexible tool used to create thematic maps raster : used to load and manipulate raster objectMy email: luigi.cesarini@iusspavia.it