This document is meant to be a quick reference to help unexperienced users to remember important syntax. It is NOT meant to be a comprehensive guide to R and may oversimplify things at times.
R is an interactive language which refers to the fact that useRs can interact to with the interpreter by entering program code to the console and immediately get a result back. The R console feels very much like a sophisticated pocket calculator. This property makes it easy for beginners to debug their code and explore the way R works. I recommend to configure R Studio to have the console (output) pane on the right side.
This leaves the left side for the script window which is basically a text editor that offers syntax highlighting for the R language and enables users to save and read and edit several R script in multiple tabs at the same time. Ctrl+Enter (command+enter on a mac) runs the currently selected lines by sending them to the console and executes them sequentially.
# list all object in the global environment (current R Session)
ls()
## character(0)
# get help for the mean function, works for all R objects
`?`(mean)
# get help for operators, special characters
`?`("+")
# show all example datasets data() # not shown in HTML output run it on the
# console.. load example dataset Swiss
data(swiss)
# show available R objects
ls()
## [1] "swiss"
# get structure of an R object
str(swiss)
## 'data.frame': 47 obs. of 6 variables:
## $ Fertility : num 80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
## $ Agriculture : num 17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
## $ Examination : int 15 6 5 12 17 9 16 14 12 16 ...
## $ Education : int 12 9 5 7 15 7 7 8 7 13 ...
## $ Catholic : num 9.96 84.84 93.4 33.77 5.16 ...
## $ Infant.Mortality: num 22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...
# get working directory note: if you are not in an R Studio project i.e. the
# project indicator on the top right corner says none, your working
# directory will be some standard directory given by your operating system
# such as MyDocuments etc. it is recommended to use projects when working
# with R Studio
getwd()
## [1] "/Users/mbannert/Phd/teaching/exploring_stats_with_R/course_slides"
# class of an object
class(swiss)
## [1] "data.frame"
# display only the first/ last couple of lines of the data
head(swiss)
## Fertility Agriculture Examination Education Catholic
## Courtelary 80.2 17.0 15 12 9.96
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Neuveville 76.9 43.5 17 15 5.16
## Porrentruy 76.1 35.3 9 7 90.57
## Infant.Mortality
## Courtelary 22.2
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Neuveville 20.6
## Porrentruy 26.6
tail(swiss)
## Fertility Agriculture Examination Education Catholic
## Neuchatel 64.4 17.6 35 32 16.92
## Val de Ruz 77.6 37.6 15 7 4.97
## ValdeTravers 67.6 18.7 25 7 8.65
## V. De Geneve 35.0 1.2 37 53 42.34
## Rive Droite 44.7 46.6 16 29 50.43
## Rive Gauche 42.8 27.7 22 29 58.33
## Infant.Mortality
## Neuchatel 23.0
## Val de Ruz 20.0
## ValdeTravers 19.5
## V. De Geneve 18.0
## Rive Droite 18.2
## Rive Gauche 19.3
# number of rows in a data.frame or matrix
nrow(swiss)
## [1] 47
# length of a vector
length(swiss$Fertility)
## [1] 47
# observation frequency
table(swiss$Education)
##
## 1 2 3 5 6 7 8 9 10 11 12 13 15 19 20 28 29 32 53
## 1 3 4 2 4 7 4 3 2 1 5 3 1 1 1 1 2 1 1
In R objects can simply created by using the assignment operator ** <- ** to assign an object (value) to an object. If the object as already in use the previous object will be overwritten without warning.
# assign the the value 1 to
a <- 1
# concatenate multiple elements to one vector and overwrite a
a <- c(1, 2, 3)
# create a sequence
b <- 1:10
# create a matrix
m <- matrix(data = c(1, 2, 3, 4), nrow = 2)
# create a data.frame by coercing a matrix to data.frame
df_1 <- as.data.frame(m)
# by defining it
df_2 <- data.frame(a = c(9, 10), b = c(1, 2))
Square brackets behind objects are used to specify indices. One dimensional objects like, only contain one index, two-dimensional objects like matrices or data.frames have two indices, typically of the following form: [row,column] Do not confuse them with () which are used when calling or defining functions.
Note: the following command just display parts of the data. No new objects are created without assignment!!
# get 2nd element of the vector a
a[2]
## [1] 2
# first element of the swiss dataset
swiss[1, 1]
## [1] 80.2
# everything but the first row
swiss[-1, ]
## Fertility Agriculture Examination Education Catholic
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Neuveville 76.9 43.5 17 15 5.16
## Porrentruy 76.1 35.3 9 7 90.57
## Broye 83.8 70.2 16 7 92.85
## Glane 92.4 67.8 14 8 97.16
## Gruyere 82.4 53.3 12 7 97.67
## Sarine 82.9 45.2 16 13 91.38
## Veveyse 87.1 64.5 14 6 98.61
## Aigle 64.1 62.0 21 12 8.52
## Aubonne 66.9 67.5 14 7 2.27
## Avenches 68.9 60.7 19 12 4.43
## Cossonay 61.7 69.3 22 5 2.82
## Echallens 68.3 72.6 18 2 24.20
## Grandson 71.7 34.0 17 8 3.30
## Lausanne 55.7 19.4 26 28 12.11
## La Vallee 54.3 15.2 31 20 2.15
## Lavaux 65.1 73.0 19 9 2.84
## Morges 65.5 59.8 22 10 5.23
## Moudon 65.0 55.1 14 3 4.52
## Nyone 56.6 50.9 22 12 15.14
## Orbe 57.4 54.1 20 6 4.20
## Oron 72.5 71.2 12 1 2.40
## Payerne 74.2 58.1 14 8 5.23
## Paysd'enhaut 72.0 63.5 6 3 2.56
## Rolle 60.5 60.8 16 10 7.72
## Vevey 58.3 26.8 25 19 18.46
## Yverdon 65.4 49.5 15 8 6.10
## Conthey 75.5 85.9 3 2 99.71
## Entremont 69.3 84.9 7 6 99.68
## Herens 77.3 89.7 5 2 100.00
## Martigwy 70.5 78.2 12 6 98.96
## Monthey 79.4 64.9 7 3 98.22
## St Maurice 65.0 75.9 9 9 99.06
## Sierre 92.2 84.6 3 3 99.46
## Sion 79.3 63.1 13 13 96.83
## Boudry 70.4 38.4 26 12 5.62
## La Chauxdfnd 65.7 7.7 29 11 13.79
## Le Locle 72.7 16.7 22 13 11.22
## Neuchatel 64.4 17.6 35 32 16.92
## Val de Ruz 77.6 37.6 15 7 4.97
## ValdeTravers 67.6 18.7 25 7 8.65
## V. De Geneve 35.0 1.2 37 53 42.34
## Rive Droite 44.7 46.6 16 29 50.43
## Rive Gauche 42.8 27.7 22 29 58.33
## Infant.Mortality
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Neuveville 20.6
## Porrentruy 26.6
## Broye 23.6
## Glane 24.9
## Gruyere 21.0
## Sarine 24.4
## Veveyse 24.5
## Aigle 16.5
## Aubonne 19.1
## Avenches 22.7
## Cossonay 18.7
## Echallens 21.2
## Grandson 20.0
## Lausanne 20.2
## La Vallee 10.8
## Lavaux 20.0
## Morges 18.0
## Moudon 22.4
## Nyone 16.7
## Orbe 15.3
## Oron 21.0
## Payerne 23.8
## Paysd'enhaut 18.0
## Rolle 16.3
## Vevey 20.9
## Yverdon 22.5
## Conthey 15.1
## Entremont 19.8
## Herens 18.3
## Martigwy 19.4
## Monthey 20.2
## St Maurice 17.8
## Sierre 16.3
## Sion 18.1
## Boudry 20.3
## La Chauxdfnd 20.5
## Le Locle 18.9
## Neuchatel 23.0
## Val de Ruz 20.0
## ValdeTravers 19.5
## V. De Geneve 18.0
## Rive Droite 18.2
## Rive Gauche 19.3
# first row, 2nd and 3rd col
swiss[1, 2:3]
## Agriculture Examination
## Courtelary 17 15
# use column names to identify the column make sure to quote '' the names
# since they are characters....
swiss[, c("Agriculture", "Fertility")]
## Agriculture Fertility
## Courtelary 17.0 80.2
## Delemont 45.1 83.1
## Franches-Mnt 39.7 92.5
## Moutier 36.5 85.8
## Neuveville 43.5 76.9
## Porrentruy 35.3 76.1
## Broye 70.2 83.8
## Glane 67.8 92.4
## Gruyere 53.3 82.4
## Sarine 45.2 82.9
## Veveyse 64.5 87.1
## Aigle 62.0 64.1
## Aubonne 67.5 66.9
## Avenches 60.7 68.9
## Cossonay 69.3 61.7
## Echallens 72.6 68.3
## Grandson 34.0 71.7
## Lausanne 19.4 55.7
## La Vallee 15.2 54.3
## Lavaux 73.0 65.1
## Morges 59.8 65.5
## Moudon 55.1 65.0
## Nyone 50.9 56.6
## Orbe 54.1 57.4
## Oron 71.2 72.5
## Payerne 58.1 74.2
## Paysd'enhaut 63.5 72.0
## Rolle 60.8 60.5
## Vevey 26.8 58.3
## Yverdon 49.5 65.4
## Conthey 85.9 75.5
## Entremont 84.9 69.3
## Herens 89.7 77.3
## Martigwy 78.2 70.5
## Monthey 64.9 79.4
## St Maurice 75.9 65.0
## Sierre 84.6 92.2
## Sion 63.1 79.3
## Boudry 38.4 70.4
## La Chauxdfnd 7.7 65.7
## Le Locle 16.7 72.7
## Neuchatel 17.6 64.4
## Val de Ruz 37.6 77.6
## ValdeTravers 18.7 67.6
## V. De Geneve 1.2 35.0
## Rive Droite 46.6 44.7
## Rive Gauche 27.7 42.8
# the $ operator for data.frames hint: hit tab in R Studio after entering
# the $ and experience some autocomplete magic.
swiss$Examination
## [1] 15 6 5 12 17 9 16 14 12 16 14 21 14 19 22 18 17 26 31 19 22 14 22
## [24] 20 12 14 6 16 25 15 3 7 5 12 7 9 3 13 26 29 22 35 15 25 37 16
## [47] 22
# subset: data, condition. comparison ==, >, < , != equals, greater than,
# smaller then, not equal
subset(swiss, Catholic > 10)
## Fertility Agriculture Examination Education Catholic
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Porrentruy 76.1 35.3 9 7 90.57
## Broye 83.8 70.2 16 7 92.85
## Glane 92.4 67.8 14 8 97.16
## Gruyere 82.4 53.3 12 7 97.67
## Sarine 82.9 45.2 16 13 91.38
## Veveyse 87.1 64.5 14 6 98.61
## Echallens 68.3 72.6 18 2 24.20
## Lausanne 55.7 19.4 26 28 12.11
## Nyone 56.6 50.9 22 12 15.14
## Vevey 58.3 26.8 25 19 18.46
## Conthey 75.5 85.9 3 2 99.71
## Entremont 69.3 84.9 7 6 99.68
## Herens 77.3 89.7 5 2 100.00
## Martigwy 70.5 78.2 12 6 98.96
## Monthey 79.4 64.9 7 3 98.22
## St Maurice 65.0 75.9 9 9 99.06
## Sierre 92.2 84.6 3 3 99.46
## Sion 79.3 63.1 13 13 96.83
## La Chauxdfnd 65.7 7.7 29 11 13.79
## Le Locle 72.7 16.7 22 13 11.22
## Neuchatel 64.4 17.6 35 32 16.92
## V. De Geneve 35.0 1.2 37 53 42.34
## Rive Droite 44.7 46.6 16 29 50.43
## Rive Gauche 42.8 27.7 22 29 58.33
## Infant.Mortality
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Porrentruy 26.6
## Broye 23.6
## Glane 24.9
## Gruyere 21.0
## Sarine 24.4
## Veveyse 24.5
## Echallens 21.2
## Lausanne 20.2
## Nyone 16.7
## Vevey 20.9
## Conthey 15.1
## Entremont 19.8
## Herens 18.3
## Martigwy 19.4
## Monthey 20.2
## St Maurice 17.8
## Sierre 16.3
## Sion 18.1
## La Chauxdfnd 20.5
## Le Locle 18.9
## Neuchatel 23.0
## V. De Geneve 18.0
## Rive Droite 18.2
## Rive Gauche 19.3
# subsetting the indexing way
swiss[swiss$Catholic > 10, ]
## Fertility Agriculture Examination Education Catholic
## Delemont 83.1 45.1 6 9 84.84
## Franches-Mnt 92.5 39.7 5 5 93.40
## Moutier 85.8 36.5 12 7 33.77
## Porrentruy 76.1 35.3 9 7 90.57
## Broye 83.8 70.2 16 7 92.85
## Glane 92.4 67.8 14 8 97.16
## Gruyere 82.4 53.3 12 7 97.67
## Sarine 82.9 45.2 16 13 91.38
## Veveyse 87.1 64.5 14 6 98.61
## Echallens 68.3 72.6 18 2 24.20
## Lausanne 55.7 19.4 26 28 12.11
## Nyone 56.6 50.9 22 12 15.14
## Vevey 58.3 26.8 25 19 18.46
## Conthey 75.5 85.9 3 2 99.71
## Entremont 69.3 84.9 7 6 99.68
## Herens 77.3 89.7 5 2 100.00
## Martigwy 70.5 78.2 12 6 98.96
## Monthey 79.4 64.9 7 3 98.22
## St Maurice 65.0 75.9 9 9 99.06
## Sierre 92.2 84.6 3 3 99.46
## Sion 79.3 63.1 13 13 96.83
## La Chauxdfnd 65.7 7.7 29 11 13.79
## Le Locle 72.7 16.7 22 13 11.22
## Neuchatel 64.4 17.6 35 32 16.92
## V. De Geneve 35.0 1.2 37 53 42.34
## Rive Droite 44.7 46.6 16 29 50.43
## Rive Gauche 42.8 27.7 22 29 58.33
## Infant.Mortality
## Delemont 22.2
## Franches-Mnt 20.2
## Moutier 20.3
## Porrentruy 26.6
## Broye 23.6
## Glane 24.9
## Gruyere 21.0
## Sarine 24.4
## Veveyse 24.5
## Echallens 21.2
## Lausanne 20.2
## Nyone 16.7
## Vevey 20.9
## Conthey 15.1
## Entremont 19.8
## Herens 18.3
## Martigwy 19.4
## Monthey 20.2
## St Maurice 17.8
## Sierre 16.3
## Sion 18.1
## La Chauxdfnd 20.5
## Le Locle 18.9
## Neuchatel 23.0
## V. De Geneve 18.0
## Rive Droite 18.2
## Rive Gauche 19.3
# maximum Fertility, note the result is the INDEX !!! not the number
which.max(swiss$Fertility)
## [1] 3
# hence we can use it to display the entire row
swiss[which.max(swiss$Fertility), ]
## Fertility Agriculture Examination Education Catholic
## Franches-Mnt 92.5 39.7 5 5 93.4
## Infant.Mortality
## Franches-Mnt 20.2
# don't forget to remove NAs
sum(swiss$Catholic, na.rm = T)
## [1] 1934
mean(swiss$Catholic, na.rm = T)
## [1] 41.14
median(swiss$Catholic, na.rm = T)
## [1] 15.14
# min and max
min(swiss$Catholic)
## [1] 2.15
max(swiss$Catholic)
## [1] 100
# general summary
summary(swiss$Catholic)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.15 5.20 15.10 41.10 93.10 100.00
# quantiles, quartiles are default...
quantile(swiss$Catholic)
## 0% 25% 50% 75% 100%
## 2.150 5.195 15.140 93.125 100.000
# create decentiles...
quantile(swiss$Catholic, probs = seq(0, 1, by = 0.1))
## 0% 10% 20% 30% 40% 50% 60% 70% 80%
## 2.150 2.832 4.610 5.542 9.174 15.140 38.912 90.732 97.094
## 90% 100%
## 99.000 100.000
# transpose a matrix or data.frame...
m_t <- t(m)