Exploring Stats with R - Reference Card aka 'Cheat Sheet'

This document is meant to be a quick reference to help unexperienced users to remember important syntax. It is NOT meant to be a comprehensive guide to R and may oversimplify things at times.

General Remarks

R is an interactive language which refers to the fact that useRs can interact to with the interpreter by entering program code to the console and immediately get a result back. The R console feels very much like a sophisticated pocket calculator. This property makes it easy for beginners to debug their code and explore the way R works. I recommend to configure R Studio to have the console (output) pane on the right side.

This leaves the left side for the script window which is basically a text editor that offers syntax highlighting for the R language and enables users to save and read and edit several R script in multiple tabs at the same time. Ctrl+Enter (command+enter on a mac) runs the currently selected lines by sending them to the console and executes them sequentially.

Getting Information

# list all object in the global environment (current R Session)
ls()
## character(0)
# get help for the mean function, works for all R objects
`?`(mean)
# get help for operators, special characters
`?`("+")
# show all example datasets data() # not shown in HTML output run it on the
# console..  load example dataset Swiss
data(swiss)
# show available R objects
ls()
## [1] "swiss"
# get structure of an R object
str(swiss)
## 'data.frame':    47 obs. of  6 variables:
##  $ Fertility       : num  80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
##  $ Agriculture     : num  17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
##  $ Examination     : int  15 6 5 12 17 9 16 14 12 16 ...
##  $ Education       : int  12 9 5 7 15 7 7 8 7 13 ...
##  $ Catholic        : num  9.96 84.84 93.4 33.77 5.16 ...
##  $ Infant.Mortality: num  22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...
# get working directory note: if you are not in an R Studio project i.e. the
# project indicator on the top right corner says none, your working
# directory will be some standard directory given by your operating system
# such as MyDocuments etc.  it is recommended to use projects when working
# with R Studio
getwd()
## [1] "/Users/mbannert/Phd/teaching/exploring_stats_with_R/course_slides"

# class of an object
class(swiss)
## [1] "data.frame"

# display only the first/ last couple of lines of the data
head(swiss)
##              Fertility Agriculture Examination Education Catholic
## Courtelary        80.2        17.0          15        12     9.96
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
##              Infant.Mortality
## Courtelary               22.2
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6
tail(swiss)
##              Fertility Agriculture Examination Education Catholic
## Neuchatel         64.4        17.6          35        32    16.92
## Val de Ruz        77.6        37.6          15         7     4.97
## ValdeTravers      67.6        18.7          25         7     8.65
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Neuchatel                23.0
## Val de Ruz               20.0
## ValdeTravers             19.5
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3

# number of rows in a data.frame or matrix
nrow(swiss)
## [1] 47

# length of a vector
length(swiss$Fertility)
## [1] 47

# observation frequency
table(swiss$Education)
## 
##  1  2  3  5  6  7  8  9 10 11 12 13 15 19 20 28 29 32 53 
##  1  3  4  2  4  7  4  3  2  1  5  3  1  1  1  1  2  1  1

Creating objects

In R objects can simply created by using the assignment operator ** <- ** to assign an object (value) to an object. If the object as already in use the previous object will be overwritten without warning.

# assign the the value 1 to
a <- 1
# concatenate multiple elements to one vector and overwrite a
a <- c(1, 2, 3)
# create a sequence
b <- 1:10

# create a matrix
m <- matrix(data = c(1, 2, 3, 4), nrow = 2)

# create a data.frame by coercing a matrix to data.frame
df_1 <- as.data.frame(m)
# by defining it
df_2 <- data.frame(a = c(9, 10), b = c(1, 2))

Indexing

Square brackets behind objects are used to specify indices. One dimensional objects like, only contain one index, two-dimensional objects like matrices or data.frames have two indices, typically of the following form: [row,column] Do not confuse them with () which are used when calling or defining functions.

Note: the following command just display parts of the data. No new objects are created without assignment!!

# get 2nd element of the vector a
a[2]
## [1] 2

# first element of the swiss dataset
swiss[1, 1]
## [1] 80.2
# everything but the first row
swiss[-1, ]
##              Fertility Agriculture Examination Education Catholic
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Sarine            82.9        45.2          16        13    91.38
## Veveyse           87.1        64.5          14         6    98.61
## Aigle             64.1        62.0          21        12     8.52
## Aubonne           66.9        67.5          14         7     2.27
## Avenches          68.9        60.7          19        12     4.43
## Cossonay          61.7        69.3          22         5     2.82
## Echallens         68.3        72.6          18         2    24.20
## Grandson          71.7        34.0          17         8     3.30
## Lausanne          55.7        19.4          26        28    12.11
## La Vallee         54.3        15.2          31        20     2.15
## Lavaux            65.1        73.0          19         9     2.84
## Morges            65.5        59.8          22        10     5.23
## Moudon            65.0        55.1          14         3     4.52
## Nyone             56.6        50.9          22        12    15.14
## Orbe              57.4        54.1          20         6     4.20
## Oron              72.5        71.2          12         1     2.40
## Payerne           74.2        58.1          14         8     5.23
## Paysd'enhaut      72.0        63.5           6         3     2.56
## Rolle             60.5        60.8          16        10     7.72
## Vevey             58.3        26.8          25        19    18.46
## Yverdon           65.4        49.5          15         8     6.10
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## Boudry            70.4        38.4          26        12     5.62
## La Chauxdfnd      65.7         7.7          29        11    13.79
## Le Locle          72.7        16.7          22        13    11.22
## Neuchatel         64.4        17.6          35        32    16.92
## Val de Ruz        77.6        37.6          15         7     4.97
## ValdeTravers      67.6        18.7          25         7     8.65
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Sarine                   24.4
## Veveyse                  24.5
## Aigle                    16.5
## Aubonne                  19.1
## Avenches                 22.7
## Cossonay                 18.7
## Echallens                21.2
## Grandson                 20.0
## Lausanne                 20.2
## La Vallee                10.8
## Lavaux                   20.0
## Morges                   18.0
## Moudon                   22.4
## Nyone                    16.7
## Orbe                     15.3
## Oron                     21.0
## Payerne                  23.8
## Paysd'enhaut             18.0
## Rolle                    16.3
## Vevey                    20.9
## Yverdon                  22.5
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## St Maurice               17.8
## Sierre                   16.3
## Sion                     18.1
## Boudry                   20.3
## La Chauxdfnd             20.5
## Le Locle                 18.9
## Neuchatel                23.0
## Val de Ruz               20.0
## ValdeTravers             19.5
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3
# first row, 2nd and 3rd col
swiss[1, 2:3]
##            Agriculture Examination
## Courtelary          17          15

# use column names to identify the column make sure to quote '' the names
# since they are characters....
swiss[, c("Agriculture", "Fertility")]
##              Agriculture Fertility
## Courtelary          17.0      80.2
## Delemont            45.1      83.1
## Franches-Mnt        39.7      92.5
## Moutier             36.5      85.8
## Neuveville          43.5      76.9
## Porrentruy          35.3      76.1
## Broye               70.2      83.8
## Glane               67.8      92.4
## Gruyere             53.3      82.4
## Sarine              45.2      82.9
## Veveyse             64.5      87.1
## Aigle               62.0      64.1
## Aubonne             67.5      66.9
## Avenches            60.7      68.9
## Cossonay            69.3      61.7
## Echallens           72.6      68.3
## Grandson            34.0      71.7
## Lausanne            19.4      55.7
## La Vallee           15.2      54.3
## Lavaux              73.0      65.1
## Morges              59.8      65.5
## Moudon              55.1      65.0
## Nyone               50.9      56.6
## Orbe                54.1      57.4
## Oron                71.2      72.5
## Payerne             58.1      74.2
## Paysd'enhaut        63.5      72.0
## Rolle               60.8      60.5
## Vevey               26.8      58.3
## Yverdon             49.5      65.4
## Conthey             85.9      75.5
## Entremont           84.9      69.3
## Herens              89.7      77.3
## Martigwy            78.2      70.5
## Monthey             64.9      79.4
## St Maurice          75.9      65.0
## Sierre              84.6      92.2
## Sion                63.1      79.3
## Boudry              38.4      70.4
## La Chauxdfnd         7.7      65.7
## Le Locle            16.7      72.7
## Neuchatel           17.6      64.4
## Val de Ruz          37.6      77.6
## ValdeTravers        18.7      67.6
## V. De Geneve         1.2      35.0
## Rive Droite         46.6      44.7
## Rive Gauche         27.7      42.8

# the $ operator for data.frames hint: hit tab in R Studio after entering
# the $ and experience some autocomplete magic.
swiss$Examination
##  [1] 15  6  5 12 17  9 16 14 12 16 14 21 14 19 22 18 17 26 31 19 22 14 22
## [24] 20 12 14  6 16 25 15  3  7  5 12  7  9  3 13 26 29 22 35 15 25 37 16
## [47] 22

Subsetting and Variable selection

# subset: data, condition.  comparison ==, >, < , != equals, greater than,
# smaller then, not equal
subset(swiss, Catholic > 10)
##              Fertility Agriculture Examination Education Catholic
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Sarine            82.9        45.2          16        13    91.38
## Veveyse           87.1        64.5          14         6    98.61
## Echallens         68.3        72.6          18         2    24.20
## Lausanne          55.7        19.4          26        28    12.11
## Nyone             56.6        50.9          22        12    15.14
## Vevey             58.3        26.8          25        19    18.46
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## La Chauxdfnd      65.7         7.7          29        11    13.79
## Le Locle          72.7        16.7          22        13    11.22
## Neuchatel         64.4        17.6          35        32    16.92
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Sarine                   24.4
## Veveyse                  24.5
## Echallens                21.2
## Lausanne                 20.2
## Nyone                    16.7
## Vevey                    20.9
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## St Maurice               17.8
## Sierre                   16.3
## Sion                     18.1
## La Chauxdfnd             20.5
## Le Locle                 18.9
## Neuchatel                23.0
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3

# subsetting the indexing way
swiss[swiss$Catholic > 10, ]
##              Fertility Agriculture Examination Education Catholic
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Sarine            82.9        45.2          16        13    91.38
## Veveyse           87.1        64.5          14         6    98.61
## Echallens         68.3        72.6          18         2    24.20
## Lausanne          55.7        19.4          26        28    12.11
## Nyone             56.6        50.9          22        12    15.14
## Vevey             58.3        26.8          25        19    18.46
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## La Chauxdfnd      65.7         7.7          29        11    13.79
## Le Locle          72.7        16.7          22        13    11.22
## Neuchatel         64.4        17.6          35        32    16.92
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Sarine                   24.4
## Veveyse                  24.5
## Echallens                21.2
## Lausanne                 20.2
## Nyone                    16.7
## Vevey                    20.9
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## St Maurice               17.8
## Sierre                   16.3
## Sion                     18.1
## La Chauxdfnd             20.5
## Le Locle                 18.9
## Neuchatel                23.0
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3

# maximum Fertility, note the result is the INDEX !!! not the number
which.max(swiss$Fertility)
## [1] 3
# hence we can use it to display the entire row
swiss[which.max(swiss$Fertility), ]
##              Fertility Agriculture Examination Education Catholic
## Franches-Mnt      92.5        39.7           5         5     93.4
##              Infant.Mortality
## Franches-Mnt             20.2

Simple Maths

# don't forget to remove NAs
sum(swiss$Catholic, na.rm = T)
## [1] 1934
mean(swiss$Catholic, na.rm = T)
## [1] 41.14
median(swiss$Catholic, na.rm = T)
## [1] 15.14

# min and max
min(swiss$Catholic)
## [1] 2.15
max(swiss$Catholic)
## [1] 100

# general summary
summary(swiss$Catholic)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.15    5.20   15.10   41.10   93.10  100.00

# quantiles, quartiles are default...
quantile(swiss$Catholic)
##      0%     25%     50%     75%    100% 
##   2.150   5.195  15.140  93.125 100.000
# create decentiles...
quantile(swiss$Catholic, probs = seq(0, 1, by = 0.1))
##      0%     10%     20%     30%     40%     50%     60%     70%     80% 
##   2.150   2.832   4.610   5.542   9.174  15.140  38.912  90.732  97.094 
##     90%    100% 
##  99.000 100.000

# transpose a matrix or data.frame...
m_t <- t(m)

To be continued…