Data structure - Vectors - page 51

(myVector <- c(7,11,13))# Create a vector called 'myVector'
## [1]  7 11 13
myVector * 3
## [1] 21 33 39
myVector[1] # Get the first element
## [1] 7
myVector[3] # Get the third element
## [1] 13

Vector operations - Some basic functions - Page 87

Vector operations - Examples - Page 87-88

x  <- c(1,3,6,2,7,4,8,1,0,8)
length(x)
## [1] 10
sort(x)
##  [1] 0 1 1 2 3 4 6 7 8 8
sort(x, decreasing=TRUE)
##  [1] 8 8 7 6 4 3 2 1 1 0
rev(x)
##  [1] 8 0 1 8 4 7 2 6 3 1
rank(x)
##  [1] 2.5 5.0 7.0 4.0 8.0 6.0 9.5 2.5 1.0 9.5
head(x, 3)
## [1] 1 3 6
tail(x, 2)
## [1] 0 8

Note: The largest value of rank(x) is not always equal to length(x), as there could be a tie of largest values.

Set operations - Page 99

A <- c(4,5,2,7)
B <- c(2,1,7,3)
C <- c(2,3,7)
is.element(C,A)
## [1]  TRUE FALSE  TRUE
is.element(A,C)
## [1] FALSE FALSE  TRUE  TRUE
intersect(A,B)
## [1] 2 7
union(A,B)
## [1] 4 5 2 7 1 3
setdiff(A,B)
## [1] 4 5

Exercise

Given that

A <- c(4,5,2,7)
B <- c(2,1,7,3)
C <- c(2,3,7)

Calculate

Solution

# the collection of elements of A and B that only belong to one set
union(setdiff(A,B), setdiff(B,A))
## [1] 4 5 1 3
# the number of elements that belong to A, B and C
length(intersect(intersect(A,B), C))
## [1] 2

Data structure - Matrices and arrays - page 51

Data structure - Matrices and arrays - page 52

(X <- matrix(1:12, nrow=4, ncol=3, byrow=TRUE))
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12
(X <- matrix(1:12, nrow=4, ncol=3, byrow=FALSE))
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
(X <- array(1:24, dim=c(2,3,4)))
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]   13   15   17
## [2,]   14   16   18
## 
## , , 4
## 
##      [,1] [,2] [,3]
## [1,]   19   21   23
## [2,]   20   22   24

Data structure - Matrices and arrays - page 53

Visualization of a three-dimensional array?

Recycling - Pages 86-87

Given an operation on two vectors/matrices/arrays of different lengths, R will complete the shortest data structure by repeating its elements from the beginning. We call this behaviour ‘recycling’:

x <- c(1,2,3,4,5,6)
y <- c(1,2,3)
x + y
## [1] 2 4 6 5 7 9

Another example is below, where the vector 1:3 is repeated to fill in a matrix:

matrix(1:3, ncol=3, nrow=4)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    2    3    1
## [3,]    3    1    2
## [4,]    1    2    3

Merging - Page 89

You can merge vectors or matrices together to create a new matrix with functions cbind() and rbind().

(B <- cbind(1:4,5:8))
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
(C <- cbind(B, 9:12))
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
(B <- rbind(1:4,5:8))
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
(C <- rbind(B, 9:12))
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12

Matrix operations - Pages 315-316-317

(A <- matrix(c(2,3,5,4), nrow=2, ncol=2, byrow=T))
##      [,1] [,2]
## [1,]    2    3
## [2,]    5    4
(B <- matrix(c(1,2,8,7), nrow=2, ncol=2, byrow=F))
##      [,1] [,2]
## [1,]    1    8
## [2,]    2    7
(I2 <- diag(nrow=2)) # identity matrix of size 2x2
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
A+B 
##      [,1] [,2]
## [1,]    3   11
## [2,]    7   11
A*B
##      [,1] [,2]
## [1,]    2   24
## [2,]   10   28
A/B
##      [,1]      [,2]
## [1,]  2.0 0.3750000
## [2,]  2.5 0.5714286
A%*%I2
##      [,1] [,2]
## [1,]    2    3
## [2,]    5    4
A%*%B
##      [,1] [,2]
## [1,]    8   37
## [2,]   13   68
t(B)
##      [,1] [,2]
## [1,]    1    2
## [2,]    8    7

Note: the diag() function can also be used for different purposes (see R help file)

Matrix operations - the solve() function - Pages 316-317

(A <- matrix(1:4, ncol=2))
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
(x <- solve(A, c(1,1)))
## [1] -0.5  0.5
A%*%x
##      [,1]
## [1,]    1
## [2,]    1
solve(A) %*% A
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

Matrix operations - The function apply() - Page 93

The function apply() is often quite handy. It applies a given function to the elements of all rows (MARGIN=1) or all columns (MARGIN=2) of a matrix.

(X <- matrix(c(1:4, 1, 6:8), nrow = 2))
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    1    7
## [2,]    2    4    6    8
apply(X, MARGIN=1, FUN=sum)
## [1] 12 20
apply(X, MARGIN=2, FUN=mean)
## [1] 1.5 3.5 3.5 7.5

Other functions you could use: rowSums(), colSums(), rowMeans(), colMeans().

Matrix operations - Exercise

Given a \(3\times 3\) matrix X

X <- matrix(1:9, nrow = 3)

Matrix operations - Solution

(X <- matrix(1:9, nr = 3))
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
(row.sums <- apply(X, 1, sum))
## [1] 12 15 18
(col.sums <- apply(X, 2, sum))
## [1]  6 15 24
(sum(row.sums) == sum(col.sums)) & (sum(X) == sum(row.sums))
## [1] TRUE

Important note on data structures

Data structure - Lists - page 53

myVector <- c(1,2,"A", TRUE)
myVector
## [1] "1"    "2"    "A"    "TRUE"
typeof(myVector)
## [1] "character"
myList <- list(TRUE, my.matrix=matrix(1:4, nrow=2), c(1+2i,3), "R is my friend")
myList
## [[1]]
## [1] TRUE
## 
## $my.matrix
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## [[3]]
## [1] 1+2i 3+0i
## 
## [[4]]
## [1] "R is my friend"

Data structure - Lists - Exercise

Consider ‘myList’ given before as

myList <- list(TRUE, my.matrix=matrix(1:4, nrow=2), c(1+2i,3), "R is my friend")
  1. How many elements do we have in the object myList?
  2. Do all elements have the same data types?
  3. Does each element have its own name?

Data structure - Lists - Solution

myList
## [[1]]
## [1] TRUE
## 
## $my.matrix
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## [[3]]
## [1] 1+2i 3+0i
## 
## [[4]]
## [1] "R is my friend"
typeof(myList[[1]])
## [1] "logical"
typeof(myList[[2]])
## [1] "integer"
typeof(myList[[3]])
## [1] "complex"
typeof(myList[[4]])
## [1] "character"
  1. There are 4 elements.
  2. No. This is one advantage of using a list. In fact, each element can be a vector, a matrix, an array or even a list.
  3. The second element’s name is my.matrix, and the others are unnamed. Note that naming elements makes it easier to read the data from a list - we will discuss this further in later sessions.

Data structure - Data frames - page 54

A data.frame in R is a table where

Data frames are widely used in R

Data structure - Data frames

BMI <- data.frame(
  Gender=c("M","F","M","F"),
  Height=c(1.83,1.78,1.80,1.60),
  Weight=c(77,68,66,48),
  row.names=c("Ben","Katja","Anthony","Jinxia"))

BMI
##         Gender Height Weight
## Ben          M   1.83     77
## Katja        F   1.78     68
## Anthony      M   1.80     66
## Jinxia       F   1.60     48
str(BMI)#display internal structure of data frame
## 'data.frame':    4 obs. of  3 variables:
##  $ Gender: chr  "M" "F" "M" "F"
##  $ Height: num  1.83 1.78 1.8 1.6
##  $ Weight: num  77 68 66 48

You can access a specific variable using the $ command

BMI$Gender
## [1] "M" "F" "M" "F"

Merging - Merging columns of data frames - Page 89-91

X <- data.frame(GENDER=c("F","M","M","F"),
ID=c(123,234,345,456),
NAME=c("Mary","James","James","Olivia"),
Height=c(170,180,185,160))

Y <- data.frame(GENDER=c("M","F","F","M"),
ID=c(345,456,123,234),
NAME=c("James","Olivia","Mary","James"),
Weight=c(80,50,70,60))
X
##   GENDER  ID   NAME Height
## 1      F 123   Mary    170
## 2      M 234  James    180
## 3      M 345  James    185
## 4      F 456 Olivia    160
Y
##   GENDER  ID   NAME Weight
## 1      M 345  James     80
## 2      F 456 Olivia     50
## 3      F 123   Mary     70
## 4      M 234  James     60
cbind(X,Y)  # Not very useful here
##   GENDER  ID   NAME Height GENDER  ID   NAME Weight
## 1      F 123   Mary    170      M 345  James     80
## 2      M 234  James    180      F 456 Olivia     50
## 3      M 345  James    185      F 123   Mary     70
## 4      F 456 Olivia    160      M 234  James     60
merge(X,Y)  # This is what we want
##   GENDER  ID   NAME Height Weight
## 1      F 123   Mary    170     70
## 2      F 456 Olivia    160     50
## 3      M 234  James    180     60
## 4      M 345  James    185     80

Any individual not present in both datasets will be lost.

Z <- data.frame(GENDER=c("M","F","F","F"),
ID=c(345,456,123,999),
NAME=c("James","Olivia","Mary","Jennifer"),
Age=c(21,19,23,99))
Z
##   GENDER  ID     NAME Age
## 1      M 345    James  21
## 2      F 456   Olivia  19
## 3      F 123     Mary  23
## 4      F 999 Jennifer  99
merge(X,Z)
##   GENDER  ID   NAME Height Age
## 1      F 123   Mary    170  23
## 2      F 456 Olivia    160  19
## 3      M 345  James    185  21

You can use the all.x or all.y arguments to force the inclusion of all the subjects of a dataset.

merge(X,Z, all.x = T) # all subjects of first dataset are kept
##   GENDER  ID   NAME Height Age
## 1      F 123   Mary    170  23
## 2      F 456 Olivia    160  19
## 3      M 234  James    180  NA
## 4      M 345  James    185  21
merge(X,Z, all.y = T) # all subjects of second dataset are kept
##   GENDER  ID     NAME Height Age
## 1      F 123     Mary    170  23
## 2      F 456   Olivia    160  19
## 3      F 999 Jennifer     NA  99
## 4      M 345    James    185  21

Data structure - Factors

A factor can be used to store character strings

x <- factor(c("blue","green","blue","red","blue","green","green"))
levels(x)
## [1] "blue"  "green" "red"
class(x)
## [1] "factor"

Note: Function levels() shows you all unique factors, which can be handy.

Data structure - Summary

Data structure Instruction in R Description
vector c() Sequence of elements of the same nature
matrix matrix() Two-dimensional table of elements of the same nature
array array() More general than a matrix; table with several dimensions
list list() Sequence of R structures of any (and possibly different) nature.
data frame data.frame() Two-dimensional table. The columns can be of different natures, but must have the same length.
factor factor() Vector of character strings associated with a modality table
dates as.Date() Vector of dates
time series ts() Values of a variable observed at several time points

Importing data - text files - Page 64

Function name Description
read.table() best suited for data sets presented as tables, as it is often the case in statistics
read.ftable() reads contingency tables
scan() much more flexible and powerful. Use this in all other cases

Important note

Importing data - read.table() - Pages 64-66

Argument name Description
file=path/to/file Location and name of the file to be read
header=TRUE Indicates whether the variable names are given on the first line of the file
sep=“” This is the field separator character. Values on each line of the file are separated by this character. (e.g. “” = whitespace, “,” = comma, “\(\backslash\)t” = tabulation)
dec=“.” Decimal mark for numbers (“.” or “,”)
data <- read.table(file="mydata.txt")

Importing data - Exercise - Page 66

In the Ed ‘Exercise Space’ for this week we have placed a dataset in the .txt format called danishfire.txt. It contains claim amounts for three different categories of insurance losses, also with the dates at which the losses occurred.

Importing data - Solution - Page 66

# Vizualise data using readLines
readLines("danishfire.txt", n=5)
# import data, note that the header argument is TRUE
danish_fire <- read.table(file="danishfire.txt", header=TRUE)
head(danish_fire)
class(danish_fire)
str(danish_fire)
# some numerical analysis
mean(danish_fire$building)
cor(danish_fire$building, danish_fire$contents)

Note that function readLines() is useful as it allows you to visualize the beginning of the data file before you import the data, so you know the structure of the data and can determine the arguments of the function read.table().

Importing in R

If you are using R installed on your computer (not in Ed), you can specify a file that is not in your working directory

my.data <- read.table(file="C:/MyFolder/somedata.txt")

Note you can use double back slash \\ instead of /.

getwd()

Importing data - Standard formats (e.g. csv) - Page 67

With standard formats, most of the arguments of the function read.table() are fixed. There are some functions that are designed to be equivalent to read.table() with several arguments filled with pre-determined values:

Exporting data - Pages 72-73

Now try to export the danish fire data you have imported to R.

# Exporting data to a text file
write.table(danish_fire, file = "myfile.txt", sep = "\t")
# Exporting data to a csv file
write.csv(danish_fire, file = "myfile.csv")

Note that the commands above save your data into the working directory.

If you are using R on your computer (not in Ed!), you can specify a different path if you wish, by using file = "path/to/data/myfile.txt".