Chapter 5 List and Frames

5.1 List of Objects

Lists can be used to group any mix of R structures and object. A single list could contain a numberic matrix, a logical array, a single character string, and a factor object.

To create a list populate the elements just like a vector.

foo <- list(matrix(data=1:4, nrow=2,ncol=2),c(T,F,T,T),"hello")
foo
[[1]]
     [,1] [,2]
[1,]    1    3
[2,]    2    4

[[2]]
[1]  TRUE FALSE  TRUE  TRUE

[[3]]
[1] "hello"

You can use length function to check the number of components in a list

length(x=foo)
[1] 3

You can retrieve components from the list using indexes [] (also called “Member Reference”)

foo[[1]]
     [,1] [,2]
[1,]    1    3
[2,]    2    4
foo[[3]]
[1] "hello"

You can treat it just like any ordinary object

foo[[1]]+5.5
     [,1] [,2]
[1,]  6.5  8.5
[2,]  7.5  9.5
foo[[1]][1,2]
[1] 3
foo[[1]][2,]
[1] 2 4
cat(foo[[3]], 'you!')
hello you!

Use the assignment operator ‘<-’ to overwrite a member of the list

# print out the current content
foo[[3]]
[1] "hello"
# overwrite it
foo[[3]] <- paste(foo[[3]],"you")
# print out the new content
foo[[3]]
[1] "hello you"

List slicing when you want to reference multiple list items at once Use a single square bracket instead of the double brackets

# this doesnt work because we used double brackets
foo[[c(2,3)]]
[1] TRUE
# this will work because we used a single bracket
bar <- foo[c(2,3)]
bar
[[1]]
[1]  TRUE FALSE  TRUE  TRUE

[[2]]
[1] "hello you"

We can provide field names to the list. Names are attributes in R.

names(foo)<-c("mymatrix", "mylogicals","mystring")
foo
$mymatrix
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$mylogicals
[1]  TRUE FALSE  TRUE  TRUE

$mystring
[1] "hello you"

We can now use the names to reference the members.

# this is the same as foo[[1]]
foo$mymatrix
     [,1] [,2]
[1,]    1    3
[2,]    2    4
# subsetting members work the same way too
all(foo$mymatrix[,2]==foo[[1]][,2])
[1] TRUE

You can save a step by creating the labels on the list as you create the list

baz <-list(tom=c(foo[[2]],T,T,T,F),dick="g'day mate",harry=foo$mymatrix*2)
baz
$tom
[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

$dick
[1] "g'day mate"

$harry
     [,1] [,2]
[1,]    2    6
[2,]    4    8

To rename these members:

names(baz)<-c("wilson","jane","john")
baz
$wilson
[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

$jane
[1] "g'day mate"

$john
     [,1] [,2]
[1,]    2    6
[2,]    4    8

You can add a component to the list by using the ‘$’ symbol

# existing baz
baz
$wilson
[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

$jane
[1] "g'day mate"

$john
     [,1] [,2]
[1,]    2    6
[2,]    4    8
# add a new component
baz$jenny <-foo
baz
$wilson
[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

$jane
[1] "g'day mate"

$john
     [,1] [,2]
[1,]    2    6
[2,]    4    8

$jenny
$jenny$mymatrix
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$jenny$mylogicals
[1]  TRUE FALSE  TRUE  TRUE

$jenny$mystring
[1] "hello you"

Now to access this nested list that was created:

# these commands produce the same results.
baz$jenny$mylogicals[1:3]
[1]  TRUE FALSE  TRUE
baz[[4]][[2]][1:3]
[1]  TRUE FALSE  TRUE
baz[[4]]$mylogicals[1:3]
[1]  TRUE FALSE  TRUE

5.2 Data Frames

Data frames are like list but with some sort of rules in them. The members must all be vectors of equal length.

5.2.1 construction

Use the data.frame function to create a data frame from scratch. Each row in a data frame is a ‘record’ Each column is a ‘variable’.

# Create a data frame
mydata <- data.frame(person=c("Peter","Lois","Meg","Chris","Stewie"), age=c(42,40,17,14,1),sex=factor(c("M","F","F","M","M")))
# print the data frame
mydata

You can refer to portions of the data by specifying the row and column

# get the name of the 2nd person 
mydata[2,1]
[1] Lois
Levels: Chris Lois Meg Peter Stewie
# get the age of the 2nd person
mydata[2,2]
[1] 40
# get the sex of the first three persons
mydata[1:3,3]
[1] M F F
Levels: F M
# get the entire 3rd and first columns
mydata[,c(3,1)]
# get the entire data frame but differnet order
mydata[,c(3:1)]

You can use the names of the vectors that were passed to data.frame to acces variables even if you don’t know their column index positions.

mydata$age
[1] 42 40 17 14  1
mydata$person
[1] Peter  Lois   Meg    Chris  Stewie
Levels: Chris Lois Meg Peter Stewie
mydata$sex
[1] M F F M M
Levels: F M
# get only the age of the 2nd record
mydata$age[2]
[1] 40

To find the size of the data frame use the nrow and ncol and dim command

nrow(mydata)
[1] 5
ncol(mydata)
[1] 3
dim(mydata)
[1] 5 3

Notice that Person has been automatically converted into a factor. To prevent this add the command " stringAsFactors=FALSE"

# see that there are levels 
mydata$person
[1] "Peter"  "Lois"   "Meg"    "Chris"  "Stewie" "Brian"  "Brian" 
# now recreate the data frame wit the stringsAsFactors=FALSE
mydata <- data.frame(person=c("Peter","Lois","Meg","Chris","Stewie"), age=c(42,40,17,14,1),sex=factor(c("M","F","F","M","M")),stringsAsFactors = FALSE)
# Now print the new data
mydata

5.2.2 Addomg Data columns and Combining Data Frames

Create a new data frame and then use rbind function to append it

# print the exiting data
mydata
# create new data frame
newrecord <-data.frame(person="Brian", age=7, sex=factor("M",levels=levels(mydata$sex)))
# now add it to mydata
mydata <- rbind(mydata,newrecord)
mydata

Create a new column using cbind

# print existing mydata
mydata
# create new column values
funny <-c("High","High","Low","Med","High","Med")
funny <- factor(x=funny, levels=c("Low","Med","High"))
# now let us add the column
mydata <-cbind(mydata,funny)
# now see if the funny column is added
mydata

You can still use $ to address a specific member to create additional columns. In the example, we list the age in months not years.

mydata$age.num <- mydata$age*12
mydata

Listing only the males in the data frames

# use logical comparison
mydata$sex=="M"
[1]  TRUE FALSE FALSE  TRUE  TRUE  TRUE
# use this comparison inside the data frame
mydata[mydata$sex=="M",]
# we can output the records without the sex column by using negative sign
mydata[mydata$sex=="M",-3]

Or use the character vector names of the variables names instead.

# we can output the records without the sex column by using negative sign
mydata[mydata$sex=="M",c("person","age","funny","age.num")]

Combining logical operators | for or && for AND

mydata[mydata$sex=="M"| mydata$funny=="Med" && mydata$age>10,c("person","age","funny","age.num")]
LS0tDQp0aXRsZTogIlRoZSBCb29rIG9mIFIgIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KPGgxPkNoYXB0ZXIgNSBMaXN0IGFuZCBGcmFtZXM8L2gxPg0KDQo8aDI+NS4xIExpc3Qgb2YgT2JqZWN0czwvaDI+DQoNCkxpc3RzIGNhbiBiZSB1c2VkIHRvIGdyb3VwIGFueSBtaXggb2YgUiBzdHJ1Y3R1cmVzIGFuZCBvYmplY3QuIEEgc2luZ2xlIGxpc3QgY291bGQgY29udGFpbiBhIG51bWJlcmljIG1hdHJpeCwgYSBsb2dpY2FsIGFycmF5LCBhIHNpbmdsZSBjaGFyYWN0ZXIgc3RyaW5nLCBhbmQgYSBmYWN0b3Igb2JqZWN0LiANCg0KVG8gY3JlYXRlIGEgbGlzdCBwb3B1bGF0ZSB0aGUgZWxlbWVudHMganVzdCBsaWtlIGEgdmVjdG9yLg0KYGBge3J9DQpmb28gPC0gbGlzdChtYXRyaXgoZGF0YT0xOjQsIG5yb3c9MixuY29sPTIpLGMoVCxGLFQsVCksImhlbGxvIikNCmZvbw0KDQpgYGANCg0KDQpZb3UgY2FuIHVzZSBsZW5ndGggZnVuY3Rpb24gdG8gY2hlY2sgdGhlIG51bWJlciBvZiBjb21wb25lbnRzIGluIGEgbGlzdA0KYGBge3J9DQpsZW5ndGgoeD1mb28pDQpgYGANCg0KWW91IGNhbiByZXRyaWV2ZSBjb21wb25lbnRzIGZyb20gdGhlIGxpc3QgdXNpbmcgaW5kZXhlcyBbXSAoYWxzbyBjYWxsZWQgIk1lbWJlciBSZWZlcmVuY2UiKQ0KYGBge3J9DQpmb29bWzFdXQ0KZm9vW1szXV0NCmBgYA0KWW91IGNhbiB0cmVhdCBpdCBqdXN0IGxpa2UgYW55IG9yZGluYXJ5IG9iamVjdA0KYGBge3J9DQpmb29bWzFdXSs1LjUNCmZvb1tbMV1dWzEsMl0NCmZvb1tbMV1dWzIsXQ0KY2F0KGZvb1tbM11dLCAneW91IScpDQpgYGANClVzZSB0aGUgYXNzaWdubWVudCBvcGVyYXRvciAnPC0nIHRvIG92ZXJ3cml0ZSBhIG1lbWJlciBvZiB0aGUgbGlzdA0KYGBge3J9DQojIHByaW50IG91dCB0aGUgY3VycmVudCBjb250ZW50DQpmb29bWzNdXQ0KIyBvdmVyd3JpdGUgaXQNCmZvb1tbM11dIDwtIHBhc3RlKGZvb1tbM11dLCJ5b3UiKQ0KIyBwcmludCBvdXQgdGhlIG5ldyBjb250ZW50DQpmb29bWzNdXQ0KYGBgDQoNCkxpc3Qgc2xpY2luZyB3aGVuIHlvdSB3YW50IHRvIHJlZmVyZW5jZSBtdWx0aXBsZSBsaXN0IGl0ZW1zIGF0IG9uY2UNClVzZSBhIHNpbmdsZSBzcXVhcmUgYnJhY2tldCBpbnN0ZWFkIG9mIHRoZSBkb3VibGUgYnJhY2tldHMNCg0KYGBge3J9DQojIHRoaXMgZG9lc250IHdvcmsgYmVjYXVzZSB3ZSB1c2VkIGRvdWJsZSBicmFja2V0cw0KZm9vW1tjKDIsMyldXQ0KIyB0aGlzIHdpbGwgd29yayBiZWNhdXNlIHdlIHVzZWQgYSBzaW5nbGUgYnJhY2tldA0KYmFyIDwtIGZvb1tjKDIsMyldDQpiYXINCg0KYGBgDQoNCldlIGNhbiBwcm92aWRlIGZpZWxkIG5hbWVzIHRvIHRoZSBsaXN0LiBOYW1lcyBhcmUgYXR0cmlidXRlcyBpbiBSLg0KYGBge3J9DQpuYW1lcyhmb28pPC1jKCJteW1hdHJpeCIsICJteWxvZ2ljYWxzIiwibXlzdHJpbmciKQ0KZm9vDQpgYGANCg0KV2UgY2FuIG5vdyB1c2UgdGhlIG5hbWVzIHRvIHJlZmVyZW5jZSB0aGUgbWVtYmVycy4NCmBgYHtyfQ0KIyB0aGlzIGlzIHRoZSBzYW1lIGFzIGZvb1tbMV1dDQpmb28kbXltYXRyaXgNCiMgc3Vic2V0dGluZyBtZW1iZXJzIHdvcmsgdGhlIHNhbWUgd2F5IHRvbw0KYWxsKGZvbyRteW1hdHJpeFssMl09PWZvb1tbMV1dWywyXSkNCmBgYA0KDQpZb3UgY2FuIHNhdmUgYSBzdGVwIGJ5IGNyZWF0aW5nIHRoZSBsYWJlbHMgb24gdGhlIGxpc3QgYXMgeW91IGNyZWF0ZSB0aGUgbGlzdA0KYGBge3J9DQpiYXogPC1saXN0KHRvbT1jKGZvb1tbMl1dLFQsVCxULEYpLGRpY2s9ImcnZGF5IG1hdGUiLGhhcnJ5PWZvbyRteW1hdHJpeCoyKQ0KYmF6DQpgYGANClRvIHJlbmFtZSB0aGVzZSBtZW1iZXJzOg0KYGBge3J9DQpuYW1lcyhiYXopPC1jKCJ3aWxzb24iLCJqYW5lIiwiam9obiIpDQpiYXoNCg0KYGBgDQoNCllvdSBjYW4gYWRkIGEgY29tcG9uZW50IHRvIHRoZSBsaXN0IGJ5IHVzaW5nIHRoZSAnJCcgc3ltYm9sDQpgYGB7cn0NCiMgZXhpc3RpbmcgYmF6DQpiYXoNCiMgYWRkIGEgbmV3IGNvbXBvbmVudA0KYmF6JGplbm55IDwtZm9vDQpiYXoNCg0KYGBgDQoNCk5vdyB0byBhY2Nlc3MgdGhpcyBuZXN0ZWQgbGlzdCB0aGF0IHdhcyBjcmVhdGVkOg0KDQpgYGB7cn0NCiMgdGhlc2UgY29tbWFuZHMgcHJvZHVjZSB0aGUgc2FtZSByZXN1bHRzLg0KYmF6JGplbm55JG15bG9naWNhbHNbMTozXQ0KYmF6W1s0XV1bWzJdXVsxOjNdDQpiYXpbWzRdXSRteWxvZ2ljYWxzWzE6M10NCmBgYA0KDQo8aDI+NS4yIERhdGEgRnJhbWVzPC9oMj4NCkRhdGEgZnJhbWVzIGFyZSBsaWtlIGxpc3QgYnV0IHdpdGggc29tZSBzb3J0IG9mIHJ1bGVzIGluIHRoZW0uIFRoZSBtZW1iZXJzIG11c3QgYWxsIGJlIHZlY3RvcnMgb2YgZXF1YWwgbGVuZ3RoLg0KDQo8aDM+IDUuMi4xIGNvbnN0cnVjdGlvbiA8L2gzPg0KVXNlIHRoZSBkYXRhLmZyYW1lIGZ1bmN0aW9uIHRvIGNyZWF0ZSBhIGRhdGEgZnJhbWUgZnJvbSBzY3JhdGNoLg0KRWFjaCByb3cgaW4gYSBkYXRhIGZyYW1lIGlzIGEgJ3JlY29yZCcNCkVhY2ggY29sdW1uIGlzIGEgJ3ZhcmlhYmxlJy4NCg0KYGBge3J9DQojIENyZWF0ZSBhIGRhdGEgZnJhbWUNCm15ZGF0YSA8LSBkYXRhLmZyYW1lKHBlcnNvbj1jKCJQZXRlciIsIkxvaXMiLCJNZWciLCJDaHJpcyIsIlN0ZXdpZSIpLCBhZ2U9Yyg0Miw0MCwxNywxNCwxKSxzZXg9ZmFjdG9yKGMoIk0iLCJGIiwiRiIsIk0iLCJNIikpKQ0KIyBwcmludCB0aGUgZGF0YSBmcmFtZQ0KbXlkYXRhDQoNCmBgYA0KDQpZb3UgY2FuIHJlZmVyIHRvIHBvcnRpb25zIG9mIHRoZSBkYXRhIGJ5IHNwZWNpZnlpbmcgdGhlIHJvdyBhbmQgY29sdW1uDQoNCmBgYHtyfQ0KIyBnZXQgdGhlIG5hbWUgb2YgdGhlIDJuZCBwZXJzb24gDQpteWRhdGFbMiwxXQ0KIyBnZXQgdGhlIGFnZSBvZiB0aGUgMm5kIHBlcnNvbg0KbXlkYXRhWzIsMl0NCiMgZ2V0IHRoZSBzZXggb2YgdGhlIGZpcnN0IHRocmVlIHBlcnNvbnMNCm15ZGF0YVsxOjMsM10NCg0KYGBgDQoNCmBgYHtyfQ0KIyBnZXQgdGhlIGVudGlyZSAzcmQgYW5kIGZpcnN0IGNvbHVtbnMNCm15ZGF0YVssYygzLDEpXQ0KIyBnZXQgdGhlIGVudGlyZSBkYXRhIGZyYW1lIGJ1dCBkaWZmZXJuZXQgb3JkZXINCm15ZGF0YVssYygzOjEpXQ0KYGBgDQoNCllvdSBjYW4gdXNlIHRoZSBuYW1lcyBvZiB0aGUgdmVjdG9ycyB0aGF0IHdlcmUgcGFzc2VkIHRvIGRhdGEuZnJhbWUgdG8gYWNjZXMgdmFyaWFibGVzIGV2ZW4gaWYgeW91IGRvbid0IGtub3cgdGhlaXIgY29sdW1uIGluZGV4IHBvc2l0aW9ucy4gDQoNCmBgYHtyfQ0KbXlkYXRhJGFnZQ0KbXlkYXRhJHBlcnNvbg0KbXlkYXRhJHNleA0KIyBnZXQgb25seSB0aGUgYWdlIG9mIHRoZSAybmQgcmVjb3JkDQpteWRhdGEkYWdlWzJdDQpgYGANCg0KVG8gZmluZCB0aGUgc2l6ZSBvZiB0aGUgZGF0YSBmcmFtZSB1c2UgdGhlIG5yb3cgYW5kIG5jb2wgYW5kIGRpbSBjb21tYW5kDQoNCmBgYHtyfQ0KbnJvdyhteWRhdGEpDQpuY29sKG15ZGF0YSkNCmRpbShteWRhdGEpDQpgYGANCg0KTm90aWNlIHRoYXQgUGVyc29uIGhhcyBiZWVuIGF1dG9tYXRpY2FsbHkgY29udmVydGVkIGludG8gYSBmYWN0b3IuIA0KVG8gcHJldmVudCB0aGlzIGFkZCB0aGUgY29tbWFuZCAiIHN0cmluZ0FzRmFjdG9ycz1GQUxTRSINCg0KYGBge3J9DQojIHNlZSB0aGF0IHRoZXJlIGFyZSBsZXZlbHMgDQpteWRhdGEkcGVyc29uDQojIG5vdyByZWNyZWF0ZSB0aGUgZGF0YSBmcmFtZSB3aXQgdGhlIHN0cmluZ3NBc0ZhY3RvcnM9RkFMU0UNCm15ZGF0YSA8LSBkYXRhLmZyYW1lKHBlcnNvbj1jKCJQZXRlciIsIkxvaXMiLCJNZWciLCJDaHJpcyIsIlN0ZXdpZSIpLCBhZ2U9Yyg0Miw0MCwxNywxNCwxKSxzZXg9ZmFjdG9yKGMoIk0iLCJGIiwiRiIsIk0iLCJNIikpLHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRSkNCiMgTm93IHByaW50IHRoZSBuZXcgZGF0YQ0KDQpteWRhdGENCg0KDQpgYGANCg0KPGgzPjUuMi4yIEFkZG9tZyBEYXRhIGNvbHVtbnMgYW5kIENvbWJpbmluZyBEYXRhIEZyYW1lcyA8L2gzPg0KDQpDcmVhdGUgYSBuZXcgZGF0YSBmcmFtZSBhbmQgdGhlbiB1c2UgcmJpbmQgZnVuY3Rpb24gdG8gYXBwZW5kIGl0DQoNCmBgYHtyfQ0KIyBwcmludCB0aGUgZXhpdGluZyBkYXRhDQpteWRhdGENCiMgY3JlYXRlIG5ldyBkYXRhIGZyYW1lDQpuZXdyZWNvcmQgPC1kYXRhLmZyYW1lKHBlcnNvbj0iQnJpYW4iLCBhZ2U9Nywgc2V4PWZhY3RvcigiTSIsbGV2ZWxzPWxldmVscyhteWRhdGEkc2V4KSkpDQojIG5vdyBhZGQgaXQgdG8gbXlkYXRhDQpteWRhdGEgPC0gcmJpbmQobXlkYXRhLG5ld3JlY29yZCkNCm15ZGF0YQ0KDQpgYGANCkNyZWF0ZSBhIG5ldyBjb2x1bW4gdXNpbmcgY2JpbmQgDQoNCmBgYHtyfQ0KIyBwcmludCBleGlzdGluZyBteWRhdGENCm15ZGF0YQ0KIyBjcmVhdGUgbmV3IGNvbHVtbiB2YWx1ZXMNCmZ1bm55IDwtYygiSGlnaCIsIkhpZ2giLCJMb3ciLCJNZWQiLCJIaWdoIiwiTWVkIikNCmZ1bm55IDwtIGZhY3Rvcih4PWZ1bm55LCBsZXZlbHM9YygiTG93IiwiTWVkIiwiSGlnaCIpKQ0KIyBub3cgbGV0IHVzIGFkZCB0aGUgY29sdW1uDQpteWRhdGEgPC1jYmluZChteWRhdGEsZnVubnkpDQojIG5vdyBzZWUgaWYgdGhlIGZ1bm55IGNvbHVtbiBpcyBhZGRlZA0KbXlkYXRhDQpgYGANCg0KWW91IGNhbiBzdGlsbCB1c2UgJCB0byBhZGRyZXNzIGEgc3BlY2lmaWMgbWVtYmVyIHRvIGNyZWF0ZSBhZGRpdGlvbmFsIGNvbHVtbnMuIEluIHRoZSBleGFtcGxlLCB3ZSBsaXN0IHRoZSBhZ2UgaW4gbW9udGhzIG5vdCB5ZWFycy4NCg0KYGBge3J9DQpteWRhdGEkYWdlLm51bSA8LSBteWRhdGEkYWdlKjEyDQpteWRhdGENCg0KYGBgDQoNCkxpc3Rpbmcgb25seSB0aGUgbWFsZXMgaW4gdGhlIGRhdGEgZnJhbWVzDQoNCmBgYHtyfQ0KIyB1c2UgbG9naWNhbCBjb21wYXJpc29uDQpteWRhdGEkc2V4PT0iTSINCiMgdXNlIHRoaXMgY29tcGFyaXNvbiBpbnNpZGUgdGhlIGRhdGEgZnJhbWUNCm15ZGF0YVtteWRhdGEkc2V4PT0iTSIsXQ0KIyB3ZSBjYW4gb3V0cHV0IHRoZSByZWNvcmRzIHdpdGhvdXQgdGhlIHNleCBjb2x1bW4gYnkgdXNpbmcgbmVnYXRpdmUgc2lnbg0KbXlkYXRhW215ZGF0YSRzZXg9PSJNIiwtM10NCmBgYA0KT3IgdXNlIHRoZSBjaGFyYWN0ZXIgdmVjdG9yIG5hbWVzIG9mIHRoZSB2YXJpYWJsZXMgbmFtZXMgaW5zdGVhZC4NCg0KYGBge3J9DQojIHdlIGNhbiBvdXRwdXQgdGhlIHJlY29yZHMgd2l0aG91dCB0aGUgc2V4IGNvbHVtbiBieSB1c2luZyBuZWdhdGl2ZSBzaWduDQpteWRhdGFbbXlkYXRhJHNleD09Ik0iLGMoInBlcnNvbiIsImFnZSIsImZ1bm55IiwiYWdlLm51bSIpXQ0KYGBgDQoNCkNvbWJpbmluZyBsb2dpY2FsIG9wZXJhdG9ycw0KfCBmb3Igb3INCiYmIGZvciBBTkQNCg0KYGBge3J9DQoNCm15ZGF0YVtteWRhdGEkc2V4PT0iTSJ8IG15ZGF0YSRmdW5ueT09Ik1lZCIgJiYgbXlkYXRhJGFnZT4xMCxjKCJwZXJzb24iLCJhZ2UiLCJmdW5ueSIsImFnZS5udW0iKV0NCmBgYA0KDQo=