Lists can be used to group any mix of R structures and object. A single list could contain a numberic matrix, a logical array, a single character string, and a factor object.
To create a list populate the elements just like a vector.
foo <- list(matrix(data=1:4, nrow=2,ncol=2),c(T,F,T,T),"hello")
foo
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
[[2]]
[1] TRUE FALSE TRUE TRUE
[[3]]
[1] "hello"
You can use length function to check the number of components in a list
length(x=foo)
[1] 3
You can retrieve components from the list using indexes [] (also called “Member Reference”)
foo[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
foo[[3]]
[1] "hello"
You can treat it just like any ordinary object
foo[[1]]+5.5
[,1] [,2]
[1,] 6.5 8.5
[2,] 7.5 9.5
foo[[1]][1,2]
[1] 3
foo[[1]][2,]
[1] 2 4
cat(foo[[3]], 'you!')
hello you!
Use the assignment operator ‘<-’ to overwrite a member of the list
# print out the current content
foo[[3]]
[1] "hello"
# overwrite it
foo[[3]] <- paste(foo[[3]],"you")
# print out the new content
foo[[3]]
[1] "hello you"
List slicing when you want to reference multiple list items at once Use a single square bracket instead of the double brackets
# this doesnt work because we used double brackets
foo[[c(2,3)]]
[1] TRUE
# this will work because we used a single bracket
bar <- foo[c(2,3)]
bar
[[1]]
[1] TRUE FALSE TRUE TRUE
[[2]]
[1] "hello you"
We can provide field names to the list. Names are attributes in R.
names(foo)<-c("mymatrix", "mylogicals","mystring")
foo
$mymatrix
[,1] [,2]
[1,] 1 3
[2,] 2 4
$mylogicals
[1] TRUE FALSE TRUE TRUE
$mystring
[1] "hello you"
We can now use the names to reference the members.
# this is the same as foo[[1]]
foo$mymatrix
[,1] [,2]
[1,] 1 3
[2,] 2 4
# subsetting members work the same way too
all(foo$mymatrix[,2]==foo[[1]][,2])
[1] TRUE
You can save a step by creating the labels on the list as you create the list
baz <-list(tom=c(foo[[2]],T,T,T,F),dick="g'day mate",harry=foo$mymatrix*2)
baz
$tom
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
$dick
[1] "g'day mate"
$harry
[,1] [,2]
[1,] 2 6
[2,] 4 8
To rename these members:
names(baz)<-c("wilson","jane","john")
baz
$wilson
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
$jane
[1] "g'day mate"
$john
[,1] [,2]
[1,] 2 6
[2,] 4 8
You can add a component to the list by using the ‘$’ symbol
# existing baz
baz
$wilson
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
$jane
[1] "g'day mate"
$john
[,1] [,2]
[1,] 2 6
[2,] 4 8
# add a new component
baz$jenny <-foo
baz
$wilson
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
$jane
[1] "g'day mate"
$john
[,1] [,2]
[1,] 2 6
[2,] 4 8
$jenny
$jenny$mymatrix
[,1] [,2]
[1,] 1 3
[2,] 2 4
$jenny$mylogicals
[1] TRUE FALSE TRUE TRUE
$jenny$mystring
[1] "hello you"
Now to access this nested list that was created:
# these commands produce the same results.
baz$jenny$mylogicals[1:3]
[1] TRUE FALSE TRUE
baz[[4]][[2]][1:3]
[1] TRUE FALSE TRUE
baz[[4]]$mylogicals[1:3]
[1] TRUE FALSE TRUE
Data frames are like list but with some sort of rules in them. The members must all be vectors of equal length.
Use the data.frame function to create a data frame from scratch. Each row in a data frame is a ‘record’ Each column is a ‘variable’.
# Create a data frame
mydata <- data.frame(person=c("Peter","Lois","Meg","Chris","Stewie"), age=c(42,40,17,14,1),sex=factor(c("M","F","F","M","M")))
# print the data frame
mydata
You can refer to portions of the data by specifying the row and column
# get the name of the 2nd person
mydata[2,1]
[1] Lois
Levels: Chris Lois Meg Peter Stewie
# get the age of the 2nd person
mydata[2,2]
[1] 40
# get the sex of the first three persons
mydata[1:3,3]
[1] M F F
Levels: F M
# get the entire 3rd and first columns
mydata[,c(3,1)]
# get the entire data frame but differnet order
mydata[,c(3:1)]
You can use the names of the vectors that were passed to data.frame to acces variables even if you don’t know their column index positions.
mydata$age
[1] 42 40 17 14 1
mydata$person
[1] Peter Lois Meg Chris Stewie
Levels: Chris Lois Meg Peter Stewie
mydata$sex
[1] M F F M M
Levels: F M
# get only the age of the 2nd record
mydata$age[2]
[1] 40
To find the size of the data frame use the nrow and ncol and dim command
nrow(mydata)
[1] 5
ncol(mydata)
[1] 3
dim(mydata)
[1] 5 3
Notice that Person has been automatically converted into a factor. To prevent this add the command " stringAsFactors=FALSE"
# see that there are levels
mydata$person
[1] "Peter" "Lois" "Meg" "Chris" "Stewie" "Brian" "Brian"
# now recreate the data frame wit the stringsAsFactors=FALSE
mydata <- data.frame(person=c("Peter","Lois","Meg","Chris","Stewie"), age=c(42,40,17,14,1),sex=factor(c("M","F","F","M","M")),stringsAsFactors = FALSE)
# Now print the new data
mydata
Create a new data frame and then use rbind function to append it
# print the exiting data
mydata
# create new data frame
newrecord <-data.frame(person="Brian", age=7, sex=factor("M",levels=levels(mydata$sex)))
# now add it to mydata
mydata <- rbind(mydata,newrecord)
mydata
Create a new column using cbind
# print existing mydata
mydata
# create new column values
funny <-c("High","High","Low","Med","High","Med")
funny <- factor(x=funny, levels=c("Low","Med","High"))
# now let us add the column
mydata <-cbind(mydata,funny)
# now see if the funny column is added
mydata
You can still use $ to address a specific member to create additional columns. In the example, we list the age in months not years.
mydata$age.num <- mydata$age*12
mydata
Listing only the males in the data frames
# use logical comparison
mydata$sex=="M"
[1] TRUE FALSE FALSE TRUE TRUE TRUE
# use this comparison inside the data frame
mydata[mydata$sex=="M",]
# we can output the records without the sex column by using negative sign
mydata[mydata$sex=="M",-3]
Or use the character vector names of the variables names instead.
# we can output the records without the sex column by using negative sign
mydata[mydata$sex=="M",c("person","age","funny","age.num")]
Combining logical operators | for or && for AND
mydata[mydata$sex=="M"| mydata$funny=="Med" && mydata$age>10,c("person","age","funny","age.num")]