# install.packages('tinytex')
# tinytex::install_tinytex() # install TinyTeX
Data Frames
Data frames are used to store tabular data in R. They are an
important type of object in R and are used in a variety of statistical
modeling applications. Hadley Wickham’s package dplyr has an
optimized set of functions designed to work efficiently with data
frames.
Data frames are represented as a special type of list where every
element of the list has to have the same length. Each element of the
list can be thought of as a column and the length of each element of the
list is the number of rows.
Unlike matrices, data frames can store different classes of objects
in each column. Matrices must have every element be the same class
(e.g. all integers or all numeric).
In addition to column names, indicating the names of the variables or
predictors, data frames have a special attribute called
row.names which indicate information about each row of the
data frame.
Data frames are usually created by reading in a dataset using the
read.table() or read.csv(). However, data
frames can also be created explicitly with the data.frame()
function or they can be coerced from other types of objects like
lists.
Data frames can be converted to a matrix by calling
data.matrix(). While it might seem that the
as.matrix() function should be used to coerce a data frame
to a matrix, almost always, what you want is the result of
data.matrix().
x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
x
nrow(x)
[1] 4
ncol(x)
[1] 2
dim(x)
[1] 4 2
attributes(x)
$names
[1] "foo" "bar"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4
str(x) # structure of x
'data.frame': 4 obs. of 2 variables:
$ foo: int 1 2 3 4
$ bar: logi TRUE TRUE FALSE FALSE
Names
R objects can have names, which is very useful for writing readable
code and self-describing objects. Here is an example of assigning names
to an integer vector.
x <- 1:3
names(x)
NULL
names(x) <- c("New York", "Seattle", "Los Angeles")
x
New York Seattle Los Angeles
1 2 3
names(x)
[1] "New York" "Seattle" "Los Angeles"
Lists can also have names, which is often very useful.
x <- list("Los Angeles" = 1, Boston = 2, London = 3)
x
$`Los Angeles`
[1] 1
$Boston
[1] 2
$London
[1] 3
names(x)
[1] "Los Angeles" "Boston" "London"
x[[1]]
[1] 1
x$`Los Angeles`
[1] 1
Matrices can have both column and row names.
m <- matrix(1:4, nrow = 2)
m
[,1] [,2]
[1,] 1 3
[2,] 2 4
names(m)
NULL
dimnames(m) <- list(c("a", "b"), c("c", "d"))
m
c d
a 1 3
b 2 4
colnames(m)
[1] "c" "d"
rownames(m)
[1] "a" "b"
Column names and row names can be set separately using the
colnames() and rownames() functions.
colnames(m) <- c("h", "f")
rownames(m) <- c("x", "z")
m
h f
x 1 3
z 2 4
Note that for data frames, there is a separate function for setting
the row names, the row.names() function. Also, data frames
do not have column names, they just have names (like lists). So to set
the column names of a data frame just use the names()
function. Yes, I know its confusing. Here’s a quick summary:
| data frame |
names() |
row.names() |
| matrix |
colnames() |
rownames() |
LS0tDQpzdWJ0aXRsZTogIkRhdGEgRnJhbWVzIGFuZCBOYW1lcyINCnRpdGxlOiAiUiBQcm9ncmFtbWluZyBmb3IgRGF0YSBTY2llbmNlIg0KYXV0aG9yOiAiUm9nZXIgRC4gUGVuZyINCmRhdGU6ICJgciBmb3JtYXQoU3lzLkRhdGUoKSwgJyVCICVkLCAlWScpYCINCm91dHB1dDoNCiAgaHRtbF9ub3RlYm9vazogDQogICAgdG9jOiB5ZXMNCiAgICBudW1iZXJfc2VjdGlvbnM6IHllcw0KICBodG1sX2RvY3VtZW50OiANCiAgICB0b2M6IHllcw0KICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQ0KICBwZGZfZG9jdW1lbnQ6DQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZGVwdGg6IDMNCiAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUNCiAgd29yZF9kb2N1bWVudDoNCiAgICB0b2M6IHllcw0KICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQ0KLS0tDQpgYGB7cn0NCiMgaW5zdGFsbC5wYWNrYWdlcygndGlueXRleCcpDQojIHRpbnl0ZXg6Omluc3RhbGxfdGlueXRleCgpICAjIGluc3RhbGwgVGlueVRlWA0KYGBgDQoNCiMgUmVmZXJlbmNlIHstfQ0KDQpbUiBQcm9ncmFtbWluZyBmb3IgRGF0YSBTY2llbmNlXShodHRwczovL2Jvb2tkb3duLm9yZy9yZHBlbmcvcnByb2dkYXRhc2NpZW5jZS8pDQpieSBSb2dlciBELiBQZW5nLCBNYXkgMzEsIDIwMjINCg0KIVtdKGJvb2tjb3Zlci5wbmcpDQoNCiFbQ2xhc3MgUGljdHVyZSBvbiBTZXB0ZW1iZXIgOCwgMjAyMl0oQ2xhc3NQaWN0dXJlXzIwMjItMDktMDgucG5nKQ0KDQojIERhdGEgRnJhbWVzDQoNCkRhdGEgZnJhbWVzIGFyZSB1c2VkIHRvIHN0b3JlIHRhYnVsYXIgZGF0YSBpbiBSLiBUaGV5IGFyZSBhbiBpbXBvcnRhbnQgdHlwZSBvZiBvYmplY3QgaW4gUiBhbmQgYXJlIHVzZWQgaW4gYSB2YXJpZXR5IG9mIHN0YXRpc3RpY2FsIG1vZGVsaW5nIGFwcGxpY2F0aW9ucy4gSGFkbGV5IFdpY2toYW3igJlzIHBhY2thZ2UgW2BkcGx5cmBdKGh0dHBzOi8vZ2l0aHViLmNvbS9oYWRsZXkvZHBseXIpIGhhcyBhbiBvcHRpbWl6ZWQgc2V0IG9mIGZ1bmN0aW9ucyBkZXNpZ25lZCB0byB3b3JrIGVmZmljaWVudGx5IHdpdGggZGF0YSBmcmFtZXMuDQoNCkRhdGEgZnJhbWVzIGFyZSByZXByZXNlbnRlZCBhcyBhIHNwZWNpYWwgdHlwZSBvZiBsaXN0IHdoZXJlIGV2ZXJ5IGVsZW1lbnQgb2YgdGhlIGxpc3QgaGFzIHRvIGhhdmUgdGhlIHNhbWUgbGVuZ3RoLiBFYWNoIGVsZW1lbnQgb2YgdGhlIGxpc3QgY2FuIGJlIHRob3VnaHQgb2YgYXMgYSBjb2x1bW4gYW5kIHRoZSBsZW5ndGggb2YgZWFjaCBlbGVtZW50IG9mIHRoZSBsaXN0IGlzIHRoZSBudW1iZXIgb2Ygcm93cy4NCg0KVW5saWtlIG1hdHJpY2VzLCBkYXRhIGZyYW1lcyBjYW4gc3RvcmUgZGlmZmVyZW50IGNsYXNzZXMgb2Ygb2JqZWN0cyBpbiBlYWNoIGNvbHVtbi4gTWF0cmljZXMgbXVzdCBoYXZlIGV2ZXJ5IGVsZW1lbnQgYmUgdGhlIHNhbWUgY2xhc3MgKGUuZy4gYWxsIGludGVnZXJzIG9yIGFsbCBudW1lcmljKS4NCg0KSW4gYWRkaXRpb24gdG8gY29sdW1uIG5hbWVzLCBpbmRpY2F0aW5nIHRoZSBuYW1lcyBvZiB0aGUgdmFyaWFibGVzIG9yIHByZWRpY3RvcnMsIGRhdGEgZnJhbWVzIGhhdmUgYSBzcGVjaWFsIGF0dHJpYnV0ZSBjYWxsZWQgYHJvdy5uYW1lc2Agd2hpY2ggaW5kaWNhdGUgaW5mb3JtYXRpb24gYWJvdXQgZWFjaCByb3cgb2YgdGhlIGRhdGEgZnJhbWUuDQoNCkRhdGEgZnJhbWVzIGFyZSB1c3VhbGx5IGNyZWF0ZWQgYnkgcmVhZGluZyBpbiBhIGRhdGFzZXQgdXNpbmcgdGhlIGByZWFkLnRhYmxlKClgIG9yIGByZWFkLmNzdigpYC4gSG93ZXZlciwgZGF0YSBmcmFtZXMgY2FuIGFsc28gYmUgY3JlYXRlZCBleHBsaWNpdGx5IHdpdGggdGhlIGBkYXRhLmZyYW1lKClgIGZ1bmN0aW9uIG9yIHRoZXkgY2FuIGJlIGNvZXJjZWQgZnJvbSBvdGhlciB0eXBlcyBvZiBvYmplY3RzIGxpa2UgbGlzdHMuDQoNCkRhdGEgZnJhbWVzIGNhbiBiZSBjb252ZXJ0ZWQgdG8gYSBtYXRyaXggYnkgY2FsbGluZyBgZGF0YS5tYXRyaXgoKWAuIFdoaWxlIGl0IG1pZ2h0IHNlZW0gdGhhdCB0aGUgYGFzLm1hdHJpeCgpYCBmdW5jdGlvbiBzaG91bGQgYmUgdXNlZCB0byBjb2VyY2UgYSBkYXRhIGZyYW1lIHRvIGEgbWF0cml4LCBhbG1vc3QgYWx3YXlzLCB3aGF0IHlvdSB3YW50IGlzIHRoZSByZXN1bHQgb2YgYGRhdGEubWF0cml4KClgLg0KDQpgYGB7cn0NCnggPC0gZGF0YS5mcmFtZShmb28gPSAxOjQsIGJhciA9IGMoVCwgVCwgRiwgRikpIA0KeA0KYGBgDQpgYGB7cn0NCm5yb3coeCkNCm5jb2woeCkNCmBgYA0KDQpgYGB7cn0NCmRpbSh4KQ0KYGBgDQoNCmBgYHtyfQ0KYXR0cmlidXRlcyh4KQ0KYGBgDQoNCmBgYHtyfQ0Kc3RyKHgpICMgc3RydWN0dXJlIG9mIHgNCmBgYA0KDQojIE5hbWVzDQoNClIgb2JqZWN0cyBjYW4gaGF2ZSBuYW1lcywgd2hpY2ggaXMgdmVyeSB1c2VmdWwgZm9yIHdyaXRpbmcgcmVhZGFibGUgY29kZSBhbmQgc2VsZi1kZXNjcmliaW5nIG9iamVjdHMuIEhlcmUgaXMgYW4gZXhhbXBsZSBvZiBhc3NpZ25pbmcgbmFtZXMgdG8gYW4gaW50ZWdlciB2ZWN0b3IuDQoNCg0KYGBge3J9DQp4IDwtIDE6Mw0KbmFtZXMoeCkNCmBgYA0KDQpgYGB7cn0NCm5hbWVzKHgpIDwtIGMoIk5ldyBZb3JrIiwgIlNlYXR0bGUiLCAiTG9zIEFuZ2VsZXMiKSANCngNCmBgYA0KYGBge3J9DQpuYW1lcyh4KQ0KYGBgDQoNCkxpc3RzIGNhbiBhbHNvIGhhdmUgbmFtZXMsIHdoaWNoIGlzIG9mdGVuIHZlcnkgdXNlZnVsLg0KDQpgYGB7cn0NCnggPC0gbGlzdCgiTG9zIEFuZ2VsZXMiID0gMSwgQm9zdG9uID0gMiwgTG9uZG9uID0gMykgDQp4DQpgYGANCg0KYGBge3J9DQpuYW1lcyh4KQ0KYGBgDQpgYGB7cn0NCnhbWzFdXQ0KYGBgDQoNCmBgYHtyfQ0KeCRgTG9zIEFuZ2VsZXNgDQpgYGANCg0KTWF0cmljZXMgY2FuIGhhdmUgYm90aCBjb2x1bW4gYW5kIHJvdyBuYW1lcy4NCg0KYGBge3J9DQptIDwtIG1hdHJpeCgxOjQsIG5yb3cgPSAyKQ0KbQ0KYGBgDQoNCmBgYHtyfQ0KbmFtZXMobSkNCmBgYA0KDQpgYGB7cn0NCmRpbW5hbWVzKG0pIDwtIGxpc3QoYygiYSIsICJiIiksIGMoImMiLCAiZCIpKSANCm0NCmBgYA0KDQpgYGB7cn0NCmNvbG5hbWVzKG0pDQpgYGANCg0KYGBge3J9DQpyb3duYW1lcyhtKQ0KYGBgDQoNCkNvbHVtbiBuYW1lcyBhbmQgcm93IG5hbWVzIGNhbiBiZSBzZXQgc2VwYXJhdGVseSB1c2luZyB0aGUgYGNvbG5hbWVzKClgIGFuZCBgcm93bmFtZXMoKWAgZnVuY3Rpb25zLg0KDQpgYGB7cn0NCmNvbG5hbWVzKG0pIDwtIGMoImgiLCAiZiIpDQpyb3duYW1lcyhtKSA8LSBjKCJ4IiwgInoiKQ0KbQ0KYGBgDQoNCk5vdGUgdGhhdCBmb3IgZGF0YSBmcmFtZXMsIHRoZXJlIGlzIGEgc2VwYXJhdGUgZnVuY3Rpb24gZm9yIHNldHRpbmcgdGhlIHJvdyBuYW1lcywgdGhlIGByb3cubmFtZXMoKWAgZnVuY3Rpb24uIEFsc28sIGRhdGEgZnJhbWVzIGRvIG5vdCBoYXZlIGNvbHVtbiBuYW1lcywgdGhleSBqdXN0IGhhdmUgbmFtZXMgKGxpa2UgbGlzdHMpLiBTbyB0byBzZXQgdGhlIGNvbHVtbiBuYW1lcyBvZiBhIGRhdGEgZnJhbWUganVzdCB1c2UgdGhlIGBuYW1lcygpYCBmdW5jdGlvbi4gWWVzLCBJIGtub3cgaXRzIGNvbmZ1c2luZy4gSGVyZeKAmXMgYSBxdWljayBzdW1tYXJ5Og0KDQoNCg0KfCBPYmplY3QgICAgIHwgU2V0IGNvbHVtbiBuYW1lcyB8IFNldCByb3cgbmFtZXMgfA0KLS0tLS0tLS0tLS0tLXwtLS0tLS0tLS0tLS0tLS0tLS18LS0tLS0tLS0tLS0tLS0tfA0KfCBkYXRhIGZyYW1lIHwgYG5hbWVzKClgICAgICAgICB8IGByb3cubmFtZXMoKWAgfA0KfCBtYXRyaXggICAgIHwgIGBjb2xuYW1lcygpYCAgICB8IGByb3duYW1lcygpYCAgfA0KDQo=