A tibble is a more efficient data.frame. “Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist)”. The result is that any problems are typically dealt with earlier in the analysis leading to cleaner code. Again, the tibble package is part of the Tidyverse group of libraries. It is constantly maintained and the data and analysis that comes from this package can be trusted. As always, there are many cheatsheets for this package - one of which can be found here on the tidyverse website.
The textbook R for Data Science has a chapter on tibbles. It is a great resource for reference.
We begin by loading the tidyverse (or the tibble library). We will work with the iris data set which is built in with R.
library(tidyverse)
To create a tibble from an existing object, we use as_tibble(). This command will coerce a data.frame, list, matrix or table (among other structures) to a tibble.
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
The tibble() command can also create a tibble. This one will define \(x\), \(y\) and \(z\). The \(x\) value will be one of the first 5 prime numbers, \(y\) will be 5 no matter the value of \(x\) and \(z\) will be \((x+y)/2\).
dat <- tibble(x = 1:5, y = 5, z = (x+y)/2)
dat
## # A tibble: 5 x 3
## x y z
## <int> <dbl> <dbl>
## 1 1 5 3
## 2 2 5 3.5
## 3 3 5 4
## 4 4 5 4.5
## 5 5 5 5
Using tribble(), we can specify a table row-wise (tribble is short of transposed tibble). Note that we need to specify at least one column name with a tilde (~).
dat2 <- tribble(~x,~y,~z,
"a",1,2,
"b",3,NA,
"c",2.5,15)
dat2
## # A tibble: 3 x 3
## x y z
## <chr> <dbl> <dbl>
## 1 a 1 2
## 2 b 3 NA
## 3 c 2.5 15
To create a new tibble by combining multiple vectors, use data_frame().
# Create
football_data <- data_frame(
name = c("Aaron Rogers", "Tom Brady", "Sony Michelle", "JJ Watt"),
age = c(37, 42, 25, 35),
height = c(180, 170, 165, 185),
qb = c(TRUE, TRUE, FALSE, FALSE)
)
# Print
football_data
## # A tibble: 4 x 4
## name age height qb
## <chr> <dbl> <dbl> <lgl>
## 1 Aaron Rogers 37 180 TRUE
## 2 Tom Brady 42 170 TRUE
## 3 Sony Michelle 25 165 FALSE
## 4 JJ Watt 35 185 FALSE
We can take subsets of the data (rows or columns) in the normal way using square brackets.
dat3 <- tibble(x = 1:5, y = 6:10, z = 11:15)
dat3
## # A tibble: 5 x 3
## x y z
## <int> <int> <int>
## 1 1 6 11
## 2 2 7 12
## 3 3 8 13
## 4 4 9 14
## 5 5 10 15
We can access a particular element of the table in row \(m\) column \(n\) by referencing the data and appending a square bracket with \(m\) and \(n\) referenced (ie, dat[m,n]). Here, we reference row 1 column 3.
dat3[1,3]
## # A tibble: 1 x 1
## z
## <int>
## 1 11
To reference an entire row (or column), simply leave that entry blank. Here is row 2 of the data.
dat[2,]
## # A tibble: 1 x 3
## x y z
## <int> <dbl> <dbl>
## 1 2 5 3.5
Multiple entries can be referenced using the colon. For example, we reference rows 2 and 3 columns 1 and 2.
dat3[2:3,1:2]
## # A tibble: 2 x 2
## x y
## <int> <int>
## 1 2 7
## 2 3 8
Camm, Jeffrey D. Business Analytics. Third edition, Cengage, 2019.
“Managing Tibbles · UC Business Analytics R Programming Guide.” Accessed May 4, 2021. Available here.
“R for Data Science.” Accessed April 20, 2021. Available here.
Salunkhe, Lalit. “Tibbles in R Programming | Analytics Steps.” Accessed May 4, 2021. Available here.
“Tibble: Build a Data Frame in Tibble: Simple Data Frames.” Accessed May 4, 2021. Available here.