The Idea

A tibble is a more efficient data.frame. “Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist)”. The result is that any problems are typically dealt with earlier in the analysis leading to cleaner code. Again, the tibble package is part of the Tidyverse group of libraries. It is constantly maintained and the data and analysis that comes from this package can be trusted. As always, there are many cheatsheets for this package - one of which can be found here on the tidyverse website.

The textbook R for Data Science has a chapter on tibbles. It is a great resource for reference.

Tibbles

We begin by loading the tidyverse (or the tibble library). We will work with the iris data set which is built in with R.

library(tidyverse)

To create a tibble from an existing object, we use as_tibble(). This command will coerce a data.frame, list, matrix or table (among other structures) to a tibble.

as_tibble(iris)
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows

The tibble() command can also create a tibble. This one will define \(x\), \(y\) and \(z\). The \(x\) value will be one of the first 5 prime numbers, \(y\) will be 5 no matter the value of \(x\) and \(z\) will be \((x+y)/2\).

dat <- tibble(x = 1:5, y = 5, z = (x+y)/2)
dat
## # A tibble: 5 x 3
##       x     y     z
##   <int> <dbl> <dbl>
## 1     1     5   3  
## 2     2     5   3.5
## 3     3     5   4  
## 4     4     5   4.5
## 5     5     5   5

Tribbles

Using tribble(), we can specify a table row-wise (tribble is short of transposed tibble). Note that we need to specify at least one column name with a tilde (~).

dat2 <- tribble(~x,~y,~z,
               "a",1,2,
               "b",3,NA,
               "c",2.5,15)
dat2
## # A tibble: 3 x 3
##   x         y     z
##   <chr> <dbl> <dbl>
## 1 a       1       2
## 2 b       3      NA
## 3 c       2.5    15

Tibbles from Vectors

To create a new tibble by combining multiple vectors, use data_frame().

# Create
football_data <- data_frame(
  name = c("Aaron Rogers", "Tom Brady", "Sony Michelle", "JJ Watt"),
  age = c(37, 42, 25, 35),
  height = c(180, 170, 165, 185),
  qb = c(TRUE, TRUE, FALSE, FALSE)
)
# Print
football_data
## # A tibble: 4 x 4
##   name            age height qb   
##   <chr>         <dbl>  <dbl> <lgl>
## 1 Aaron Rogers     37    180 TRUE 
## 2 Tom Brady        42    170 TRUE 
## 3 Sony Michelle    25    165 FALSE
## 4 JJ Watt          35    185 FALSE

Subsetting

We can take subsets of the data (rows or columns) in the normal way using square brackets.

dat3 <- tibble(x = 1:5, y = 6:10, z = 11:15)
dat3
## # A tibble: 5 x 3
##       x     y     z
##   <int> <int> <int>
## 1     1     6    11
## 2     2     7    12
## 3     3     8    13
## 4     4     9    14
## 5     5    10    15

We can access a particular element of the table in row \(m\) column \(n\) by referencing the data and appending a square bracket with \(m\) and \(n\) referenced (ie, dat[m,n]). Here, we reference row 1 column 3.

dat3[1,3]
## # A tibble: 1 x 1
##       z
##   <int>
## 1    11

To reference an entire row (or column), simply leave that entry blank. Here is row 2 of the data.

dat[2,]
## # A tibble: 1 x 3
##       x     y     z
##   <int> <dbl> <dbl>
## 1     2     5   3.5

Multiple entries can be referenced using the colon. For example, we reference rows 2 and 3 columns 1 and 2.

dat3[2:3,1:2]
## # A tibble: 2 x 2
##       x     y
##   <int> <int>
## 1     2     7
## 2     3     8

Final Thoughts

Citations

Camm, Jeffrey D. Business Analytics. Third edition, Cengage, 2019.

“Managing Tibbles · UC Business Analytics R Programming Guide.” Accessed May 4, 2021. Available here.

“R for Data Science.” Accessed April 20, 2021. Available here.

Salunkhe, Lalit. “Tibbles in R Programming | Analytics Steps.” Accessed May 4, 2021. Available here.

“Tibble: Build a Data Frame in Tibble: Simple Data Frames.” Accessed May 4, 2021. Available here.