Tibbles

What are tibbles?

Tibbles are a modern version of dataframes.
They take the best features data.frame() and disgard the worst
Tibbles contain a tbl_df and tbl classes.
- They are a subclass of data.frame.
- Needed to change the behavior of data.frame

Features

It never changes the type of inputs (converts strings to factors)
It never creates row names
Allows for non-syntactic names
Nice printing of data frames
Vector recycling
Sequential evaluation
Easy creation and use of list-columns

Why the name tibble?

@JennyBryan ‘tibble-diff’ has a certain charm :) Sept 23, 2014

Creating tibbles

tibble(): Creates a tibble similar to data.frame()
as_tibble(): Coerces an existing object (dataframe, table, matrix) into a tibble.

Sequential evaluation

Sequential evaluation is when you can create a variable that is dependent on another variable that is being created simultaneously.

data.frame(x = 1:3, y = 15:17, z = x*y)

Error in data.frame(x = 1:3, y = 15:17, z = x * y): object 'x' not found

# Create a tibble
tibble(x = 1:3, y = 15:17, z = x*y, 
       listcol = list(1:5, seq(5, 10, 1), rep(1, 3)))

# A tibble: 3 x 4
      x     y     z listcol  
  <int> <int> <int> <list>   
1     1    15    15 <int [5]>
2     2    16    32 <dbl [6]>
3     3    17    51 <dbl [3]>

List-columns and indexing

Dataframes can store more than atomic vectors. They can store anything including lists. A list-column is a column in a dataframe that is a list. You can inspect, index and compute operations on list-columns. Tibbles are still indexed the same way as data.frame.

# Tibble with list-column
my_tibble <- tibble(x = 1:3, y = 15:17, z = x*y, 
       listcol = list(1:5, seq(5, 10, 1), rep(1, 3)))
       
my_tibble$listcol

[[1]]
[1] 1 2 3 4 5

[[2]]
[1]  5  6  7  8  9 10

[[3]]
[1] 1 1 1

my_tibble$listcol[[1]]

[1] 1 2 3 4 5

my_tibble$listcol[[1]][[5]]

[1] 5

my_tibble$x

[1] 1 2 3

my_tibble[['y']]

[1] 15 16 17

Vector Recycling

Vector recycling is the

# Create a tibble with vector recycling
tibble(x = 1:3, y = 15:17, z = x*y,
       listcol = list(1:5, seq(5, 10, 1), rep(1, 3)),
       one_col = 1)

# A tibble: 3 x 5
      x     y     z listcol   one_col
  <int> <int> <int> <list>      <dbl>
1     1    15    15 <int [5]>       1
2     2    16    32 <dbl [6]>       1
3     3    17    51 <dbl [3]>       1

Non-syntactic Names

Tibbles allow for variable names that are not valid in R.

# tibble's column naming conventions 
tibble(`0 1 Normal distribution` = rnorm(5), 
       `:)` = rep('Smile',5))

# A tibble: 5 x 2
  `0 1 Normal distribution` `:)` 
                      <dbl> <chr>
1                    0.827  Smile
2                   -0.764  Smile
3                   -0.978  Smile
4                   -0.468  Smile
5                   -0.0950 Smile

Printing

Tibbles also differ from data.frame with the way it prints output.

Only the first 10 rows will print and all the columns that will fit on a screen
Each column will display its data type.
Default print behavior can be changed using options(tibble.print_max, tibble.print_min)

# Show off tibble's printing of a large dataframe
df <- tibble(x = 1:1000, y = 2, z = x*y)
print(df)

# A tibble: 1,000 x 3
       x     y     z
   <int> <dbl> <dbl>
 1     1     2     2
 2     2     2     4
 3     3     2     6
 4     4     2     8
 5     5     2    10
 6     6     2    12
 7     7     2    14
 8     8     2    16
 9     9     2    18
10    10     2    20
# ... with 990 more rows

# Class of the tibble
class(df)

[1] "tbl_df"     "tbl"        "data.frame"

Other Useful Functions

add_column(): Add one or more columns to existing tibble
add_row(): Add one or more rows to existing tibble
row_names_to_column(): Convert row names to column
rowid_to_column(): Create a column of IDs using row indicies

# Load ISLR library
library(ISLR)

# Show College data
head(College)

                             Private Apps Accept Enroll Top10perc
Abilene Christian University     Yes 1660   1232    721        23
Adelphi University               Yes 2186   1924    512        16
Adrian College                   Yes 1428   1097    336        22
Agnes Scott College              Yes  417    349    137        60
Alaska Pacific University        Yes  193    146     55        16
Albertson College                Yes  587    479    158        38
                             Top25perc F.Undergrad P.Undergrad Outstate
Abilene Christian University        52        2885         537     7440
Adelphi University                  29        2683        1227    12280
Adrian College                      50        1036          99    11250
Agnes Scott College                 89         510          63    12960
Alaska Pacific University           44         249         869     7560
Albertson College                   62         678          41    13500
                             Room.Board Books Personal PhD Terminal
Abilene Christian University       3300   450     2200  70       78
Adelphi University                 6450   750     1500  29       30
Adrian College                     3750   400     1165  53       66
Agnes Scott College                5450   450      875  92       97
Alaska Pacific University          4120   800     1500  76       72
Albertson College                  3335   500      675  67       73
                             S.F.Ratio perc.alumni Expend Grad.Rate
Abilene Christian University      18.1          12   7041        60
Adelphi University                12.2          16  10527        56
Adrian College                    12.9          30   8735        54
Agnes Scott College                7.7          37  19016        59
Alaska Pacific University         11.9           2  10922        15
Albertson College                  9.4          11   9727        55

# Convert to tibble and change rownames to a column 

as_tibble(rownames_to_column(College, 'College'))

# A tibble: 777 x 19
   College Private  Apps Accept Enroll Top10perc Top25perc F.Undergrad
   <chr>   <fct>   <dbl>  <dbl>  <dbl>     <dbl>     <dbl>       <dbl>
 1 Abilen~ Yes      1660   1232    721        23        52        2885
 2 Adelph~ Yes      2186   1924    512        16        29        2683
 3 Adrian~ Yes      1428   1097    336        22        50        1036
 4 Agnes ~ Yes       417    349    137        60        89         510
 5 Alaska~ Yes       193    146     55        16        44         249
 6 Albert~ Yes       587    479    158        38        62         678
 7 Albert~ Yes       353    340    103        17        45         416
 8 Albion~ Yes      1899   1720    489        37        68        1594
 9 Albrig~ Yes      1038    839    227        30        63         973
10 Alders~ Yes       582    498    172        21        44         799
# ... with 767 more rows, and 11 more variables: P.Undergrad <dbl>,
#   Outstate <dbl>, Room.Board <dbl>, Books <dbl>, Personal <dbl>,
#   PhD <dbl>, Terminal <dbl>, S.F.Ratio <dbl>, perc.alumni <dbl>,
#   Expend <dbl>, Grad.Rate <dbl>