Introduction

In August 2014, I created a 40-minute video tutorial introducing the key functionality of the dplyr package in R, using dplyr version 0.2. Since then, there have been two significant updates to dplyr (0.3 and 0.4), introducing a ton of new features.

This document (created in March 2015) covers the most useful new features in 0.3 and 0.4, as well as other functionality that I didn’t cover last time (though it is not necessarily new). My new video tutorial walks through the code below in detail.

If you have not watched the previous tutorial, I recommend you do so first since it covers some dplyr basics that will not be covered in this tutorial.

Loading dplyr and the nycflights13 dataset

Although my last tutorial used data from the hflights package, Hadley Wickham has rewritten the dplyr vignettes to use the nycflights13 package instead, and so I’m also using nycflights13 for the sake of consistency.

# remove flights data if you just finished my previous tutorial
rm(flights)
# load packages
suppressMessages(library(dplyr))
library(nycflights13)

# print the flights dataset from nycflights13
flights
## Source: local data frame [336,776 x 16]
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      517         2      830        11      UA  N14228
## 2  2013     1   1      533         4      850        20      UA  N24211
## 3  2013     1   1      542         2      923        33      AA  N619AA
## 4  2013     1   1      544        -1     1004       -18      B6  N804JB
## 5  2013     1   1      554        -6      812       -25      DL  N668DN
## 6  2013     1   1      554        -4      740        12      UA  N39463
## 7  2013     1   1      555        -5      913        19      B6  N516JB
## 8  2013     1   1      557        -3      709       -14      EV  N829AS
## 9  2013     1   1      557        -3      838        -8      B6  N593JB
## 10 2013     1   1      558        -2      753         8      AA  N3ALAA
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)

Choosing columns: select, rename

# besides just using select() to pick columns...
flights %>% select(carrier, flight)
## Source: local data frame [336,776 x 2]
## 
##    carrier flight
## 1       UA   1545
## 2       UA   1714
## 3       AA   1141
## 4       B6    725
## 5       DL    461
## 6       UA   1696
## 7       B6    507
## 8       EV   5708
## 9       B6     79
## 10      AA    301
## ..     ...    ...
# ...you can use the minus sign to hide columns
flights %>% select(-month, -day)
## Source: local data frame [336,776 x 14]
## 
##    year dep_time dep_delay arr_time arr_delay carrier tailnum flight
## 1  2013      517         2      830        11      UA  N14228   1545
## 2  2013      533         4      850        20      UA  N24211   1714
## 3  2013      542         2      923        33      AA  N619AA   1141
## 4  2013      544        -1     1004       -18      B6  N804JB    725
## 5  2013      554        -6      812       -25      DL  N668DN    461
## 6  2013      554        -4      740        12      UA  N39463   1696
## 7  2013      555        -5      913        19      B6  N516JB    507
## 8  2013      557        -3      709       -14      EV  N829AS   5708
## 9  2013      557        -3      838        -8      B6  N593JB     79
## 10 2013      558        -2      753         8      AA  N3ALAA    301
## ..  ...      ...       ...      ...       ...     ...     ...    ...
## Variables not shown: origin (chr), dest (chr), air_time (dbl), distance
##   (dbl), hour (dbl), minute (dbl)
# hide a range of columns
flights %>% select(-(dep_time:arr_delay))

# hide any column with a matching name
flights %>% select(-contains("time"))
# pick columns using a character vector of column names
cols <- c("carrier", "flight", "tailnum")
flights %>% select(one_of(cols))
## Source: local data frame [336,776 x 3]
## 
##    carrier flight tailnum
## 1       UA   1545  N14228
## 2       UA   1714  N24211
## 3       AA   1141  N619AA
## 4       B6    725  N804JB
## 5       DL    461  N668DN
## 6       UA   1696  N39463
## 7       B6    507  N516JB
## 8       EV   5708  N829AS
## 9       B6     79  N593JB
## 10      AA    301  N3ALAA
## ..     ...    ...     ...
# select() can be used to rename columns, though all columns not mentioned are dropped
flights %>% select(tail = tailnum)
## Source: local data frame [336,776 x 1]
## 
##      tail
## 1  N14228
## 2  N24211
## 3  N619AA
## 4  N804JB
## 5  N668DN
## 6  N39463
## 7  N516JB
## 8  N829AS
## 9  N593JB
## 10 N3ALAA
## ..    ...
# rename() does the same thing, except all columns not mentioned are kept
flights %>% rename(tail = tailnum)
## Source: local data frame [336,776 x 16]
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier   tail
## 1  2013     1   1      517         2      830        11      UA N14228
## 2  2013     1   1      533         4      850        20      UA N24211
## 3  2013     1   1      542         2      923        33      AA N619AA
## 4  2013     1   1      544        -1     1004       -18      B6 N804JB
## 5  2013     1   1      554        -6      812       -25      DL N668DN
## 6  2013     1   1      554        -4      740        12      UA N39463
## 7  2013     1   1      555        -5      913        19      B6 N516JB
## 8  2013     1   1      557        -3      709       -14      EV N829AS
## 9  2013     1   1      557        -3      838        -8      B6 N593JB
## 10 2013     1   1      558        -2      753         8      AA N3ALAA
## ..  ...   ... ...      ...       ...      ...       ...     ...    ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)

Choosing rows: filter, between, slice, sample_n, top_n, distinct

# filter() supports the use of multiple conditions
flights %>% filter(dep_time >= 600, dep_time <= 605)
## Source: local data frame [2,460 x 16]
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      600         0      851        -7      B6  N595JB
## 2  2013     1   1      600         0      837        12      MQ  N542MQ
## 3  2013     1   1      601         1      844        -6      B6  N644JB
## 4  2013     1   1      602        -8      812        -8      DL  N971DL
## 5  2013     1   1      602        -3      821        16      MQ  N730MQ
## 6  2013     1   2      600         0      814        25      EV  N13914
## 7  2013     1   2      600        -5      751       -27      EV  N760EV
## 8  2013     1   2      600         0      819         4      9E  N8946A
## 9  2013     1   2      600         0      846         0      B6  N529JB
## 10 2013     1   2      600         0      737        12      WN  N8311Q
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)
# between() is a concise alternative for determing if numeric values fall in a range
flights %>% filter(between(dep_time, 600, 605))

# side note: is.na() can also be useful when filtering
flights %>% filter(!is.na(dep_time))
# slice() filters rows by position
flights %>% slice(1000:1005)
## Source: local data frame [6 x 16]
## 
##   year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013     1   2      809        -1      950         2      B6  N304JB
## 2 2013     1   2      810        10     1008        -6      DL  N358NW
## 3 2013     1   2      811        -4     1100         4      DL  N328NW
## 4 2013     1   2      811        -4     1126        -5      DL  N305DQ
## 5 2013     1   2      811        -9      944       -11      MQ  N509MQ
## 6 2013     1   2      815         0     1109       -19      DL  N335NW
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)
# keep the first three rows within each group
flights %>% group_by(month, day) %>% slice(1:3)
## Source: local data frame [1,095 x 16]
## Groups: month, day
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      517         2      830        11      UA  N14228
## 2  2013     1   1      533         4      850        20      UA  N24211
## 3  2013     1   1      542         2      923        33      AA  N619AA
## 4  2013     1   2       42        43      518        36      B6  N580JB
## 5  2013     1   2      126       156      233       154      B6  N636JB
## 6  2013     1   2      458        -2      703        13      US  N162UW
## 7  2013     1   3       32        33      504        22      B6  N763JB
## 8  2013     1   3       50       185      203       172      B6  N329JB
## 9  2013     1   3      235       156      700       143      B6  N618JB
## 10 2013     1   4       25        26      505        23      B6  N554JB
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)
# sample three rows from each group
flights %>% group_by(month, day) %>% sample_n(3)
## Source: local data frame [1,095 x 16]
## Groups: month, day
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1     1521         6     1830         7      DL  N378NW
## 2  2013     1   1     1854        24     2055        40      MQ  N518MQ
## 3  2013     1   1     1915        -5     2238       -19      DL  N633DL
## 4  2013     1   2     1550         0     1926         1      DL  N633DL
## 5  2013     1   2     1806         5     2140         3      UA  N12116
## 6  2013     1   2      925         5     1124        16      B6  N239JB
## 7  2013     1   3     2154        -1       50         9      B6  N508JB
## 8  2013     1   3     1448        -2     1631        -9      MQ  N835MQ
## 9  2013     1   3     1651         1     1843       -53      UA  N423UA
## 10 2013     1   4     2203        88     2309        87      EV  N22909
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)
# keep three rows from each group with the top dep_delay
flights %>% group_by(month, day) %>% top_n(3, dep_delay)
## Source: local data frame [1,108 x 16]
## Groups: month, day
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      848       853     1001       851      MQ  N942MQ
## 2  2013     1   1     1815       290     2120       338      EV  N17185
## 3  2013     1   1     2343       379      314       456      EV  N21197
## 4  2013     1   2     1412       334     1710       323      UA  N474UA
## 5  2013     1   2     1607       337     2003       368      AA  N324AA
## 6  2013     1   2     2131       379     2340       359      UA  N593UA
## 7  2013     1   3     2008       268     2339       270      DL  N338NW
## 8  2013     1   3     2012       252     2314       257      B6  N558JB
## 9  2013     1   3     2056       291     2239       285      9E  N928XJ
## 10 2013     1   4     2058       208        2       172      B6  N523JB
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)
# also sort by dep_delay within each group
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>% arrange(desc(dep_delay))
## Source: local data frame [1,108 x 16]
## Groups: month, day
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      848       853     1001       851      MQ  N942MQ
## 2  2013     1   1     2343       379      314       456      EV  N21197
## 3  2013     1   1     1815       290     2120       338      EV  N17185
## 4  2013     1   2     2131       379     2340       359      UA  N593UA
## 5  2013     1   2     1607       337     2003       368      AA  N324AA
## 6  2013     1   2     1412       334     1710       323      UA  N474UA
## 7  2013     1   3     2056       291     2239       285      9E  N928XJ
## 8  2013     1   3     2008       268     2339       270      DL  N338NW
## 9  2013     1   3     2012       252     2314       257      B6  N558JB
## 10 2013     1   4     2123       288     2332       276      EV  N29917
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)
# unique rows can be identified using unique() from base R
flights %>% select(origin, dest) %>% unique()
## Source: local data frame [224 x 2]
## 
##    origin dest
## 1     EWR  IAH
## 2     LGA  IAH
## 3     JFK  MIA
## 4     JFK  BQN
## 5     LGA  ATL
## 6     EWR  ORD
## 7     EWR  FLL
## 8     LGA  IAD
## 9     JFK  MCO
## 10    LGA  ORD
## ..    ...  ...
# dplyr provides an alternative that is more "efficient"
flights %>% select(origin, dest) %>% distinct()

# side note: when chaining, you don't have to include the parentheses if there are no arguments
flights %>% select(origin, dest) %>% distinct

Adding new variables: mutate, transmute, add_rownames

# mutate() creates a new variable (and keeps all existing variables)
flights %>% mutate(speed = distance/air_time*60)
## Source: local data frame [336,776 x 17]
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      517         2      830        11      UA  N14228
## 2  2013     1   1      533         4      850        20      UA  N24211
## 3  2013     1   1      542         2      923        33      AA  N619AA
## 4  2013     1   1      544        -1     1004       -18      B6  N804JB
## 5  2013     1   1      554        -6      812       -25      DL  N668DN
## 6  2013     1   1      554        -4      740        12      UA  N39463
## 7  2013     1   1      555        -5      913        19      B6  N516JB
## 8  2013     1   1      557        -3      709       -14      EV  N829AS
## 9  2013     1   1      557        -3      838        -8      B6  N593JB
## 10 2013     1   1      558        -2      753         8      AA  N3ALAA
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl), speed (dbl)
# transmute() only keeps the new variables
flights %>% transmute(speed = distance/air_time*60)
## Source: local data frame [336,776 x 1]
## 
##       speed
## 1  370.0441
## 2  374.2731
## 3  408.3750
## 4  516.7213
## 5  394.1379
## 6  287.6000
## 7  404.4304
## 8  259.2453
## 9  404.5714
## 10 318.6957
## ..      ...
# example data frame with row names
mtcars %>% head()
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# add_rownames() turns row names into an explicit variable
mtcars %>% add_rownames("model") %>% head()
##               model  mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# side note: dplyr no longer prints row names (ever) for local data frames
mtcars %>% tbl_df()
## Source: local data frame [32 x 11]
## 
##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## ..  ... ...   ... ...  ...   ...   ... .. ..  ...  ...

Grouping and counting: summarise, tally, count, group_size, n_groups, ungroup

# summarise() can be used to count the number of rows in each group
flights %>% group_by(month) %>% summarise(cnt = n())
## Source: local data frame [12 x 2]
## 
##    month   cnt
## 1      1 27004
## 2      2 24951
## 3      3 28834
## 4      4 28330
## 5      5 28796
## 6      6 28243
## 7      7 29425
## 8      8 29327
## 9      9 27574
## 10    10 28889
## 11    11 27268
## 12    12 28135
# tally() and count() can do this more concisely
flights %>% group_by(month) %>% tally()
flights %>% count(month)
# you can sort by the count
flights %>% group_by(month) %>% summarise(cnt = n()) %>% arrange(desc(cnt))
## Source: local data frame [12 x 2]
## 
##    month   cnt
## 1      7 29425
## 2      8 29327
## 3     10 28889
## 4      3 28834
## 5      5 28796
## 6      4 28330
## 7      6 28243
## 8     12 28135
## 9      9 27574
## 10    11 27268
## 11     1 27004
## 12     2 24951
# tally() and count() have a sort parameter for this purpose
flights %>% group_by(month) %>% tally(sort=TRUE)
flights %>% count(month, sort=TRUE)
# you can sum over a specific variable instead of simply counting rows
flights %>% group_by(month) %>% summarise(dist = sum(distance))
## Source: local data frame [12 x 2]
## 
##    month     dist
## 1      1 27188805
## 2      2 24975509
## 3      3 29179636
## 4      4 29427294
## 5      5 29974128
## 6      6 29856388
## 7      7 31149199
## 8      8 31149334
## 9      9 28711426
## 10    10 30012086
## 11    11 28639718
## 12    12 29954084
# tally() and count() have a wt parameter for this purpose
flights %>% group_by(month) %>% tally(wt = distance)
flights %>% count(month, wt = distance)
# group_size() returns the counts as a vector
flights %>% group_by(month) %>% group_size()
##  [1] 27004 24951 28834 28330 28796 28243 29425 29327 27574 28889 27268
## [12] 28135
# n_groups() simply reports the number of groups
flights %>% group_by(month) %>% n_groups()
## [1] 12
# group by two variables, summarise, arrange (output is possibly confusing)
flights %>% group_by(month, day) %>% summarise(cnt = n()) %>% arrange(desc(cnt)) %>% print(n = 40)
## Source: local data frame [365 x 3]
## Groups: month
## 
##    month day cnt
## 1      1   2 943
## 2      1   7 933
## 3      1  10 932
## 4      1  11 930
## 5      1  14 928
## 6      1  31 928
## 7      1  17 927
## 8      1  24 925
## 9      1  18 924
## 10     1  28 923
## 11     1  25 922
## 12     1   4 915
## 13     1   3 914
## 14     1  21 912
## 15     1   9 902
## 16     1  16 901
## 17     1  30 900
## 18     1   8 899
## 19     1  23 897
## 20     1  15 894
## 21     1  22 890
## 22     1  29 890
## 23     1   1 842
## 24     1   6 832
## 25     1  13 828
## 26     1  27 823
## 27     1  20 786
## 28     1   5 720
## 29     1  12 690
## 30     1  26 680
## 31     1  19 674
## 32     2  28 964
## 33     2  21 961
## 34     2  25 961
## 35     2  22 957
## 36     2  14 956
## 37     2  15 954
## 38     2  20 949
## 39     2  18 948
## 40     2  27 945
## ..   ... ... ...
# ungroup() before arranging to arrange across all groups
flights %>% group_by(month, day) %>% summarise(cnt = n()) %>% ungroup() %>% arrange(desc(cnt))
## Source: local data frame [365 x 3]
## 
##    month day  cnt
## 1     11  27 1014
## 2      7  11 1006
## 3      7   8 1004
## 4      7  10 1004
## 5     12   2 1004
## 6      7  18 1003
## 7      7  25 1003
## 8      7  12 1002
## 9      7   9 1001
## 10     7  17 1001
## ..   ... ...  ...

Creating data frames: data_frame

data_frame() is a better way than data.frame() for creating data frames. Benefits of data_frame():

# data_frame() example
data_frame(a = 1:6, b = a*2, c = 'string', 'd+e' = 1) %>% glimpse()
## Observations: 6
## Variables:
## $ a   (int) 1, 2, 3, 4, 5, 6
## $ b   (dbl) 2, 4, 6, 8, 10, 12
## $ c   (chr) "string", "string", "string", "string", "string", "string"
## $ d+e (dbl) 1, 1, 1, 1, 1, 1
# data.frame() example
data.frame(a = 1:6, c = 'string', 'd+e' = 1) %>% glimpse()
## Observations: 6
## Variables:
## $ a   (int) 1, 2, 3, 4, 5, 6
## $ c   (fctr) string, string, string, string, string, string
## $ d.e (dbl) 1, 1, 1, 1, 1, 1

Joining (merging) tables: left_join, right_join, inner_join, full_join, semi_join, anti_join

# create two simple data frames
(a <- data_frame(color = c("green","yellow","red"), num = 1:3))
## Source: local data frame [3 x 2]
## 
##    color num
## 1  green   1
## 2 yellow   2
## 3    red   3
(b <- data_frame(color = c("green","yellow","pink"), size = c("S","M","L")))
## Source: local data frame [3 x 2]
## 
##    color size
## 1  green    S
## 2 yellow    M
## 3   pink    L
# only include observations found in both "a" and "b" (automatically joins on variables that appear in both tables)
inner_join(a, b)
## Joining by: "color"
## Source: local data frame [2 x 3]
## 
##    color num size
## 1  green   1    S
## 2 yellow   2    M
# include observations found in either "a" or "b"
full_join(a, b)
## Joining by: "color"
## Source: local data frame [4 x 3]
## 
##    color num size
## 1  green   1    S
## 2 yellow   2    M
## 3    red   3   NA
## 4   pink  NA    L
# include all observations found in "a"
left_join(a, b)
## Joining by: "color"
## Source: local data frame [3 x 3]
## 
##    color num size
## 1  green   1    S
## 2 yellow   2    M
## 3    red   3   NA
# include all observations found in "b"
right_join(a, b)
## Joining by: "color"
## Source: local data frame [3 x 3]
## 
##    color num size
## 1  green   1    S
## 2 yellow   2    M
## 3   pink  NA    L
# right_join(a, b) is identical to left_join(b, a) except for column ordering
left_join(b, a)
## Joining by: "color"
## Source: local data frame [3 x 3]
## 
##    color size num
## 1  green    S   1
## 2 yellow    M   2
## 3   pink    L  NA
# filter "a" to only show observations that match "b"
semi_join(a, b)
## Joining by: "color"
## Source: local data frame [2 x 2]
## 
##    color num
## 1  green   1
## 2 yellow   2
# filter "a" to only show observations that don't match "b"
anti_join(a, b)
## Joining by: "color"
## Source: local data frame [1 x 2]
## 
##   color num
## 1   red   3
# sometimes matching variables don't have identical names
b <- b %>% rename(col = color)

# specify that the join should occur by matching "color" in "a" with "col" in "b"
inner_join(a, b, by=c("color" = "col"))
## Source: local data frame [2 x 3]
## 
##    color num size
## 1  green   1    S
## 2 yellow   2    M

Viewing more output: print, View

# specify that you want to see more rows
flights %>% print(n = 15)
## Source: local data frame [336,776 x 16]
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      517         2      830        11      UA  N14228
## 2  2013     1   1      533         4      850        20      UA  N24211
## 3  2013     1   1      542         2      923        33      AA  N619AA
## 4  2013     1   1      544        -1     1004       -18      B6  N804JB
## 5  2013     1   1      554        -6      812       -25      DL  N668DN
## 6  2013     1   1      554        -4      740        12      UA  N39463
## 7  2013     1   1      555        -5      913        19      B6  N516JB
## 8  2013     1   1      557        -3      709       -14      EV  N829AS
## 9  2013     1   1      557        -3      838        -8      B6  N593JB
## 10 2013     1   1      558        -2      753         8      AA  N3ALAA
## 11 2013     1   1      558        -2      849        -2      B6  N793JB
## 12 2013     1   1      558        -2      853        -3      B6  N657JB
## 13 2013     1   1      558        -2      924         7      UA  N29129
## 14 2013     1   1      558        -2      923       -14      UA  N53441
## 15 2013     1   1      559        -1      941        31      AA  N3DUAA
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
##   (dbl), distance (dbl), hour (dbl), minute (dbl)
# specify that you want to see ALL rows (don't run this!)
flights %>% print(n = Inf)
# specify that you want to see all columns
flights %>% print(width = Inf)
## Source: local data frame [336,776 x 16]
## 
##    year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1  2013     1   1      517         2      830        11      UA  N14228
## 2  2013     1   1      533         4      850        20      UA  N24211
## 3  2013     1   1      542         2      923        33      AA  N619AA
## 4  2013     1   1      544        -1     1004       -18      B6  N804JB
## 5  2013     1   1      554        -6      812       -25      DL  N668DN
## 6  2013     1   1      554        -4      740        12      UA  N39463
## 7  2013     1   1      555        -5      913        19      B6  N516JB
## 8  2013     1   1      557        -3      709       -14      EV  N829AS
## 9  2013     1   1      557        -3      838        -8      B6  N593JB
## 10 2013     1   1      558        -2      753         8      AA  N3ALAA
## ..  ...   ... ...      ...       ...      ...       ...     ...     ...
##    flight origin dest air_time distance hour minute
## 1    1545    EWR  IAH      227     1400    5     17
## 2    1714    LGA  IAH      227     1416    5     33
## 3    1141    JFK  MIA      160     1089    5     42
## 4     725    JFK  BQN      183     1576    5     44
## 5     461    LGA  ATL      116      762    5     54
## 6    1696    EWR  ORD      150      719    5     54
## 7     507    EWR  FLL      158     1065    5     55
## 8    5708    LGA  IAD       53      229    5     57
## 9      79    JFK  MCO      140      944    5     57
## 10    301    LGA  ORD      138      733    5     58
## ..    ...    ...  ...      ...      ...  ...    ...
# show up to 1000 rows and all columns
flights %>% View()

# set option to see all columns and fewer rows
options(dplyr.width = Inf, dplyr.print_min = 6)

# reset options (or just close R)
options(dplyr.width = NULL, dplyr.print_min = 10)

Resources

Data School

< END OF DOCUMENT >