In August 2014, I created a 40-minute video tutorial introducing the key functionality of the dplyr package in R, using dplyr version 0.2. Since then, there have been two significant updates to dplyr (0.3 and 0.4), introducing a ton of new features.
This document (created in March 2015) covers the most useful new features in 0.3 and 0.4, as well as other functionality that I didn’t cover last time (though it is not necessarily new). My new video tutorial walks through the code below in detail.
If you have not watched the previous tutorial, I recommend you do so first since it covers some dplyr basics that will not be covered in this tutorial.
Although my last tutorial used data from the hflights package, Hadley Wickham has rewritten the dplyr vignettes to use the nycflights13 package instead, and so I’m also using nycflights13 for the sake of consistency.
# remove flights data if you just finished my previous tutorial
rm(flights)
# load packages
suppressMessages(library(dplyr))
library(nycflights13)
# print the flights dataset from nycflights13
flights
## Source: local data frame [336,776 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB
## 5 2013 1 1 554 -6 812 -25 DL N668DN
## 6 2013 1 1 554 -4 740 12 UA N39463
## 7 2013 1 1 555 -5 913 19 B6 N516JB
## 8 2013 1 1 557 -3 709 -14 EV N829AS
## 9 2013 1 1 557 -3 838 -8 B6 N593JB
## 10 2013 1 1 558 -2 753 8 AA N3ALAA
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# besides just using select() to pick columns...
flights %>% select(carrier, flight)
## Source: local data frame [336,776 x 2]
##
## carrier flight
## 1 UA 1545
## 2 UA 1714
## 3 AA 1141
## 4 B6 725
## 5 DL 461
## 6 UA 1696
## 7 B6 507
## 8 EV 5708
## 9 B6 79
## 10 AA 301
## .. ... ...
# ...you can use the minus sign to hide columns
flights %>% select(-month, -day)
## Source: local data frame [336,776 x 14]
##
## year dep_time dep_delay arr_time arr_delay carrier tailnum flight
## 1 2013 517 2 830 11 UA N14228 1545
## 2 2013 533 4 850 20 UA N24211 1714
## 3 2013 542 2 923 33 AA N619AA 1141
## 4 2013 544 -1 1004 -18 B6 N804JB 725
## 5 2013 554 -6 812 -25 DL N668DN 461
## 6 2013 554 -4 740 12 UA N39463 1696
## 7 2013 555 -5 913 19 B6 N516JB 507
## 8 2013 557 -3 709 -14 EV N829AS 5708
## 9 2013 557 -3 838 -8 B6 N593JB 79
## 10 2013 558 -2 753 8 AA N3ALAA 301
## .. ... ... ... ... ... ... ... ...
## Variables not shown: origin (chr), dest (chr), air_time (dbl), distance
## (dbl), hour (dbl), minute (dbl)
# hide a range of columns
flights %>% select(-(dep_time:arr_delay))
# hide any column with a matching name
flights %>% select(-contains("time"))
# pick columns using a character vector of column names
cols <- c("carrier", "flight", "tailnum")
flights %>% select(one_of(cols))
## Source: local data frame [336,776 x 3]
##
## carrier flight tailnum
## 1 UA 1545 N14228
## 2 UA 1714 N24211
## 3 AA 1141 N619AA
## 4 B6 725 N804JB
## 5 DL 461 N668DN
## 6 UA 1696 N39463
## 7 B6 507 N516JB
## 8 EV 5708 N829AS
## 9 B6 79 N593JB
## 10 AA 301 N3ALAA
## .. ... ... ...
# select() can be used to rename columns, though all columns not mentioned are dropped
flights %>% select(tail = tailnum)
## Source: local data frame [336,776 x 1]
##
## tail
## 1 N14228
## 2 N24211
## 3 N619AA
## 4 N804JB
## 5 N668DN
## 6 N39463
## 7 N516JB
## 8 N829AS
## 9 N593JB
## 10 N3ALAA
## .. ...
# rename() does the same thing, except all columns not mentioned are kept
flights %>% rename(tail = tailnum)
## Source: local data frame [336,776 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tail
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB
## 5 2013 1 1 554 -6 812 -25 DL N668DN
## 6 2013 1 1 554 -4 740 12 UA N39463
## 7 2013 1 1 555 -5 913 19 B6 N516JB
## 8 2013 1 1 557 -3 709 -14 EV N829AS
## 9 2013 1 1 557 -3 838 -8 B6 N593JB
## 10 2013 1 1 558 -2 753 8 AA N3ALAA
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# filter() supports the use of multiple conditions
flights %>% filter(dep_time >= 600, dep_time <= 605)
## Source: local data frame [2,460 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 600 0 851 -7 B6 N595JB
## 2 2013 1 1 600 0 837 12 MQ N542MQ
## 3 2013 1 1 601 1 844 -6 B6 N644JB
## 4 2013 1 1 602 -8 812 -8 DL N971DL
## 5 2013 1 1 602 -3 821 16 MQ N730MQ
## 6 2013 1 2 600 0 814 25 EV N13914
## 7 2013 1 2 600 -5 751 -27 EV N760EV
## 8 2013 1 2 600 0 819 4 9E N8946A
## 9 2013 1 2 600 0 846 0 B6 N529JB
## 10 2013 1 2 600 0 737 12 WN N8311Q
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# between() is a concise alternative for determing if numeric values fall in a range
flights %>% filter(between(dep_time, 600, 605))
# side note: is.na() can also be useful when filtering
flights %>% filter(!is.na(dep_time))
# slice() filters rows by position
flights %>% slice(1000:1005)
## Source: local data frame [6 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 2 809 -1 950 2 B6 N304JB
## 2 2013 1 2 810 10 1008 -6 DL N358NW
## 3 2013 1 2 811 -4 1100 4 DL N328NW
## 4 2013 1 2 811 -4 1126 -5 DL N305DQ
## 5 2013 1 2 811 -9 944 -11 MQ N509MQ
## 6 2013 1 2 815 0 1109 -19 DL N335NW
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# keep the first three rows within each group
flights %>% group_by(month, day) %>% slice(1:3)
## Source: local data frame [1,095 x 16]
## Groups: month, day
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 2 42 43 518 36 B6 N580JB
## 5 2013 1 2 126 156 233 154 B6 N636JB
## 6 2013 1 2 458 -2 703 13 US N162UW
## 7 2013 1 3 32 33 504 22 B6 N763JB
## 8 2013 1 3 50 185 203 172 B6 N329JB
## 9 2013 1 3 235 156 700 143 B6 N618JB
## 10 2013 1 4 25 26 505 23 B6 N554JB
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# sample three rows from each group
flights %>% group_by(month, day) %>% sample_n(3)
## Source: local data frame [1,095 x 16]
## Groups: month, day
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 1521 6 1830 7 DL N378NW
## 2 2013 1 1 1854 24 2055 40 MQ N518MQ
## 3 2013 1 1 1915 -5 2238 -19 DL N633DL
## 4 2013 1 2 1550 0 1926 1 DL N633DL
## 5 2013 1 2 1806 5 2140 3 UA N12116
## 6 2013 1 2 925 5 1124 16 B6 N239JB
## 7 2013 1 3 2154 -1 50 9 B6 N508JB
## 8 2013 1 3 1448 -2 1631 -9 MQ N835MQ
## 9 2013 1 3 1651 1 1843 -53 UA N423UA
## 10 2013 1 4 2203 88 2309 87 EV N22909
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# keep three rows from each group with the top dep_delay
flights %>% group_by(month, day) %>% top_n(3, dep_delay)
## Source: local data frame [1,108 x 16]
## Groups: month, day
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 848 853 1001 851 MQ N942MQ
## 2 2013 1 1 1815 290 2120 338 EV N17185
## 3 2013 1 1 2343 379 314 456 EV N21197
## 4 2013 1 2 1412 334 1710 323 UA N474UA
## 5 2013 1 2 1607 337 2003 368 AA N324AA
## 6 2013 1 2 2131 379 2340 359 UA N593UA
## 7 2013 1 3 2008 268 2339 270 DL N338NW
## 8 2013 1 3 2012 252 2314 257 B6 N558JB
## 9 2013 1 3 2056 291 2239 285 9E N928XJ
## 10 2013 1 4 2058 208 2 172 B6 N523JB
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# also sort by dep_delay within each group
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>% arrange(desc(dep_delay))
## Source: local data frame [1,108 x 16]
## Groups: month, day
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 848 853 1001 851 MQ N942MQ
## 2 2013 1 1 2343 379 314 456 EV N21197
## 3 2013 1 1 1815 290 2120 338 EV N17185
## 4 2013 1 2 2131 379 2340 359 UA N593UA
## 5 2013 1 2 1607 337 2003 368 AA N324AA
## 6 2013 1 2 1412 334 1710 323 UA N474UA
## 7 2013 1 3 2056 291 2239 285 9E N928XJ
## 8 2013 1 3 2008 268 2339 270 DL N338NW
## 9 2013 1 3 2012 252 2314 257 B6 N558JB
## 10 2013 1 4 2123 288 2332 276 EV N29917
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# unique rows can be identified using unique() from base R
flights %>% select(origin, dest) %>% unique()
## Source: local data frame [224 x 2]
##
## origin dest
## 1 EWR IAH
## 2 LGA IAH
## 3 JFK MIA
## 4 JFK BQN
## 5 LGA ATL
## 6 EWR ORD
## 7 EWR FLL
## 8 LGA IAD
## 9 JFK MCO
## 10 LGA ORD
## .. ... ...
# dplyr provides an alternative that is more "efficient"
flights %>% select(origin, dest) %>% distinct()
# side note: when chaining, you don't have to include the parentheses if there are no arguments
flights %>% select(origin, dest) %>% distinct
# mutate() creates a new variable (and keeps all existing variables)
flights %>% mutate(speed = distance/air_time*60)
## Source: local data frame [336,776 x 17]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB
## 5 2013 1 1 554 -6 812 -25 DL N668DN
## 6 2013 1 1 554 -4 740 12 UA N39463
## 7 2013 1 1 555 -5 913 19 B6 N516JB
## 8 2013 1 1 557 -3 709 -14 EV N829AS
## 9 2013 1 1 557 -3 838 -8 B6 N593JB
## 10 2013 1 1 558 -2 753 8 AA N3ALAA
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl), speed (dbl)
# transmute() only keeps the new variables
flights %>% transmute(speed = distance/air_time*60)
## Source: local data frame [336,776 x 1]
##
## speed
## 1 370.0441
## 2 374.2731
## 3 408.3750
## 4 516.7213
## 5 394.1379
## 6 287.6000
## 7 404.4304
## 8 259.2453
## 9 404.5714
## 10 318.6957
## .. ...
# example data frame with row names
mtcars %>% head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# add_rownames() turns row names into an explicit variable
mtcars %>% add_rownames("model") %>% head()
## model mpg cyl disp hp drat wt qsec vs am gear carb
## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# side note: dplyr no longer prints row names (ever) for local data frames
mtcars %>% tbl_df()
## Source: local data frame [32 x 11]
##
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## .. ... ... ... ... ... ... ... .. .. ... ...
# summarise() can be used to count the number of rows in each group
flights %>% group_by(month) %>% summarise(cnt = n())
## Source: local data frame [12 x 2]
##
## month cnt
## 1 1 27004
## 2 2 24951
## 3 3 28834
## 4 4 28330
## 5 5 28796
## 6 6 28243
## 7 7 29425
## 8 8 29327
## 9 9 27574
## 10 10 28889
## 11 11 27268
## 12 12 28135
# tally() and count() can do this more concisely
flights %>% group_by(month) %>% tally()
flights %>% count(month)
# you can sort by the count
flights %>% group_by(month) %>% summarise(cnt = n()) %>% arrange(desc(cnt))
## Source: local data frame [12 x 2]
##
## month cnt
## 1 7 29425
## 2 8 29327
## 3 10 28889
## 4 3 28834
## 5 5 28796
## 6 4 28330
## 7 6 28243
## 8 12 28135
## 9 9 27574
## 10 11 27268
## 11 1 27004
## 12 2 24951
# tally() and count() have a sort parameter for this purpose
flights %>% group_by(month) %>% tally(sort=TRUE)
flights %>% count(month, sort=TRUE)
# you can sum over a specific variable instead of simply counting rows
flights %>% group_by(month) %>% summarise(dist = sum(distance))
## Source: local data frame [12 x 2]
##
## month dist
## 1 1 27188805
## 2 2 24975509
## 3 3 29179636
## 4 4 29427294
## 5 5 29974128
## 6 6 29856388
## 7 7 31149199
## 8 8 31149334
## 9 9 28711426
## 10 10 30012086
## 11 11 28639718
## 12 12 29954084
# tally() and count() have a wt parameter for this purpose
flights %>% group_by(month) %>% tally(wt = distance)
flights %>% count(month, wt = distance)
# group_size() returns the counts as a vector
flights %>% group_by(month) %>% group_size()
## [1] 27004 24951 28834 28330 28796 28243 29425 29327 27574 28889 27268
## [12] 28135
# n_groups() simply reports the number of groups
flights %>% group_by(month) %>% n_groups()
## [1] 12
# group by two variables, summarise, arrange (output is possibly confusing)
flights %>% group_by(month, day) %>% summarise(cnt = n()) %>% arrange(desc(cnt)) %>% print(n = 40)
## Source: local data frame [365 x 3]
## Groups: month
##
## month day cnt
## 1 1 2 943
## 2 1 7 933
## 3 1 10 932
## 4 1 11 930
## 5 1 14 928
## 6 1 31 928
## 7 1 17 927
## 8 1 24 925
## 9 1 18 924
## 10 1 28 923
## 11 1 25 922
## 12 1 4 915
## 13 1 3 914
## 14 1 21 912
## 15 1 9 902
## 16 1 16 901
## 17 1 30 900
## 18 1 8 899
## 19 1 23 897
## 20 1 15 894
## 21 1 22 890
## 22 1 29 890
## 23 1 1 842
## 24 1 6 832
## 25 1 13 828
## 26 1 27 823
## 27 1 20 786
## 28 1 5 720
## 29 1 12 690
## 30 1 26 680
## 31 1 19 674
## 32 2 28 964
## 33 2 21 961
## 34 2 25 961
## 35 2 22 957
## 36 2 14 956
## 37 2 15 954
## 38 2 20 949
## 39 2 18 948
## 40 2 27 945
## .. ... ... ...
# ungroup() before arranging to arrange across all groups
flights %>% group_by(month, day) %>% summarise(cnt = n()) %>% ungroup() %>% arrange(desc(cnt))
## Source: local data frame [365 x 3]
##
## month day cnt
## 1 11 27 1014
## 2 7 11 1006
## 3 7 8 1004
## 4 7 10 1004
## 5 12 2 1004
## 6 7 18 1003
## 7 7 25 1003
## 8 7 12 1002
## 9 7 9 1001
## 10 7 17 1001
## .. ... ... ...
data_frame() is a better way than data.frame() for creating data frames. Benefits of data_frame():
# data_frame() example
data_frame(a = 1:6, b = a*2, c = 'string', 'd+e' = 1) %>% glimpse()
## Observations: 6
## Variables:
## $ a (int) 1, 2, 3, 4, 5, 6
## $ b (dbl) 2, 4, 6, 8, 10, 12
## $ c (chr) "string", "string", "string", "string", "string", "string"
## $ d+e (dbl) 1, 1, 1, 1, 1, 1
# data.frame() example
data.frame(a = 1:6, c = 'string', 'd+e' = 1) %>% glimpse()
## Observations: 6
## Variables:
## $ a (int) 1, 2, 3, 4, 5, 6
## $ c (fctr) string, string, string, string, string, string
## $ d.e (dbl) 1, 1, 1, 1, 1, 1
# create two simple data frames
(a <- data_frame(color = c("green","yellow","red"), num = 1:3))
## Source: local data frame [3 x 2]
##
## color num
## 1 green 1
## 2 yellow 2
## 3 red 3
(b <- data_frame(color = c("green","yellow","pink"), size = c("S","M","L")))
## Source: local data frame [3 x 2]
##
## color size
## 1 green S
## 2 yellow M
## 3 pink L
# only include observations found in both "a" and "b" (automatically joins on variables that appear in both tables)
inner_join(a, b)
## Joining by: "color"
## Source: local data frame [2 x 3]
##
## color num size
## 1 green 1 S
## 2 yellow 2 M
# include observations found in either "a" or "b"
full_join(a, b)
## Joining by: "color"
## Source: local data frame [4 x 3]
##
## color num size
## 1 green 1 S
## 2 yellow 2 M
## 3 red 3 NA
## 4 pink NA L
# include all observations found in "a"
left_join(a, b)
## Joining by: "color"
## Source: local data frame [3 x 3]
##
## color num size
## 1 green 1 S
## 2 yellow 2 M
## 3 red 3 NA
# include all observations found in "b"
right_join(a, b)
## Joining by: "color"
## Source: local data frame [3 x 3]
##
## color num size
## 1 green 1 S
## 2 yellow 2 M
## 3 pink NA L
# right_join(a, b) is identical to left_join(b, a) except for column ordering
left_join(b, a)
## Joining by: "color"
## Source: local data frame [3 x 3]
##
## color size num
## 1 green S 1
## 2 yellow M 2
## 3 pink L NA
# filter "a" to only show observations that match "b"
semi_join(a, b)
## Joining by: "color"
## Source: local data frame [2 x 2]
##
## color num
## 1 green 1
## 2 yellow 2
# filter "a" to only show observations that don't match "b"
anti_join(a, b)
## Joining by: "color"
## Source: local data frame [1 x 2]
##
## color num
## 1 red 3
# sometimes matching variables don't have identical names
b <- b %>% rename(col = color)
# specify that the join should occur by matching "color" in "a" with "col" in "b"
inner_join(a, b, by=c("color" = "col"))
## Source: local data frame [2 x 3]
##
## color num size
## 1 green 1 S
## 2 yellow 2 M
# specify that you want to see more rows
flights %>% print(n = 15)
## Source: local data frame [336,776 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB
## 5 2013 1 1 554 -6 812 -25 DL N668DN
## 6 2013 1 1 554 -4 740 12 UA N39463
## 7 2013 1 1 555 -5 913 19 B6 N516JB
## 8 2013 1 1 557 -3 709 -14 EV N829AS
## 9 2013 1 1 557 -3 838 -8 B6 N593JB
## 10 2013 1 1 558 -2 753 8 AA N3ALAA
## 11 2013 1 1 558 -2 849 -2 B6 N793JB
## 12 2013 1 1 558 -2 853 -3 B6 N657JB
## 13 2013 1 1 558 -2 924 7 UA N29129
## 14 2013 1 1 558 -2 923 -14 UA N53441
## 15 2013 1 1 559 -1 941 31 AA N3DUAA
## .. ... ... ... ... ... ... ... ... ...
## Variables not shown: flight (int), origin (chr), dest (chr), air_time
## (dbl), distance (dbl), hour (dbl), minute (dbl)
# specify that you want to see ALL rows (don't run this!)
flights %>% print(n = Inf)
# specify that you want to see all columns
flights %>% print(width = Inf)
## Source: local data frame [336,776 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB
## 5 2013 1 1 554 -6 812 -25 DL N668DN
## 6 2013 1 1 554 -4 740 12 UA N39463
## 7 2013 1 1 555 -5 913 19 B6 N516JB
## 8 2013 1 1 557 -3 709 -14 EV N829AS
## 9 2013 1 1 557 -3 838 -8 B6 N593JB
## 10 2013 1 1 558 -2 753 8 AA N3ALAA
## .. ... ... ... ... ... ... ... ... ...
## flight origin dest air_time distance hour minute
## 1 1545 EWR IAH 227 1400 5 17
## 2 1714 LGA IAH 227 1416 5 33
## 3 1141 JFK MIA 160 1089 5 42
## 4 725 JFK BQN 183 1576 5 44
## 5 461 LGA ATL 116 762 5 54
## 6 1696 EWR ORD 150 719 5 54
## 7 507 EWR FLL 158 1065 5 55
## 8 5708 LGA IAD 53 229 5 57
## 9 79 JFK MCO 140 944 5 57
## 10 301 LGA ORD 138 733 5 58
## .. ... ... ... ... ... ... ...
# show up to 1000 rows and all columns
flights %>% View()
# set option to see all columns and fewer rows
options(dplyr.width = Inf, dplyr.print_min = 6)
# reset options (or just close R)
options(dplyr.width = NULL, dplyr.print_min = 10)