Just a sample of what we can do with dplyr.
Consider the nycflights13 package. For each week of the year, find the plane(s) that made the most flights in the month, along with the tail number.
The following code returns only the top departer(s) in each week.
flights %>%
filter(!is.na(tailnum)) %>%
mutate(week_number = lubridate::week(time_hour)) %>%
group_by(week_number, tailnum) %>%
summarize(departures = n()) %>%
filter(departures == max(departures)) %>%
arrange(week_number) %>%
DT::datatable()
If we like we can retain all of the departures, but sort them within their week-groups. The key is to arrange() by weeknumber and by departures within weeknumber. See the code below.
flights %>%
filter(!is.na(tailnum)) %>%
mutate(week_number = lubridate::week(time_hour)) %>%
group_by(week_number, tailnum) %>%
summarize(departures = n()) %>%
arrange(week_number, desc(departures))
# A tibble: 109,324 x 3
# Groups: week_number [53]
week_number tailnum departures
<dbl> <chr> <int>
1 1 N14542 17
2 1 N711MQ 17
3 1 N725MQ 17
4 1 N730MQ 17
5 1 N16561 16
6 1 N737MQ 16
7 1 N739MQ 16
8 1 N281JB 15
9 1 N509MQ 15
10 1 N723MQ 15
# ... with 109,314 more rows
You can’t show the whole thing in a DT datatable, however. If you try then you get a message in the Console:
It seems your data is too big for client-side DataTables.
You may consider server-side processing:
https://rstudio.github.io/DT/server.html
You could content yourself with finding top performers in each week:
flights %>%
filter(!is.na(tailnum)) %>%
mutate(week_number = lubridate::week(time_hour)) %>%
group_by(week_number, tailnum) %>%
summarize(departures = n()) %>%
top_n(5) %>%
arrange(week_number, desc(departures))
# A tibble: 348 x 3
# Groups: week_number [53]
week_number tailnum departures
<dbl> <chr> <int>
1 1 N14542 17
2 1 N711MQ 17
3 1 N725MQ 17
4 1 N730MQ 17
5 1 N16561 16
6 1 N737MQ 16
7 1 N739MQ 16
8 2 N719MQ 17
9 2 N730MQ 17
10 2 N713MQ 16
# ... with 338 more rows
Note that when there are ties in a group more than the requested 5 items are returned.
This table is small enough for DT to handle:
flights %>%
filter(!is.na(tailnum)) %>%
mutate(week_number = lubridate::week(time_hour)) %>%
group_by(week_number, tailnum) %>%
summarize(departures = n()) %>%
top_n(5) %>%
arrange(week_number, desc(departures)) %>%
DT::datatable()
For fun, let’s find plot the number of cancelled flights, during 2013, by day:
flights %>%
mutate(day_number = lubridate::yday(time_hour)) %>%
group_by(day_number) %>%
summarise(cancellations = sum(is.na(dep_delay))) %>%
mutate(date = lubridate::as_date(day_number,
origin = "2013-01-01")) %>%
ggplot(aes(x = date, y = cancellations)) +
geom_line() +
labs(x = NULL)
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".