avocado |>ggplot(aes(x = Date, y = AveragePrice, group = region, color = region)) +geom_line(aes(linetype = isNorthEast ), linewidth =1.25) +labs(title ="Average Avocado Price in the US Over Time",subtitle ="How do avocado prices differ by region?",x ="Date",y ="Average Price",color ="Region" ) +theme_fivethirtyeight() +theme(axis.title =element_text()) +scale_linetype_manual(values =c("11", # compact dash (dash length 1, gap length 1)"solid" ),guide ="none"# removes the "isNorthEast" legend ) +scale_x_date(breaks =pretty_breaks(n =10))
2tidyverse in action
Default |>sample_n(5)
default student balance income
1 Yes Yes 1956.9239 15574.39
2 No Yes 599.4710 18575.41
3 No No 322.9823 25267.40
4 No Yes 1499.1901 17560.37
5 No No 1217.0728 62764.10
Use View() to view the data in a comprehensive manner.
To count the number of long_flights (the column long_flights doesn’t exist in the dataframe, on option would be the following
# A tibble: 5 × 4
date tailnum origin dest
<date> <chr> <chr> <chr>
1 2013-02-26 N586JB JFK SJU
2 2013-09-09 N454UA EWR FLL
3 2013-02-24 N806MQ JFK RDU
4 2013-04-16 N8665A JFK CLE
5 2013-03-24 N14230 EWR RSW
Pipes are really handy,
airlines |>sample_n(5)
# A tibble: 5 × 2
carrier name
<chr> <chr>
1 AA American Airlines Inc.
2 F9 Frontier Airlines Inc.
3 DL Delta Air Lines Inc.
4 WN Southwest Airlines Co.
5 MQ Envoy Air
airlines |>mutate(name = name |>str_to_upper() |>str_replace_all(" (INC|CO)\\.?$", "") |>str_replace_all(" AIR ?(LINES|WAYS)?( CORPORATION)?$", "") |>str_to_title() |>str_replace_all("\\bUs\\b", "US")) |>sample_n(5)
# A tibble: 5 × 2
carrier name
<chr> <chr>
1 UA United
2 MQ Envoy
3 YV Mesa
4 AA American
5 HA Hawaiian
group_bycarrier, then include only the groups that have more than 27000 rows,
# A tibble: 5 × 4
carrier name short_name remainder
<chr> <chr> <chr> <chr>
1 DL Delta Air Lines Inc. Delta Air Lines Inc.
2 VX Virgin America Virgin America
3 HA Hawaiian Airlines Inc. Hawaiian Airlines Inc.
4 WN Southwest Airlines Co. Southwest Airlines Co.
5 FL AirTran Airways Corporation AirTran Airways Corporation
semi_join() to pick only rows from the first table which are matched in the second table,
begin_with_a <- airlines |>filter( name |>str_detect("^A") )begin_with_a
# A tibble: 3 × 2
carrier name
<chr> <chr>
1 AA American Airlines Inc.
2 AS Alaska Airlines Inc.
3 FL AirTran Airways Corporation
flights |>semi_join(begin_with_a, by ="carrier") |>count(carrier)
# A tibble: 3 × 2
carrier n
<chr> <int>
1 AA 32729
2 AS 714
3 FL 3260