I was looking for a simple dataset with count data for many items to demonstrate some basic readr functions. Luckily, readr comes bundled with a good demo dataset.
chickens <- read_csv(readr_example("chickens.csv"))
chickens
## # A tibble: 5 × 4
## chicken sex eggs_laid motto
## <chr> <chr> <dbl> <chr>
## 1 Foghorn Leghorn rooster 0 That's a joke, ah say, that's a jok…
## 2 Chicken Little hen 3 The sky is falling!
## 3 Ginger hen 12 Listen. We'll either die free chick…
## 4 Camilla the Chicken hen 7 Bawk, buck, ba-gawk.
## 5 Ernie The Giant Chicken rooster 0 Put Captain Solo in the cargo hold.
Q: how to I set column types? A: Use readr column specifications
Column types have been printed by readr. The column types were guessed by readr, and although it has done a very good job, it is not perfect. For example, the guessed column type for eggs_laid is double.
spec(chickens)
## cols(
## chicken = col_character(),
## sex = col_character(),
## eggs_laid = col_double(),
## motto = col_character()
## )
Since chickens do not lay fractional eggs we may want to tell readr to set the type of eggs_laid as integer. Furthermore, we may also want sex to be read in as factor instead of character. Notice, we set specifications for only those columns that were not guessed correctly.
chickens <- read_csv(readr_example("chickens.csv"),
col_types = cols(
sex = col_factor(levels = c('rooster', 'hen')),
eggs_laid = col_integer()
)
)
chickens
## # A tibble: 5 × 4
## chicken sex eggs_laid motto
## <chr> <fct> <int> <chr>
## 1 Foghorn Leghorn rooster 0 That's a joke, ah say, that's a jok…
## 2 Chicken Little hen 3 The sky is falling!
## 3 Ginger hen 12 Listen. We'll either die free chick…
## 4 Camilla the Chicken hen 7 Bawk, buck, ba-gawk.
## 5 Ernie The Giant Chicken rooster 0 Put Captain Solo in the cargo hold.
The column types have now been set correctly. A compact way of
providing column types is by using a string of positional types. For
example, cfi
to read first column as character, second as
float, and third as integer. To skip columns underscore character.
chickens <- read_csv(readr_example("chickens.csv"),
col_types = cols(
sex = col_factor(levels = c('rooster', 'hen')),
eggs_laid = col_integer(),
.default = col_character()
)
)
chickens
## # A tibble: 5 × 4
## chicken sex eggs_laid motto
## <chr> <fct> <int> <chr>
## 1 Foghorn Leghorn rooster 0 That's a joke, ah say, that's a jok…
## 2 Chicken Little hen 3 The sky is falling!
## 3 Ginger hen 12 Listen. We'll either die free chick…
## 4 Camilla the Chicken hen 7 Bawk, buck, ba-gawk.
## 5 Ernie The Giant Chicken rooster 0 Put Captain Solo in the cargo hold.
Finally, a default type can be used for instead of guessing for columns that are not specified.
chickens <- read_csv(readr_example("chickens.csv"),
col_types = "cfi_"
)
Q: how can I parse a character vector into specific data type A: Use readr::parse_ functions
parse_double(c('1.1', '2', '3', '4'))
## [1] 1.1 2.0 3.0 4.0
parse_logical(c('t', 'f'))
## [1] TRUE FALSE
Unlike parse_integer() and parse_double(), parse_number() is able to handle num-numeric prefixes and suffixes.
parse_number(c('$123.45', '1,000,000'))
## [1] 123.45 1000000.00
Finally, there are flexible Date/Time parsers.
parse_datetime('2022-10-29 18:06')
## [1] "2022-10-29 18:06:00 UTC"
parse_date('2022-10-29')
## [1] "2022-10-29"
parse_time("3:08 pm")
## 15:08:00
The Date/Time parsers takes an optional format argument that specifies the string format.
parse_date("10/29/2022", format = "%m/%d/%Y")
## [1] "2022-10-29"
fct_relevel() allows us to change the ‘level’ or order of any particular factor or vector of vectors on the fly.
ggplot(data=chickens, aes(x=sex)) + geom_bar(fill='lightyellow')
ggplot(data=chickens, aes(x=fct_relevel(sex, "hen"))) + geom_bar(fill='lightyellow')
fct_infreq() lets us sort by frequency, descending.
ggplot(data=chickens, aes(x=fct_infreq(sex))) + geom_bar(fill='lightyellow') + geom_text(aes(label = ..count..), stat = "count")
chickens <- read_csv(readr_example("chickens.csv"))
word23 <- word(chickens$motto, start=2, end=3, sep=fixed(" "))
chickens |> mutate(word_2to3 = word23)
## # A tibble: 5 × 5
## chicken sex eggs_laid motto word_…¹
## <chr> <chr> <dbl> <chr> <chr>
## 1 Foghorn Leghorn rooster 0 That's a joke, ah say, that… a joke,
## 2 Chicken Little hen 3 The sky is falling! sky is
## 3 Ginger hen 12 Listen. We'll either die fr… We'll …
## 4 Camilla the Chicken hen 7 Bawk, buck, ba-gawk. buck, …
## 5 Ernie The Giant Chicken rooster 0 Put Captain Solo in the car… Captai…
## # … with abbreviated variable name ¹word_2to3