rowwise()
function in the
tidyverse
dplyr
package to apply multiple
custom functions requiring contextual row information to a data
frame.Import the following dataset from FiveThirtyEight that represents the male and female shares of popular unisex names in the US.
names <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/unisex-names/unisex_names_table.csv")
print(head(names, 10))
## X name total male_share female_share gap
## 1 1 Casey 176544.33 0.5842866 0.4157134 0.16857313
## 2 2 Riley 154860.67 0.5076391 0.4923609 0.01527814
## 3 3 Jessie 136381.83 0.4778343 0.5221657 0.04433146
## 4 4 Jackie 132928.79 0.4211326 0.5788674 0.15773480
## 5 5 Avery 121797.42 0.3352131 0.6647869 0.32957385
## 6 6 Jaime 109870.19 0.5617929 0.4382071 0.12358580
## 7 7 Peyton 94896.40 0.4337194 0.5662806 0.13256125
## 8 8 Kerry 88963.93 0.4839488 0.5160512 0.03210231
## 9 9 Jody 80400.52 0.3520680 0.6479320 0.29586394
## 10 10 Kendall 79210.87 0.3723667 0.6276333 0.25526652
Create a custom function to add a new column indicating if the share gap is male-favored (M) or female-favored (F).
This function set_MF()
will accept as its parameter a
vector containing the male_share
and the
female_share
of the “current” row. Use bracket notation and
the index to access each value.
It will return a relevant string or NA
.
set_MF <- function(vec) {
male <- vec[1]
female <- vec[2]
if (male == female) {
return(NA)
} else {
ifelse(male > female, "M", "F")
}
}
Create a second custom function to convert the values in the
gap
column from a difference in proportion to a raw number
of people (rounded).
This function get_num()
will also accept a vector for
its parameter; this time for the total
number of people for
the row and the percentage gap
. It will return the rounded
product of the two values.
get_num <- function(vec) {
total <- vec[1]
gap <- vec[2]
return(round(total * gap))
}
To apply the above custom functions, the data frame is first piped in
to the rowwise()
function below. Then via
mutate()
, each of the new/changed columns is set as equal
to the function with vectors of the column names as the arguments.
new_names <- names |>
rowwise() |>
mutate(
favor = set_MF(c(male_share, female_share)),
gap = get_num(c(total, gap))
)
print(head(new_names, 10))
## # A tibble: 10 × 7
## # Rowwise:
## X name total male_share female_share gap favor
## <int> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 1 Casey 176544. 0.584 0.416 29761 M
## 2 2 Riley 154861. 0.508 0.492 2366 M
## 3 3 Jessie 136382. 0.478 0.522 6046 F
## 4 4 Jackie 132929. 0.421 0.579 20967 F
## 5 5 Avery 121797. 0.335 0.665 40141 F
## 6 6 Jaime 109870. 0.562 0.438 13578 M
## 7 7 Peyton 94896. 0.434 0.566 12580 F
## 8 8 Kerry 88964. 0.484 0.516 2856 F
## 9 9 Jody 80401. 0.352 0.648 23788 F
## 10 10 Kendall 79211. 0.372 0.628 20220 F
Per the dplyr
documentation, rowwise()
“allows you to compute on a data frame a row-at-a-time. This is most
useful when a vectorised function doesn’t exist.”
So when a custom function requires multiple values from the same row
as input, and not just the current “cell” value, rowwise()
can be used to apply one (or many) function(s) to a data frame
easily.