Objective:

To demonstrate how to use the rowwise() function in the tidyverse dplyr package to apply multiple custom functions requiring contextual row information to a data frame.

Step 1:

Import the following dataset from FiveThirtyEight that represents the male and female shares of popular unisex names in the US.

names <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/unisex-names/unisex_names_table.csv")

print(head(names, 10))
##     X    name     total male_share female_share        gap
## 1   1   Casey 176544.33  0.5842866    0.4157134 0.16857313
## 2   2   Riley 154860.67  0.5076391    0.4923609 0.01527814
## 3   3  Jessie 136381.83  0.4778343    0.5221657 0.04433146
## 4   4  Jackie 132928.79  0.4211326    0.5788674 0.15773480
## 5   5   Avery 121797.42  0.3352131    0.6647869 0.32957385
## 6   6   Jaime 109870.19  0.5617929    0.4382071 0.12358580
## 7   7  Peyton  94896.40  0.4337194    0.5662806 0.13256125
## 8   8   Kerry  88963.93  0.4839488    0.5160512 0.03210231
## 9   9    Jody  80400.52  0.3520680    0.6479320 0.29586394
## 10 10 Kendall  79210.87  0.3723667    0.6276333 0.25526652

Step 2:

Create a custom function to add a new column indicating if the share gap is male-favored (M) or female-favored (F).

This function set_MF() will accept as its parameter a vector containing the male_share and the female_share of the “current” row. Use bracket notation and the index to access each value.

It will return a relevant string or NA.

set_MF <- function(vec) {
  male <- vec[1]
  female <- vec[2]

  if (male == female) {
    return(NA)
  } else {
    ifelse(male > female, "M", "F")
  }
}

Create a second custom function to convert the values in the gap column from a difference in proportion to a raw number of people (rounded).

This function get_num() will also accept a vector for its parameter; this time for the total number of people for the row and the percentage gap. It will return the rounded product of the two values.

get_num <- function(vec) {
  total <- vec[1]
  gap <- vec[2]
  
  return(round(total * gap))
}

Step 3:

To apply the above custom functions, the data frame is first piped in to the rowwise() function below. Then via mutate(), each of the new/changed columns is set as equal to the function with vectors of the column names as the arguments.

new_names <- names |>
  rowwise() |>
  mutate(
    favor = set_MF(c(male_share, female_share)),
    gap = get_num(c(total, gap))
)

print(head(new_names, 10))
## # A tibble: 10 × 7
## # Rowwise: 
##        X name      total male_share female_share   gap favor
##    <int> <chr>     <dbl>      <dbl>        <dbl> <dbl> <chr>
##  1     1 Casey   176544.      0.584        0.416 29761 M    
##  2     2 Riley   154861.      0.508        0.492  2366 M    
##  3     3 Jessie  136382.      0.478        0.522  6046 F    
##  4     4 Jackie  132929.      0.421        0.579 20967 F    
##  5     5 Avery   121797.      0.335        0.665 40141 F    
##  6     6 Jaime   109870.      0.562        0.438 13578 M    
##  7     7 Peyton   94896.      0.434        0.566 12580 F    
##  8     8 Kerry    88964.      0.484        0.516  2856 F    
##  9     9 Jody     80401.      0.352        0.648 23788 F    
## 10    10 Kendall  79211.      0.372        0.628 20220 F

Conclusion

Per the dplyr documentation, rowwise() “allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn’t exist.”

So when a custom function requires multiple values from the same row as input, and not just the current “cell” value, rowwise() can be used to apply one (or many) function(s) to a data frame easily.