1 Objective

Our goal is to extract the public comparables table from each workbook and transform it into a list of vectors that each contain all the observations corresponding to a heading. When finished it should look like the following:

1.1 COMP1

1.1.1 Insurance Brokers

1.2 COMP2

1.2.1 Insurance Brokers

1.2.2 Asset Management / Wealth Advisors

2 Isolating the Data

Observing the summary_cells below, we can see that there is one table with comparable data we need to extract for COMP1 and two tables in COMP2.

2.1 COMP1

2.1.1 Insurance Brokers

2.2 COMP2

2.2.1 Insurance Brokers

2.2.2 Asset Management / Wealth Advisors

To simplify it, we will focus on making a function to isolate and clean the data for the first table and then modify it so that it can be applied to grab all three tables we are interested in.

First, we will need to give our function an input that it can use to locate and isolate the correct table. All three of these tables have a corresponding table heading, so we can provide it with the data in the top left corner cell; we will use the ‘corner’ parameter to narrow our bag further later.

To begin the function, we will locate the general area of the summary_cells our data is located in so that we are not searching the entire sheet. We will do this with an inner_join to find the cell where “Valuation Metric #2” is located, since this demarks the start of the section on both COMP1 and COMP2.

xlsx_toggle=1

if (xlsx_toggle==1) {
  xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
  summary_cells=xlsx_obj %>% 
    filter(sheet == "Summary")
} else {
  xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
  summary_cells=xlsx_obj %>% # bag of cells for COMP2
    filter(sheet == "Valuation Summary")
}

bag=summary_cells %>% 
  select(row, col, character, numeric, date, data_type, local_format_id)

idx=bag %>% 
  regex_inner_join(
    "Valuation Metric #2" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)

Next, we will find where the section ends by using the local_format_id to find the next cell that matches the formatting of the “Valuation Metric #2” cell. We will set this cell to idx_2 and it marks where the next section begins.

idx_2=bag %>% 
  filter(
    data_type!="numeric" & row>idx$row & 
      (local_format_id==idx$local_format_id | 
         local_format_id==idx$local_format_id+1)
  ) %>% 
  slice(1)

Now we will change the upper index to where the subtable begins. We will find this using the corner parameter, filtering for the string in the bag of cells; finally, we will filter for the min row to return the first time we find the string, since this will be where the table begins. With the top left corner of the subtable located, we can filter the bag down.

idx=bag %>% 
  filter(row>=idx$row) %>% 
  regex_inner_join(
    str_glue("^{corner}$") %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(row, col, character, local_format_id) %>% 
  filter(row==min(row))

bag=bag %>% 
  filter(
    row>=idx$row, col>=idx$col
  ) %>% 
  select(row, col, character, numeric, date, data_type, local_format_id)

We will then fix up the bottom index by cutting it back to the bottom of the table. Since it now sits at the beginning of the next section, we know it has to come back up to the first data entry and also skip any extra rows at the bottom of our tables with source data or averages. For some tables the last row includes the string “Source:” and others end with “Mean”, so we can detect where those cells are located in the narrowed bag. Lastly, some tables have a row of \(NA\) above the source or mean row, so we can filter the heading column in the bag for \(NA\)s and take the tail value to find the gap between the last row and where the data ends, if it exists. This will be our final idx_2.

idx_2=bag %>% 
  filter(
    str_detect(character, "Source:|Mean .")
  ) %>% 
  select(row, col) %>% slice(1)

bag=bag %>% 
  filter(
    row>=idx$row, 
    row<idx_2$row, 
    col>=idx$col
  ) %>% 
  select(row, col, character, numeric, date, data_type)

idx_2=bag %>% 
  filter(
    col == min(bag$col), 
    !is.na(character)
  ) %>% 
  select(row, col) %>% tail(1)

With these steps complete, we know our function will now have determined where the top left cell of the table corresponding to the given corner is as well as the cell below bottom left of the table. We can use this to filter for our final bag.

if (xlsx_toggle==1) {
  bag=bag %>% 
    filter(
      row>=idx$row, 
      row<=idx_2$row, 
      col>=idx$col
    ) %>% 
    select(row, col, character, numeric, date, data_type)
} else { # if (xlsx_toggle==2) 
  bag=summary_cells %>% 
    filter(
      row>=idx$row-1, 
      row<=idx_2$row, 
      col>=idx$col
    ) %>% 
    select(row, col, character, numeric, date, data_type)
}

3 Extracting and Cleaning the Data

The next part of the function is where the isolated data will be cleaned up and transformed into our output. We begin by stripping the northern headings and assigning them to the data with behead. For COMP2, there are two rows of headings, and so we need to perform behead twice for any subsets that are from COMP2. As a final step, we can remove any values that do not have a heading, as these are just empty cells to the right of our table. We also need to remove the ticker column from COMP2 by filtering it out.

if(xlsx_toggle==2){
  bag=bag %>% 
    behead("N", name=outer_heading)
}

data=bag %>%
  behead("N", name=heading) %>%
  mutate(heading=heading %>% clean_names) %>%
  filter(heading!="na")

if(xlsx_toggle==2){data=data %>% filter(heading!="ticker")}

Continuing on, we will convert any “–” character values to \(NA\), convert the date column to character to prepare it for unpivotr::pack, and then combine the outer_heading and heading for subtables in COMP2. We can do each of these steps with a combination of mutate and map with various anonymous functions. The map function chosen will depend on what data is needed by the anonymous function; for converting "-- " to \(NA\) and dates to characters, we can use a simple map_chr to ensure the output is of the character type.

data=data %>%
  mutate(
    character=map_chr(
      character, 
      ~ifelse(.=="-- ", NA_real_, .))
  ) %>%
  mutate(date=map_chr(date, as.character))

For combining the outer_heading, we will use a pmap_chr, providing it with the piped data and telling it to combine the string in variable \(7\) and \(8\) (outer_heading and heading) provided that the outer_heading is not \(NA\). We will also put the new string through clean_names and then deselect the outer_heading.

if(xlsx_toggle==2){
  data=data %>%
    mutate(
      heading=pmap_chr(., 
        ~ifelse(!is.na(..7), paste(..7, ..8), ..8) %>% 
          clean_names
      )
    ) %>%
    select(-outer_heading)
}

Finally, we will gather the separated data type columns into one value column using pack and then transform the output into a list of vectors so that it conforms with the target output. After the unpivotr::pack function, we need to first split data$value by column, since each property is in a different column and then set_names of each list using the unique values of heading in data. Lastly, we will convert our list of lists into a list of vectors by iteratively unlisting and running each list through as.vector.

data=data %>% 
  pack()
  public_comparables=data$value %>% 
    split(data$col) %>% 
    set_names(data$heading %>% unique) %>% 
    lapply(unlist) %>% 
    lapply(as.vector) %>%
    list()

All together, the match.f function will look like this:

print(match.f)
function(corner, summary_cells, xlsx_toggle=1){
  # CREATE BAG
  bag=summary_cells %>% 
    select(row, col, character, numeric, date, data_type, local_format_id)
  
  # Set idx (index) to Valuation Metric #2
  idx=bag %>% 
    regex_inner_join(
      "Valuation Metric #2" %>% as_tibble, 
      by = c("character"="value")
    ) %>% 
    select(-value)
  
  # Set idx_2 to next Valuation Metric table or metric subtable
  idx_2 = bag %>% 
    filter(
      data_type!="numeric" & row>idx$row & 
        (local_format_id==idx$local_format_id | 
           local_format_id==idx$local_format_id+1)
    ) %>% 
    slice(1)
  
  # Set idx to top left of sub table (corner value)
  idx=bag %>% 
    filter(row>=idx$row) %>% 
    regex_inner_join(
      str_glue("^{corner}$") %>% as_tibble, 
      by = c("character"="value")
    ) %>% 
    select(row, col, character, local_format_id) %>% 
    filter(row==min(row))
  
  bag=bag %>% 
    filter(row>=idx$row, col>=idx$col) %>% 
    select(row, col, character, numeric, date, data_type, local_format_id)
  
  # Cut bag back to Source/Mean row
  idx_2 = bag %>% 
    filter(str_detect(character, "Source:|Mean .")) %>% 
    select(row, col) %>% 
    slice(1)
  
  bag=bag %>% 
    filter(row>=idx$row, row<idx_2$row, col>=idx$col) %>% 
    select(row, col, character, numeric, date, data_type)
  # Cut back any NA rows left (asset management/ wealth advisors)
  idx_2 = bag %>% 
    filter(col == min(bag$col), !is.na(character)) %>% 
    select(row, col) %>% 
    tail(1)
  
  # Final data set
  if (xlsx_toggle==1) {
    bag=bag %>% 
      filter(row>=idx$row, row<=idx_2$row, col>=idx$col) %>% 
      select(row, col, character, numeric, date, data_type)
  } else { # if (xlsx_toggle==2) 
    bag=summary_cells %>% 
      filter(row>=idx$row-1, row<=idx_2$row, col>=idx$col) %>% 
      select(row, col, character, numeric, date, data_type)
  }
  
  if(xlsx_toggle==2){
    # If bag 2, there are two lines of heading blocks 
    bag = bag %>% behead("N", name = outer_heading)
  }
  
  # behead data, remove ticker, filter out nas, change -- to NA, and convert 
  # date to character
  data=bag %>%
    behead("N", name = heading) %>%
    mutate(heading = heading %>% clean_names) %>%
    filter(heading!="na")
  
  if(xlsx_toggle==2){data = data %>% filter(heading!="ticker")}
  
  data = data %>%
    mutate(
      character = map_chr(
        character, 
        ~ifelse(.=="-- ", NA_real_, .)
      )
    ) %>%
    mutate(date = map_chr(date, as.character))
  
  # If comp 2, fuse two lines of heading blocks 
  if(xlsx_toggle==2){
    data=data %>%
      mutate(
        heading = pmap_chr(
          ., 
          ~ifelse(!is.na(..7), paste(..7, ..8), ..8) %>% 
            clean_names
        )
      ) %>%
      select(-outer_heading)
  }
  data = data %>% pack()

  public_comparables=data$value %>% 
    split(data$col) %>% 
    set_names(data$heading %>% unique) %>% 
    lapply(unlist) %>% 
    lapply(as.vector) %>%
    list()
  
  if (xlsx_toggle==1){
    # map(plyr::rename, c("company" = "business"))
    nm=c("business", "exchange") %>% set_names(c("company", "ticker"))
  } else { #  if (xlsx_toggle==2)
    # xlsx_toggle=2
    # map(plyr::rename, c("company" = "business"))
    nm=c(rep("business", 2), "ev", "ev_ebitda") %>%
      set_names(c("insurance_brokers", "asset_management_wealth_advisors", 
                  "tev", "tev_ltm_ebitda"))
  }
  public_comparables=public_comparables %>% map(plyr::rename, nm)
  
}
<bytecode: 0x7fed09268e08>

Now that we have a function created, we just need to set public_comparables equal to the output of match.f for each corner of the desired subtables. For COMP1, we can simply set the output of the function with the parameter “Company” to public_comparables. For COMP2, we can save the lists output by running “Insurance Brokers” and “Asset Management / Wealth Advisors” through match.f and then set public_comparables equal to a list of the outputs.

The resulting lists from running this code appear for COMP1 and COMP2 as below:

3.1 COMP1

xlsx_toggle=1
xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
summary_cells=xlsx_obj %>% filter(sheet == "Summary")

3.1.1 Insurance Brokers

corner="Company"
public_comparables=match.f(corner, summary_cells, xlsx_toggle)
public_comparables %>%
  map_dfc(~.x) %>%
  format.dt.f(.)

3.2 COMP2

xlsx_toggle=2
xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
summary_cells=xlsx_obj %>% filter(sheet == "Valuation Summary")

3.2.1 Insurance Brokers

corner="Insurance Brokers"
public_comparables=match.f(corner, summary_cells, xlsx_toggle)
public_comparables %>%
  map_dfc(~.x) %>%
  format.dt.f(.)

3.2.2 Asset Management / Wealth Advisors

corner="Asset Management / Wealth Advisors"
public_comparables=match.f(corner, summary_cells, xlsx_toggle)
public_comparables %>%
  map_dfc(~.x) %>%
  format.dt.f(.)

The last steps are to update the master valuation list by using list_modify and appending ma_comparables to the end of our existing master list.

# provide heading for top left corner of table, produces a list; fuse for 2 since 
# there are two partitions
if (xlsx_toggle==1){
  corner="Company"
  public_comparables=match.f(corner, summary_cells, xlsx_toggle) %>%
    set_names(c("insurance_brokers"))

}else if (xlsx_toggle==2){
  corner="Insurance Brokers"
  l1=match.f(corner, summary_cells, xlsx_toggle)
  corner="Asset Management / Wealth Advisors"
  l2=match.f(corner, summary_cells, xlsx_toggle)
  public_comparables=flatten(list(l1, l2)) %>%
    set_names(c("insurance_brokers", "asset_management_wealth_advisors"))
}
# public_comparables %>% bind_rows(.id="sector")

valuation=list_modify(
  valuation, 
  public_comparables=public_comparables
)