1 Objective

Our goal is to extract the M&A comparables table from each workbook and transform it into a list of vectors that each contain all the observations corresponding to a heading. When finished it should look like the following:

2 Isolating the Data

To start, we need to isolate the table of cells containing the headings and corresponding data. The first step in doing so is to use regex_inner_join to find the top left of the table through joining by the title cell used in both sheets, which is “Valuation Metric #3”. This gives us a general idea of where the table we need is located on the sheet, and so we can filter the summary_cells for rows at our above the index cell. Next, we can narrow the bag further by locating the bottom row of the table.

For COMP1, the bottom index cell, idx_2, can be found \(32\) rows below the first so we can simply add \(32\) to the index row and take the column from the idx.

In COMP2, the first row we do not want to be included is where the heading “Mean” is located; to identify this we will have to use regex ^...$ with an inner_join to make sure we do not capture cells without the capitalization. With the index for the top and bottom of the table for COMP2, we can filter the summary_cells for rows \(2\) below the idx row, where the headings begin, and rows above where idx_2 is. One last step is to remove the subtitle row in COMP2 by filtering out row \(276\); this will allow us to use unpivotr functions unhindered.

xlsx_toggle=1

if (xlsx_toggle==1) {
  xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
  summary_cells=xlsx_obj %>% 
    filter(sheet == "Summary")
} else {
  xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
  summary_cells=xlsx_obj %>% # bag of cells for COMP2
    filter(sheet == "Valuation Summary")
}

bag=summary_cells %>% 
  select(row, col, character)

idx=bag %>% 
  regex_inner_join(
    "Valuation Metric #3" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)

bag=summary_cells %>% 
  filter(row>=idx$row)

if (xlsx_toggle==1) {
  idx_2=tibble(row=idx$row+32, col=idx$col)
  bag=summary_cells %>% 
    filter(row>=idx$row+2, row<=idx_2$row-1, col>=idx$col) %>% 
    select(row, col, character, numeric, date, data_type)
} else {
  idx_2=bag %>% 
  regex_inner_join(
    "^Mean$" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)
  
  bag=summary_cells %>% 
    filter(row>=idx$row+2, row<=idx_2$row-1, col>=idx$col) %>% 
    select(row, col, character, numeric, date, data_type)
  #Remove subtitle row
  bag = bag %>% filter(row!=276)
}

bag %>% 
  rectify(.) %>%
  format.dt.f()

3 Extracting and Cleaning the Data

With the table set as bag, we can begin to organize the data and conform it to our target list. We can see that the top row of the table we have extracted are the headings; we will use behead to assign these headings as a variable to each observation; this will help us later in determining which values to place in which vector. We can take the time now to apply clean_names to our heading variable column.

data=bag %>%
  behead("N", name = heading) %>%
  mutate(heading = heading %>% clean_names)

data %>% 
  slice(1:5, 35) %>%
  format.dt.f()

From the sample of our data table so far, we can see there are a few problems to be cleaned up. First, we can remove any observations with a heading of ‘na’ by using filter. Next, we need to change all date values to character types to prepare it for unpivotr::pack by combining mutate and map_chr with the function as.character. Finally, we need to convert ‘–’ to NA and then change data_type variable for those observations to numeric, since enter_value needs to be a vector of doubles in our final output. We can do this by first using mutate and map_chr with an anonymous function that checks if the character is equal to “–” and then changes it to NA if it is or leaves it untouched if it is not. Lastly we can run our result through unpivotr::pack to pivot the various data type variables and their observations into one variable.

data = data %>%
  filter(heading!="na") %>%
  mutate(
    date = map_chr(date, as.character)
  ) %>%
  mutate(
    character = map_chr(
      character, 
      ~ifelse(.=="-- ", NA_real_, .)
    )
  ) %>%
  mutate(
    data_type = pmap_chr(
      ., 
      ~ifelse(is.na(..3) & is.na(..5), "numeric", ..6)
    )
  ) %>%
  pack()

data %>%
  format.dt.f()

4 Transforming the Data

The next step will be transforming this tibble into a list of vectors to conform to the target output. We will first pull data$value and then split it by col, as each property is in a different column. We will then apply set_names using the unique heading values from data.

ma_comparables = data$value %>% 
  split(data$col) %>% 
  set_names(data$heading %>% unique)

Now we are left with a list of lists which looks like:

We need to change this list of lists into a list of vectors; to do this we can iteratively apply an unlist and an as.vector:

ma_comparables = ma_comparables %>% 
  lapply(unlist) %>% 
  lapply(as.vector)

The resulting lists from running this code appear below:

The last steps are to update the master valuation list by using list_modify and appending ma_comparables to the end of our existing master list.

valuation=list_modify(valuation, ma_comparables=ma_comparables)