Our goal is to extract the M&A comparables table from each workbook and transform it into a list of vectors that each contain all the observations corresponding to a heading. When finished it should look like the following:
To start, we need to isolate the table of cells containing the
headings and corresponding data. The first step in doing so is to use
regex_inner_join
to find the top left of the table through
joining by the title cell used in both sheets, which is “Valuation
Metric #3”. This gives us a general idea of where the table we need is
located on the sheet, and so we can filter the
summary_cells
for rows at our above the index cell. Next,
we can narrow the bag further by locating the bottom row of the
table.
For COMP1
, the bottom index cell,
idx_2, can be found \(32\) rows below the first so we can simply
add \(32\) to the index row and take
the column from the idx.
In COMP2
, the first row we do not want to be included is
where the heading “Mean” is located; to identify this we will have to
use regex ^...$
with an inner_join
to
make sure we do not capture cells without the capitalization. With the
index for the top and bottom of the table for COMP2
, we can
filter the summary_cells
for rows \(2\) below the idx row,
where the headings begin, and rows above where idx_2
is. One last step is to remove the subtitle row in COMP2
by
filtering out row \(276\); this will
allow us to use unpivotr
functions unhindered.
xlsx_toggle=1
if (xlsx_toggle==1) {
xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
summary_cells=xlsx_obj %>%
filter(sheet == "Summary")
} else {
xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
summary_cells=xlsx_obj %>% # bag of cells for COMP2
filter(sheet == "Valuation Summary")
}
bag=summary_cells %>%
select(row, col, character)
idx=bag %>%
regex_inner_join(
"Valuation Metric #3" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
bag=summary_cells %>%
filter(row>=idx$row)
if (xlsx_toggle==1) {
idx_2=tibble(row=idx$row+32, col=idx$col)
bag=summary_cells %>%
filter(row>=idx$row+2, row<=idx_2$row-1, col>=idx$col) %>%
select(row, col, character, numeric, date, data_type)
} else {
idx_2=bag %>%
regex_inner_join(
"^Mean$" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
bag=summary_cells %>%
filter(row>=idx$row+2, row<=idx_2$row-1, col>=idx$col) %>%
select(row, col, character, numeric, date, data_type)
#Remove subtitle row
bag = bag %>% filter(row!=276)
}
bag %>%
rectify(.) %>%
format.dt.f()
With the table set as bag, we can begin to organize
the data and conform it to our target list. We can see that the top row
of the table we have extracted are the headings; we will use
behead
to assign these headings as a variable to each
observation; this will help us later in determining which values to
place in which vector. We can take the time now to apply
clean_names
to our heading variable column.
data=bag %>%
behead("N", name = heading) %>%
mutate(heading = heading %>% clean_names)
data %>%
slice(1:5, 35) %>%
format.dt.f()
From the sample of our data table so far, we can see there are a few
problems to be cleaned up. First, we can remove any observations with a
heading of ‘na’ by using filter
. Next, we
need to change all date values to character types to prepare it for
unpivotr::pack
by combining mutate
and
map_chr
with the function as.character
.
Finally, we need to convert ‘–’ to NA
and then change data_type variable for those observations to
numeric, since enter_value needs to be a vector of doubles in
our final output. We can do this by first using mutate
and
map_chr
with an anonymous function that checks if the
character is equal to “–” and then changes it to
NA if it is or leaves it untouched if it is not. Lastly
we can run our result through unpivotr::pack
to pivot the
various data type variables and their observations into one
variable.
data = data %>%
filter(heading!="na") %>%
mutate(
date = map_chr(date, as.character)
) %>%
mutate(
character = map_chr(
character,
~ifelse(.=="-- ", NA_real_, .)
)
) %>%
mutate(
data_type = pmap_chr(
.,
~ifelse(is.na(..3) & is.na(..5), "numeric", ..6)
)
) %>%
pack()
data %>%
format.dt.f()
The next step will be transforming this tibble into a list of vectors
to conform to the target output. We will first pull
data$value
and then split it by col, as
each property is in a different column. We will then apply
set_names
using the unique heading values from data.
ma_comparables = data$value %>%
split(data$col) %>%
set_names(data$heading %>% unique)
Now we are left with a list of lists which looks like:
We need to change this list of lists into a list of vectors; to do
this we can iteratively apply an unlist
and an
as.vector
:
ma_comparables = ma_comparables %>%
lapply(unlist) %>%
lapply(as.vector)
The resulting lists from running this code appear below:
The last steps are to update the master valuation list by using
list_modify
and appending ma_comparables to the
end of our existing master list.
valuation=list_modify(valuation, ma_comparables=ma_comparables)