Our goal is to extract the Long Term Financial Performance
table from each workbook and transform it into a vector of two vectors
detailing financial performance and covenant ratios. When finished, our
output should look like the following for COMP1
and
COMP2
:
Our first step is to examine the workbook imported into
summary_cells
and filter the rows to focus in on the
‘valuation: financial performance’ table. The table we are looking for
lies between rows \(129\) to \(162\) for COMP1
and rows \(169\) to \(219\) for COMP2
and we will
also select the row, col, character,
numeric, date, and data_type from the cells
as they will to be used with unpivotr
functions. We can
then apply the remove_blank_columns()
function to the
narrowed bag to clean up the data.
remove_blank_columns <- function(bag){
has_multipletypes <- bag %>%
select(col, data_type) %>%
unique() %>%
select(col) %>%
count(col) %>%
filter(n>1) %>%
select(col)
has_noblanks <- bag %>%
select(col, data_type) %>%
unique() %>%
filter(data_type!="blank") %>%
select(col) %>%
unique()
to_keep <- full_join(has_noblanks, has_multipletypes) %>%
arrange(col)
n_newcols <- to_keep$col %>% length()
df_keycol <- bind_cols(tibble(1:n_newcols), to_keep)
table <- bag %>%
filter(col %in% to_keep$col) %>%
left_join(df_keycol) %>%
mutate(col = `1:n_newcols`) %>%
select(-`1:n_newcols`)
# rm(has_multipletypes, has_noblanks, to_keep, n_newcols, df_keycol)
table
}
xlsx_toggle=1
xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
summary_cells=xlsx_obj %>% filter(sheet == "Summary")
summary_cells %>%
filter(row>=129, row<=162) %>%
select(row, col, character, numeric, date, data_type) %>%
remove_blank_columns() %>%
rectify() %>%
format.dt.f(.)
xlsx_toggle=2
xlsx_obj=file.path(path_data, "RNFC_COMP2") %>% readRDS
summary_cells=xlsx_obj %>% filter(sheet == "Valuation Summary")
summary_cells %>%
filter(row>=169, row<=219) %>%
select(row, col, character, numeric, date, data_type) %>%
remove_blank_columns() %>%
rectify() %>%
format.dt.f(.)
There are also a few other variables unique to each file that need to
be saved to help the procedures that will be applied later to be
adaptable to either sheet. First, we need to save
'headings_t'
, which will be a tibble of the headings we
want our final output to have. Second, we need to save the column where
our headers are located, which we can do with an inner_join
to find the cell where the
'Long Term Financial Performance'
title is located and then
grab and save the col value. Third, we need to locate
and save the rows where there are northern headings for the
dates. We can find these period rows by filtering for observations in
the bag with ’data_type’s of date and then applying
slice(1)
to get the row of the first period row while
applying tail(1)
to get the row of the second period row,
assuming one exists.
Lastly, we need to create a table that will serve as a key to join
the headings existing in the bag to our desired heading names; the
process will be very similar to the one employed by the
remove_blank_columns()
function. We start by creating a
tibble with a column for headings_t
and another column that
will have values that will join to the corresponding heading in the
bag.
if(xlsx_toggle==1){
bag <- summary_cells %>%
filter(row>=129, row<=162) %>%
select(row, col, character, numeric, date, data_type) %>%
remove_blank_columns()
headings_t <- tibble(
names = c(
"period", "revenue", "adj_ebitda", "acq_ebitda",
"capex", "acq_capex", "opco_debt_ebitda",
"total_debt_ebitda", "ltm_ebitda_cash_interest"
)
)
header_col <- regex_inner_join(
bag,
"Long Term Financial Performance" %>%
enframe(name = NULL),
by = c("character"="value")
) %>%
pull(col)
period_row1 <- filter(bag,data_type=="date") %>%
slice(1) %>%
pull(row)
period_row2 <- filter(bag, data_type=="date") %>%
tail(1) %>%
pull(row)
# Filter for financial performance and covenant headings
headings_key <- tibble(
filter = c(
"Period", "Revenue", "Adj. EBITDA", "Acq. EBITDA",
"CapEx", "Acq. Capex", "Total Secured Debt / EBITDA",
"Total Net Debt / EBITDA", "EBITDA / Cash Interest"
),
names = headings_t$names
)
}else{ # if (xlsx_toggle==2)
bag <- summary_cells %>%
filter(row>=169, row<=219) %>%
select(row, col, character, numeric, date, data_type) %>%
filter(col!=7 & col!=8) %>%
remove_blank_columns()
headings_t <- tibble(
names = c(
"period", "revenue", "adj_ebitda", "acq_ebitda",
"capex", "acq_capex", "opco_debt_ebitda",
"total_debt_ebitda", "ltm_ebitda_cash_interest"
)
)
header_col <- regex_inner_join(
bag,
"Long Term Financial Performance" %>% enframe(name = NULL),
by = c("character"="value")
) %>%
pull(col)
period_row1 <- filter(
bag,
data_type=="date"
) %>%
slice(1) %>%
pull(row)
period_row2 <- filter(
bag,
data_type=="date"
) %>%
tail(1) %>%
pull(row)
# Headings Key - filter headings and target headings
headings_key <- tibble(
filter = c(
"Period", "Revenue", "Run", "Acquired EBITDA", "CapEx",
"Acquisition Capex", "Projected Opco Debt / RR EBITDA",
"Projected Total Debt", "LTM EBITDA / Cash Interest"
),
names = headings_t$names
)
}
headings_key %>%
format.dt.f()
We also need to make a few changes specific to whether we are working
with COMP1
or COMP2
. The first thing we need
to do is add the missing 'period'
header to the bag. We
know this should be in the column recorded in header_col
and in the row number saved in period_row1
. Examining the
bag, these are observations \(43\) and
\(31\) for COMP1
and
COMP2
, respectively, and so we can set the character value
to “Period” to set the heading while also making sure to set the
data_type
to 'character'
.
# Add period heading
# bag %>% filter(col==header_col, row==period_row1)
if(xlsx_toggle==1){
bag$character[43] <- "Period"
bag$data_type[43] <- "character"
} else { # if(xlsx_toggle==2)
bag$character[31] <- "Period"
bag$data_type[31] <- "character"
}
We also need to check for and remove any duplicate rows containing
the dates. We will do this by checking if period_row1
and
period_row2
are different, and if they are we will remove
the data from the row recorded in period_row2
. This will
remove the duplicate Northern heading row from
COMP2
.
# Remove duplicate period row
if (period_row1!=period_row2){
bag <- bag %>%
filter(row!=period_row2)
}
Now that sheet specific changes are made and variables that are sheet
dependent are saved, we can move on to extracting the data we need and
transforming it to match our target output. The goal from here is to use
the unpivotr::enhead()
function to assign headings to the
data, which can then be extracted into our list format.
The first step to achieving this is to filter for and save the
header. We will filter for only values in the header row using
header_col
. Then, we will remove the headings for summary
values, since they are unneeded; simply filtering out values including
'%Growth'
, 'vs. Projected'
, and
'% of Revenue'
will eliminate these headers from both
sheets. The last step in completing our header is to clean the names; to
do this we will use an inner_join
, joining our data with
the 'headings_key'
by the filter column. This will be the
header we use with enhead()
.
# Filter for desired headings to be used with enhead()
header <- bag %>%
filter(
col==header_col,
character!="% Growth",
character!="vs. Projected",
character!="% of Revenue"
) %>%
regex_inner_join(headings_key, by = c("character"="filter")) %>%
select(row, col, names)
header %>%
format.dt.f()
Examining the header, we can see there are two rows for each heading besides period, this is because historical and projected values are on different rows. Now that they have the same header name though, it will make them easier to combine into a single vector of data later on.
The next step will be to filter for and save the data. To get the
data we will filter for rows from the period row and beyond, since this
is the top row in both sheets, and then right_join
with the
rows in the header to filter out any values that do not correspond to a
non-summary heading.
# Filter for data cells to be used with enhead()
cells <- bag %>%
filter(
row>=period_row1,
col>header_col
) %>%
right_join(header %>% select(row), by = "row")
cells %>%
rectify() %>%
format.dt.f()
The third step will be to use the unpivotr::enhead
function to assign the cells to a header. We know the headings are on
the West side of the data, so we can give the enhead the
'W'
direction. We also need to filter out the columns where
only period information is present but not any other values; in both
sheets there is only data for \(7\)
columns after the header so we can filter out any columns past that.
# Use enhead() to assign data cells to related heading
table <- enhead(cells, header, "W") %>%
filter(col<(header_col+8))
table %>%
rectify() %>%
format.dt.f()
We also need to remove various user inputs meant to indicate \(NA\) that have been captured in the table.
# Remove NA various user inputs
table$character[table$character == "n/a"] <- NA
table$character[table$character == "--"] <- NA
table$character[table$character == "NA"] <- NA
Next, we will save the period values in their own tibble to be added back later since it is a data field present in the target vector output but we will be beheading it for the purpose of manipulating the other data fields into an easy to extract format.
# Save period
period <- table %>%
filter(names=="period") %>%
select(date) %>%
rename(period = date)
period %>%
format.dt.f()
Now, with the enheaded
and cleaned data, we can behead
and spatter the period row to see the data organized by year for each
heading in the names column.
# Behead and spatter by period
table <- table %>%
behead("N", period) %>%
filter(!is.na(numeric)) %>% arrange(names) %>%
select(names, numeric, period, data_type) %>%
spatter(period)
table %>%
format.dt.f()
Taking a look at the table, you might notice that each row is exactly
what we are after for our target output, with the first column
representing what we want to name each list of values. Thus, we can
simply extract these by importing the table again with
as_cells()
. However, we first need to take care of any
missing headings that should be in our target output. We can add these
headings using add_row()
, adding any missing headings in
the sheet before rearranging the names by converting them to a factor
and arranging by the levels we set.
# Arrange by target output headings
if(xlsx_toggle==1){
# Add missing rows "acq_ebitda" "acq_capex"
table <- table %>%
add_row(names = "acq_ebitda") %>%
add_row(names = "acq_capex") %>%
mutate(
names = factor(
names,
levels = c(
"revenue", "adj_ebitda", "acq_ebitda", "capex",
"acq_capex", "opco_debt_ebitda", "total_debt_ebitda",
"ltm_ebitda_cash_interest"
)
)
) %>%
arrange(names)
}else{ # if(xlsx_toggle==2)
# Add missing rows "opco_debt_ebitda" "total_debt_ebitda"
table <- table %>%
add_row(names = "opco_debt_ebitda") %>%
add_row(names = "total_debt_ebitda") %>%
mutate(names = factor(
names,
levels = c(
"revenue", "adj_ebitda", "acq_ebitda", "capex",
"acq_capex", "opco_debt_ebitda", "total_debt_ebitda",
"ltm_ebitda_cash_interest"
)
)
) %>%
arrange(names)
}
table %>%
format.dt.f()
Now the table displays exactly what we need and we can run it through
as_cells
and then behead the West heading.
valuation_master <- table %>%
as_cells %>%
behead("W", name = "heading") %>%
select(col, dbl, heading)
valuation_master %>%
format.dt.f()
Next, we can spatter the heading, which will give us the corresponding data in each column ordered by date:
valuation_master <- valuation_master %>%
spatter(key = heading, values = dbl) %>%
select(-col)
valuation_master %>%
format.dt.f()
Finally, we bind the period tibble to the
valuation_master
table with bind_cols
, which
will add the period column on the left of the table:
valuation_master = period %>%
bind_cols(valuation_master)
valuation_master %>%
format.dt.f()
Now we are ready to transform the extracted data into our target
output. We will do this by saving a table for financial performance and
a table for covenants through use of select()
. We will then
save the valuation output as a list of the two tibbles.
# Separate master tibble into financial info and covenants and create a vector containing both as lists
valuation_financial <- valuation_master %>%
select(period, revenue, adj_ebitda, acq_ebitda, capex, acq_capex)
valuation_covenants <- valuation_master %>%
select(-revenue, -adj_ebitda, -acq_ebitda, -capex, -acq_capex)
valuation = list(
financial_performance = valuation_financial,
coverage_covenants = valuation_covenants
)
valuation_financial %>%
format.dt.f()
valuation_covenants %>%
format.dt.f()
This leaves our valuation list identical to the target valuation output.