Our goal is to extract the commentary from each workbook as well as valuation summary tables and enter them into a named list of values, lists, and tibbles. We will extract and clean the data for each section and, when finished, it will match the following:
We will first focus on pulling the commentary for risk classification, valuation impairment, conclusions, strengths, and concerns. Before proceeding, we will quickly set up out bag of cells to include the variables we need and then create a list of headings for naming the list of values we extract.
xlsx_toggle=2
if (xlsx_toggle==1){
xlsx_obj <- file.path(path_data, "RNFC_COMP1") %>% readRDS
bag=xlsx_obj %>% # bag of cells for COMP1
filter(sheet == "Summary") %>%
select(row, col, character, numeric, date, data_type)
} else{
xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
bag=xlsx_obj %>% # bag of cells for COMP2
filter(sheet == "Valuation Summary") %>%
select(row, col, character, numeric, date, data_type)
}
headings = c(
"risk_classification", "valuation_impairment",
"conclusion", "strengths", "concerns"
)
Next we will begin extracting the commentary by using
unpivotr::partition
. For the corners argument, we will need
to locate the \(5\) tables using an
inner_join
to join any cells with character values that
start with “Step”, followed by any number, and then a colon “:”; we can
achieve this with regex using the ., which will accept any
character, by searching for “Step .:”. This will grab us the tables for
our first three headings, which we can then use bind_rows with
inner_join
results searching for cells with character
values equal to “Investment Strengths” and “Investment Risks”; we will
use the regex ^
and $
to search for the exact
string.
# Partition -> tibble of corners
corners = bag %>%
regex_inner_join(
"Step .:" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value) %>%
filter(character!="Step 3: Market / Security Analysis")
corners=bind_rows(
corners,
bag %>%
regex_inner_join(
"^Investment Strengths$" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value))
corners=bind_rows(
corners,
bag %>%
regex_inner_join(
"Investment Risks" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value))
corners %>%
format.dt.f()
We will then use our bag, filtered for values in the corner columns since the data we want to grab is in the cells underneath each heading, and corners as arguments in the partition function.
partitions=partition(
bag %>%
filter(col %in% corners$col),
corners
)
partitions %>%
listviewer::jsonedit(.)
The results will have our corners in the character variable with the corresponding table stored in cells. Our job will be to extract the desired values from the tibbles in cells.
From each tibble in the list, we can do this by filtering for the
unicode number for the black right-pointing pointer
(25ba). This will pick out most values, however there
are a couple unwanted values it grabs so we can exclude them by adding
str_detects
with negate=TRUE
to our filter.
Below, we can see these steps working on the first tibble, for
determination of risk classification.
partitions$cells[[1]] %>%
filter(
str_detect(.$character, "\u{25ba}") &
str_detect(.$character, "Mezzanine", negate=TRUE) &
str_detect(.$character, "Equity Valuation", negate=TRUE)
) %>%
format.dt.f()
We can iteratively apply these steps to every tibble in the list with
lapply
. We will also take it a step further by using
lapply
to select just the character value from each tibble
and then flatten the resulting list of tibbles to transform it into a
list of vectors. Then we can use set_names
with our
headings and the function clean.sentence
on the values to
return our desired list.
clean.sentence <- function(text){
# text = gsub('[[:punct:]]', '', text)
text = gsub('[[:cntrl:]]', '', text)
# text = gsub('\\d+', '', text)
text = gsub("@\\w+ *", "", text)
text = gsub("\\s+", " ", str_trim(text))
text
}
commentary = partitions$cells %>%
lapply(
. %>%
filter(
str_detect(.$character, "\u{25ba}") &
# Remove values not belonging to table; the
# play button code (>) filters well,
# but doesn't remove all irrelevant values
str_detect(.$character, "Mezzanine", negate=TRUE) &
str_detect(.$character, "Equity Valuation", negate=TRUE)
)
) %>%
lapply(. %>% select(character)) %>%
flatten() %>%
set_names(headings) %>%
lapply(clean.sentence)
commentary %>%
listviewer::jsonedit()
Now that we have all of the desired information, we just need to clean up the list and reorder some entries.
Our first step will be to subset strengths and concerns into a list
named “strengths_concerns”. We will do this by creating a new entry into
our commentary list equal to a list of the two and then
set_names
again; following that, we will drop our original
strengths and concerns from the commentary list.
commentary$strengths_concerns = list(
commentary$strengths,
commentary$concerns
)
commentary$strengths_concerns = purrr::set_names(
commentary$strengths_concerns,
c("strengths", "concerns")
)
commentary = commentary[
names(commentary) %in% c("strengths", "concerns") == FALSE
]
commentary %>%
listviewer::jsonedit()
In the list we are trying to recreate, some of the values are mixed
up from their order in the sheets, and so we can reorder our list. For
the conclusion entry, COMP2
needs the two list values
swapped in their order so that Equity is first and
Preferred Equity commentary second. We can do this by saving
the original list item temporarily as x
and then re-saving
commentary$conclusion
with the desired order of list
items.
# Rearrange list items
x=commentary$conclusion
if (xlsx_toggle==2){commentary$conclusion = c(x[2], x[1])}
commentary$conclusion %>%
format.dt.f()
We need to do this for COMP1
and COMP2
for
concerns as well, since the first value needs to be at the end of the
list for each to match the target.
x=commentary$strengths_concerns$concerns
if (xlsx_toggle==1) {
commentary$strengths_concerns$concerns = c(x[2], x[3], x[4], x[1])
} else {
commentary$strengths_concerns$concerns = c(x[2], x[3], x[4], x[5], x[1])
}
With the first four items in our list extracted and saved, we can move on to calculations.
Our output will now match the the target for the first four entries, and will look like the lists below:
commentary %>%
listviewer::jsonedit()
The calculations section is a list within our commentary that will be a combination of extracted values from the sheets as well as calculated values. When finished it should match the target, which looks like:
We will begin by setting up the list with names for which values will be entered as we go.
if(xlsx_toggle==1){
commentary$calculations=vector("list", 4)
commentary$calculations=purrr::set_names(
commentary$calculations,
c("valuation_summary", "mezzanine_index",
"mezzanine_yield", "implied_dollar_price")
)
}else{
commentary$calculations=vector("list", 5)
commentary$calculations=purrr::set_names(
commentary$calculations,
c("valuation_summary", "implied_dollar_yield", "mezzanine_index",
"mezzanine_yield", "implied_dollar_price")
)
}
commentary$calculations %>%
listviewer::jsonedit()
The valuation summary tibble will simply come from the lists in valuation$summary bound together row-wise with each list being identified in a new variable called method, which will take its value from the name of each list.
commentary$calculations$valuation_summary=valuation$summary[7:10] %>%
bind_rows(.id="method")
commentary$calculations$valuation_summary %>%
format.dt.f(.)
Next, we will enter in a list of values for
implied_dollar_yield for COMP2
by extracting the
values from the corresponding sheet. Since this is only for
COMP2
, we will only proceed with these steps if
xlsx_toggle==2
.
First we will locate the table by performing an
inner_join
with “implied dollar price” to find the cell for
the top left of the table. Next we will filter for columns and rows
greater or equal to that value and search for Yield, since this
is the heading for this table; conveniently, in this case, it is also
the heading for the last value and so we do not have to find the last
row of the table. With this search, we will have the location of the top
left data cell and bottom left data cell for the table in the
idx tibble.
if(xlsx_toggle==2){
idx=bag %>%
regex_inner_join(
"implied dollar price" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
idx = bag %>%
filter(col>idx$col, row>=idx$row) %>%
regex_inner_join(
"Yield" %>% as_tibble,
by = c("character" = "value")
) %>% select(-value)
idx %>%
format.dt.f(.)
}
Now we can work on grabbing and naming the data values. To do this,
we can bind the column containing headings with the column containing
the corresponding data. To get the the headings values, we can filter
the bag for character values below our upper left corner and above our
bottom left corner; we will filter for values above the bottom left
because we want to exclude the yield value for now anyways. To extract
the corresponding data, we can filter the bag by rows between the two
identified corners but in the column two ahead; we can go ahead and
deselect character since no values are of character type. We can then
bind these two tibbles together column-wise with bind_cols
.
We will then filter out Accreted price and Accretion,
convert our dates to characters with mutate, clean our headings with
clean_names
, and then rename our headings variable from
character to field.
if(xlsx_toggle==2){
idx = bag %>%
filter(
col==idx$col[1], row>idx$row[1] & row<idx$row[2]
) %>%
select(character) %>%
bind_cols(
bag %>%
filter(
col==idx$col[1]+2, row>idx$row[1] & row<idx$row[2]
) %>%
select(-character)
) %>%
filter(character!=c("Accreted price", "Accretion")) %>%
mutate(
date=as.character(date),
character = clean_names(character)
) %>%
rename(field = character)
idx %>%
format.dt.f()
}
Next, we will work on converting the above tibble to a named list. To
do this, we will first use unpivotr::pack
to combine the
date and numeric values into one single column. Next, we will use
pivot_wider
with our values spread out under their
corresponding field. Before this step, we need to deselect the row,
since the field is already being used as the names identifier for
pivot_wider
. To clean up, we will use map_df
with unlist
to restore the data types that were converted
to lists in the pack function.
if(xlsx_toggle==2){
commentary$calculations$implied_dollar_yield =
idx %>%
pack() %>%
select(-row) %>%
pivot_wider(names_from = field, values_from = value) %>%
map_df(unlist) %>%
select(-col) %>%
mutate(
settlement=as.character(settlement),
investment_date=as.character(investment_date),
maturity=as.character(maturity),
rate=as.numeric(rate),
price=as.numeric(price),
redemption_price=as.numeric(redemption_price),
frequency=as.numeric(frequency),
basis=as.numeric(basis)
) #%>% as.list()
commentary$calculations$implied_dollar_yield %>%
map_dfc(~.x) %>%
format.dt.f()
}
Now that we have our extracted values for implied dollar yield, we will fill in the last few elements of this list through calculations. We will add back accretion, accreted_price, and yield as calculations in our R environemnt instead of just pulling the values from the spreadsheet.
if(xlsx_toggle==2){
commentary$calculations$implied_dollar_yield=update_list(
commentary$calculations$implied_dollar_yield,
accretion=(
commentary$calculations$implied_dollar_yield$redemption_price-
commentary$calculations$implied_dollar_yield$price
)*as.integer(
lubridate::as_date(commentary$calculations$implied_dollar_yield$settlement)-
lubridate::as_date(commentary$calculations$implied_dollar_yield$investment_date)
)/as.integer(
lubridate::as_date(commentary$calculations$implied_dollar_yield$maturity)-
lubridate::as_date(commentary$calculations$implied_dollar_yield$investment_date)
)
)
commentary$calculations$implied_dollar_yield=update_list(
commentary$calculations$implied_dollar_yield,
accreted_price=commentary$calculations$implied_dollar_yield$accretion+
commentary$calculations$implied_dollar_yield$price
)
commentary$calculations$implied_dollar_yield=update_list(
commentary$calculations$implied_dollar_yield,
yield=bond.yield(
settle=commentary$calculations$implied_dollar_yield$settlement,
mature=commentary$calculations$implied_dollar_yield$maturity,
coupon=commentary$calculations$implied_dollar_yield$rate,
freq=commentary$calculations$implied_dollar_yield$frequency,
price=commentary$calculations$implied_dollar_yield$accreted_price,
convention=case_when(
commentary$calculations$implied_dollar_yield$basis == 0 ~"30/360E",
commentary$calculations$implied_dollar_yield$basis == 1 ~"ACT/ACT",
commentary$calculations$implied_dollar_yield$basis == 2 ~"ACT/360",
# commentary$calculations$implied_dollar_yield$basis == 3 ~"30/360E",
# missing Actual/365
commentary$calculations$implied_dollar_yield$basis == 4 ~"30/360E",
)
)
)
commentary$calculations$implied_dollar_yield %>%
map_dfc(~.x) %>%
format.dt.f()
}
After saving this list to our commentary$calculations
list, we can move on to the next entry.
The mezzanine index will be extracted from the sheets and saved as a list. We will need to extract the table below from our sheet:
To do so, we will use an inner_join
with “Comparison to
Mezzanine Index” with our bag to find the cell with the equivalent value
in character. Next, we will filter the bag for values in the same column
after this section title and then take the first three cells with
character values below it for COMP1
, or first two cells
with character values below it for COMP2
. This will extract
the headings we will need to fill values for in each case.
# Mezzanine Index
idx = bag %>%
regex_inner_join(
"Comparison to Mezzanine Index" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
idx = bag %>%
filter(col==idx$col, row>idx$row) %>%
filter(!is.na(character))
if(xlsx_toggle==1){
idx = idx %>%
slice(1:3) %>%
select(-numeric, -date, -data_type)
} else {
idx = idx %>%
slice(1:2) %>%
select(-numeric, -date, -data_type)
}
idx %>%
format.dt.f(.)
However, we first need to add the “Period” heading, since there are
values for it but with no heading in the first column. We can do this by
adding an entry into our tibble by combining the idx
with a tibble with a row equal to one less than the first row in our
idx, a col equal to those in idx, and
a character value equal to “Period”. We will then use clean_names on
these headings, rename the column from character to heading, and save
these to a headings tibble for use with
unpivotr::enhead
.
idx = tibble(
row = idx$row[1]-1,
col=idx$col[1],
character = "Period"
) %>%
bind_rows(idx)
headings=idx %>%
mutate(character=clean_names(character)) %>%
rename(headings=character)
headings %>%
format.dt.f(.)
Now that we have the header, we can look for the data for use with enhead. We can grab this simply by filtering the bag for rows found in our idx and columns that include cells that are two and three columns to the right of our header.
data = bag %>%
filter(
row %in% idx$row,
col %in% (idx$col[1]+2):(idx$col[1]+3)
)
data %>%
format.dt.f()
With our header and data identified, we can go ahead and use
enhead
to assign headings to the data values, giving it a
direction of “W” since the header is located to the left of the data
cells. We will also perform a pack
and
pivot_wider
to spread the data underneath their
corresponding headings; we need to deselect the row before pivoting
since the heading will be separating the values from each other, however
we need to keep col since it will identify the unique values within each
spread heading. After the pivot, we can go ahead and map
unlist across the data frame and drop unused variables from our
tibble.
commentary$calculations$mezzanine_index=enhead(data, headings, "W") %>%
pack() %>%
select(-row) %>%
pivot_wider(
names_from = headings,
values_from = value
) %>%
map_df(unlist) %>%
select(-col, -date) %>%
mutate(period=as.character(period)) %>%
mutate_if(is.numeric, as.numeric)
commentary$calculations$mezzanine_index %>%
format.dt.f(.)
Finally we can turn this saved tibble into a list in our calculations list.
commentary$calculations$mezzanine_index=
commentary$calculations$mezzanine_index #%>% as.list()
For our next entry, we will set up the list with our desired names and then fill the values with both calculations and extraction from our data sheet or saved data. We will have to return later to fill in values for period and implied_dollar_price.
commentary$calculations$mezzanine_yield=vector("list", 6)
commentary$calculations$mezzanine_yield=purrr::set_names(
commentary$calculations$mezzanine_yield,
c(
"period", "yield", "adj_mezzanine_index",
"adjust_leverage", "implied_yield",
"implied_dollar_price"
)
)
First we will import the yield from our implied_dollar_yield
list for COMP2
or from the second entry from
comp_2_nd_lien_yield under our mezzanine_index list
for COMP1
.
## yield
if(xlsx_toggle==1){
commentary$calculations$mezzanine_yield$yield=
commentary$calculations$mezzanine_index[[3]][2]
}else{
commentary$calculations$mezzanine_yield$yield=
commentary$calculations$implied_dollar_yield$yield
}
commentary$calculations$mezzanine_yield %>%
map_dfc(~.x) %>%
format.dt.f()
Next we will calculate the adj_mezzanine_index by finding the difference between the two values from our mezzanine_index vector in our mezzanine_index list.
## adj_mezzanine_index
commentary$calculations$mezzanine_yield$adj_mezzanine_index=diff(
commentary$calculations$mezzanine_index$mezzanine_index
)
commentary$calculations$mezzanine_yield$adj_mezzanine_index=
as.numeric(commentary$calculations$mezzanine_yield$adj_mezzanine_index)
commentary$calculations$mezzanine_yield %>%
map_dfc(~.x) %>%
format.dt.f()
We will then extract the value for adjust_leverage by grabbing the
numeric value from the data cell corresponding to the “Adjust for
Leverage*” heading in each sheet. We can do this with an
inner_join
of this value and then pulling the numeric value
of the cell two columns to the right.
## adjust_leverage
heading=bag %>%
regex_inner_join(
"Adjust for Leverage*" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
commentary$calculations$mezzanine_yield$adjust_leverage=bag %>%
filter(
row==heading$row,
col==heading$col+2
) %>%
pull(numeric)
commentary$calculations$mezzanine_yield %>%
map_dfc(~.x) %>%
format.dt.f()
Next we can get the implied_yield by adding the yield, adj_mezzanine_index, and adjust_leverage together from our mezzanine_yield list.
## implied_yield
commentary$calculations$mezzanine_yield$implied_yield=commentary$calculations$mezzanine_yield$yield+
commentary$calculations$mezzanine_yield$adj_mezzanine_index+
commentary$calculations$mezzanine_yield$adjust_leverage
commentary$calculations$mezzanine_yield$implied_yield=
as.numeric(commentary$calculations$mezzanine_yield$implied_yield)
commentary$calculations$mezzanine_yield %>%
map_dfc(~.x) %>%
format.dt.f()
We will have to come back to fill in period and implied dollar price, since we will pull those values from our implied dollar price list, which we will do next.
For implied dollar price, we will begin by extracting the table from the bag below:
First, we will perform an inner_join
on the bag with
“implied dollar price” and save it to idx to find our
top left corner. Then we will find the bottom left corner by filtering
the bag for the cell in the same column as idx with the
character value “basis”. We know that the data for our table is located
two columns to the right of our headings, so our bottom right corner
will be two columns right of idx_2. With this
knowledge, we can save our table as the bag filtered for rows between
one below the idx and idx_2 and
columns equal to that in idx and idx +
2.
idx=bag %>%
regex_inner_join(
"implied dollar price" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
idx_2=bag %>%
filter(col==idx$col, row>idx$row) %>%
filter(character=="basis")
table=bag %>%
filter(
row>idx$row,
row<=idx_2$row,
col%in%(c(idx$col, idx$col+2))
) %>%
filter(data_type!="blank")
table %>%
rectify() %>%
format.dt.f()
With the table extracted, we can go ahead and use
unpivotr::behead
to strip the values in column \(20\) and assign them to the data in column
\(22\).
table=table %>%
behead(direction = "W", name = "heading") %>%
filter(heading!="yield")
table %>%
format.dt.f()
Next we will convert this tibble into our desired list. First we will mutate our date column into character type and then pack the values.
table=table %>%
mutate(date = as.character(date)) %>%
pack()
table %>%
format.dt.f()
Now we just need to pull and save the value variable from our table
after running it through set_names
using the cleaned names
of the heading variable.
commentary$calculations$implied_dollar_price=table %>%
pull(value) %>%
set_names(table$heading %>% clean_names)
commentary$calculations$implied_dollar_price %>%
map_dfc(~.x) %>%
format.dt.f()
Now we just need to add yield and price entries. We can get the yield by taking and saving the implied_yield from mezzanine_yield.
## yield
commentary$calculations$implied_dollar_price=update_list(
commentary$calculations$implied_dollar_price,
yield=commentary$calculations$mezzanine_yield$implied_yield
)
We will get the price by using the function bond.price
with arguments pulled from or based on our other
implied_dollar_price entries. This will finish off our entries
in implied_dollar_price list within calculations.
## price
commentary$calculations$implied_dollar_price=update_list(
commentary$calculations$implied_dollar_price,
price=bond.price(
settle=commentary$calculations$implied_dollar_price$settlement,
mature=commentary$calculations$implied_dollar_price$maturity,
coupon=commentary$calculations$implied_dollar_price$rate,
freq=commentary$calculations$implied_dollar_price$frequency,
yield=commentary$calculations$implied_dollar_price$yield,
convention=case_when(
commentary$calculations$implied_dollar_price$basis == 0 ~"30/360E",
commentary$calculations$implied_dollar_price$basis == 1 ~"ACT/ACT",
commentary$calculations$implied_dollar_price$basis == 2 ~"ACT/360",
# commentary$calculations$implied_dollar_price$basis == 3 ~"30/360E", # missing Actual/365
commentary$calculations$implied_dollar_price$basis == 4 ~"30/360E",
)
)
)
commentary$calculations$implied_dollar_price %>%
map_dfc(~.x) %>%
format.dt.f()
Now we can go back and fill in the missing entries in mezzanine_yield by pulling them from our implied_dollar_price list.
## Mezzanine Yield
commentary$calculations$mezzanine_yield$period=
commentary$calculations$implied_dollar_price$settlement
commentary$calculations$mezzanine_yield$implied_dollar_price=
commentary$calculations$implied_dollar_price$price
commentary$calculations$mezzanine_yield %>%
map_dfc(~.x) %>%
format.dt.f()
With our last entry saved, our output will now match the target output.
commentary %>%
listviewer::jsonedit()