1 Objective

Our goal is to extract the commentary from each workbook as well as valuation summary tables and enter them into a named list of values, lists, and tibbles. We will extract and clean the data for each section and, when finished, it will match the following:

2 Risk Classification, Valuation Impairment, Conclusion, and Strengths & Concerns

We will first focus on pulling the commentary for risk classification, valuation impairment, conclusions, strengths, and concerns. Before proceeding, we will quickly set up out bag of cells to include the variables we need and then create a list of headings for naming the list of values we extract.

xlsx_toggle=2
if (xlsx_toggle==1){
  xlsx_obj <- file.path(path_data, "RNFC_COMP1") %>% readRDS
  bag=xlsx_obj %>% # bag of cells for COMP1
    filter(sheet == "Summary") %>% 
    select(row, col, character, numeric, date, data_type)
} else{ 
  xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
  bag=xlsx_obj %>% # bag of cells for COMP2
    filter(sheet == "Valuation Summary") %>% 
    select(row, col, character, numeric, date, data_type)
}
headings = c(
  "risk_classification", "valuation_impairment", 
  "conclusion", "strengths", "concerns"
)

Next we will begin extracting the commentary by using unpivotr::partition. For the corners argument, we will need to locate the \(5\) tables using an inner_join to join any cells with character values that start with “Step”, followed by any number, and then a colon “:”; we can achieve this with regex using the ., which will accept any character, by searching for “Step .:”. This will grab us the tables for our first three headings, which we can then use bind_rows with inner_join results searching for cells with character values equal to “Investment Strengths” and “Investment Risks”; we will use the regex ^ and $ to search for the exact string.

# Partition -> tibble of corners
corners = bag %>% 
  regex_inner_join(
    "Step .:" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value) %>% 
  filter(character!="Step 3: Market / Security Analysis")

corners=bind_rows(
  corners, 
  bag %>% 
    regex_inner_join(
      "^Investment Strengths$" %>% as_tibble, 
      by = c("character"="value")
    ) %>% 
    select(-value))

corners=bind_rows(
  corners, 
  bag %>% 
    regex_inner_join(
      "Investment Risks" %>% as_tibble, 
      by = c("character"="value")
    ) %>% 
    select(-value))

corners %>% 
  format.dt.f()

We will then use our bag, filtered for values in the corner columns since the data we want to grab is in the cells underneath each heading, and corners as arguments in the partition function.

partitions=partition(
  bag %>% 
    filter(col %in% corners$col), 
  corners
)
partitions %>%
  listviewer::jsonedit(.)

The results will have our corners in the character variable with the corresponding table stored in cells. Our job will be to extract the desired values from the tibbles in cells.

From each tibble in the list, we can do this by filtering for the unicode number for the black right-pointing pointer (25ba). This will pick out most values, however there are a couple unwanted values it grabs so we can exclude them by adding str_detects with negate=TRUE to our filter. Below, we can see these steps working on the first tibble, for determination of risk classification.

partitions$cells[[1]] %>% 
  filter(
    str_detect(.$character, "\u{25ba}") &
    str_detect(.$character, "Mezzanine", negate=TRUE) &
    str_detect(.$character, "Equity Valuation", negate=TRUE)
  ) %>% 
  format.dt.f()

We can iteratively apply these steps to every tibble in the list with lapply. We will also take it a step further by using lapply to select just the character value from each tibble and then flatten the resulting list of tibbles to transform it into a list of vectors. Then we can use set_names with our headings and the function clean.sentence on the values to return our desired list.

clean.sentence <- function(text){
  # text = gsub('[[:punct:]]', '', text)
  text = gsub('[[:cntrl:]]', '', text)
  # text = gsub('\\d+', '', text)
  text = gsub("@\\w+ *", "", text)
  text = gsub("\\s+", " ", str_trim(text))
  text
}

commentary = partitions$cells %>% 
  lapply(
    . %>% 
      filter(
        str_detect(.$character, "\u{25ba}") & 
        # Remove values not belonging to table; the 
        # play button code (>) filters well, 
        # but doesn't remove all irrelevant values
        str_detect(.$character, "Mezzanine", negate=TRUE) &
        str_detect(.$character, "Equity Valuation", negate=TRUE)
      )
  ) %>%
  lapply(. %>% select(character)) %>% 
  flatten() %>% 
  set_names(headings) %>%
  lapply(clean.sentence)

commentary %>%
  listviewer::jsonedit()

Now that we have all of the desired information, we just need to clean up the list and reorder some entries.

Our first step will be to subset strengths and concerns into a list named “strengths_concerns”. We will do this by creating a new entry into our commentary list equal to a list of the two and then set_names again; following that, we will drop our original strengths and concerns from the commentary list.

commentary$strengths_concerns = list(
  commentary$strengths, 
  commentary$concerns
)

commentary$strengths_concerns = purrr::set_names(
  commentary$strengths_concerns, 
  c("strengths", "concerns")
)
commentary = commentary[
  names(commentary) %in% c("strengths", "concerns") == FALSE
]
commentary %>%
  listviewer::jsonedit()

In the list we are trying to recreate, some of the values are mixed up from their order in the sheets, and so we can reorder our list. For the conclusion entry, COMP2 needs the two list values swapped in their order so that Equity is first and Preferred Equity commentary second. We can do this by saving the original list item temporarily as x and then re-saving commentary$conclusion with the desired order of list items.

# Rearrange list items
x=commentary$conclusion

if (xlsx_toggle==2){commentary$conclusion = c(x[2], x[1])}

commentary$conclusion %>%
  format.dt.f()

We need to do this for COMP1 and COMP2 for concerns as well, since the first value needs to be at the end of the list for each to match the target.

x=commentary$strengths_concerns$concerns
if (xlsx_toggle==1) {
  commentary$strengths_concerns$concerns = c(x[2], x[3], x[4], x[1])
} else {
  commentary$strengths_concerns$concerns = c(x[2], x[3], x[4], x[5], x[1])
}

With the first four items in our list extracted and saved, we can move on to calculations.

Our output will now match the the target for the first four entries, and will look like the lists below:

commentary %>%
  listviewer::jsonedit()

3 Calculations

The calculations section is a list within our commentary that will be a combination of extracted values from the sheets as well as calculated values. When finished it should match the target, which looks like:

We will begin by setting up the list with names for which values will be entered as we go.

if(xlsx_toggle==1){
  commentary$calculations=vector("list", 4)
  commentary$calculations=purrr::set_names(
    commentary$calculations, 
    c("valuation_summary", "mezzanine_index", 
      "mezzanine_yield", "implied_dollar_price")
  )
}else{
  commentary$calculations=vector("list", 5)
  commentary$calculations=purrr::set_names(
    commentary$calculations, 
    c("valuation_summary", "implied_dollar_yield", "mezzanine_index", 
      "mezzanine_yield", "implied_dollar_price")
  )
}

commentary$calculations %>%
  listviewer::jsonedit()

3.1 Valuation Summary

The valuation summary tibble will simply come from the lists in valuation$summary bound together row-wise with each list being identified in a new variable called method, which will take its value from the name of each list.

commentary$calculations$valuation_summary=valuation$summary[7:10] %>% 
  bind_rows(.id="method")

commentary$calculations$valuation_summary %>%
  format.dt.f(.)

3.2 Implied Dollar Yield

Next, we will enter in a list of values for implied_dollar_yield for COMP2 by extracting the values from the corresponding sheet. Since this is only for COMP2, we will only proceed with these steps if xlsx_toggle==2.

First we will locate the table by performing an inner_join with “implied dollar price” to find the cell for the top left of the table. Next we will filter for columns and rows greater or equal to that value and search for Yield, since this is the heading for this table; conveniently, in this case, it is also the heading for the last value and so we do not have to find the last row of the table. With this search, we will have the location of the top left data cell and bottom left data cell for the table in the idx tibble.

if(xlsx_toggle==2){
  idx=bag %>% 
    regex_inner_join(
      "implied dollar price" %>% as_tibble, 
      by = c("character"="value")
    ) %>% 
    select(-value)
  
  idx = bag %>% 
    filter(col>idx$col, row>=idx$row) %>% 
    regex_inner_join(
      "Yield" %>% as_tibble, 
      by = c("character" = "value")
    ) %>% select(-value)

  idx %>% 
    format.dt.f(.)
}

Now we can work on grabbing and naming the data values. To do this, we can bind the column containing headings with the column containing the corresponding data. To get the the headings values, we can filter the bag for character values below our upper left corner and above our bottom left corner; we will filter for values above the bottom left because we want to exclude the yield value for now anyways. To extract the corresponding data, we can filter the bag by rows between the two identified corners but in the column two ahead; we can go ahead and deselect character since no values are of character type. We can then bind these two tibbles together column-wise with bind_cols. We will then filter out Accreted price and Accretion, convert our dates to characters with mutate, clean our headings with clean_names, and then rename our headings variable from character to field.

if(xlsx_toggle==2){
  idx = bag %>% 
    filter(
      col==idx$col[1], row>idx$row[1] & row<idx$row[2]
    ) %>% 
    select(character) %>% 
    bind_cols(
      bag %>% 
        filter(
          col==idx$col[1]+2, row>idx$row[1] & row<idx$row[2]
        ) %>% 
        select(-character)
    ) %>% 
    filter(character!=c("Accreted price", "Accretion")) %>%
    mutate(
      date=as.character(date),
      character = clean_names(character)
    ) %>%
    rename(field = character)

  idx %>%
    format.dt.f()
}

Next, we will work on converting the above tibble to a named list. To do this, we will first use unpivotr::pack to combine the date and numeric values into one single column. Next, we will use pivot_wider with our values spread out under their corresponding field. Before this step, we need to deselect the row, since the field is already being used as the names identifier for pivot_wider. To clean up, we will use map_df with unlist to restore the data types that were converted to lists in the pack function.

if(xlsx_toggle==2){
  commentary$calculations$implied_dollar_yield = 
    idx %>% 
    pack() %>% 
    select(-row) %>% 
    pivot_wider(names_from = field, values_from = value) %>% 
    map_df(unlist) %>% 
    select(-col) %>%
    mutate(
      settlement=as.character(settlement),
      investment_date=as.character(investment_date),
      maturity=as.character(maturity),
      rate=as.numeric(rate),
      price=as.numeric(price),
      redemption_price=as.numeric(redemption_price),
      frequency=as.numeric(frequency),
      basis=as.numeric(basis)
    ) #%>% as.list()
  
  commentary$calculations$implied_dollar_yield %>%
    map_dfc(~.x) %>%
    format.dt.f()
}

Now that we have our extracted values for implied dollar yield, we will fill in the last few elements of this list through calculations. We will add back accretion, accreted_price, and yield as calculations in our R environemnt instead of just pulling the values from the spreadsheet.

if(xlsx_toggle==2){
  
 commentary$calculations$implied_dollar_yield=update_list(
    commentary$calculations$implied_dollar_yield, 
    accretion=(
      commentary$calculations$implied_dollar_yield$redemption_price-
        commentary$calculations$implied_dollar_yield$price
    )*as.integer(
      lubridate::as_date(commentary$calculations$implied_dollar_yield$settlement)-
        lubridate::as_date(commentary$calculations$implied_dollar_yield$investment_date)
    )/as.integer(
      lubridate::as_date(commentary$calculations$implied_dollar_yield$maturity)-
        lubridate::as_date(commentary$calculations$implied_dollar_yield$investment_date)
    )
  )
  
  commentary$calculations$implied_dollar_yield=update_list(
    commentary$calculations$implied_dollar_yield,
    accreted_price=commentary$calculations$implied_dollar_yield$accretion+
      commentary$calculations$implied_dollar_yield$price
  )
  
  commentary$calculations$implied_dollar_yield=update_list(
    commentary$calculations$implied_dollar_yield,
    yield=bond.yield(
      settle=commentary$calculations$implied_dollar_yield$settlement,
      mature=commentary$calculations$implied_dollar_yield$maturity,
      coupon=commentary$calculations$implied_dollar_yield$rate,
      freq=commentary$calculations$implied_dollar_yield$frequency,
      price=commentary$calculations$implied_dollar_yield$accreted_price,
      convention=case_when(
        commentary$calculations$implied_dollar_yield$basis == 0 ~"30/360E",
        commentary$calculations$implied_dollar_yield$basis == 1 ~"ACT/ACT",
        commentary$calculations$implied_dollar_yield$basis == 2 ~"ACT/360",
        # commentary$calculations$implied_dollar_yield$basis == 3 ~"30/360E", 
        # missing Actual/365
        commentary$calculations$implied_dollar_yield$basis == 4 ~"30/360E",
      )
    )
  )
  
  commentary$calculations$implied_dollar_yield %>%
    map_dfc(~.x) %>%
    format.dt.f()
}

After saving this list to our commentary$calculations list, we can move on to the next entry.

3.3 Mezzanine Index

The mezzanine index will be extracted from the sheets and saved as a list. We will need to extract the table below from our sheet:

To do so, we will use an inner_join with “Comparison to Mezzanine Index” with our bag to find the cell with the equivalent value in character. Next, we will filter the bag for values in the same column after this section title and then take the first three cells with character values below it for COMP1, or first two cells with character values below it for COMP2. This will extract the headings we will need to fill values for in each case.

# Mezzanine Index
idx = bag %>% 
  regex_inner_join(
    "Comparison to Mezzanine Index" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)

idx = bag %>% 
  filter(col==idx$col, row>idx$row) %>% 
  filter(!is.na(character))

if(xlsx_toggle==1){
  idx = idx %>% 
    slice(1:3) %>% 
    select(-numeric, -date, -data_type)
} else {
  idx = idx %>% 
    slice(1:2) %>% 
    select(-numeric, -date, -data_type)
}

idx %>%
  format.dt.f(.)

However, we first need to add the “Period” heading, since there are values for it but with no heading in the first column. We can do this by adding an entry into our tibble by combining the idx with a tibble with a row equal to one less than the first row in our idx, a col equal to those in idx, and a character value equal to “Period”. We will then use clean_names on these headings, rename the column from character to heading, and save these to a headings tibble for use with unpivotr::enhead.

idx = tibble(
  row = idx$row[1]-1, 
  col=idx$col[1], 
  character = "Period"
) %>% 
  bind_rows(idx)

headings=idx %>% 
  mutate(character=clean_names(character)) %>% 
  rename(headings=character)

headings %>%
  format.dt.f(.)

Now that we have the header, we can look for the data for use with enhead. We can grab this simply by filtering the bag for rows found in our idx and columns that include cells that are two and three columns to the right of our header.

data = bag %>% 
  filter(
    row %in% idx$row, 
    col %in% (idx$col[1]+2):(idx$col[1]+3)
  )

data %>%
  format.dt.f()

With our header and data identified, we can go ahead and use enhead to assign headings to the data values, giving it a direction of “W” since the header is located to the left of the data cells. We will also perform a pack and pivot_wider to spread the data underneath their corresponding headings; we need to deselect the row before pivoting since the heading will be separating the values from each other, however we need to keep col since it will identify the unique values within each spread heading. After the pivot, we can go ahead and map unlist across the data frame and drop unused variables from our tibble.

commentary$calculations$mezzanine_index=enhead(data, headings, "W") %>%
  pack() %>% 
  select(-row) %>%
  pivot_wider(
    names_from = headings, 
    values_from = value
  ) %>%
  map_df(unlist) %>%
  select(-col, -date) %>%
  mutate(period=as.character(period)) %>%
  mutate_if(is.numeric, as.numeric)

commentary$calculations$mezzanine_index %>%
  format.dt.f(.)

Finally we can turn this saved tibble into a list in our calculations list.

commentary$calculations$mezzanine_index=
  commentary$calculations$mezzanine_index #%>% as.list()

3.4 Mezzanine Yield (Part 1)

For our next entry, we will set up the list with our desired names and then fill the values with both calculations and extraction from our data sheet or saved data. We will have to return later to fill in values for period and implied_dollar_price.

commentary$calculations$mezzanine_yield=vector("list", 6)
commentary$calculations$mezzanine_yield=purrr::set_names(
  commentary$calculations$mezzanine_yield, 
  c(
    "period", "yield", "adj_mezzanine_index", 
    "adjust_leverage", "implied_yield", 
    "implied_dollar_price"
  )
)

First we will import the yield from our implied_dollar_yield list for COMP2 or from the second entry from comp_2_nd_lien_yield under our mezzanine_index list for COMP1.

## yield
if(xlsx_toggle==1){
  commentary$calculations$mezzanine_yield$yield=
    commentary$calculations$mezzanine_index[[3]][2]
}else{
  commentary$calculations$mezzanine_yield$yield=
    commentary$calculations$implied_dollar_yield$yield
}

commentary$calculations$mezzanine_yield %>%
  map_dfc(~.x) %>%
  format.dt.f()

Next we will calculate the adj_mezzanine_index by finding the difference between the two values from our mezzanine_index vector in our mezzanine_index list.

## adj_mezzanine_index
commentary$calculations$mezzanine_yield$adj_mezzanine_index=diff(
  commentary$calculations$mezzanine_index$mezzanine_index
)
commentary$calculations$mezzanine_yield$adj_mezzanine_index=
  as.numeric(commentary$calculations$mezzanine_yield$adj_mezzanine_index)

commentary$calculations$mezzanine_yield %>%
  map_dfc(~.x) %>%
  format.dt.f()

We will then extract the value for adjust_leverage by grabbing the numeric value from the data cell corresponding to the “Adjust for Leverage*” heading in each sheet. We can do this with an inner_join of this value and then pulling the numeric value of the cell two columns to the right.

## adjust_leverage
heading=bag %>% 
  regex_inner_join(
    "Adjust for Leverage*" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)

commentary$calculations$mezzanine_yield$adjust_leverage=bag %>% 
  filter(
    row==heading$row, 
    col==heading$col+2
  ) %>% 
  pull(numeric)

commentary$calculations$mezzanine_yield %>%
  map_dfc(~.x) %>%
  format.dt.f()

Next we can get the implied_yield by adding the yield, adj_mezzanine_index, and adjust_leverage together from our mezzanine_yield list.

## implied_yield
commentary$calculations$mezzanine_yield$implied_yield=commentary$calculations$mezzanine_yield$yield+
  commentary$calculations$mezzanine_yield$adj_mezzanine_index+
  commentary$calculations$mezzanine_yield$adjust_leverage
commentary$calculations$mezzanine_yield$implied_yield=
  as.numeric(commentary$calculations$mezzanine_yield$implied_yield)

commentary$calculations$mezzanine_yield %>%
  map_dfc(~.x) %>%
  format.dt.f()

We will have to come back to fill in period and implied dollar price, since we will pull those values from our implied dollar price list, which we will do next.

3.5 Implied Dollar Price

For implied dollar price, we will begin by extracting the table from the bag below:

First, we will perform an inner_join on the bag with “implied dollar price” and save it to idx to find our top left corner. Then we will find the bottom left corner by filtering the bag for the cell in the same column as idx with the character value “basis”. We know that the data for our table is located two columns to the right of our headings, so our bottom right corner will be two columns right of idx_2. With this knowledge, we can save our table as the bag filtered for rows between one below the idx and idx_2 and columns equal to that in idx and idx + 2.

idx=bag %>% 
  regex_inner_join(
    "implied dollar price" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)

idx_2=bag %>% 
  filter(col==idx$col, row>idx$row) %>% 
  filter(character=="basis")

table=bag %>% 
  filter(
    row>idx$row, 
    row<=idx_2$row, 
    col%in%(c(idx$col, idx$col+2))
  ) %>% 
  filter(data_type!="blank")

table %>% 
  rectify() %>%
  format.dt.f()

With the table extracted, we can go ahead and use unpivotr::behead to strip the values in column \(20\) and assign them to the data in column \(22\).

table=table %>% 
  behead(direction = "W", name = "heading") %>% 
  filter(heading!="yield")

table %>%
  format.dt.f()

Next we will convert this tibble into our desired list. First we will mutate our date column into character type and then pack the values.

table=table %>% 
  mutate(date = as.character(date)) %>% 
  pack()

table %>%
  format.dt.f()

Now we just need to pull and save the value variable from our table after running it through set_names using the cleaned names of the heading variable.

commentary$calculations$implied_dollar_price=table %>% 
  pull(value) %>% 
  set_names(table$heading %>% clean_names)

commentary$calculations$implied_dollar_price %>%
  map_dfc(~.x) %>%
  format.dt.f()

Now we just need to add yield and price entries. We can get the yield by taking and saving the implied_yield from mezzanine_yield.

## yield
commentary$calculations$implied_dollar_price=update_list(
  commentary$calculations$implied_dollar_price,
  yield=commentary$calculations$mezzanine_yield$implied_yield
)

We will get the price by using the function bond.price with arguments pulled from or based on our other implied_dollar_price entries. This will finish off our entries in implied_dollar_price list within calculations.

## price
commentary$calculations$implied_dollar_price=update_list(
  commentary$calculations$implied_dollar_price, 
  price=bond.price(
    settle=commentary$calculations$implied_dollar_price$settlement,
    mature=commentary$calculations$implied_dollar_price$maturity,
    coupon=commentary$calculations$implied_dollar_price$rate,
    freq=commentary$calculations$implied_dollar_price$frequency,
    yield=commentary$calculations$implied_dollar_price$yield,
    convention=case_when(
      commentary$calculations$implied_dollar_price$basis == 0 ~"30/360E",
      commentary$calculations$implied_dollar_price$basis == 1 ~"ACT/ACT",
      commentary$calculations$implied_dollar_price$basis == 2 ~"ACT/360",
      #    commentary$calculations$implied_dollar_price$basis == 3 ~"30/360E", # missing Actual/365
      commentary$calculations$implied_dollar_price$basis == 4 ~"30/360E",
    )
  )
)

commentary$calculations$implied_dollar_price %>%
  map_dfc(~.x) %>%
  format.dt.f()

3.6 Mezzanine Yield (Part 2)

Now we can go back and fill in the missing entries in mezzanine_yield by pulling them from our implied_dollar_price list.

## Mezzanine Yield
commentary$calculations$mezzanine_yield$period=
  commentary$calculations$implied_dollar_price$settlement

commentary$calculations$mezzanine_yield$implied_dollar_price=
  commentary$calculations$implied_dollar_price$price

commentary$calculations$mezzanine_yield %>%
  map_dfc(~.x) %>%
  format.dt.f()

With our last entry saved, our output will now match the target output.

commentary %>% 
  listviewer::jsonedit()