1 Objective

Our goal is to extract the Long Term Financial Performance table from each workbook and transform it into a vector of two vectors detailing financial performance and covenant ratios. When finished, our output should look like the following for COMP1 and COMP2:

2 Isolating the Data

Our first step is to examine the workbook imported into summary_cells and filter the rows to focus in on the ‘valuation: financial performance’ table. The table we are looking for lies between rows \(129\) to \(162\) for COMP1 and rows \(169\) to \(219\) for COMP2 and we will also select the row, col, character, numeric, date, and data_type from the cells as they will to be used with unpivotr functions. We can then apply the remove_blank_columns() function to the narrowed bag to clean up the data.

remove_blank_columns <- function(bag){
  has_multipletypes <- bag %>% 
    select(col, data_type) %>% 
    unique() %>% 
    select(col) %>% 
    count(col) %>% 
    filter(n>1) %>% 
    select(col)
  
  has_noblanks <- bag %>% 
    select(col, data_type) %>% 
    unique() %>% 
    filter(data_type!="blank") %>% 
    select(col) %>% 
    unique()
  
  to_keep <- full_join(has_noblanks, has_multipletypes) %>% 
    arrange(col)
  
  n_newcols <- to_keep$col %>% length()
  df_keycol <- bind_cols(tibble(1:n_newcols), to_keep) 
  
  table <- bag %>% 
    filter(col %in% to_keep$col) %>% 
    left_join(df_keycol) %>%
    mutate(col = `1:n_newcols`) %>% 
    select(-`1:n_newcols`)
  
  # rm(has_multipletypes, has_noblanks, to_keep, n_newcols, df_keycol)
  table
}

2.1 COMP1

xlsx_toggle=1
xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
summary_cells=xlsx_obj %>% filter(sheet == "Summary")
summary_cells %>% 
  filter(row>=129, row<=162) %>% 
  select(row, col, character, numeric, date, data_type) %>% 
  remove_blank_columns() %>% 
  rectify() %>%
  format.dt.f(.)

2.2 COMP2

xlsx_toggle=2
xlsx_obj=file.path(path_data, "RNFC_COMP2") %>% readRDS
summary_cells=xlsx_obj %>% filter(sheet == "Valuation Summary")
summary_cells %>% 
  filter(row>=169, row<=219) %>% 
  select(row, col, character, numeric, date, data_type) %>% 
  remove_blank_columns() %>% 
  rectify() %>%
  format.dt.f(.)

There are also a few other variables unique to each file that need to be saved to help the procedures that will be applied later to be adaptable to either sheet. First, we need to save 'headings_t', which will be a tibble of the headings we want our final output to have. Second, we need to save the column where our headers are located, which we can do with an inner_join to find the cell where the 'Long Term Financial Performance' title is located and then grab and save the col value. Third, we need to locate and save the rows where there are northern headings for the dates. We can find these period rows by filtering for observations in the bag with ’data_type’s of date and then applying slice(1) to get the row of the first period row while applying tail(1) to get the row of the second period row, assuming one exists.

Lastly, we need to create a table that will serve as a key to join the headings existing in the bag to our desired heading names; the process will be very similar to the one employed by the remove_blank_columns() function. We start by creating a tibble with a column for headings_t and another column that will have values that will join to the corresponding heading in the bag.

if(xlsx_toggle==1){
  bag <- summary_cells %>% 
    filter(row>=129, row<=162) %>% 
    select(row, col, character, numeric, date, data_type) %>%
    remove_blank_columns()
  
  headings_t <- tibble(
    names = c(
      "period", "revenue", "adj_ebitda", "acq_ebitda", 
      "capex", "acq_capex", "opco_debt_ebitda", 
      "total_debt_ebitda", "ltm_ebitda_cash_interest"
    )
  ) 
  
  header_col <- regex_inner_join(
    bag, 
    "Long Term Financial Performance" %>% 
      enframe(name = NULL), 
    by = c("character"="value")
  ) %>% 
    pull(col) 
  
  period_row1 <- filter(bag,data_type=="date") %>% 
    slice(1) %>% 
    pull(row)
  
  period_row2 <- filter(bag, data_type=="date") %>% 
    tail(1) %>% 
    pull(row)
  
  # Filter for financial performance and covenant headings
  headings_key <- tibble(
    filter = c(
      "Period", "Revenue", "Adj. EBITDA", "Acq. EBITDA", 
      "CapEx", "Acq. Capex", "Total Secured Debt / EBITDA", 
      "Total Net Debt / EBITDA", "EBITDA / Cash Interest"
    ),
    names = headings_t$names
  )

}else{ # if (xlsx_toggle==2)
  bag <- summary_cells %>% 
    filter(row>=169, row<=219) %>% 
    select(row, col, character, numeric, date, data_type) %>%
    filter(col!=7 & col!=8) %>%
    remove_blank_columns()
  
  headings_t <- tibble(
    names = c(
      "period", "revenue", "adj_ebitda", "acq_ebitda", 
      "capex", "acq_capex", "opco_debt_ebitda", 
      "total_debt_ebitda", "ltm_ebitda_cash_interest"
    )
  ) 
  
  header_col <- regex_inner_join(
    bag, 
    "Long Term Financial Performance" %>% enframe(name = NULL), 
    by = c("character"="value")
  ) %>% 
    pull(col)
  
  period_row1 <- filter(
    bag, 
    data_type=="date"
  ) %>% 
    slice(1) %>% 
    pull(row)
  
  period_row2 <- filter(
    bag,
    data_type=="date"
  ) %>% 
    tail(1) %>% 
    pull(row)
  
  # Headings Key - filter headings and target headings
  headings_key <- tibble(
    filter = c(
      "Period", "Revenue", "Run", "Acquired EBITDA", "CapEx", 
      "Acquisition Capex", "Projected Opco Debt / RR EBITDA", 
      "Projected Total Debt", "LTM EBITDA / Cash Interest"
    ),
    names = headings_t$names
  )
}

headings_key %>%
  format.dt.f()

We also need to make a few changes specific to whether we are working with COMP1 or COMP2. The first thing we need to do is add the missing 'period' header to the bag. We know this should be in the column recorded in header_col and in the row number saved in period_row1. Examining the bag, these are observations \(43\) and \(31\) for COMP1 and COMP2, respectively, and so we can set the character value to “Period” to set the heading while also making sure to set the data_type to 'character'.

# Add period heading
# bag %>% filter(col==header_col, row==period_row1)
if(xlsx_toggle==1){
  bag$character[43] <- "Period"
  bag$data_type[43] <- "character"
} else { # if(xlsx_toggle==2)
  bag$character[31] <- "Period"
  bag$data_type[31] <- "character"
}

We also need to check for and remove any duplicate rows containing the dates. We will do this by checking if period_row1 and period_row2 are different, and if they are we will remove the data from the row recorded in period_row2. This will remove the duplicate Northern heading row from COMP2.

# Remove duplicate period row
if (period_row1!=period_row2){
  bag <- bag %>% 
    filter(row!=period_row2)
}

3 Extracting the Data

Now that sheet specific changes are made and variables that are sheet dependent are saved, we can move on to extracting the data we need and transforming it to match our target output. The goal from here is to use the unpivotr::enhead() function to assign headings to the data, which can then be extracted into our list format.

The first step to achieving this is to filter for and save the header. We will filter for only values in the header row using header_col. Then, we will remove the headings for summary values, since they are unneeded; simply filtering out values including '%Growth', 'vs. Projected', and '% of Revenue' will eliminate these headers from both sheets. The last step in completing our header is to clean the names; to do this we will use an inner_join, joining our data with the 'headings_key' by the filter column. This will be the header we use with enhead().

# Filter for desired headings to be used with enhead()
header <- bag %>%
  filter(
    col==header_col,
    character!="% Growth",
    character!="vs. Projected",
    character!="% of Revenue"
  ) %>%
  regex_inner_join(headings_key, by = c("character"="filter")) %>%
  select(row, col, names)

header %>%
  format.dt.f()

Examining the header, we can see there are two rows for each heading besides period, this is because historical and projected values are on different rows. Now that they have the same header name though, it will make them easier to combine into a single vector of data later on.

The next step will be to filter for and save the data. To get the data we will filter for rows from the period row and beyond, since this is the top row in both sheets, and then right_join with the rows in the header to filter out any values that do not correspond to a non-summary heading.

# Filter for data cells to be used with enhead()
cells <- bag %>%
  filter(
    row>=period_row1,
    col>header_col
  ) %>%
  right_join(header %>% select(row), by = "row")

cells %>% 
  rectify() %>%
  format.dt.f()

The third step will be to use the unpivotr::enhead function to assign the cells to a header. We know the headings are on the West side of the data, so we can give the enhead the 'W' direction. We also need to filter out the columns where only period information is present but not any other values; in both sheets there is only data for \(7\) columns after the header so we can filter out any columns past that.

# Use enhead() to assign data cells to related heading
table <- enhead(cells, header, "W") %>%
  filter(col<(header_col+8))

table %>% 
  rectify() %>%
  format.dt.f()

We also need to remove various user inputs meant to indicate \(NA\) that have been captured in the table.

# Remove NA various user inputs
table$character[table$character == "n/a"] <- NA
table$character[table$character == "--"] <- NA
table$character[table$character == "NA"] <- NA

Next, we will save the period values in their own tibble to be added back later since it is a data field present in the target vector output but we will be beheading it for the purpose of manipulating the other data fields into an easy to extract format.

# Save period 
period <- table %>% 
  filter(names=="period") %>% 
  select(date) %>% 
  rename(period = date)

period %>%
  format.dt.f()

Now, with the enheaded and cleaned data, we can behead and spatter the period row to see the data organized by year for each heading in the names column.

# Behead and spatter by period
table <- table %>%
  behead("N", period) %>% 
  filter(!is.na(numeric)) %>% arrange(names) %>%
  select(names, numeric, period, data_type) %>% 
  spatter(period)

table %>%
  format.dt.f()

Taking a look at the table, you might notice that each row is exactly what we are after for our target output, with the first column representing what we want to name each list of values. Thus, we can simply extract these by importing the table again with as_cells(). However, we first need to take care of any missing headings that should be in our target output. We can add these headings using add_row(), adding any missing headings in the sheet before rearranging the names by converting them to a factor and arranging by the levels we set.

# Arrange by target output headings
if(xlsx_toggle==1){
  # Add missing rows "acq_ebitda" "acq_capex"
  table <- table %>%
    add_row(names = "acq_ebitda") %>%
    add_row(names = "acq_capex") %>%
    mutate(
      names = factor(
        names, 
        levels = c(
          "revenue", "adj_ebitda", "acq_ebitda", "capex", 
          "acq_capex", "opco_debt_ebitda", "total_debt_ebitda",
          "ltm_ebitda_cash_interest"
        )
      )
    ) %>%
    arrange(names)
}else{ # if(xlsx_toggle==2)
  # Add missing rows "opco_debt_ebitda" "total_debt_ebitda"
  table <- table %>%
    add_row(names = "opco_debt_ebitda") %>%
    add_row(names = "total_debt_ebitda") %>%
    mutate(names = factor(
      names, 
      levels = c(
        "revenue", "adj_ebitda", "acq_ebitda", "capex", 
        "acq_capex", "opco_debt_ebitda", "total_debt_ebitda",
        "ltm_ebitda_cash_interest"
      )
    )
  ) %>%
    arrange(names)
}

table %>%
  format.dt.f()

Now the table displays exactly what we need and we can run it through as_cells and then behead the West heading.

valuation_master <- table %>% 
  as_cells %>% 
  behead("W", name = "heading") %>% 
  select(col, dbl, heading)

valuation_master %>%
  format.dt.f()

Next, we can spatter the heading, which will give us the corresponding data in each column ordered by date:

valuation_master <- valuation_master %>% 
  spatter(key = heading, values = dbl) %>% 
  select(-col)

valuation_master %>%
  format.dt.f()

Finally, we bind the period tibble to the valuation_master table with bind_cols, which will add the period column on the left of the table:

valuation_master = period %>% 
  bind_cols(valuation_master)

valuation_master %>%
  format.dt.f()

4 Transforming the Data

Now we are ready to transform the extracted data into our target output. We will do this by saving a table for financial performance and a table for covenants through use of select(). We will then save the valuation output as a list of the two tibbles.

# Separate master tibble into financial info and covenants and create a vector containing both as lists
valuation_financial <- valuation_master %>% 
  select(period, revenue, adj_ebitda, acq_ebitda, capex, acq_capex)

valuation_covenants <- valuation_master %>% 
  select(-revenue, -adj_ebitda, -acq_ebitda, -capex, -acq_capex)

valuation = list(
  financial_performance = valuation_financial, 
  coverage_covenants = valuation_covenants
)

4.1 valuation_financial

valuation_financial %>%
  format.dt.f()

4.2 valuation_covenants

valuation_covenants %>%
  format.dt.f()

This leaves our valuation list identical to the target valuation output.