1 Objective

Our goal is to extract the trading levels of company securities table from each workbook and transform it into a list containing lists that detail the properties of each security. When finished it should look like the following:

2 Isolating the Data

To start, we need to isolate the table of cells containing the properties and corresponding data. The first step in doing so is to use regex_inner_join to find the top left of the table through joining by the title; this works for both COMP1 and COMP2. With the corner known, we can then grab the data set by filtering summary_cells to grab anything in the three rows after it and within the same column or greater. This leaves us with the table we desire.

xlsx_toggle=1

if (xlsx_toggle==1) {
  xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
  summary_cells=xlsx_obj %>% 
    filter(sheet == "Summary")
} else {
  xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
  summary_cells=xlsx_obj %>% # bag of cells for COMP2
    filter(sheet == "Valuation Summary")
}

bag=summary_cells %>% 
  select(row, col, character, data_type)

idx=bag %>% 
  regex_inner_join(
    "Trading Levels of Company Securities" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)

bag=summary_cells %>% 
  filter(
    row>=idx$row, 
    row<=idx$row+3, 
    col>=idx$col
  )

idx=bag %>% 
  regex_inner_join(
    "Security" %>% as_tibble, 
    by = c("character"="value")
  ) %>% 
  select(-value)

bag=summary_cells %>% 
  filter(row>=idx$row, row<=idx$row+3, col>=idx$col) %>% 
  select(row, col, character, numeric, date, data_type)

# extracted table
bag %>% 
  rectify() %>%
  format.dt.f()

3 Extracting the Data

With the table set as bag, we can begin to organize the data and conform it to our target list. Since we can see there are no blank values in the bag that we need to keep, we can filter out any observations with a data_type of “blank”. We can also behead the northern row and assign them to each observation since these are the headings. Next we can use unpivotr::pack to gather all of the data spread across the different data type headings into one column; however, before we do so, we need to convert the date column to a character since pack will not preserve it the way we want it. To do so we can use a combination of mutate and map, specifically map_chr to return character values instead of lists for each value. By mapping the as.character function across date, all values in the date column will have their types transformed from dttm to chr; now it is ready for the pack function.

trading_levels=bag %>%
  behead("N", name = "headings") %>%
  filter(data_type != "blank") %>%
  mutate(date = map_chr(date, as.character)) %>%
  pack()

trading_levels %>%
  format.dt.f()

4 Transforming the Data

The next step will be transforming this tibble into a list, with values split by security and named according to its corresponding headings value. We begin by pulling trading_levels$value and using set_names with cleaned headings values before splitting the list by the row, since each security is on its own row in the original data. Finally, we use unname to remove the row labels from the individual lists, which will give us the target list we are aiming for.

trading_levels=trading_levels$value %>%
  set_names(trading_levels$headings %>% clean_names()) %>%
  split(trading_levels$row) %>%
  unname()

trading_levels=trading_levels %>%
  set_names(
    trading_levels %>% 
      map_chr(~.x$security) %>% 
      clean_names()
  )

if(xlsx_toggle==2){
  trading_levels=trading_levels %>% 
    map(
      ~.x %>% 
        modify_at(
          c("next_call_px"), 
          ~ifelse(is.character(.x), NA_real_, .x)
        )
    )   
}

The resulting lists from running this code appear below:

The last steps are to update the master valuation list by using list_modify and appending trading_levels to the end of our existing master list.

valuation=list_modify(valuation, trading_levels = trading_levels)