Our goal is to extract the trading levels of company securities table from each workbook and transform it into a list containing lists that detail the properties of each security. When finished it should look like the following:
To start, we need to isolate the table of cells containing the
properties and corresponding data. The first step in doing so is to use
regex_inner_join
to find the top left of the table through
joining by the title; this works for both COMP1
and
COMP2
. With the corner known, we can then grab the data set
by filtering summary_cells to grab anything in the three rows
after it and within the same column or greater. This leaves us with the
table we desire.
xlsx_toggle=1
if (xlsx_toggle==1) {
xlsx_obj=file.path(path_data, "RNFC_COMP1") %>% readRDS
summary_cells=xlsx_obj %>%
filter(sheet == "Summary")
} else {
xlsx_obj <- file.path(path_data, "RNFC_COMP2") %>% readRDS
summary_cells=xlsx_obj %>% # bag of cells for COMP2
filter(sheet == "Valuation Summary")
}
bag=summary_cells %>%
select(row, col, character, data_type)
idx=bag %>%
regex_inner_join(
"Trading Levels of Company Securities" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
bag=summary_cells %>%
filter(
row>=idx$row,
row<=idx$row+3,
col>=idx$col
)
idx=bag %>%
regex_inner_join(
"Security" %>% as_tibble,
by = c("character"="value")
) %>%
select(-value)
bag=summary_cells %>%
filter(row>=idx$row, row<=idx$row+3, col>=idx$col) %>%
select(row, col, character, numeric, date, data_type)
# extracted table
bag %>%
rectify() %>%
format.dt.f()
With the table set as bag, we can begin to organize
the data and conform it to our target list. Since we can see there are
no blank values in the bag that we need to keep, we can filter out any
observations with a data_type of “blank”. We can also
behead
the northern row and assign them to each observation
since these are the headings. Next we can use
unpivotr::pack
to gather all of the data spread across the
different data type headings into one column; however, before we do so,
we need to convert the date column to a character since pack will not
preserve it the way we want it. To do so we can use a combination of
mutate
and map
, specifically
map_chr
to return character values instead of lists for
each value. By mapping the as.character
function across
date, all values in the date column will have their types transformed
from dttm to chr; now it is ready for the
pack
function.
trading_levels=bag %>%
behead("N", name = "headings") %>%
filter(data_type != "blank") %>%
mutate(date = map_chr(date, as.character)) %>%
pack()
trading_levels %>%
format.dt.f()
The next step will be transforming this tibble into a list, with
values split by security and named according to its corresponding
headings value. We begin by pulling trading_levels$value
and using set_names
with cleaned headings values before
splitting the list by the row, since each security is on its own row in
the original data. Finally, we use unname
to remove the row
labels from the individual lists, which will give us the target list we
are aiming for.
trading_levels=trading_levels$value %>%
set_names(trading_levels$headings %>% clean_names()) %>%
split(trading_levels$row) %>%
unname()
trading_levels=trading_levels %>%
set_names(
trading_levels %>%
map_chr(~.x$security) %>%
clean_names()
)
if(xlsx_toggle==2){
trading_levels=trading_levels %>%
map(
~.x %>%
modify_at(
c("next_call_px"),
~ifelse(is.character(.x), NA_real_, .x)
)
)
}
The resulting lists from running this code appear below:
The last steps are to update the master valuation list by using
list_modify
and appending trading_levels to the
end of our existing master list.
valuation=list_modify(valuation, trading_levels = trading_levels)