The R package tabulizer depends on rJava, the setup can be gnarly.
knitr::opts_chunk$set(echo = TRUE)
library(tabulizer)
library(tidytext)
library(tidyr)
library(dplyr)
library(purrr)
library(readr)
We could be a bit cleverer about file names - e.g. use R projects and pacakge here for relative file paths. As an example, we use one file only.
fn <- "PMST_ILRX1I.pdf"
x <- tabulizer::extract_tables(
file = fn,
method = "decide",
output = "data.frame"
)
One table is shown here. Repeat for any tables of interest.
table5 <- x %>% purrr::pluck(5) %>% tibble::as_tibble()
# Clean up the table content, sanitize colnames, etc
table5 %>% reactable::reactable()
table5 %>% readr::write_csv(glue::glue("{fn}_table5.csv"))