Introduction

Extracting data from a Rich Text Format (RTF) table is a common need. For example, in clinical trial reporting, this desire of extracting RTF tables and converts them into SAS datasets are mainly for two reasons. One, to validate tables and listings (TLs) programmatically; secondly, to compare different versions of (TLs).

There are plenty of literatures [1] that offer solutions on this topic already, and this blog too, shows another alternative; one that simplify the development of such an RTF parser tool. That is instead of writing the RTF parser completely in SAS, you can call R packages to parse the RTF and return the result to SAS for further processing, and there are number of R packages for this. These include textreadr, striprtf and unrtf; each have their advantages but the package I have chosen is textreadr because this seems to give a more general solution based on my limited testing.

Usage of textreadr::read_rtf

The read_rtf function in the textreadr package is what we need to parse an RTF table and here is an example of its usage. After the calling of the read_rtf function, it is then just a little tidying up.

library(textreadr)
library(dplyr)
library(haven)

1. read_rtf_df <- textreadr::read_rtf("mytable.rtf"
                         ,skip         = 0
                         ,remove.empty = TRUE
                         ,trim         = TRUE
                         ,row_start    = "["
                         ,row_end      = ""
                         ,cell_end     = "@"
                         )

2. read_rtf_df <- as.data.frame(read_rtf_df)
3. names(read_rtf_df) <- "ALLCOLS"
4. read_rtf_df <- mutate_if(read_rtf_df, is.factor, as.character)

5. write_sav(read_rtf_df, "dfname.sav")

Line 1 - Reads the RTF table
Line 2 - Convert to a data frame
Line 3 - Assign variable name
Line 4 - On conversion some of the columns in the data frame may be factor variables, we need to change these to          character variables.
Line 5 - Save data frame in SPSS format of which SAS can reads. Note, [write_sas] from the [haven] package does           not work! And other methods such as [write_xpt], [SASxport] etc work less satisfactorily in my opinion. 


Once we have the data in SPSS format we can use proc import to import it into SAS format.

proc import datafile="dfname.sav" out=dfname dbms = sav replace;
run;


Example
Let’s look at an example for this RTF table (mytable.rtf).

Fig 1


From just a few lines (1 to 5 above) of R codes, the parsed RTF table will looks something like Fig 2 below. We can now easily process this further, either with R or SAS and saved the result SAS a dataset. Note the chosen delimiter “@” character split the columns.

Fig 2

[@Pre-treatment(N = 18)@On-treatment(N = 18)@Post-treatment(N = 18)@
[A@
[All subject@17 (94.4) 175@7 (38.9) 9@18 (100.0) 773@
[  Upper Respiratory Tract Infection@1 (5.6) 1@0@13 (72.2) 48@
[  Gastroenteritis@2 (11.1) 2@0@11 (61.1) 18@
[  Urinary tract infection@2 (11.1) 2@0@10 (55.6) 16@
[  Nasopharyngitis@1 (5.6) 1@0@9 (50.0) 11@
[  Herpes zoster@1 (5.6) 1@0@9 (50.0) 16@
[  Bronchitis@3 (16.7) 3@1 (5.6) 1@9 (50.0) 16@
[  Hypersensitivity@5 (27.8) 6@1 (5.6) 1@9 (50.0) 18@
[  Drug hypersensitivity@0@0@8 (44.4) 9@
[  Anaemia@2 (11.1) 2@0@7 (38.9) 7@
[  Sepsis@0@0@7 (38.9) 8@
[  Vertico@0@0@7 (38.9) 14@
[  Dizziness@0@0@7 (38.9) 7@

With the delimiter “@” identifying the columns of the table, it is then just matter of looping through and separates each column into a variable. Fig 3 is the final result.

Fig 3


%callR

textreadr::read_rtf parsed an RTF table with ease; the next question is how do we wrap this inside SAS macro? You can use SAS IML for this or write your own SAS to R interface and there are a number of papers on this topic [4]. Here I introduce my own such interface; the %callR macro. The details of this is for a separate blog but below shows a complete codes of the %RTFParser macro making use of %callR.

%macro RTFParser(ifile =          /* input RTF file */
                ,ofile = ofile    /* output dataset */
                ,delim = @        /* delimiter      */
                );

  *** calling SAS to R interface (%callR) ***;
  %callR(
  
    library(textreadr);
    library(dplyr);
    library(haven);
    
    read_rtf_df <- textreadr::read_rtf("&ifile"
                                       ,skip         = 0
                                       ,remove.empty = TRUE
                                       ,trim         = TRUE
                                       ,row_start    = "["
                                       ,row_end      = ""
                                       ,cell_end     = "&delim"
                                       );
    
    read_rtf_df <- as.data.frame(read_rtf_df);
    names(read_rtf_df) <- "ALLCOLS";
    RTFParser_dset <- mutate_if(read_rtf_df, is.factor, as.character);
    
    ,rconsoleto  = PGM
    ,srconsoleto = rresult1.rlg
    ,df2sas7bdat = RTFParser_dset
  );
  
  proc sql noprint;
  select max(countw(ALLCOLS, "&delim") - 1) into: maxcol
  from RTFParser_dset
  ;
  quit;
  %let maxcol = %sysfunc(compress(&maxcol));
  
  data &ofile;
    set RTFParser_dset;
    array cols $200 c1-c&maxcol;
    do c = 1 to &maxcol;
      cols(c) = scan(ALLCOLS, c, "&delim");
    end;
    c1 = substr(c1, 2);
    if substr(c1,1,1) = '??' then c1 = ' ';
    keep c1-c&maxcol;
  run;

%mend;

/*
%RTFParser(ifile = mytable.rtf
          ,odset = odset 
          ,delim = @
          );

proc print data=odset;
run;
*/


Conclusion

Parsing an RTF table is a ‘challenge’ but with some creativeness and the ‘right’ tools, a simple and elegant solution is possible. The versatility of this solution however depends on quality of the underlying parser. In this case textreadr::read_rtf.


Reference

[1] Igor Goldfarb: PharmaSUG 2020 - Paper AD-106Macro to Produce SAS??-Readable Table of Content from TLF Shell
https://www.pharmasug.org/proceedings/2020/AD/PharmaSUG-2020-AD-106.pdf

[2] Neeral Beladia: PharmaSUG 2010 - Paper TT13 %rtf2data : A utility macro to convert RTF Table to SAS?? dataset
https://www.lexjansen.com/pharmasug/2010/TT/TT13.pdf

[3] Duong Tran: PhUSE 2008 - %RTFparser
https://www.lexjansen.com/phuse/2008/po/PO03.pdf

[4] Duong Tran: rpubs.com Dec 2020 - Calling R from Windows SAS
https://rpubs.com/dtran01/blog4

Contact

Email: