Background

The R Journal is the open access, refereed journal of the R project for statistical computing. Since it began in 2009, articles have been created in PDF, using LaTeX to convert marked up plain text to the final version with the formatted text, tables, figures, etc. Over the past few years, the R Journal has been developing a new website, with the option to publish articles in HTML format (https://rjournal.github.io/). This means that authors can include interactive graphics and animations in their articles. The HTML format is also easier to browse online and more accessible for users of assistive tools and technologies such as screen readers. However, most published articles are only available as PDF. My project was to work on converting published articles to HTML.

There are numerous benefits of HTML format:

  1. Articles can include interactive graphics and tables.
  2. The format is more accessible to screen readers and other assistive technologies, making the work more accessible to vision-impaired researchers.
  3. HTML provides a more comfortable reading experience on mobile devices, which are increasingly used as researchers work on the move and share work via social media.
  4. Search engines can easily access the full text of articles, facilitating discovery of published articles.

As a Google Summer of Code contributor, I could understand the problem and had enough confidence to implement a solution for it.

Work Done In Past

A major part of the project was done during GSoC 2022 where I developed two packages texor and rebib which provided tools to do the conversions in a fast and somewhat automated manner.

Key Highlights of the packages were:

  1. Bibliography conversion and aggregation (rebib)
  2. Article Pre-Processing in R to modify LaTeX source.
  3. Pandoc Lua Filters to facilitate various features.
  4. Automation and conversion workflow tools to convert the articles.
  5. Inclusion of LaTeX specific Environments like Tikz and algorithm2e
  6. Extracting and Typesetting metadata in R markdown (inspired by rj package).

Present status of the project

The key Goals for my Google Summer of Code Project in 2023 was to do the following :

  1. Submit texor and rebib to CRAN
    A part of the proposal was to prepare the existing packages and make them ready to be deployed on CRAN. It took a few attempts, but at-last both packages were live on CRAN. I would like to specially thank the CRAN team for reviewing my package and offering me suggestions for the shortcomings.

  2. Convert all the article in R Journal from LaTeX to R markdown
    A major task was to convert all the legacy/LaTeX articles to R markdown in R Journal website. All the articles are expected to be converted around the end of GSoC period. These will be deployed to the actual website by the end of this year or early next year.

  3. Collaborate on writing a R Journal paper
    As a way to spread the word around for the article conversion and introducing the new packages to authors, I have been collaborating in writing and putting together an R Journal paper for the packages. There are some fun supplementary materials to try out and get a feel of converting legacy articles. It is expected to be published sometime in 2024.

  4. Make Improvements to the package
    There were many improvements to the package, specifically due to the changes in pandoc v3.0, where a Figure class was introduced. Thanks to the excellent work of pandoc authors, who are constantly improving the pandoc software (Pandoc is an essential dependency of the texor package) these conversions were made possible.

A summarized change-log of the package:

Future goals

I intend to continue maintaining the package(s) and improving its functionality. I always thought of writing custom pandoc readers and writers to provide a more elegant solution for the conversions. However due to constraints in experience and skill, it remains something to be explored in future.

There could be more updates to the packages to make it more user-friendly and with a better warning and alert system.

Closing thoughts

As a perfect closure to this project, the R Journal article should be out sometime in 2024. The article will spread the word about the conversions of legacy articles and also encourage authors to transition to the R markdown format.

The project will remain relevant for all authors who wish to use it to convert their future/in progress articles and transition to the R markdown format.

While the software is not perfect, it does a great job of converting given certain limits. Nonetheless, I am very happy with the current state of the packages and the web articles it generates.

I am thankful to all my mentors and contributors who were a part of these 2 GSoC projects and the R Journal article:

  1. Dianne Cook
  2. Mitch O’Hara-Wild
  3. Heather Turner
  4. Christophe Dervieux

I am also thankful to Google for creating such an opportunity and funding me under the Google Summer of Code project. It was a beautiful experience !