Google Summer of Code 2024 - pytourr
pytourr: interactive visualization interface for the tourr package
Background
The tourr R package enables users to explore shapes of high-dimensional data to gain further insight into their data of interest. This is achieved by visualizing low(er) dimensional projections of the data. As tours aren’t just singular projections, but rather sequences of projections, an integral part of the tourr package focuses on animating tours. Allowing users to interact with the tours to direct them towards projections of particular interest considerably improves their usefulness. However, the animations currently implemented in the tourr package don’t offer interactivity.
Previous work
The galahr package was an attempt to add interactivity by means of the R packages shiny and plotly, but ultimately suffered from performance issues. To overcome these issues, the detourr package, which precomputes multiple projections in R and subsequently visualizes them in an interactive display using Javascript, has been developed. It does add some interactivity, but the interactivity is limited to displaying the precomputed projections. The functionality to instruct the computation of new projections interactively is not implemented. Further, attempts were made by the mmtour project written in Mathematica and High-Dimensional-Data-Visualisation project written in Python. The High-Dimensional-Data-Visualisation project has shown that Pythons’ matplotlib offers the desired level of interactivity to tourr.
Project aim
The goal of this project was to develop a user-friendly and high-performance R package that simplifies interaction with tours generated by the tourr R-package. This was accomplished by integrating the tourr functionality with a Python backend through reticulate. Using a Python backend enhances performance, particularly for interactive plotting, offering a more fluid experience for the end user.
Contribution
The pytourr package was developed from the ground up as part of this project. The complete code and documentation are available on github.
Current state
As of the date of submission the pytourr offers
- Fast-rendering interactive GUI
- Seven different interactive plot types
- Interactive feature selection
- Interactive subset selection and highlighting
- Navigation through tour projections
- Initiate new tours directly from the GUI
- Save current projections, subselections, and plots
Future work
- Cleaning up the interactive elements in the left-hand panel
- Streamlining the codebase to support easier development of additional interactive plots by third parties
- Implementing new interactive plot types
- Submitting the package to CRAN
- Making the package accessible to Python users
- Publishing an article about the software in a scientific journal
Learnings
One of the key insights from the project was realizing how blitting (updating only parts of a plot for dynamic content) can enhance performance in interactive plots. Implementing blitting led to a ~3 fold performance increase for a smaller dataset (flea dataset; n=74, p=6) and a ~8 fold performance increase for a larger dataset (winterActiv dataset; n=2961, p=27). (Evaluated on a AMD Ryzen 7 4700U CPU).
The simultaneous use of R and Python through reticulate (and rpy2) works seamlessly once you become familiar with it. Therefore, I highly recommend not limiting yourself to just one of these programming languages when starting a new project. Instead, take advantage of the strengths of both to optimize different aspects of your workflow!
Getting different objects to interact can be challenging. I recommend that all elements interacting with one another access centrally stored objects containing the necessary information, which should be referenced during updates. It’s crucial to carefully decide when to use copies of objects and when to rely on pointers. Misusing pointers can result in unintended overwriting of data that should be preserved, while overusing copies can lead to a loss of interactivity, as interactive elements may no longer reference the same object for updates. Any information relevant to only one element should be stored with that element.
Acknowledgments
I would like to thank my mentors Ursula Laa and Dianne Cook for their supervision and invaluable input throughout this project.
Additionally, I extend my gratitude to Google for sponsoring this work.