{tidypolars}

Tidypolars is a data frame library built on top of the blazingly fast polars library, providing a tidy interface for R users familiar with the tidyverse.

Use Cases

Here are some common use cases:

  1. Data Manipulation:
    • Tidypolars provides a tidyverse-like interface for data manipulation tasks. You can filter rows, select columns, arrange data, and create new variables using familiar syntax.
    • Example: Filtering rows based on conditions, creating new columns, and summarizing data.
  2. Performance Optimization:
    • Polars, the underlying library, is designed for speed and efficiency. Tidypolars leverages this performance to handle large datasets efficiently.
    • Use it when you need to process large data frames quickly.
  3. Joining Data Frames:
    • Tidypolars supports various join operations (inner, outer, left, right) to combine data frames.
    • Example: Merging two data frames based on common keys.
  4. Aggregations and Grouping:
    • You can group data by one or more columns and perform aggregations (sum, mean, count, etc.) within each group.
    • Useful for summarizing data at different levels.
  5. Window Functions:
    • Tidypolars allows you to apply window functions (rolling calculations) to data frames.
    • Example: Calculating moving averages, cumulative sums, or ranking within partitions.
  6. Efficient Data Processing:
    • If you work with large datasets and need performance gains, tidypolars is a great choice.
    • It’s especially useful when you want to maintain a tidy data workflow.

Installation

Remember to install tidypolars from R-universe to explore its capabilities further!


Let’s explore the differences between polars and dplyr:

  1. Column Referencing:
    • dplyr allows column references without quotation marks due to non-standard evaluation (NSE). It captures expressions passed as arguments, making the syntax more user-friendly.
    • In polars, column references typically need explicit quoting or methods attached to data frames (e.g., polars.col()).
  2. Performance:
    • polars is heavily optimized for performance. Users can expect significant speed improvements (orders of magnitude) compared to dplyr for large datasets (>500MB).
    • Automatic optimization in polars can further boost performance for complex queries.
  3. Function Names:
    • Both packages have similar function names (e.g., filter()), but polars consistently uses snake case verbs for intuitive inputs and outputs.
    • dplyr relies on NSE, while polars adheres to standard evaluation.

Remember that polars is relatively new in the Python world, and it’s worth exploring its capabilities!


Tidypolars allows you to work with data frames using methods that resemble tidyverse functions. For example:

import tidypolars as tp
from tidypolars import col, desc

df = tp.Tibble(x=range(3), y=range(3, 6), z=['a', 'a', 'b'])
result = (df
    .select('x', 'y', 'z')
    .filter(col('x') < 4, col('y') > 1)
    .arrange(desc('z'), 'x')
    .mutate(double_x=col('x') * 2, x_plus_y=col('x') + col('y'))
)

# Resulting data frame:
#   x  y  z  double_x  x_plus_y
# 0 2  5  b        4         7
# 1 0  3  a        0         3
# 2 1  4  a        2         5

Remember that in tidypolars, column names must be wrapped in col() for certain methods like .filter(), .mutate(), and .summarize(). Grouping by columns is also straightforward using the by argument¹. You can install tidypolars from R-universe on Windows, macOS, or Linux².


References

  1. GitHub - markfairbanks/tidypolars: Tidy interface to polars. https://github.com/markfairbanks/tidypolars.
  2. Get the Power of Polars with the Syntax of the Tidyverse • tidypolars. https://www.tidypolars.etiennebacher.com/.
  3. tidypolars · PyPI. https://pypi.org/project/tidypolars/.
  4. undefined. https://etiennebacher.r-universe.dev.
  5. github.com. https://github.com/markfairbanks/tidypolars/tree/dd839b890a8c9daee54efa4eaaf3dc766c07d4a0/README.md.

References

  1. Tidy Data Manipulation: dplyr vs polars – Tidily. https://blog.tidy-intelligence.com/posts/dplyr-vs-polars/.
  2. An Introduction to Polars from R. https://cran.r-universe.dev/polars/doc/polars.html.
  3. polars’ Rgonomic Patterns | Emily Riederer. https://www.emilyriederer.com/post/py-rgo-polars/.
  4. difference between plyr::mutate and dplyr::mutate - Stack Overflow. https://stackoverflow.com/questions/28812512/difference-between-plyrmutate-and-dplyrmutate.

Edit Notes

  • Citrix Build on 25th July 2024
  • UX improvements in 22nd August 2024 (Citrix)