This repository was archived by the owner on Apr 10, 2024. It is now read-only.
This repository was archived by the owner on Apr 10, 2024. It is now read-only.
"Predicate pushdown" in group-bys #7
Open
Description
xref #15
I brought this up at SciPy 2015, but there's a significant performance win available in expressions like:
df[boolean_cond].groupby(grouping_exprs).agg(agg_expr)
If you do this currently, it will produce a fully materialized copy of df
even if the groupby only touches a small portion of the DataFrame. Ideally, we'd have:
df.groupby(grouping_exprs, where=boolean_cond).agg(...)
I put this as a design / pandas2 issue because the boolean bytes / bits will need to get pushed down into the various C-level groupby subroutines.