Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

"Predicate pushdown" in group-bys #7

Open
@wesm

Description

@wesm

xref #15

I brought this up at SciPy 2015, but there's a significant performance win available in expressions like:

df[boolean_cond].groupby(grouping_exprs).agg(agg_expr)

If you do this currently, it will produce a fully materialized copy of df even if the groupby only touches a small portion of the DataFrame. Ideally, we'd have:

df.groupby(grouping_exprs, where=boolean_cond).agg(...)

I put this as a design / pandas2 issue because the boolean bytes / bits will need to get pushed down into the various C-level groupby subroutines.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions