You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
The current filter pushdown APIs in DataFusion (FilterPushdownPropagation, PredicateSupports, etc.) have grown organically but now appear convoluted and redundant. The complex layering of abstractions makes the filter pushdown mechanism difficult to understand, maintain, and extend.
Complex mental model requiring developers to track multiple states and transformations
Lack of clear documentation about the conceptual model and flow
Inconsistent naming conventions (e.g., all_supported creates new objects while make_supported transforms existing ones)
These issues increase the learning curve for new contributors and make maintenance more challenging for all developers.
Describe the solution you'd like
Redesign the filter pushdown APIs with a focus on simplicity, consistency, and clarity:
Reduce abstraction layers: Consolidate the multiple wrappers into fewer, more focused data structures.
Consistent API patterns: Use clear naming conventions:
with_* for non-mutating methods that return new objects
mark_* for transformations
collect_* for extraction methods
Simplified core data structures:
/// A predicate with its support status for pushdownenumPredicateWithSupport{Supported(Arc<dynPhysicalExpr>),Unsupported(Arc<dynPhysicalExpr>),}/// Collection of predicates with clearly defined operationsstructPredicates{// Core operations that are intuitive to use// ...}/// Clear result type for pushdown operationsstructFilterPushdownResult<T>{pushed_predicates:Vec<Arc<dynPhysicalExpr>>,retained_predicates:Vec<Arc<dynPhysicalExpr>>,updated_plan:Option<T>,}
More declarative approach: Let execution plan nodes declare which predicates they support rather than relying on complex negotiation.
Better documentation: Add clear documentation about the mental model, flow, and expected usage patterns.
Test coverage: Ensure robust test coverage for the new APIs to prevent regressions.
This redesign should aim to reduce cognitive load for developers while maintaining all current functionality. It should also make future extensions to the filter pushdown system more straightforward.
Describe alternatives you've considered
No response
Additional context
The current implementation reflects the complexity of the problem space, but I believe it could be made more approachable with a clearer design focused on the essential operations and better documentation of the conceptual model.
The text was updated successfully, but these errors were encountered:
Clearly what we have now needs work but I think I'd like to defer cleaning this up until some other folks try to implement more things with these APIs (e.g. join filter pushdown) which will give us both:
More ideas / brains behind API design.
More use cases to validate that the design is correct.
@alamb does that sound reasonable? It means any changes are more of a breaking change but since these APIs are very new and complex I think it's reasonable to have a bit of churn before they stabilize.
In general I agree with the premise that making the filter pushdown APIs easier to use / understand would be very valuable to DataFusion -- the goals @kosiew describe all sound wonderful to me
I think the key question would be exactly what the new API would look like and how much API churn it would entail
I do think getting more ideas would be valuable and I would enjoy reviewing proposals. As @adriangb mentions, one thing that might help drive / force these API changes is trying to add new features.
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem or challenge?
The current filter pushdown APIs in DataFusion (FilterPushdownPropagation, PredicateSupports, etc.) have grown organically but now appear convoluted and redundant. The complex layering of abstractions makes the filter pushdown mechanism difficult to understand, maintain, and extend.
Specific issues include:
These issues increase the learning curve for new contributors and make maintenance more challenging for all developers.
Describe the solution you'd like
Redesign the filter pushdown APIs with a focus on simplicity, consistency, and clarity:
Reduce abstraction layers: Consolidate the multiple wrappers into fewer, more focused data structures.
Consistent API patterns: Use clear naming conventions:
More declarative approach: Let execution plan nodes declare which predicates they support rather than relying on complex negotiation.
Better documentation: Add clear documentation about the mental model, flow, and expected usage patterns.
Test coverage: Ensure robust test coverage for the new APIs to prevent regressions.
This redesign should aim to reduce cognitive load for developers while maintaining all current functionality. It should also make future extensions to the filter pushdown system more straightforward.
Describe alternatives you've considered
No response
Additional context
The current implementation reflects the complexity of the problem space, but I believe it could be made more approachable with a clearer design focused on the essential operations and better documentation of the conceptual model.
The text was updated successfully, but these errors were encountered: