Skip to content

Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience #16188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kosiew opened this issue May 26, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@kosiew
Copy link
Contributor

kosiew commented May 26, 2025

Is your feature request related to a problem or challenge?

The current filter pushdown APIs in DataFusion (FilterPushdownPropagation, PredicateSupports, etc.) have grown organically but now appear convoluted and redundant. The complex layering of abstractions makes the filter pushdown mechanism difficult to understand, maintain, and extend.

Specific issues include:

  • Multiple overlapping abstraction layers (PredicateSupport, PredicateSupports, FilterDescription, etc.)
  • Redundant helper methods with inconsistent naming patterns (.unsupported(), .transparent(), .with_filters(), .with_updated_node(), .new_with_supported_check(), .collect_supported(), .is_all_supported(), etc.)
  • Complex mental model requiring developers to track multiple states and transformations
  • Lack of clear documentation about the conceptual model and flow
  • Inconsistent naming conventions (e.g., all_supported creates new objects while make_supported transforms existing ones)

These issues increase the learning curve for new contributors and make maintenance more challenging for all developers.

Describe the solution you'd like

Redesign the filter pushdown APIs with a focus on simplicity, consistency, and clarity:

  1. Reduce abstraction layers: Consolidate the multiple wrappers into fewer, more focused data structures.

  2. Consistent API patterns: Use clear naming conventions:

  • with_* for non-mutating methods that return new objects
  • mark_* for transformations
  • collect_* for extraction methods
  1. Simplified core data structures:
/// A predicate with its support status for pushdown
enum PredicateWithSupport {
    Supported(Arc<dyn PhysicalExpr>),
    Unsupported(Arc<dyn PhysicalExpr>),
}

/// Collection of predicates with clearly defined operations
struct Predicates {
    // Core operations that are intuitive to use
    // ...
}

/// Clear result type for pushdown operations
struct FilterPushdownResult<T> {
    pushed_predicates: Vec<Arc<dyn PhysicalExpr>>,
    retained_predicates: Vec<Arc<dyn PhysicalExpr>>,
    updated_plan: Option<T>,
}
  1. More declarative approach: Let execution plan nodes declare which predicates they support rather than relying on complex negotiation.

  2. Better documentation: Add clear documentation about the mental model, flow, and expected usage patterns.

  3. Test coverage: Ensure robust test coverage for the new APIs to prevent regressions.

This redesign should aim to reduce cognitive load for developers while maintaining all current functionality. It should also make future extensions to the filter pushdown system more straightforward.

Describe alternatives you've considered

No response

Additional context

The current implementation reflects the complexity of the problem space, but I believe it could be made more approachable with a clearer design focused on the essential operations and better documentation of the conceptual model.

@adriangb
Copy link
Contributor

Thank you @kosiew.

Clearly what we have now needs work but I think I'd like to defer cleaning this up until some other folks try to implement more things with these APIs (e.g. join filter pushdown) which will give us both:

  1. More ideas / brains behind API design.
  2. More use cases to validate that the design is correct.

@alamb does that sound reasonable? It means any changes are more of a breaking change but since these APIs are very new and complex I think it's reasonable to have a bit of churn before they stabilize.

@alamb
Copy link
Contributor

alamb commented May 28, 2025

Thank you for this ticket @kosiew and @adriangb

In general I agree with the premise that making the filter pushdown APIs easier to use / understand would be very valuable to DataFusion -- the goals @kosiew describe all sound wonderful to me

I think the key question would be exactly what the new API would look like and how much API churn it would entail

I do think getting more ideas would be valuable and I would enjoy reviewing proposals. As @adriangb mentions, one thing that might help drive / force these API changes is trying to add new features.

@xudong963
Copy link
Member

I still don't get a chance to read the whole Filter pushdown APIs, but will do in a week, then will give some feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants