Skip to content

fix: Fix EquivalenceClass calculation for Union queries #16185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

chenkovsky
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

equivalence is not set.

What changes are included in this PR?

compute intersect of equivalence for union

Are these changes tested?

UT

Are there any user-facing changes?

No

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels May 25, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chenkovsky

FYI @mustafasrepo I wonder if you have a few minutes to review this PR

@@ -422,6 +423,32 @@ impl EquivalenceGroup {
self.bridge_classes()
}

#[allow(clippy::type_complexity)]
fn adjacent_set(&self) -> HashSet<(&Arc<dyn PhysicalExpr>, &Arc<dyn PhysicalExpr>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add documentation to these functions about what they do and what the returned structure represents?

set
}

pub fn intersect(&self, other: &Self) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, can we please add documentation

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chenkovsky -- I worry about the expoenential time of this algorithm

/// two equivalence groups.
#[allow(clippy::type_complexity)]
fn adjacent_set(&self) -> HashSet<(&Arc<dyn PhysicalExpr>, &Arc<dyn PhysicalExpr>)> {
let mut set = HashSet::new();
Copy link
Contributor

@alamb alamb May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this algorithm is O(N^2) in the number of items in the equivalence class, right?

We have hit other really long planning times when using such algorithms, for example

Is there some way to make this more efficient? Maybe by creating an 2x boolean array to represent the intersections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb now time complexity is O(nlogn)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb , I created another PR. #16211 to ease the symptoms in #13748

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Equivalence System Overhaul is better way, so I closed that PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTHanks @chenkovsky -- I am sorry I didn't have enough time to review this PR in more detail

@alamb alamb changed the title fix: equivalence for union fix: Fix EquivalenceClass calculation for Union queries May 28, 2025
@alamb
Copy link
Contributor

alamb commented May 31, 2025

@chenkovsky chenkovsky closed this Jun 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UNION ALL with ORDER BY and duplicate projection fails sanity checker
2 participants