-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Maximum number of nodes exceeded in constant #120173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do you still have the code somewhere for that const fn? Even for 10k bytes it shouldn't take overly long |
I tried to minimise the error and got down to the following playground: It seems that just having a const byte array of a large size is enough to trigger the error - it seems that every element of the array is counted as a node? Amusingly, changing the length to a slightly smaller value makes the compiler on the playground crash, and changing it to a small value makes everything compile without any errors or warnings about comparing byte slices. |
Yes - here's the complete module that does the equality check. One single comparison probably only takes on the order of a minute, but I am somehow crazy enough to do that 48 times for type checking ... so even small changes drastically change my compilation time. use const_type_layout::{serialise_type_graph, serialised_type_graph_len, TypeGraphLayout};
#[derive(PartialEq, Eq, core::marker::ConstParamTy)]
pub enum HostAndDeviceKernelSignatureTypeLayout {
Match,
Mismatch,
}
pub struct Assert<const MATCH: HostAndDeviceKernelSignatureTypeLayout>;
#[must_use]
pub const fn check<T: TypeGraphLayout>(
device: &'static [u8],
) -> HostAndDeviceKernelSignatureTypeLayout
where
[u8; serialised_type_graph_len::<T>()]:,
{
const SIGNATURE_ERROR_MESSAGE: &[u8] = b"ERROR in this PTX compilation";
// Short-circuit to avoid extra errors when PTX compilation fails
if equals(device, SIGNATURE_ERROR_MESSAGE) {
return HostAndDeviceKernelSignatureTypeLayout::Match;
}
let host = serialise_type_graph::<T>();
if equals(device, &host) {
HostAndDeviceKernelSignatureTypeLayout::Match
} else {
HostAndDeviceKernelSignatureTypeLayout::Mismatch
}
}
const fn equals(device: &[u8], host: &[u8]) -> bool {
if host.len() != device.len() {
return false;
}
let mut i = 0;
while i < host.len() {
if host[i] != device[i] {
return false;
}
i += 1;
}
true
} For context, this code is run to check that my type layout description, serialised into a byte array, is the same when compiling a CUDA kernel as when compiling the host code. |
From what I can guesstimate right now, constants can be quite large if I mainly materialise them inside const fns that just take a long while to run. Forcing the compiler to do a byte array comparison for me, e.g. by using the const generic trait trick or using match against a literal seems to run a lot faster but runs against the valtree size limit. |
How large is the specific device string that you are testing here? |
Yea i expect our limit to be too large. We should probably try to lower it. The issue is that you'll end up with symbol names that contain all bytes of the constant, encoded in an inefficient way. While we could probably special case byte strings (I think we special case strings already), the root issue persists: LLVM and the linker do not like large symbol names Though in this specific case it looks like a compiler panic or potentially a stack overflow |
Hmmm, why is the const given a symbol name that includes all bytes? In my use case, all materialisations of the long byte strings happen either as variables inside const fns or parameters to const fns that feed into const generic params (as in I tested a few more variations of the issue. If I use match serialise_type_graph::<MyVeryComplexType>() {
b"my-very-long-byte-string" => ...,
_ => ...,
} inside a const, cargo check inside CI crashes very soon after generating all byte strings inside a proc macro. If I instead use So perhaps I am indeed reaching some memory limit, just at different speeds with different const bytestring comparison codepaths. |
yea in this case that won't happen, only if you end up with the constant in a signature that actually goes to codegen. We don't want to have code that is only allowed in const eval, but not outside of it, so we use the same limits for it. If you ran the const's body in due to #119214 runtime code is in fact generated for your constants, but that's a bug obviously |
I did some debugging: Looks like we're recursing very very deeply in ┐rustc_pattern_analysis::usefulness::compute_match_usefulness scrut_ty=&ReErased [u8], scrut_validity=ValidOnly
├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ + &[0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1] +
│ + _ : &ReErased [u8] +
│ | ✓ | // column validity
│
│ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness ty: &ReErased [u8]
│ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=&ReErased [u8]
│ │ ├─ 0ms TRACE rustc_middle::ty::visit has_type_flags, self=&ReErased [u8], flags=TypeFlags(HAS_ERROR), res=false
│ │ ├─ 0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Ref)
│ ├─┘
│ ├─┐rustc_pattern_analysis::constructor::split
│ │ ├─ 0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [Ref], missing: [], missing_empty: [] }
│ ├─┘
│ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness specialize(Ref)
│ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=Ref, ty=&ReErased [u8]
│ ├─┘
│ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ │ + [0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1] +
│ │ + _ +
│ │ | ? | // column validity
│ │
│ │ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness ty: [u8]
│ │ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=[u8]
│ │ │ ├─ 0ms TRACE rustc_middle::ty::visit has_type_flags, self=[u8], flags=TypeFlags(HAS_ERROR), res=false
│ │ │ ├─┐rustc_middle::ty::inhabitedness::inhabited_predicate self=u8
│ │ │ │ ├─ 0ms TRACE rustc_middle::ty::visit has_type_flags, self=u8, flags=TypeFlags(HAS_TY_INFER | HAS_RE_INFER | HAS_CT_INFER), res=false
│ │ │ │ ├─ 0ms DEBUG rustc_middle::ty::inhabitedness return=True
│ │ │ ├─┘
│ │ │ ├─┐rustc_middle::ty::inhabitedness::inhabited_predicate::apply_inner self=True, eval_stack=[]
│ │ │ │ ├─ 0ms DEBUG rustc_middle::ty::inhabitedness::inhabited_predicate return=Ok(true)
│ │ │ ├─┘
│ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Slice { array_len: None, subtype_is_empty: false })
│ │ ├─┘
│ │ ├─┐rustc_pattern_analysis::constructor::split
│ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [Slice(Slice { array_len: None, kind: FixedLen(9) })], missing: [Slice(Slice { array_len: None, kind: FixedLen(0) }), Slice(Slice { array_len: None, kind: FixedLen(1) }), Slice(Slice { array_len: None, kind: FixedLen(2) }), Slice(Slice { array_len: None, kind: FixedLen(3) }), Slice(Slice { array_len: None, kind: FixedLen(4) }), Slice(Slice { array_len: None, kind: FixedLen(5) }), Slice(Slice { array_len: None, kind: FixedLen(6) }), Slice(Slice { array_len: None, kind: FixedLen(7) }), Slice(Slice { array_len: None, kind: FixedLen(8) }), Slice(Slice { array_len: None, kind: VarLen(10, 0) })], missing_empty: [] }
│ │ ├─┘
│ │ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness specialize(Slice(Slice { array_len: None, kind: FixedLen(9) }))
│ │ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=Slice(Slice { array_len: None, kind: FixedLen(9) }), ty=[u8]
│ │ ├─┘
│ │ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ │ │ + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 +
│ │ │ + _ + _ + _ + _ + _ + _ + _ + _ + _ +
│ │ │ | ? | ? | ? | ? | ? | ? | ? | ? | ? | // column validity
│ │ │
│ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness ty: u8
│ │ │ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=u8
│ │ │ │ ├─ 0ms TRACE rustc_middle::ty::visit has_type_flags, self=u8, flags=TypeFlags(HAS_ERROR), res=false
│ │ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Integers { range_1: 0..256, range_2: None })
│ │ │ ├─┘
│ │ │ ├─┐rustc_pattern_analysis::constructor::split
│ │ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [IntRange(0..1)], missing: [IntRange(1..256)], missing_empty: [] }
│ │ │ ├─┘
│ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness specialize(IntRange(0..1))
│ │ │ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=IntRange(0..1), ty=u8
│ │ │ ├─┘
│ │ │ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ │ │ │ + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 +
│ │ │ │ + _ + _ + _ + _ + _ + _ + _ + _ +
│ │ │ │ | ? | ? | ? | ? | ? | ? | ? | ? | // column validity
│ │ │ │
│ │ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness ty: u8
│ │ │ │ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=u8
│ │ │ │ │ ├─ 0ms TRACE rustc_middle::ty::visit has_type_flags, self=u8, flags=TypeFlags(HAS_ERROR), res=false
│ │ │ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Integers { range_1: 0..256, range_2: None })
│ │ │ │ ├─┘
│ │ │ │ ├─┐rustc_pattern_analysis::constructor::split
│ │ │ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [IntRange(0..1)], missing: [IntRange(1..256)], missing_empty: [] }
│ │ │ │ ├─┘
│ │ │ │ ├─ 0ms DEBUG rustc_pattern_analysis::usefulness specialize(IntRange(0..1))
│ │ │ │ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=IntRange(0..1), ty=u8
│ │ │ │ ├─┘
│ │ │ │ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix= |
cc @Nadrieril do you have some idea why ctor specialization would give us a single case per element in a slice? Each recursion shrinks the passed matrix by one element. |
I presume this is the match-checking of the |
Note that this is different from what we'd to with a variable-length slice pattern like |
Sorry for the late reply - the byte strings are all between 100kB and 200kB long. I trying out some domain-specific compression right now (much of this length comes from repeated instances of |
Thank you for that explanation - I hadn't thought of that but it makes total sense now that a match statement needs to be checked for reachability and exhaustiveness. |
Uh oh!
There was an error while loading. Please reload this page.
In
rust/compiler/rustc_const_eval/src/const_eval/mod.rs
Line 24 in 1828461
the
VALTREE_MAX_NODES
is hardcoded. I managed to hit this limit with the rough following setup, producing the "Maximum number of nodes exceeded in constant" error:cargo check
in my GitHub CI with exit code 143)Is there a way to rewrite my code to not this limit as easily, would it be possible to make it configurable with an unstable attribute similar to
#![recursion_limit]
, or is there an even better way to check that two byte arrays are the same at compile time (where one is built from a very complex constant)?Thank you so much for your help!
The text was updated successfully, but these errors were encountered: