Skip to content

Maximum number of nodes exceeded in constant #120173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
juntyr opened this issue Jan 20, 2024 · 14 comments
Open

Maximum number of nodes exceeded in constant #120173

juntyr opened this issue Jan 20, 2024 · 14 comments
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-discussion Category: Discussion or questions that doesn't represent real issues. C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@juntyr
Copy link
Contributor

juntyr commented Jan 20, 2024

In

const VALTREE_MAX_NODES: usize = 100000;

the VALTREE_MAX_NODES is hardcoded. I managed to hit this limit with the rough following setup, producing the "Maximum number of nodes exceeded in constant" error:

  1. https://docs.rs/const-type-layout/latest/const_type_layout/fn.serialise_type_graph.html is a const fn generating a byte array that describes the layout of a type
  2. I define the following helper trait to check whether two byte arrays are the same:
pub trait Same<const A: &'static [u8], const B: &'static [u8]> {}

impl<const AB: &'static [u8]> Same<AB, AB> for () {}

pub const fn check<const A: &'static [u8], const B: &'static [u8]>() where (): Same<A, B> {}
  1. I define the following const to assert that the produced const byte array matches a byte string (doing so in a const fn also works but taking a while and crashes cargo check in my GitHub CI with exit code 143)
const _: () = check::<&{serialise_type_graph::<MyVeryComplexType>()}, b"my-very-long-byte-string">();

Is there a way to rewrite my code to not this limit as easily, would it be possible to make it configurable with an unstable attribute similar to #![recursion_limit], or is there an even better way to check that two byte arrays are the same at compile time (where one is built from a very complex constant)?

Thank you so much for your help!

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 20, 2024
@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 20, 2024
@oli-obk
Copy link
Contributor

oli-obk commented Jan 20, 2024

3. doing so in a const fn also works but taking a while and crashes cargo check in my GitHub CI with exit code 143)

Do you still have the code somewhere for that const fn? Even for 10k bytes it shouldn't take overly long

@juntyr
Copy link
Contributor Author

juntyr commented Jan 20, 2024

I tried to minimise the error and got down to the following playground:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8bb81ea05705676e7b5b58be1b48c9e7

It seems that just having a const byte array of a large size is enough to trigger the error - it seems that every element of the array is counted as a node? Amusingly, changing the length to a slightly smaller value makes the compiler on the playground crash, and changing it to a small value makes everything compile without any errors or warnings about comparing byte slices.

@juntyr
Copy link
Contributor Author

juntyr commented Jan 20, 2024

3. doing so in a const fn also works but taking a while and crashes cargo check in my GitHub CI with exit code 143)

Do you still have the code somewhere for that const fn? Even for 10k bytes it shouldn't take overly long

Yes - here's the complete module that does the equality check. One single comparison probably only takes on the order of a minute, but I am somehow crazy enough to do that 48 times for type checking ... so even small changes drastically change my compilation time.

use const_type_layout::{serialise_type_graph, serialised_type_graph_len, TypeGraphLayout};

#[derive(PartialEq, Eq, core::marker::ConstParamTy)]
pub enum HostAndDeviceKernelSignatureTypeLayout {
    Match,
    Mismatch,
}

pub struct Assert<const MATCH: HostAndDeviceKernelSignatureTypeLayout>;

#[must_use]
pub const fn check<T: TypeGraphLayout>(
    device: &'static [u8],
) -> HostAndDeviceKernelSignatureTypeLayout
where
    [u8; serialised_type_graph_len::<T>()]:,
{
    const SIGNATURE_ERROR_MESSAGE: &[u8] = b"ERROR in this PTX compilation";

    // Short-circuit to avoid extra errors when PTX compilation fails
    if equals(device, SIGNATURE_ERROR_MESSAGE) {
        return HostAndDeviceKernelSignatureTypeLayout::Match;
    }

    let host = serialise_type_graph::<T>();

    if equals(device, &host) {
        HostAndDeviceKernelSignatureTypeLayout::Match
    } else {
        HostAndDeviceKernelSignatureTypeLayout::Mismatch
    }
}

const fn equals(device: &[u8], host: &[u8]) -> bool {
    if host.len() != device.len() {
        return false;
    }

    let mut i = 0;

    while i < host.len() {
        if host[i] != device[i] {
            return false;
        }

        i += 1;
    }

    true
}

For context, this code is run to check that my type layout description, serialised into a byte array, is the same when compiling a CUDA kernel as when compiling the host code.

@juntyr
Copy link
Contributor Author

juntyr commented Jan 20, 2024

From what I can guesstimate right now, constants can be quite large if I mainly materialise them inside const fns that just take a long while to run. Forcing the compiler to do a byte array comparison for me, e.g. by using the const generic trait trick or using match against a literal seems to run a lot faster but runs against the valtree size limit.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 21, 2024

One single comparison probably only takes on the order of a minute,

How large is the specific device string that you are testing here?

@oli-obk
Copy link
Contributor

oli-obk commented Jan 21, 2024

Amusingly, changing the length to a slightly smaller value makes the compiler on the playground crash,

Yea i expect our limit to be too large. We should probably try to lower it.

The issue is that you'll end up with symbol names that contain all bytes of the constant, encoded in an inefficient way. While we could probably special case byte strings (I think we special case strings already), the root issue persists: LLVM and the linker do not like large symbol names

Though in this specific case it looks like a compiler panic or potentially a stack overflow

@juntyr
Copy link
Contributor Author

juntyr commented Jan 21, 2024

The issue is that you'll end up with symbol names that contain all bytes of the constant, encoded in an inefficient way. While we could probably special case byte strings (I think we special case strings already), the root issue persists: LLVM and the linker do not like large symbol names

Hmmm, why is the const given a symbol name that includes all bytes? In my use case, all materialisations of the long byte strings happen either as variables inside const fns or parameters to const fns that feed into const generic params (as in const _: () = check::<&{serialise_type_graph::<MyVeryComplexType>()}, b"my-very-long-byte-string">();) inside trait function impl bodies. As far as I know, none of these constants is exposed as a pub const.

I tested a few more variations of the issue. If I use

match serialise_type_graph::<MyVeryComplexType>() {
    b"my-very-long-byte-string" => ...,
    _ => ...,
}

inside a const, cargo check inside CI crashes very soon after generating all byte strings inside a proc macro.

If I instead use core::intrinsics::compare_bytes (which I picked since it seems like const eval does the comparison on the memory itself, unlike my byte-loop that needs to be interpreted), cargo check spends another 30min doing something before crashing with exit code 143.

So perhaps I am indeed reaching some memory limit, just at different speeds with different const bytestring comparison codepaths.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 22, 2024

Hmmm, why is the const given a symbol name that includes all bytes? In my use case, all materialisations of the long byte strings happen either as variables inside const fns or parameters to const fns that feed into const generic params (as in const _: () = check::<&{serialise_type_graph::<MyVeryComplexType>()}, b"my-very-long-byte-string">();) inside trait function impl bodies. As far as I know, none of these constants is exposed as a pub const.

yea in this case that won't happen, only if you end up with the constant in a signature that actually goes to codegen. We don't want to have code that is only allowed in const eval, but not outside of it, so we use the same limits for it. If you ran the const's body in fn main instead, then the serialize_type_graph would get monomorphized and sent to codegen

due to #119214 runtime code is in fact generated for your constants, but that's a bug obviously

@oli-obk
Copy link
Contributor

oli-obk commented Jan 22, 2024

I did some debugging: Looks like we're recursing very very deeply in compute_exhaustiveness_and_usefulness:

┐rustc_pattern_analysis::usefulness::compute_match_usefulness scrut_ty=&ReErased [u8], scrut_validity=ValidOnly
├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ + &[0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1] +
│ + _ : &ReErased [u8]                                      +
│ | ✓                                                       | // column validity
│ 
│ ├─  0ms DEBUG rustc_pattern_analysis::usefulness ty: &ReErased [u8]
│ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=&ReErased [u8]
│ │ ├─  0ms TRACE rustc_middle::ty::visit has_type_flags, self=&ReErased [u8], flags=TypeFlags(HAS_ERROR), res=false
│ │ ├─  0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Ref)
│ ├─┘
│ ├─┐rustc_pattern_analysis::constructor::split 
│ │ ├─  0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [Ref], missing: [], missing_empty: [] }
│ ├─┘
│ ├─  0ms DEBUG rustc_pattern_analysis::usefulness specialize(Ref)
│ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=Ref, ty=&ReErased [u8]
│ ├─┘
│ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ │ + [0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1, 0..1] +
│ │ + _                                                      +
│ │ | ?                                                      | // column validity
│ │ 
│ │ ├─  0ms DEBUG rustc_pattern_analysis::usefulness ty: [u8]
│ │ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=[u8]
│ │ │ ├─  0ms TRACE rustc_middle::ty::visit has_type_flags, self=[u8], flags=TypeFlags(HAS_ERROR), res=false
│ │ │ ├─┐rustc_middle::ty::inhabitedness::inhabited_predicate self=u8
│ │ │ │ ├─  0ms TRACE rustc_middle::ty::visit has_type_flags, self=u8, flags=TypeFlags(HAS_TY_INFER | HAS_RE_INFER | HAS_CT_INFER), res=false
│ │ │ │ ├─  0ms DEBUG rustc_middle::ty::inhabitedness return=True
│ │ │ ├─┘
│ │ │ ├─┐rustc_middle::ty::inhabitedness::inhabited_predicate::apply_inner self=True, eval_stack=[]
│ │ │ │ ├─  0ms DEBUG rustc_middle::ty::inhabitedness::inhabited_predicate return=Ok(true)
│ │ │ ├─┘
│ │ │ ├─  0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Slice { array_len: None, subtype_is_empty: false })
│ │ ├─┘
│ │ ├─┐rustc_pattern_analysis::constructor::split 
│ │ │ ├─  0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [Slice(Slice { array_len: None, kind: FixedLen(9) })], missing: [Slice(Slice { array_len: None, kind: FixedLen(0) }), Slice(Slice { array_len: None, kind: FixedLen(1) }), Slice(Slice { array_len: None, kind: FixedLen(2) }), Slice(Slice { array_len: None, kind: FixedLen(3) }), Slice(Slice { array_len: None, kind: FixedLen(4) }), Slice(Slice { array_len: None, kind: FixedLen(5) }), Slice(Slice { array_len: None, kind: FixedLen(6) }), Slice(Slice { array_len: None, kind: FixedLen(7) }), Slice(Slice { array_len: None, kind: FixedLen(8) }), Slice(Slice { array_len: None, kind: VarLen(10, 0) })], missing_empty: [] }
│ │ ├─┘
│ │ ├─  0ms DEBUG rustc_pattern_analysis::usefulness specialize(Slice(Slice { array_len: None, kind: FixedLen(9) }))
│ │ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=Slice(Slice { array_len: None, kind: FixedLen(9) }), ty=[u8]
│ │ ├─┘
│ │ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ │ │ + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 +
│ │ │ + _    + _    + _    + _    + _    + _    + _    + _    + _    +
│ │ │ | ?    | ?    | ?    | ?    | ?    | ?    | ?    | ?    | ?    | // column validity
│ │ │ 
│ │ │ ├─  0ms DEBUG rustc_pattern_analysis::usefulness ty: u8
│ │ │ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=u8
│ │ │ │ ├─  0ms TRACE rustc_middle::ty::visit has_type_flags, self=u8, flags=TypeFlags(HAS_ERROR), res=false
│ │ │ │ ├─  0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Integers { range_1: 0..256, range_2: None })
│ │ │ ├─┘
│ │ │ ├─┐rustc_pattern_analysis::constructor::split 
│ │ │ │ ├─  0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [IntRange(0..1)], missing: [IntRange(1..256)], missing_empty: [] }
│ │ │ ├─┘
│ │ │ ├─  0ms DEBUG rustc_pattern_analysis::usefulness specialize(IntRange(0..1))
│ │ │ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=IntRange(0..1), ty=u8
│ │ │ ├─┘
│ │ │ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=
│ │ │ │ + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 + 0..1 +
│ │ │ │ + _    + _    + _    + _    + _    + _    + _    + _    +
│ │ │ │ | ?    | ?    | ?    | ?    | ?    | ?    | ?    | ?    | // column validity
│ │ │ │ 
│ │ │ │ ├─  0ms DEBUG rustc_pattern_analysis::usefulness ty: u8
│ │ │ │ ├─┐rustc_pattern_analysis::rustc::ctors_for_ty ty=u8
│ │ │ │ │ ├─  0ms TRACE rustc_middle::ty::visit has_type_flags, self=u8, flags=TypeFlags(HAS_ERROR), res=false
│ │ │ │ │ ├─  0ms DEBUG rustc_pattern_analysis::rustc return=Ok(Integers { range_1: 0..256, range_2: None })
│ │ │ │ ├─┘
│ │ │ │ ├─┐rustc_pattern_analysis::constructor::split 
│ │ │ │ │ ├─  0ms DEBUG rustc_pattern_analysis::constructor return=SplitConstructorSet { present: [IntRange(0..1)], missing: [IntRange(1..256)], missing_empty: [] }
│ │ │ │ ├─┘
│ │ │ │ ├─  0ms DEBUG rustc_pattern_analysis::usefulness specialize(IntRange(0..1))
│ │ │ │ ├─┐rustc_pattern_analysis::rustc::ctor_sub_tys ctor=IntRange(0..1), ty=u8
│ │ │ │ ├─┘
│ │ │ │ ├─┐rustc_pattern_analysis::usefulness::compute_exhaustiveness_and_usefulness matrix=

@oli-obk
Copy link
Contributor

oli-obk commented Jan 22, 2024

cc @Nadrieril do you have some idea why ctor specialization would give us a single case per element in a slice? Each recursion shrinks the passed matrix by one element.

@Nadrieril
Copy link
Member

I presume this is the match-checking of the matches!(a, A) from above? The constant A expands to a pattern that's literally [0, 0, 0, 0, 0, ....], which we then check for reachability and exhaustiveness. There are currently no shortcuts that would figure out "yeah that's reachable and there's a wildcard so it's exhaustive" unfortunately, so we do the normal checking which goes field-by-field.

@Nadrieril
Copy link
Member

Nadrieril commented Jan 22, 2024

Note that this is different from what we'd to with a variable-length slice pattern like [0, .., 0]. There we only have to check 3 fields. In your case though there are 1000 explicit fields and I don't know of a good way to detect that we can skip through them.

@juntyr
Copy link
Contributor Author

juntyr commented Jan 30, 2024

One single comparison probably only takes on the order of a minute,

How large is the specific device string that you are testing here?

Sorry for the late reply - the byte strings are all between 100kB and 200kB long. I trying out some domain-specific compression right now (much of this length comes from repeated instances of core::any::type_names, so I'm testing some simple de-duplication, though this almost triples my const eval time in producing the byte string) to see if that is enough to get my code to compile again without making rustc error or hang or crash.

@juntyr
Copy link
Contributor Author

juntyr commented Jan 30, 2024

Note that this is different from what we'd to with a variable-length slice pattern like [0, .., 0]. There we only have to check 3 fields. In your case though there are 1000 explicit fields and I don't know of a good way to detect that we can skip through them.

Thank you for that explanation - I hadn't thought of that but it makes total sense now that a match statement needs to be checked for reachability and exhaustiveness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-discussion Category: Discussion or questions that doesn't represent real issues. C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants