Skip to content

Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for x86amx, bf16(xN) and i1xN #140763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

sayantn
Copy link
Contributor

@sayantn sayantn commented May 7, 2025

This PR changes how LLVM intrinsics are codegen

Explanation of the changes

Current procedure

This is the same for all functions, LLVM intrinsics are not treated specially

  • We get the LLVM Type of a function simply using the argument types. For example, the following function
    #[link_name = "llvm.sqrt.f32"]
    fn sqrtf32(a: f32) -> f32;
    will have LLVM type simply f32 (f32) due to the Rust signature

Pros

  • Simpler to implement, no extra complexity involved due to LLVM intrinsics

Cons

  • LLVM intrinsics have a well-defined signature, completely defined by their name (and if it is overloaded, the type parameters). So, this process of converting Rust signatures to LLVM signatures may not work, for example the following code generates LLVM IR without any problem
    #[link_name = "llvm.sqrt.f32"]
    fn sqrtf32(a: i32) -> f32;
    but the generated LLVM IR is invalid, because it has wrong signature for the intrinsic (Godbolt, adding -Zverify-llvm-ir to it will fail compilation). I would expect this code to not compile at all instead of generating invalid IR.
  • LLVM intrinsics that have types in their signature that can't be accessed from Rust (notable examples are the AMX intrinsics that have the x86amx type, and (almost) all intrinsics that have vectors of i1 types) can't be linked to at all. This is a (major?) roadblock in the AMX and AVX512 support in stdarch.

What this PR does

  • When linking to non-overloaded intrinsics, we use the function LLVMIntrinsicGetType to directly get the function type of the intrinsic from LLVM.
  • We then use this LLVM definition to verify the Rust signature, and emit a proper error if it doesn't match, instead of silently emitting invalid IR.

This PR only focuses on non-overloaded intrinsics, overloaded can be done in a future PR

Pros

  • This helps a lot in the implementation of #[rust_intrinsic] functions, we don't have to manually hardcode/compute the LLVM signature of the intrinsics. This reduces a lot of complexity in the implementation of Rust intrinsics (bonus: now is_val_statically_known is properly implemented for all immediate operands)

  • It is now not possible (or at least, it would require significantly more leaps and bounds) to introduce invalid IR using non-overloaded LLVM intrinsics.

  • As we are now doing the matching of Rust signatures to LLVM intrinsics ourselves, we can now add bypasses to enable linking to such non-Rust types (e.g. matching 8192-bit vectors to x86amx and injecting llvm.x86.cast.vector.to.tile and llvm.x86.cast.tile.to.vectors in callsite)

    Also, small clarification, I don't intend for these bypasses to be permanent (at least the bf16 and i1 ones, the x86amx bypass seems inevitable). A better approach will be introducing a bf16 type in Rust, and allowing repr(simd) with bools to get Rust-native i1xNs. These are meant to be short-time, as I mentioned, "bypass"es. They shouldn't cause any major breakage even if removed, as link_llvm_intrinsics is perma-unstable.

    This PR adds bypasses for x86amx (via 8192-bit vectors), bf16 (via i16) and bf16xN (via i16xN). This will unblock AMX, and a lot of bf16 intrinsics in stdarch.

Cons

  • It's not possible to do this to all LLVM intrinsics, magic AutoUpgrade code in LLVM sometimes replaces old and removed intrinsics with new and shiny ones. One particular example is llvm.x86.avx2.vperm2i128, this intrinsic has since been completely removed from the IntrinsicsX86.td file. Instead LLVM changes this to shufflevector instructions (Godbolt).
    There is a silver lining here though, LLVM does expose a function to check if an intrinsic call can be "upgrade"d (llvm::upgradeIntrinsicFunction), we can probably use this.
  • This only works for non-overloaded intrinsics (at least for now). Improving this to work with overloaded intrinsics too will involve significantly more work.

Possible ways to extend this to overloaded intrinsics (future)

Parse the mangled intrinsic name to get the type parameters

LLVM has a stable mangling of intrinsic names with type parameters (in LLVMIntrinsicCopyOverloadedName2), so we can parse the name to get the type parameters, and then just do the same thing.

Pros

  • For most intrinsics, this will work perfectly, and is a easy way to do this.

Cons

  • The LLVM mangling is not perfectly reversible. When we have TargetExt types or identified structs, their name is a part of the mangling, making it impossible to reverse. Even more complexities arise when there are unnamed identified structs.

Use the IITDescriptor table and the Rust function signature

We can use the base name to get the IITDescriptors of the corresponding intrinsic, and then manually implement the matching logic based on the Rust signature.

Pros

  • Doesn't have the above mentioned limitation of the parsing approach, has correct behavior even when there are identified structs and TargetExt types. Also, fun fact, Rust exports all struct types as literal structs (unless it is emitting LLVM IR, then it always uses named identified structs, with mangled names)

Cons

  • Doesn't actually use the type parameters in the name, only uses the base name and the Rust signature to get the llvm signature (although we can check that it is the correct name). It means there would be no way to (for example) link against llvm.sqrt.bf16 until we have bf16 types in Rust. Because if we are using u16s (or any other type) as bf16s, then the matcher will deduce that the signature is u16 (u16) not bf16 (bf16) (which would lead to an error because u16 is not a valid type parameter for llvm.sqrt), even though the intended type parameter is specified in the name.
  • Much more complex, and hard to maintain as LLVM gets new IITDescriptorKinds

Other things that this PR does

  • disables all ABI checks only for the unadjusted ABI to facilitate the implementation of AMX (otherwise passing 8192-bit vectors to the intrinsic won't be allowed). This is "safe" because this ABI is only used to link to LLVM intrinsics, and passing vectors of any lengths to LLVM intrinsics is fine, because they don't exist in machine level.
  • Removes unnecessary bitcasts in cg_llvm/builder::check_call (now renamed as cast_arguments due to its new counterpart cast_return). This was old code from when Rust used to pass non-erased lifetimes to LLVM.

Reviews are welcome, as this is my first time actually contributing to rustc

After the design is finalized, we would need a try build, a rustc-perf run and a crater run.

@rustbot label T-compiler A-codegen A-LLVM
r? codegen

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) labels May 7, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 8, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@sayantn

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented May 8, 2025

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

@rust-log-analyzer

This comment has been minimized.

@sayantn sayantn changed the title Add auto-bitcasts from/to x86amx and i32x256 for AMX intrinsics Add auto-bitcasts from/to x86amx for i32x256 for AMX intrinsics May 8, 2025
@sayantn sayantn changed the title Add auto-bitcasts from/to x86amx for i32x256 for AMX intrinsics Add auto-bitcasts between x86amx and i32x256 for AMX intrinsics May 8, 2025
@sayantn

This comment has been minimized.

@dianqk
Copy link
Member

dianqk commented May 9, 2025

I think you can use LLVMGetIntrinsicDeclaration, LLVMGetIntrinsicDeclaration or some functions in Intrinsic.h in declare_raw_fn, as a reference: https://github.com/llvm/llvm-project/blob/d35ad58859c97521edab7b2eddfa9fe6838b9a5e/llvm/lib/AsmParser/LLParser.cpp#L330-L335.

@sayantn
Copy link
Contributor Author

sayantn commented May 9, 2025

That can be used to improve performance, I am not really focusing on performance in this PR. I want to currently emphasize the correctness of the codegen.

@sayantn
Copy link
Contributor Author

sayantn commented May 9, 2025

Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance

@dianqk
Copy link
Member

dianqk commented May 15, 2025

Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance

I think you can just focus on non-overloaded functions for this PR. Overloaded functions and type checking that checking Rust function signatures using LLVM defined can be subsequent PRs.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 15, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 15, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@sayantn

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@sayantn sayantn marked this pull request as draft May 19, 2025 07:23
@nikic
Copy link
Contributor

nikic commented May 19, 2025

@sayantn Taking the address of an intrinsic is invalid LLVM IR.

@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 25, 2025
@sayantn
Copy link
Contributor Author

sayantn commented May 25, 2025

I have currently changed this to just ignore overloaded intrinsics, we need to find a way to make this work with overloaded functions in the future.

@sayantn sayantn marked this pull request as ready for review May 25, 2025 20:13
@rust-log-analyzer

This comment has been minimized.

@sayantn
Copy link
Contributor Author

sayantn commented May 27, 2025

Upon testing with this I have discovered quite a few stdarch intrinsics that use invalid LLVM intrinsics (I don't know how the CI suite didn't catch this), and some that generate invalid IR. So I would send a stdarch PR too

@nikic
Copy link
Contributor

nikic commented May 27, 2025

Upon testing with this I have discovered quite a few stdarch intrinsics that use invalid LLVM intrinsics (I don't know how the CI suite didn't catch this), and some that generate invalid IR. So I would send a stdarch PR too

Do you have some examples? Most likely the intrinsics aren't invalid, just old. Intrinsics change over time and support auto-upgrade.

@sayantn
Copy link
Contributor Author

sayantn commented May 27, 2025

I am testing for upgrades with upgradeIntrinsicFunction, these are not even upgradable. The ones I have found are

  • llvm.s390.vupl{b,f}w
  • llvm.ppc.altivec.s{l,r}{l,o,v}
  • llvm.ppc.altivec.sra{b,h,w}

Along with that I have found that 2 intrinsics use wrong parameters. The 2nd parameter for llvm.loongarch.lddir.d and llvm.loongarch.ldpte.d should be immediate (i.e. const generic in Rust), but they were written as simple arguments, which lead to -Zverify-llvm-ir failing (even in nightly).

Also, there are (invalid) uses of simd_{or,and,xor} on floating point types in the implementation of vec_sel in s390x.

I have also found a lot of upgradable intrinsics (mainly in x86 and arm), but I am not bothered about them.

These were not detected before due to #[inline] and generics. The codegen backend doesn't get invoked if an #[inline] function is not used (even if pub) or if it is not monomorphized (which is typically achieved by tests, but not all archs have full test coverage). I believe this can be prevented in future by

  • Having assert_instr for all intrinsics, even assert_instr(nop) works (it will not check for any particular instruction, but will still compile the function
  • For const generic intrinsics, take care to instantiate at least 1 monomorphization for every distinct LLVM intrinsic it is using (which is often achieved by assert_instr)
  • Having -Zverify-llvm-ir in the test script. It will not save us from invalid intrinsics (for some reason, it doesn't complain if I link against a non-existing intrinsic), but will take care of other sources of invalid IR.

cc @Amanieu

@rustbot
Copy link
Collaborator

rustbot commented May 28, 2025

Some changes occurred in compiler/rustc_codegen_llvm/src/builder/autodiff.rs

cc @ZuseZ4

@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label May 28, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 28, 2025

This PR modifies run-make tests.

cc @jieyouxu

@sayantn
Copy link
Contributor Author

sayantn commented May 28, 2025

There was an invalid intrinsic in run-make/simd-ffi/simd.rs 🙂. The problem with invalid intrinsics (as opposed to invalid signatures of valid intrinsics) is that they silently compile, even -Zverify-llvm-ir doesn't complain. They just generate a call to an unknown function in asm, which will eventually fail linking, but most tests don't link at all.

@sayantn
Copy link
Contributor Author

sayantn commented May 28, 2025

After CI is green, I think we should do a try build and an rustc-perf run, because this modifies quite a bit in the codegen of rust and llvm intrinsics, which might have some performance footprint (I am especially concerned about rust-intrinsics, I got rid of the signature table on grounds that we don't need to do it anymore, but we might still need it as a cache)

@sayantn sayantn changed the title Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for x86amx Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for x86amx, bf16(xN) and i1xN May 29, 2025
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-19 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
#19 exporting to docker image format
#19 sending tarball 28.2s done
#19 DONE 32.6s
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-19]
[CI_JOB_NAME=x86_64-gnu-llvm-19]
debug: `DISABLE_CI_RUSTC_IF_INCOMPATIBLE` configured.
---
sccache: Listening on address 127.0.0.1:4226
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-19', '--enable-llvm-link-shared', '--set', 'rust.randomize-layout=true', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'build.print-step-timings', '--enable-verbose-tests', '--set', 'build.metrics', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--set', 'gcc.download-ci-gcc=true', '--enable-new-symbol-mangling']
configure: build.build          := x86_64-unknown-linux-gnu
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-19/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.randomize-layout := True
configure: rust.thin-lto-import-instr-limit := 10
---
---- [ui] tests/ui/codegen/deprecated-llvm-intrinsic.rs stdout ----
Saved the actual stderr to `/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/codegen/deprecated-llvm-intrinsic/deprecated-llvm-intrinsic.stderr`
diff of stderr:

- note: Using deprecated intrinsic `llvm.aarch64.neon.rbit.v8i8`, `llvm.bitreverse.v8i8` can be used instead
- 
- 


The actual stderr differed from the expected stderr
To update references, rerun the tests and pass the `--bless` flag
To only update this specific test, also pass `--test-args codegen/deprecated-llvm-intrinsic.rs`

error: 1 errors occurred comparing output.
status: exit status: 0
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/ui/codegen/deprecated-llvm-intrinsic.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--check-cfg" "cfg(test,FALSE)" "--error-format" "json" "--json" "future-incompat" "-Ccodegen-units=1" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zwrite-long-types-to-disk=no" "-Cstrip=debuginfo" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/codegen/deprecated-llvm-intrinsic" "-A" "unused" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/i686-unknown-linux-gnu/native/rust-test-helpers" "-Clinker=x86_64-linux-gnu-gcc" "--target" "aarch64-unknown-linux-gnu" "-Cpanic=abort" "-Cforce-unwind-tables=yes" "--extern" "minicore=/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/codegen/deprecated-llvm-intrinsic/libminicore.rlib"
stdout: none
stderr: none



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-run-make Area: port run-make Makefiles to rmake.rs F-autodiff `#![feature(autodiff)]` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants