-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for x86amx
#140763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Some changes occurred in compiler/rustc_codegen_ssa |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_codegen_gcc |
This comment has been minimized.
This comment has been minimized.
x86amx
for i32x256
for AMX intrinsics
x86amx
for i32x256
for AMX intrinsicsx86amx
and i32x256
for AMX intrinsics
This comment has been minimized.
This comment has been minimized.
I think you can use |
That can be used to improve performance, I am not really focusing on performance in this PR. I want to currently emphasize the correctness of the codegen. |
Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance |
I think you can just focus on non-overloaded functions for this PR. Overloaded functions and type checking that checking Rust function signatures using LLVM defined can be subsequent PRs. @rustbot author |
Reminder, once the PR becomes ready for a review, use |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@sayantn Taking the address of an intrinsic is invalid LLVM IR. |
x86amx
, bf16
and i1
x86amx
…y the rust signature against it
This comment has been minimized.
This comment has been minimized.
I have currently changed this to just ignore overloaded intrinsics, we need to find a way to make this work with overloaded functions in the future. |
This comment has been minimized.
This comment has been minimized.
Add support fot `bf16` linking
Upon testing with this I have discovered quite a few stdarch intrinsics that use invalid LLVM intrinsics (I don't know how the CI suite didn't catch this), and some that generate invalid IR. So I would send a stdarch PR too |
Do you have some examples? Most likely the intrinsics aren't invalid, just old. Intrinsics change over time and support auto-upgrade. |
I am testing for upgrades with
Along with that I have found that 2 intrinsics use wrong parameters. The 2nd parameter for Also, there are (invalid) uses of I have also found a lot of upgradable intrinsics (mainly in x86 and arm), but I am not bothered about them. These were not detected before due to
cc @Amanieu |
Some changes occurred in compiler/rustc_codegen_llvm/src/builder/autodiff.rs cc @ZuseZ4 |
This comment has been minimized.
This comment has been minimized.
- Emit notes when can't match, and fall back to default implementation
This PR modifies cc @jieyouxu |
There was an invalid intrinsic in |
After CI is green, I think we should do a try build and an rustc-perf run, because this modifies quite a bit in the codegen of rust and llvm intrinsics, which might have some performance footprint (I am especially concerned about rust-intrinsics, I got rid of the signature table on grounds that we don't need to do it anymore, but we might still need it as a cache) |
return FunctionSignature::Intrinsic(intrinsic, llvm_fn_ty); | ||
} else { | ||
// FIXME: ideally we would like to hard error, but there are edge cases like `i1xN` | ||
// and differences between struct representations that we have to tackle first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic should I add a bypass for linking i1xN
(using i{8*ceil(N/8)}
)? And also most of the mismatches are due to struct types not matching exactly with LLVM.
- If a struct is
repr(packed)
, Rust generates packed LLVM structs, which doesn't match non-packed structs currently - When generating LLVM IR, rustc generates identified LLVM structs, which also don't match literal structs.
For structs a viable shim can be just extracting the fields of one struct and putting them in the other. It is safe, and LLVM normally does this, only concern being optimizations.
For i1xN
, we have to do bitcast
to iN
, then shufflevector
with zeroinitializer
. Only concern here is should we do this, or just implement repr(simd)
for bool
This PR changes how LLVM intrinsics are codegen
Explanation of the changes
Current procedure
This is the same for all functions, LLVM intrinsics are not treated specially
f32 (f32)
due to the Rust signaturePros
Cons
-Zverify-llvm-ir
to it will fail compilation). I would expect this code to not compile at all instead of generating invalid IR.x86amx
type, and (almost) all intrinsics that have vectors ofi1
types) can't be linked to at all. This is a (major?) roadblock in the AMX and AVX512 support in stdarch.What this PR does
LLVMIntrinsicGetType
to directly get the function type of the intrinsic from LLVM.Pros
This helps a lot in the implementation of
#[rust_intrinsic]
functions, we don't have to manually hardcode/compute the LLVM signature of the intrinsics. This reduces a lot of complexity in the implementation of Rust intrinsics (bonus: nowis_val_statically_known
is properly implemented for all immediate operands)It is now not possible (or at least, it would require significantly more leaps and bounds) to introduce invalid IR using non-overloaded LLVM intrinsics.
As we are now doing the matching of Rust signatures to LLVM intrinsics ourselves, we can now add bypasses to enable linking to such non-Rust types (e.g. matching 8192-bit vectors to
x86amx
and injectingllvm.x86.cast.vector.to.tile
andllvm.x86.cast.tile.to.vector
s in callsite)This PR adds bypasses for
x86amx
(via 8192-bit vectors),bf16
(viai16
) andbf16xN
(viai16xN
). This will unblock AMX, and a lot of bf16 intrinsics in stdarch.Cons
llvm.x86.avx2.vperm2i128
, this intrinsic has since been completely removed from theIntrinsicsX86.td
file. Instead LLVM changes this toshufflevector
instructions (Godbolt).There is a silver lining here though, LLVM does expose a function to check if an intrinsic call can be "upgrade"d (
llvm::upgradeIntrinsicFunction
), we can probably use this.Possible ways to extend this to overloaded intrinsics (future)
Parse the mangled intrinsic name to get the type parameters
LLVM has a stable mangling of intrinsic names with type parameters (in
LLVMIntrinsicCopyOverloadedName2
), so we can parse the name to get the type parameters, and then just do the same thing.Pros
Cons
TargetExt
types or identified structs, their name is a part of the mangling, making it impossible to reverse. Even more complexities arise when there are unnamed identified structs.Use the
IITDescriptor
table and the Rust function signatureWe can use the base name to get the
IITDescriptor
s of the corresponding intrinsic, and then manually implement the matching logic based on the Rust signature.Pros
TargetExt
types. Also, fun fact, Rust exports all struct types as literal structs (unless it is emitting LLVM IR, then it always uses named identified structs, with mangled names)Cons
llvm.sqrt.bf16
until we havebf16
types in Rust. Because if we are usingu16
s (or any other type) asbf16
s, then the matcher will deduce that the signature isu16 (u16)
notbf16 (bf16)
(which would lead to an error becauseu16
is not a valid type parameter forllvm.sqrt
), even though the intended type parameter is specified in the name.IITDescriptorKind
sOther things that this PR does
unadjusted
ABI to facilitate the implementation of AMX (otherwise passing 8192-bit vectors to the intrinsic won't be allowed). This is "safe" because this ABI is only used to link to LLVM intrinsics, and passing vectors of any lengths to LLVM intrinsics is fine, because they don't exist in machine level.bitcast
s incg_llvm/builder::check_call
(now renamed ascast_arguments
due to its new counterpartcast_return
). This was old code from when Rust used to pass non-erased lifetimes to LLVM.Reviews are welcome, as this is my first time actually contributing to
rustc
After the design is finalized, we would need a try build, a rustc-perf run and a crater run.
@rustbot label T-compiler A-codegen A-LLVM
r? codegen