-
Notifications
You must be signed in to change notification settings - Fork 769
[LIBCLC][BINDLESS][CUDA] always inline redirection functs #18699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LIBCLC][BINDLESS][CUDA] always inline redirection functs #18699
Conversation
These functions have effectively zero register overhead, therefore no usage circumstance that brings downside to always inlining. Signed-off-by: JackAKirk <[email protected]>
Marking as ready for review: will fix format later. |
@@ -149,7 +149,7 @@ void __nvvm_sust_3d_v4i32_clamp(write_only image3d_t, int, int, int, int, int, | |||
|
|||
int __nvvm_suq_width(long) __asm("llvm.nvvm.suq.width"); | |||
int __nvvm_suq_height(long) __asm("llvm.nvvm.suq.height"); | |||
int __nvvm_suq_depth(long arg) { | |||
__attribute__((always_inline)) int __nvvm_suq_depth(long arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be prudent to define some kind of IMAGE_DEF_ATTRS
macro like CLC_DEF
which is set to __attribute__((always_inline))
? That way more attributes could be added less noisily in the future if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that is what _CLC_DECL
already does for non-spirv targets. I don't think that the subset of image functions is special with regard to choosing when to inline. But if you prefer I suppose I could create a new macro if _CLC_DECL
is not appropriate.
Personally I found it much clearer to just write __attribute__((always_inline))
directly, since otherwise it isn't clear that a function has this attribute, but I can change it to whatever idiom you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking you could define a macro within this file (as opposed to globally where I agree they're not special enough) just so that within the file itself future refactoring would be simpler. It's not a blocker though as it's a style thing.
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
@intel/llvm-gatekeepers this is ready to merge. single test failure is unrelated and tracked here: #18762 Thanks |
These functions at most do some casting, and have effectively zero register overhead at default opt level, therefore there should be no usage circumstance that brings a downside to always inlining.
This brings the nvptx libclc image backend in line with the amd one which requires no such changes. amd libclc backend already does the same thing via consistent usage of the _CLC_DECL macro for all functions. Whilst not immediately obvious to the libclc programmer, _CLC_DECL macro calls
__attribute__((always_inline))
.There's a few cases that had low register usage that I've added the
inline
hint to also, being probably overly cautious.