-
Notifications
You must be signed in to change notification settings - Fork 770
WIP Replace local clone with SYCL kernel object #1568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fznamznon I very much want your input here.
Also, this needs tests!
I see that we're adding this to the header, but not the body of the kernel. Is that intentional?
Handlers &... handlers) { | ||
|
||
QualType KernelType = QualType(KernelObject->getTypeForDecl(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these only allowed to be top-level things? Can you clarify how this feature is supposed to work? I don't get it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're adding the kernel object as a parameter of the kernel function (just like we did with the fields). Currently in the kernel body, a kernel object clone is generated and initialized using the fields of kernel function. The clone is then subsequently used in kernel body code. We're now exploring the possibility of getting rid of the clone entirely and using this kernel object
Yes. I'll be fixing lit tests and adding any required tests in a followup patch. After that I'll work on removing the clone and replacing it's usage in kernel body with this. I just pushed this patch to start getting feedback on approach :)
We'll be using this in the body instead of the local clone. I didn't tackle that in this patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have objections yet.
clang/lib/Sema/SemaSYCL.cpp
Outdated
@@ -830,6 +845,10 @@ class SyclKernelFieldChecker | |||
<< 1 << FieldTy; | |||
} | |||
} | |||
void handleKernelObject(CXXRecordDecl *KernelObject, | |||
QualType KernelType) final { | |||
// Do we need any diagnostics for Kernel Object? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since kernel object can be defined not only as lambda but as a callable user's type, we could validate that the kernel object is trivially copy constructible (or satisfies standard layout requirement). It is required by the spec (section 4.8.9.1)
Signed-off-by: Elizabeth Andrews <[email protected]>
Signed-off-by: Elizabeth Andrews <[email protected]>
557ea88
to
3eae199
Compare
Sorry for the delay. I was having A LOT of trouble running the tests locally. Finally got the environment issues fixed! This patch just fixes current lit fails. Actual body changes have not been covered in this patch. I pushed the patch to check if passing the kernel object as a top level parameter would break any test/cause regressions. I worked briefly on the body changes on Friday. I will pick it up again tomorrow. Hopefully will push another patch by EOD or Tuesday. |
@elizabethandrews, please, apply clang-format to your patch and remove WIP from the PR title when the patch is ready for review. |
This patch removes the local clone generated in kernel. In the kernel body, it replaces any usage of the clone with that of the kernel object passed as a parameter to the kernel. Since we are now directly using the kernel object, scalar arguments (except pointers) and struct type arguments (except those containing accessor fields) need not be generated and passed to kernel. This code has been removed. Pointers require additional 'handling' for USM. For these, arguments are still generated and passed to kernel. Initialization of clone pointer fields have been replaced by assignment of kernel object pointer fields from these parameters. Please note accessors and streams have not been handled yet. So related tests will fail as required initializations are now missing. Signed-off-by: Elizabeth Andrews <[email protected]>
Signed-off-by: Elizabeth Andrews <[email protected]>
Signed-off-by: Elizabeth Andrews <[email protected]>
@@ -9,7 +9,9 @@ int main() { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IR for this code has simplified from -
; Function Attrs: norecurse nounwind
define spir_kernel void @_ZTSZ4mainE15kernel_function() #0 !kernel_arg_addr_space !4 !kernel_arg_access_qual !4
!kernel_arg_type !4 !kernel_arg_base_type !4 !kernel_arg_type_qual !4 {
entry:
%0 = alloca %"class._ZTSZ4mainE3$_0.anon", align 1
%1 = bitcast %"class._ZTSZ4mainE3$_0.anon"* %0 to i8*
call void @llvm.lifetime.start.p0i8(i64 1, i8* %1) #3
%2 = addrspacecast %"class._ZTSZ4mainE3$_0.anon"* %0 to %"class._ZTSZ4mainE3$_0.anon" addrspace(4)*
call spir_func void @"_ZZ4mainENK3$_0clEv"(%"class._ZTSZ4mainE3$_0.anon" addrspace(4)* %2)
%3 = bitcast %"class._ZTSZ4mainE3$_0.anon"* %0 to i8*
call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #3
ret void
}
to
; Function Attrs: norecurse nounwind
define spir_kernel void @_ZTSZ4mainE15kernel_function(%"class._ZTSZ4mainE3$_0.anon"*
byval(%"class._ZTSZ4mainE3$_0.anon") align 1 %_arg_kernelObject) #0 !kernel_arg_addr_space !4
!kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 {
entry:
%0 = addrspacecast %"class._ZTSZ4mainE3$_0.anon"* %_arg_kernelObject to
%"class._ZTSZ4mainE3$_0.anon" addrspace(4)*
call spir_func void @"_ZZ4mainENK3$_0clEv"(%"class._ZTSZ4mainE3$_0.anon" addrspace(4)* %0)
ret void
}
@Fznamznon Are those lifetime markers needed? I'm not sure what the intrinsic really does and if it's required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifetime markers are needed to specify start/stop of memory objects life time (oops I'm a captain obvious here). Lifetime markers define kinda frame outside of which the memory object is dead and has undefined value (see https://llvm.org/docs/LangRef.html#id2407 , it can explain better).
In the first LLVM IR snippet I see lifetime markers only for local kernel object clone. If we are going to remove local kernel object clone, then no need to keep lifetime markers for it :)
I expect exactly the same simplification in resulting LLVM IR which I see in your second snippet, so it is a nice progress.
@elizabethandrews, are you still working on this patch? |
No. We've abandoned this effort since whole object cannot be passed without breaking openCL spec |
Added Kernel Object to KernelDecl and Integration header.