Skip to content

WIP Replace local clone with SYCL kernel object #1568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

elizabethandrews
Copy link
Contributor

@elizabethandrews elizabethandrews commented Apr 22, 2020

Added Kernel Object to KernelDecl and Integration header.

Copy link
Contributor

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fznamznon I very much want your input here.

Also, this needs tests!

I see that we're adding this to the header, but not the body of the kernel. Is that intentional?

Handlers &... handlers) {

QualType KernelType = QualType(KernelObject->getTypeForDecl(), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these only allowed to be top-level things? Can you clarify how this feature is supposed to work? I don't get it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding the kernel object as a parameter of the kernel function (just like we did with the fields). Currently in the kernel body, a kernel object clone is generated and initialized using the fields of kernel function. The clone is then subsequently used in kernel body code. We're now exploring the possibility of getting rid of the clone entirely and using this kernel object

@elizabethandrews
Copy link
Contributor Author

elizabethandrews commented Apr 22, 2020

@Fznamznon I very much want your input here.

Also, this needs tests!

Yes. I'll be fixing lit tests and adding any required tests in a followup patch. After that I'll work on removing the clone and replacing it's usage in kernel body with this. I just pushed this patch to start getting feedback on approach :)

I see that we're adding this to the header, but not the body of the kernel. Is that intentional?

We'll be using this in the body instead of the local clone. I didn't tackle that in this patch.

Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have objections yet.

@@ -830,6 +845,10 @@ class SyclKernelFieldChecker
<< 1 << FieldTy;
}
}
void handleKernelObject(CXXRecordDecl *KernelObject,
QualType KernelType) final {
// Do we need any diagnostics for Kernel Object?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since kernel object can be defined not only as lambda but as a callable user's type, we could validate that the kernel object is trivially copy constructible (or satisfies standard layout requirement). It is required by the spec (section 4.8.9.1)

@elizabethandrews
Copy link
Contributor Author

Sorry for the delay. I was having A LOT of trouble running the tests locally. Finally got the environment issues fixed!

This patch just fixes current lit fails. Actual body changes have not been covered in this patch. I pushed the patch to check if passing the kernel object as a top level parameter would break any test/cause regressions. I worked briefly on the body changes on Friday. I will pick it up again tomorrow. Hopefully will push another patch by EOD or Tuesday.

@bader
Copy link
Contributor

bader commented Apr 27, 2020

@elizabethandrews, please, apply clang-format to your patch and remove WIP from the PR title when the patch is ready for review.

This patch removes the local clone generated in kernel. In the kernel
body, it replaces any usage of the clone with that of the kernel
object passed as a parameter to the kernel.

Since we are now directly using the kernel object, scalar arguments
(except pointers) and struct type arguments (except those containing
accessor fields) need not be generated and passed to kernel. This
code has been removed.

Pointers require additional 'handling' for USM. For these, arguments
are still generated and passed to kernel. Initialization of clone
pointer fields have been replaced by assignment of kernel object
pointer fields from these parameters.

Please note accessors and streams have not been handled yet. So related
tests will fail as required initializations are now missing.

Signed-off-by: Elizabeth Andrews <[email protected]>
Signed-off-by: Elizabeth Andrews <[email protected]>
Signed-off-by: Elizabeth Andrews <[email protected]>
@@ -9,7 +9,9 @@ int main() {

Copy link
Contributor Author

@elizabethandrews elizabethandrews Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IR for this code has simplified from -

  ; Function Attrs: norecurse nounwind
  define spir_kernel void @_ZTSZ4mainE15kernel_function() #0 !kernel_arg_addr_space !4 !kernel_arg_access_qual !4 
  !kernel_arg_type !4 !kernel_arg_base_type !4 !kernel_arg_type_qual !4 {
  entry:
      %0 = alloca %"class._ZTSZ4mainE3$_0.anon", align 1
      %1 = bitcast %"class._ZTSZ4mainE3$_0.anon"* %0 to i8*
      call void @llvm.lifetime.start.p0i8(i64 1, i8* %1) #3
      %2 = addrspacecast %"class._ZTSZ4mainE3$_0.anon"* %0 to %"class._ZTSZ4mainE3$_0.anon" addrspace(4)*
      call spir_func void @"_ZZ4mainENK3$_0clEv"(%"class._ZTSZ4mainE3$_0.anon" addrspace(4)* %2)
      %3 = bitcast %"class._ZTSZ4mainE3$_0.anon"* %0 to i8*
      call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #3
     ret void
 }

  to

  ; Function Attrs: norecurse nounwind
      define spir_kernel void @_ZTSZ4mainE15kernel_function(%"class._ZTSZ4mainE3$_0.anon"* 
     byval(%"class._ZTSZ4mainE3$_0.anon") align 1 %_arg_kernelObject) #0 !kernel_arg_addr_space !4 
     !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 {
    entry:
         %0 = addrspacecast %"class._ZTSZ4mainE3$_0.anon"* %_arg_kernelObject to 
         %"class._ZTSZ4mainE3$_0.anon" addrspace(4)*
        call spir_func void @"_ZZ4mainENK3$_0clEv"(%"class._ZTSZ4mainE3$_0.anon" addrspace(4)* %0)
         ret void
    }   

@Fznamznon Are those lifetime markers needed? I'm not sure what the intrinsic really does and if it's required.

Copy link
Contributor

@Fznamznon Fznamznon May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifetime markers are needed to specify start/stop of memory objects life time (oops I'm a captain obvious here). Lifetime markers define kinda frame outside of which the memory object is dead and has undefined value (see https://llvm.org/docs/LangRef.html#id2407 , it can explain better).
In the first LLVM IR snippet I see lifetime markers only for local kernel object clone. If we are going to remove local kernel object clone, then no need to keep lifetime markers for it :)
I expect exactly the same simplification in resulting LLVM IR which I see in your second snippet, so it is a nice progress.

@bader
Copy link
Contributor

bader commented May 17, 2020

@elizabethandrews, are you still working on this patch?

@elizabethandrews
Copy link
Contributor Author

No. We've abandoned this effort since whole object cannot be passed without breaking openCL spec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants