Skip to content

Kernel property use_root_sync not specified correctly for kernel functors #16451

Closed
@HPS-1

Description

@HPS-1

Describe the bug

As suggested in #14785, we are about to deprecate parallel_for and single_task overloads from the sycl_ext_oneapi_kernel_properties extension, and use the alternative interface provided by the sycl_ext_oneapi_enqueue_functions extension. With this new interface, if a user wants to specify kernel properties for a kernel, they must use a named function object (kernel functor) which exposes the properties via get(sycl::ext::oneapi::experimental::properties_tag) instead of a kernel lambda. (See note in this doc: https://github.com/intel/llvm/blob/974aec94af2ab81014895cf961895b5d2c06fc29/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc)

However, this method appears to be not setting the property sycl::ext::oneapi::experimental::use_root_sync correctly. This will cause UR errors if the respective kernel tries to run certain statements such as:

auto root = it.ext_oneapi_get_root_group();
sycl::group_barrier(root);

To reproduce

Here are two scripts for contrast, one should fail and the other should pass:
Script with kernel functor: (fails)

#include <cstdlib>
#include <type_traits>
#include <sycl/builtins.hpp>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/experimental/root_group.hpp>
#include <sycl/group_barrier.hpp>
#include <sycl/kernel_bundle.hpp>
struct RootGroupKernel {
  RootGroupKernel(){}
  void operator()(sycl::nd_item<1> it) const {
    auto root = it.ext_oneapi_get_root_group();
    sycl::group_barrier(root);
  }
  auto get(sycl::ext::oneapi::experimental::properties_tag) {
    return sycl::ext::oneapi::experimental::properties{sycl::ext::oneapi::experimental::use_root_sync};
  }
};
int main() {
  sycl::queue q;
  sycl::range<1> R1{1};
  sycl::nd_range<1> NDR1{R1, R1};
  q.submit([&](sycl::handler &h) {
    h.parallel_for(NDR1, RootGroupKernel());
  });
  return EXIT_SUCCESS;
}

Script with kernel lambda: (passes, and if you remove props from parallel_for()'s parameter list it fails, which shows that the property is what matters here):

#include <cassert>
#include <cstdlib>
#include <type_traits>
#include <sycl/builtins.hpp>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/experimental/root_group.hpp>
#include <sycl/group_barrier.hpp>
#include <sycl/kernel_bundle.hpp>
int main() {
  sycl::queue q;
  sycl::range<1> R1{1};
  sycl::nd_range<1> NDR1{R1, R1};
  const auto props = sycl::ext::oneapi::experimental::properties{sycl::ext::oneapi::experimental::use_root_sync};
  q.submit([&](sycl::handler &h) {
    h.parallel_for(NDR1, props, [=](sycl::nd_item<1> it) {
      auto root = it.ext_oneapi_get_root_group();
      sycl::group_barrier(root);
    });
  });
  return EXIT_SUCCESS;
}

Environment

The issue happens under Intel/GEN12 (and maybe NVIDIA/CUDA?) environments. AMD/HIP environment seems to be not impacted.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions