Description
Describe the bug
As suggested in #14785, we are about to deprecate parallel_for
and single_task
overloads from the sycl_ext_oneapi_kernel_properties
extension, and use the alternative interface provided by the sycl_ext_oneapi_enqueue_functions
extension. With this new interface, if a user wants to specify kernel properties for a kernel, they must use a named function object (kernel functor) which exposes the properties via get(sycl::ext::oneapi::experimental::properties_tag)
instead of a kernel lambda. (See note in this doc: https://github.com/intel/llvm/blob/974aec94af2ab81014895cf961895b5d2c06fc29/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc)
However, this method appears to be not setting the property sycl::ext::oneapi::experimental::use_root_sync
correctly. This will cause UR errors if the respective kernel tries to run certain statements such as:
auto root = it.ext_oneapi_get_root_group();
sycl::group_barrier(root);
To reproduce
Here are two scripts for contrast, one should fail and the other should pass:
Script with kernel functor: (fails)
#include <cstdlib>
#include <type_traits>
#include <sycl/builtins.hpp>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/experimental/root_group.hpp>
#include <sycl/group_barrier.hpp>
#include <sycl/kernel_bundle.hpp>
struct RootGroupKernel {
RootGroupKernel(){}
void operator()(sycl::nd_item<1> it) const {
auto root = it.ext_oneapi_get_root_group();
sycl::group_barrier(root);
}
auto get(sycl::ext::oneapi::experimental::properties_tag) {
return sycl::ext::oneapi::experimental::properties{sycl::ext::oneapi::experimental::use_root_sync};
}
};
int main() {
sycl::queue q;
sycl::range<1> R1{1};
sycl::nd_range<1> NDR1{R1, R1};
q.submit([&](sycl::handler &h) {
h.parallel_for(NDR1, RootGroupKernel());
});
return EXIT_SUCCESS;
}
Script with kernel lambda: (passes, and if you remove props from parallel_for()'s parameter list it fails, which shows that the property is what matters here):
#include <cassert>
#include <cstdlib>
#include <type_traits>
#include <sycl/builtins.hpp>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/experimental/root_group.hpp>
#include <sycl/group_barrier.hpp>
#include <sycl/kernel_bundle.hpp>
int main() {
sycl::queue q;
sycl::range<1> R1{1};
sycl::nd_range<1> NDR1{R1, R1};
const auto props = sycl::ext::oneapi::experimental::properties{sycl::ext::oneapi::experimental::use_root_sync};
q.submit([&](sycl::handler &h) {
h.parallel_for(NDR1, props, [=](sycl::nd_item<1> it) {
auto root = it.ext_oneapi_get_root_group();
sycl::group_barrier(root);
});
});
return EXIT_SUCCESS;
}
Environment
The issue happens under Intel/GEN12 (and maybe NVIDIA/CUDA?) environments. AMD/HIP environment seems to be not impacted.
Additional context
No response