Skip to content

Commit 74a6281

Browse files
authored
[SYCL][CUDA] Enable memcpy optimizations for NVPTX (#18598)
The NVPTX backend doesn't support libcalls but does support the memset and memcpy intrinsics, so this flag enables optimizations that use these intrinsics. This was added for the CUDA path in https://reviews.llvm.org/D106401
1 parent 5ac8577 commit 74a6281

File tree

2 files changed

+7
-0
lines changed

2 files changed

+7
-0
lines changed

clang/lib/Driver/ToolChains/Cuda.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -966,6 +966,8 @@ void CudaToolChain::addClangTargetOptions(
966966
if (FastRelaxedMath || UnsafeMathOpt)
967967
CC1Args.append({"-mllvm", "--nvptx-prec-divf32=0", "-mllvm",
968968
"--nvptx-prec-sqrtf32=0"});
969+
970+
CC1Args.append({"-mllvm", "-enable-memcpyopt-without-libcalls"});
969971
} else {
970972
CC1Args.append({"-fcuda-is-device", "-mllvm",
971973
"-enable-memcpyopt-without-libcalls",
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
// RUN: %clang -### -nocudalib \
2+
// RUN: -fsycl -fsycl-targets=nvptx64-nvidia-cuda %s 2>&1 \
3+
// RUN: | FileCheck --check-prefix=CHECK-DEFAULT %s
4+
5+
// CHECK-DEFAULT: "-enable-memcpyopt-without-libcalls"

0 commit comments

Comments
 (0)