-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[BACKEND] Set LLVM_ABI_BREAKING_CHECKS to be able to update the llvm version #4512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…version This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed. We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS".
7d9ccab
to
6d4ea00
Compare
What cases rely on DenseMap determinism? I thought the breaking problem was coming from a mismatch in the compilation. The failures were happening even running triton-opt without running any code.
Again I'm curious what would need to be changed |
We don't know which cases rely on DenseMap determinism. The author of the llvm-commit suggested this as a potential issue. There are many cases of DenseMap, and it also could be that the issue is not in Triton but in MLIR. It's being exposed when running triton-opt because it's missing values in a DenseMap. All the failing tests can be seen in this job. I started to analyze Conversion/divide-by-0.mlir. I can reproduce in my cloned repository (not inside google) by running: The result is strange. It's half broken and half not (references broken until 190 and then they are fine):
This seems a resizing issue that invalidates hash-keys (inserting while iterating the container??). As it was suggested in the commit from llvm, I started to replace some But setting the contant |
Well those are definitely not a problem of determinism and I don't think converting those DenseMap is the right thing to do. The examples you pointed out should stay dense_map. Anyway, is this PR ready to land? |
Yes |
This commit is fixing the root cause of the issue that appeared during an llvm update, and was raised in triton-lang#4212 (comment). The issue was temporary fixed in triton-lang#4512, where we set LLVM_ABI_BREAKING_CHECKS to FORCE_OFF constant in order to not use a non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80). A further investigation (with chsigg@) found that the issue is that LLVM is built with clang11, while Triton use a newer version. The ABI issue is brought up here: llvm/llvm-project#96282 (comment), but the consensus seemed to be that this setup is rare. Updated the clang version to 12 in the ubuntu build fixed the issue and therefore we can revert setting LLVM_ABI_BREAKING_CHECKS constant. I am additionaly erasing LLVM_ABI_BREAKING_CHECKS in the setup of the other hardwares (it seems it was not needed) and I am splitting ubuntu and macOS configurations because it seems cleaner than having a variable that sets the compiler version for each of them.
This commit is fixing the root cause of the issue that appeared during an llvm update, and was raised in triton-lang#4212 (comment). The issue was temporary fixed in triton-lang#4512, where we set LLVM_ABI_BREAKING_CHECKS to FORCE_OFF constant in order to not use a non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80). A further investigation (with chsigg@) found that the issue is that LLVM is built with clang11, while Triton use a newer version. The ABI issue is brought up here: llvm/llvm-project#96282 (comment), but the consensus seemed to be that this setup is rare. Updated the clang version to 12 in the ubuntu build fixed the issue and therefore we can revert setting LLVM_ABI_BREAKING_CHECKS constant. I am additionaly erasing LLVM_ABI_BREAKING_CHECKS in the setup of the other hardwares (it seems it was not needed) and I am splitting ubuntu and macOS configurations because it seems cleaner than having a variable that sets the compiler version for each of them.
This commit is fixing the root cause of the issue that appeared during an llvm update, and was raised in #4212 (comment). The issue was temporary fixed in #4512, where we set LLVM_ABI_BREAKING_CHECKS to FORCE_OFF constant in order to not use a non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80). A further investigation (with chsigg@) found that the issue is that LLVM is built with clang11, while Triton use a newer version. The ABI issue is brought up here: llvm/llvm-project#96282 (comment), but the consensus seemed to be that this setup is rare. Updated the clang version to 12 in the ubuntu build fixed the issue and therefore we can revert setting LLVM_ABI_BREAKING_CHECKS constant. I am additionaly erasing LLVM_ABI_BREAKING_CHECKS in the setup of the other hardwares (it seems it was not needed) and I am splitting ubuntu and macOS configurations because it seems cleaner than having a variable that sets the compiler version for each of them.
The issue raised in triton-lang#4212 (comment) and temporary fixed in triton-lang#4512 has now been fixed upstream. The non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds. This was removed again in xgupta/llvm-project@5255b81 and we can enable ABI-breaking checks again. We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
The issue raised in triton-lang#4212 (comment) and temporary fixed in triton-lang#4512 has now been fixed upstream. The non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds. This was removed again in xgupta/llvm-project@5255b81 and we can enable ABI-breaking checks again. We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
The issue raised in #4212 (comment) and temporary fixed in #4512 has now been fixed upstream. The non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds. This was removed again in llvm/llvm-project@def9550 and we can enable ABI-breaking checks again. We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
The issue raised in triton-lang#4212 (comment) and temporary fixed in triton-lang#4512 has now been fixed upstream. The non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds. This was removed again in llvm/llvm-project@def9550 and we can enable ABI-breaking checks again. We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
The issue raised in triton-lang#4212 (comment) and temporary fixed in triton-lang#4512 has now been fixed upstream. The non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds. This was removed again in llvm/llvm-project@def9550 and we can enable ABI-breaking checks again. We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
The issue raised in triton-lang#4212 (comment) and temporary fixed in triton-lang#4512 has now been fixed upstream. The non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds. This was removed again in llvm/llvm-project@def9550 and we can enable ABI-breaking checks again. We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
…version (triton-lang#4512) This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed. We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set. There were some discussions in triton-lang#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly). I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS". - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - [x] This PR does not need a test because it's just setting a variable for LLVM. - [x] I have not added any `lit` tests. Co-authored-by: Tori Baker <[email protected]>
The issue raised in triton-lang#4212 (comment) and temporary fixed in triton-lang#4512 has now been fixed upstream. The non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds. This was removed again in llvm/llvm-project@def9550 and we can enable ABI-breaking checks again. We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
This PR is setting LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, similar of how we do it inside Google. In this way, we are forcing the structures that relies in Hashing.h to use a deterministic seed.
We were not able to update the llvm version. The culprit llvm commit is llvm/llvm-project@ce80c80, that basically uses a non-deterministic seed if the previous "Flag" is set.
There were some discussions in
#4338, and the other option would be to check/replace code and tests that are overly depending on the determinism of llvm/include/llvm/ADT/Hashing.h (DenseMap and DenseSet mostly).
I think that the priority is first to be able to update the llvm version, but if OpenAI is interested I could work on replacing them (for MapVector), in order to fix all the failing unit tests, and then we can set the LLVM_ABI_BREAKING_CHECKS again to "WITH_ASSERTS".
rules.
pre-commit run --from-ref origin/main --to-ref HEAD
.lit
tests.