-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][E2E] Fix XFAIL
statements that use negated device features
#18641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Set this to empty so internal lit code won't change our result if it incorrectly | ||
# thinks the test should XFAIL. This can happen when our XFAIL condition relies on | ||
# device features, since the internal lit code doesn't have knowledge of these. | ||
test.xfails = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry so which features are known by the general lit infra and which are known only by our custom code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The features in test.config.available_features
are known by the general lit infra, the device specific features which are in test.config.sycl_dev_features
are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add the dev features to available_features
before we build/run for each target and remove it after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue would be in situations with multiple devices where a single device might be marked as XFAIL, while the rest pass. If we add all the device features to available_features
and let the lit infra do their own XFAIL logic, then this test would be expected to XFAIL, while in reality we expect it to pass, while the XFAIL devices are skipped.
The reason I did it this way was mostly to avoid doing this redundant second XFAIL check.
Also in the case that we change the logic for XFAILs in our code to properly account for running on multiple devices, it is likely that we would want this internal XFAIL check turned off in favor of our own logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the xfail check only happen once in the lit infra with multiple devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the internal lit infra XFAIL check only happens once after we return the result of our test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the result contains all device runs? sorry just making sure we can't use the lit infra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah its a single result for the amalgamation of all the device runs.
i.e.,
PASS -> if all device pass (Some but not all devices may be xfail/unsupported, those are skipped)
FAIL -> if a single non xfail/unsupported device fails
XFAIL/XPASS -> can't happen afaik when running a test on multiple devices
UNSUPPORTED -> if all devices are unsupported, or xfailed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks
// XFAIL: !arch-intel_gpu_bmg_g21 | ||
// XFAIL: * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually fails on BMG, but this statement was an example of one that was being incorrectly processed and marked as XFAIL on BMG.
https://github.com/intel/llvm/actions/runs/15194270458/job/42735337999
# Set this to empty so internal lit code won't change our result if it incorrectly | ||
# thinks the test should XFAIL. This can happen when our XFAIL condition relies on | ||
# device features, since the internal lit code doesn't have knowledge of these. | ||
test.xfails = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does our outer lit code handle XFAILs for both device specific features and normal features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep the part just above this
Lines 391 to 395 in e5d5382
if len(triples) == 1 and test.config.test_mode == "build-only": | |
result.code = map_result(test.config.available_features, result.code) | |
if len(devices_for_test) == 1: | |
device = devices_for_test[0] | |
result.code = map_result(test.config.sycl_dev_features[device], result.code) |
For build only if the test is only ran for a single triple we only use the normal features, and on full/run-only if the test is only ran for one device we use test.config.sycl_dev_features[device]
which contains both the normal features as well as the device features.
Because of device specific features we have to perform our own checks to see whether or not a test should XFAIL. However, a similar check is also done in internal lit code, but due to the fact that this bit of code only knows about the non device specific features there can be situations where it incorrectly reports a test as expected to fail (Like negating a device specific feature, like
XFAIL: !cpu
).To avoid this we set the XFAIL conditions to an empty list before returning our result, this way
expected_to_fail
is always false in the code below, and our result is not altered.llvm/llvm/utils/lit/lit/Test.py
Lines 275 to 292 in d60cd27