Skip to content

[SYCL][E2E] Fix XFAIL statements that use negated device features #18641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 22, 2025

Conversation

ayylol
Copy link
Contributor

@ayylol ayylol commented May 22, 2025

Because of device specific features we have to perform our own checks to see whether or not a test should XFAIL. However, a similar check is also done in internal lit code, but due to the fact that this bit of code only knows about the non device specific features there can be situations where it incorrectly reports a test as expected to fail (Like negating a device specific feature, like XFAIL: !cpu).

To avoid this we set the XFAIL conditions to an empty list before returning our result, this way expected_to_fail is always false in the code below, and our result is not altered.

def setResult(self, result):
assert self.result is None, "result already set"
assert isinstance(result, Result), "unexpected result type"
try:
expected_to_fail = self.isExpectedToFail()
except ValueError as err:
# Syntax error in an XFAIL line.
result.code = UNRESOLVED
result.output = str(err)
else:
if expected_to_fail:
# pass -> unexpected pass
if result.code is PASS:
result.code = XPASS
# fail -> expected fail
elif result.code is FAIL:
result.code = XFAIL
self.result = result

@ayylol ayylol requested a review from sarnex May 22, 2025 18:37
@ayylol ayylol requested a review from a team as a code owner May 22, 2025 18:37
@ayylol ayylol temporarily deployed to WindowsCILock May 22, 2025 18:37 — with GitHub Actions Inactive
@ayylol ayylol temporarily deployed to WindowsCILock May 22, 2025 18:58 — with GitHub Actions Inactive
# Set this to empty so internal lit code won't change our result if it incorrectly
# thinks the test should XFAIL. This can happen when our XFAIL condition relies on
# device features, since the internal lit code doesn't have knowledge of these.
test.xfails = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry so which features are known by the general lit infra and which are known only by our custom code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The features in test.config.available_features are known by the general lit infra, the device specific features which are in test.config.sycl_dev_features are not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add the dev features to available_features before we build/run for each target and remove it after?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue would be in situations with multiple devices where a single device might be marked as XFAIL, while the rest pass. If we add all the device features to available_features and let the lit infra do their own XFAIL logic, then this test would be expected to XFAIL, while in reality we expect it to pass, while the XFAIL devices are skipped.

The reason I did it this way was mostly to avoid doing this redundant second XFAIL check.

Also in the case that we change the logic for XFAILs in our code to properly account for running on multiple devices, it is likely that we would want this internal XFAIL check turned off in favor of our own logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the xfail check only happen once in the lit infra with multiple devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the internal lit infra XFAIL check only happens once after we return the result of our test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the result contains all device runs? sorry just making sure we can't use the lit infra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its a single result for the amalgamation of all the device runs.

i.e.,
PASS -> if all device pass (Some but not all devices may be xfail/unsupported, those are skipped)
FAIL -> if a single non xfail/unsupported device fails
XFAIL/XPASS -> can't happen afaik when running a test on multiple devices
UNSUPPORTED -> if all devices are unsupported, or xfailed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks

@ayylol ayylol temporarily deployed to WindowsCILock May 22, 2025 19:55 — with GitHub Actions Inactive
// XFAIL: !arch-intel_gpu_bmg_g21
// XFAIL: *
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually fails on BMG, but this statement was an example of one that was being incorrectly processed and marked as XFAIL on BMG.

https://github.com/intel/llvm/actions/runs/15194270458/job/42735337999

@ayylol ayylol temporarily deployed to WindowsCILock May 22, 2025 20:17 — with GitHub Actions Inactive
@ayylol ayylol temporarily deployed to WindowsCILock May 22, 2025 20:17 — with GitHub Actions Inactive
# Set this to empty so internal lit code won't change our result if it incorrectly
# thinks the test should XFAIL. This can happen when our XFAIL condition relies on
# device features, since the internal lit code doesn't have knowledge of these.
test.xfails = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does our outer lit code handle XFAILs for both device specific features and normal features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep the part just above this

if len(triples) == 1 and test.config.test_mode == "build-only":
result.code = map_result(test.config.available_features, result.code)
if len(devices_for_test) == 1:
device = devices_for_test[0]
result.code = map_result(test.config.sycl_dev_features[device], result.code)

For build only if the test is only ran for a single triple we only use the normal features, and on full/run-only if the test is only ran for one device we use test.config.sycl_dev_features[device] which contains both the normal features as well as the device features.

@sarnex sarnex merged commit 1e62398 into intel:sycl May 22, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants