-
Notifications
You must be signed in to change notification settings - Fork 5k
Enable new exception handling on win-x86 #115957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/azp run runtime-coreclr outerloop |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
|
I'm not so sure it's the same root cause. I'll look at the dumps later on. |
The JIT.opt stress test is failing when reporting GC references at the More than one value in the dump seems off, but at least it explains why I didn't see this in earlier gcstress pass. Disassembly:
Locals in EnumGCRefsX86:
GC info:
Presumably it's trying to report |
Obligatory JIT dump for the above: https://gist.githubusercontent.com/filipnavara/87489fc9b6224706111055ce1ee79583/raw/80122358359dbe95e53fd7708c59611f7024aa87/gistfile1.txt |
I'm starting to suspect that the liveness of the variable may be misreported on other platforms too. We just get away with it because the funclet prolog is marked as x86 JIT dump: https://gist.githubusercontent.com/filipnavara/87489fc9b6224706111055ce1ee79583/raw/80122358359dbe95e53fd7708c59611f7024aa87/gistfile1.txt |
It is an expected invariant that prolog/epilog are no-GC regions. Why did this issue not manifest for filter funclets in the old scheme (because filter-live variable are always pinned?)? |
Yeah, and that's fine. x86 GC info always encoded the (non-funclet) prologs/epilogs and treated them as no-GC on the VM side. Until #115630 the funclet prolog was empty on win-x86 so we didn't run into this problem because there was no code in the no-GC region. Likewise, the funclet epilog is single "ret" instruction so we don't really run into problem either. (These assumptions are not true for linux-x86 which has additional stack alignment instruction and which would have likely run into this problem if someone ever got far enough to run GC stress tests.) I did, however, expect that calling
Not sure I know the precise answer. I remember stumbling upon some specific |
Looks like we have a comment that this is indeed intentional: void CodeGen::genReserveFuncletProlog(BasicBlock* block)
{
assert(compiler->UsesFunclets());
assert(block != nullptr);
/* Currently, no registers are live on entry to the prolog, except maybe
the exception object. There might be some live stack vars, but they
cannot be accessed until after the frame pointer is re-established.
In order to potentially prevent emitting a death before the prolog
and a birth right after it, we just report it as live during the
prolog, and rely on the prolog being non-interruptible. Trust
genCodeForBBlist to correctly initialize all the sets.
We might need to relax these asserts if the VM ever starts
restoring any registers, then we could have live-in reg vars...
*/
noway_assert((gcInfo.gcRegGCrefSetCur & RBM_EXCEPTION_OBJECT) == gcInfo.gcRegGCrefSetCur);
noway_assert(gcInfo.gcRegByrefSetCur == 0);
JITDUMP("Reserving funclet prolog IG for block " FMT_BB "\n", block->bbNum);
GetEmitter()->emitCreatePlaceholderIG(IGPT_FUNCLET_PROLOG, block, gcInfo.gcVarPtrSetCur, gcInfo.gcRegGCrefSetCur,
gcInfo.gcRegByrefSetCur, false);
} |
Thanks. I adjusted emitting the no-GC regions for funclet prologs/epilogs but I will look into this at some later point to see if we can improve the codegen. We no longer establish the frame pointer in funclet prologs on any platform so the reasoning makes little sense. |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
I suspect the Methodical_r1 test may be failing because this code path is not implemented for runtime/src/coreclr/vm/gcenv.ee.common.cpp Lines 320 to 350 in d896e85
It would have been taken otherwise based on the state of the variables. |
Failure:
This assert is one of the typical GC hole symptoms. It is likely to be hit anytime we miss reporting an object with a sync block. The offending object in this case is a marshalled delegate. Marshalled delegates have syncblocks.
It likely means that the object was collected, but the space was not reused by the GC for a new object yet. We likely missed reporting the slot in some earlier GC but reporting it again. The callstack looks like this: BestFitMapping!ILStubClass.IL_STUB_PInvoke -> native code -> BestFitMapping!ILStubClass.IL_STUB_ReversePInvoke
Let's take a look what
This GC was triggered in the finally funclet inside |
BestFitMapping was fixed by b9cd5ac. Could you rerun gcstress? |
This was a fix for Interop/COM/NETClients/IDispatch/NETClientIDispatch/NETClientIDispatch. I do not think BestFitMapping is fixed. |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
I have a fix locally for Methodical_r1.0.1. I'll push it after going through more tests and once the current GCStress run finishes. UPD: Hmm, still needs some refining. UPD2: As usual, the refining consisted of actually saving a file before starting the compilation... (ac4373f) |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
src/coreclr/vm/gcenv.ee.common.cpp
Outdated
if (pCF->ShouldParentToFuncletUseUnwindTargetLocationForGCReporting()) | ||
{ | ||
bool wantsReportOnlyLeaf = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is always set by RyuJIT. We can delete all places that check it in the runtime and change this
runtime/src/coreclr/vm/gcinfodecoder.cpp
Line 675 in 8115429
#ifdef TARGET_AMD64 |
to #if defined(TARGET_AMD64) && defined(DECODE_OLD_FORMATS)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. It will likely need some documentation/comments update too.
The BestFitMapping fails in like every 6th try on my machine when running in isolation (w/ GCStress=0xC and TieredCompilation=0). I am trying to capture a time travel trace to investigate it but didn't have luck so far. UPD: Got lucky. UPD2: UPD3: I found what is the difference in behavior between x86 and x64. Need to run full battery of tests before I change it. |
The issue with
|
Hmm, this does actually make a difference: --- a/src/coreclr/vm/stackwalk.cpp
+++ b/src/coreclr/vm/stackwalk.cpp
@@ -3153,9 +3153,19 @@ void StackFrameIterator::PostProcessingForManagedFrames(void)
#endif // ELIMINATE_FEF
#ifdef TARGET_X86
+#ifdef FEATURE_EH_FUNCLETS
+ bool hasReversePInvoke = false;
+ if (!m_crawl.codeInfo.IsFunclet())
+ {
+ hdrInfo *gcHdrInfo;
+ m_crawl.codeInfo.DecodeGCHdrInfo(&gcHdrInfo);
+ hasReversePInvoke = gcHdrInfo->revPInvokeOffset != INVALID_REV_PINVOKE_OFFSET;
+ }
+#else
hdrInfo *gcHdrInfo;
m_crawl.codeInfo.DecodeGCHdrInfo(&gcHdrInfo);
bool hasReversePInvoke = gcHdrInfo->revPInvokeOffset != INVALID_REV_PINVOKE_OFFSET;
+#endif // FEATURE_EH_FUNCLETS
#endif // TARGET_X86
ProcessIp(GetControlPC(m_crawl.pRD)); ...but I still get a GC hole. In fact, I get it even more deterministically. |
I think this time it's the environment variable |
I think it's ready for another pass. |
/azp run runtime-coreclr outerloop |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
We finally have a clean x86 run of the tests 🥳 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making this happen! Great work!
/ba-g infrastructure and known issues |
Thanks for the help diagnosing the issues, and thanks everyone for helping to make this happen! |
Switches JIT to the funclet model that is used by other platforms and VM to use the new exception handling introduced in .NET 9.
Fixes #113985