spa: clear checkpoint information during retry #17319

oshogbo · 2025-05-09T09:10:59Z

Motivation and Context

During spa loading, the spa_add() function allocates the spa_t
structure. The problematic rewind logic starts in
the spa_load_best() function:

static int
spa_load_best(spa_t *spa, spa_load_state_t state, uint64_t max_request,
    int rewind_flags)
{
        // ...

        load_error = rewind_error = spa_load(spa, state, SPA_IMPORT_EXISTING);
        if (load_error == 0)
                return (0);
        if (load_error == ZFS_ERR_NO_CHECKPOINT) {
                //
                return (load_error);
        }

        // ...
        while (rewind_error && spa->spa_uberblock.ub_txg >= min_txg &&
            spa->spa_uberblock.ub_txg <= spa->spa_load_max_txg) {
                if (spa->spa_load_max_txg < safe_rewind_txg)
                        spa->spa_extreme_rewind = B_TRUE;
                rewind_error = spa_load_retry(spa, state);
        }
        // ...
}

spa_load() is called, which is a wrapper for spa_load_impl().
One of the first actions in that function is to call
spa_ld_read_checkpoint_txg().
If a checkpoint exists, this function sets spa->spa_checkpoint_txg.

However, during spa_load_impl(), many steps may fail after reading
the checkpoint, such as reading feature flags or MOS directories.
If one of these steps fails, the function returns an error.

If the failure is something other than ZFS_ERR_NO_CHECKPOINT,
a rewind is attempted in the loop shown above. But because
the checkpoint had already been read (and spa_checkpoint_txg
was set), and nothing clears this state on failure,
another call to spa_ld_read_checkpoint_txg() during rewind will
casue assertion:

static int
spa_ld_read_checkpoint_txg(spa_t *spa)
{
        //...

        ASSERT0(spa->spa_checkpoint_txg);
}

And we might end with crash like:

[ 6543.510793] VERIFY0(spa->spa_checkpoint_txg) failed (0 == 15)
  [ 6543.511146] PANIC at spa.c:5224:spa_ld_read_checkpoint_txg()
  [ 6543.511407] Showing stack for process 450801
  [ 6543.512011] CPU: 0 PID: 450801 Comm: zpool Kdump: loaded Tainted: P           OE     -------  ---  5.14.0-503.35.1.el9_5.x86_64 #1
  [ 6543.512296] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  [ 6543.512736] Call Trace:
  [ 6543.513202]  <TASK>
  [ 6543.513683]  dump_stack_lvl+0x34/0x48
  [ 6543.515313]  spl_panic+0xd1/0xe9 [spl]
  [ 6543.516916]  ? allocate_cgrp_cset_links+0x89/0xa0
  [ 6543.520874]  ? spl_kmem_alloc_impl+0xb0/0xd0 [spl]
  [ 6543.521241]  ? spl_kmem_alloc_impl+0xb0/0xd0 [spl]
  [ 6543.521597]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.522182]  ? __kmalloc_node+0x4e/0x140
  [ 6543.522598]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.522856]  ? spl_kmem_alloc_impl+0xb0/0xd0 [spl]
  [ 6543.523118]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.523358]  ? __list_add+0x12/0x30 [spl]
  [ 6543.523650]  ? __dprintf+0x120/0x190 [zfs]
  [ 6543.539257]  spa_ld_read_checkpoint_txg+0x194/0x1d0 [zfs]
  [ 6543.539724]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.539858]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.539988]  ? spa_import_progress_set_notes_impl+0x103/0x200 [zfs]
  [ 6543.540410]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.540569]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.540709]  ? spa_import_progress_set_notes+0x5b/0x80 [zfs]
  [ 6543.541124]  spa_load_impl.constprop.0+0x10d/0x720 [zfs]
  [ 6543.541558]  spa_load+0x76/0x140 [zfs]
  [ 6543.542288]  spa_load_best+0x138/0x2c0 [zfs]
  [ 6543.542930]  spa_import+0x28a/0x780 [zfs]
  [ 6543.543526]  ? free_unref_page+0xf2/0x130
  [ 6543.543762]  zfs_ioc_pool_import+0x140/0x160 [zfs]
  [ 6543.544348]  zfsdev_ioctl_common+0x690/0x760 [zfs]
  [ 6543.544959]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.545230]  ? _copy_from_user+0x27/0x60
  [ 6543.545593]  zfsdev_ioctl+0x53/0xe0 [zfs]
  [ 6543.546157]  __x64_sys_ioctl+0x8a/0xc0
  [ 6543.546526]  do_syscall_64+0x5f/0xf0
  [ 6543.546826]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 6543.547096]  ? exc_page_fault+0x62/0x150
  [ 6543.547411]  entry_SYSCALL_64_after_hwframe+0x78/0x80
  [ 6543.547966] RIP: 0033:0x7ff05670313b
  [ 6543.550135] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ad 4c 0f 00 f7 d8 64 89 01 48
  [ 6543.550491] RSP: 002b:00007ffe5bd417f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 6543.550779] RAX: ffffffffffffffda RBX: 000056329756ce10 RCX: 00007ff05670313b
  [ 6543.551046] RDX: 00007ffe5bd42960 RSI: 0000000000005a02 RDI: 0000000000000003
  [ 6543.551290] RBP: 00007ffe5bd45f60 R08: 0000000000000003 R09: 0000000000000000
  [ 6543.551529] R10: 0000000010000000 R11: 0000000000000246 R12: 00007ffe5bd41960
  [ 6543.551779] R13: 000056329755c2e0 R14: 00007ffe5bd42960 R15: 00007ff050002da8
  [ 6543.552096]  </TASK>

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Description

There are several potential ways to address this issue:

Clear spa_checkpoint_txg during retry (proposed in this PR).
Remove the assertion in spa_ld_read_checkpoint_txg().
Perform broader spa_t structure cleanup before retry.

How Has This Been Tested?

I have hit this error during work on #16853. In that PR, we add one
additional TXG during export, which seems to trigger this loop during
import. This was triggered by the checkpoint_zhack_feat test.
However, I don't think that this bug is limited to this PR, and it's a
more generic problem that users can hit.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin · 2025-05-09T13:48:21Z

Shouldn't it be done in spa_unload() for symmetry? spa_load_retry() seems a wrong place for it.

oshogbo · 2025-05-09T13:52:55Z

@amotin Sounds like a better place.
Thanks!

Signed-off-by: Mariusz Zaborski <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.

Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Closes openzfs#17319 (cherry picked from commit 8b9c4e6)

Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mariusz Zaborski <[email protected]> Closes #17319 (cherry picked from commit 8b9c4e6)

oshogbo force-pushed the oshogbo/checkpoint branch from 6b70304 to 8a6daa3 Compare May 9, 2025 13:52

spa: clear checkpoint information during retry

8a6daa3

Signed-off-by: Mariusz Zaborski <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.

amotin approved these changes May 9, 2025

View reviewed changes

amotin added Status: Code Review Needed Ready for review and testing Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 9, 2025

amotin merged commit 8b9c4e6 into openzfs:master May 11, 2025
23 of 24 checks passed

robn mentioned this pull request May 23, 2025

2.3.3 staging prep #17371

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

spa: clear checkpoint information during retry #17319

spa: clear checkpoint information during retry #17319

Uh oh!

oshogbo commented May 9, 2025 •

edited

Loading

Uh oh!

amotin commented May 9, 2025

Uh oh!

oshogbo commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

spa: clear checkpoint information during retry #17319

spa: clear checkpoint information during retry #17319

Uh oh!

Conversation

oshogbo commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

amotin commented May 9, 2025

Uh oh!

oshogbo commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

oshogbo commented May 9, 2025 •

edited

Loading