panic loop: VERIFY0(0 == dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)) failed (0 == 2)

### System information

Type | Version/Name
 --- | ---
Distribution Name	| Ubuntu
Distribution Version	| 20.4
Kernel Version	| 5.15.0-67
Architecture	| x86_64
OpenZFS Version	| zfs-2.2.99


### Describe the problem you're observing
After upgrading a system which did not have the `head_errlog` feature (introduced in 0409d3327371cef8a8c5886cb7530ded6f5f1091)  and with an existing scrub IO error recorded, to a version which did have this feature, the system entered a panic loop during a `scrub`.

When a pool transitions to a state where the `head_errlog` feature is enabled, any persistent errlog state is converted to a new format. This involves deleting the old persistent errlog objects (`errlog_last` and `errlog_scrub`) and replacing them with new ones. While the internal state is updated (`spa_errlog_last` and `spa_errlog_scrub`), the persistent pointers to these objects in the MOS directory object are not updated. These are currently only updated in `spa_errlog_sync()` when it is called during sync phase (and there is "something to do").

### Describe how to reproduce the problem
The following sequence will reproduce the issue:

Start with a pool which does not have the head_errlog feature (this was introduced in release 6.0.15.0).
```
$ zpool create -o feature@head_errlog=disabled test_pool sdb
```
A data error occurs in the pool at some point and is discovered via a scrub. This will result in the errlog_last entry in the pools MOS directory to have a non-zero value (an object Id for a zap object that records the error block). Note that on pool import the entries from the MOS directory object are used to populate fields in the spa (e.g. errlog_last is used to initialize spa_errlog_last).
```
$ dd if=/dev/urandom of=/test_pool/file bs=128K count=1000
$ zinject -b 36:2:0:0:1 test_pool
$ zpool scrub test_pool
$ sudo zdb -dddd test 1
Dataset mos [META], ID 0, cr_txg 4, 141K, 61 objects, rootbp DVA[0]=<0:c0016200:200> DVA[1]=<0:e0016200:200> DVA[2]=<0:100022800:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique triple size=1000L/200P birth=89L/89P fill=61 cksum=0000000961588255:000003cffd0c0ccd:0000c9dbf729c1f0:001c559859613442

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         1    1   128K    16K  13.5K     512    32K  100.00  object directory
	dnode flags: USED_BYTES 
	dnode maxblkid: 1
	Fat ZAP stats:
		Pointer table:
			1024 elements
			zt_blk: 0
			zt_numblks: 0
			zt_shift: 10
			zt_blks_copied: 0
			zt_nextblk: 0
		ZAP entries: 19
		Leaf blocks: 1
		Total blocks: 2
		zap_block_type: 0x8000000000000001
			zap_magic: 0x2f52ab2ab
		zap_salt: 0x1df1dddb
		Leafs with 2^n pointers:
			  9:      1 *
		Blocks with n*5 entries:
			  3:      1 *
		Blocks n/10 full:
			  2:      1 *
		Entries with n chunks:
			  3:     13 *************
			  4:      3 ***
			  5:      1 *
			  6:      1 *
			  7:      0 
			  8:      0 
			  9:      1 *
		Buckets with n entries:
			  0:    494 ****************************************
			  1:     17 **
			  2:      1 *

		creation_version = 5000 
		feature_descriptions = 53 
		sync_bplist = 62 
		org.illumos:checksum_salt = 8b469083295342722ca32f7963150eacd1153da3fa0471e12d59130d73b853d2
		scan = 1 2 0 0 49 1 49 1694731576 1694731576 131278336 131278336 44544 0 1 1 3 0 0 0 -1 0 0 0 0 
		free_bpobj = 41 
		feature_enabled_txg = 63 
		deflate = 1 
		errlog_scrub = 0 
		root_dataset = 32 
		com.delphix:log_spacemap_zap = 69 
		config = 61 
		error_scrub = 0 0 0 0 0 0 0 0 0 
		history = 60 
		features_for_write = 52 
		features_for_read = 51 
		com.delphix:bookmark_v2_recalculated = 1 
		com.delphix:vdev_zap_map = 64 
		errlog_last = 89 
```
Upgrade the pool to a later version (e.g., 7.0.0.0.0). This will cause spa_errlog_upgrade() to be called, which will transfer the entries from the old object and delete the old object (replacing the value of spa_errlog_last). Note that this function does not update the MOS directory object, so the directory will continue to reference the old (now deleted) object.
```
$ zpool set feature@head_errlog=enabled test_pool
$ sdb
sdb> spa
ADDR               NAME
------------------------------------------------------------
0xffff8968924e4000 rpool
0xffff89690d298000 test_pool
sdb> addr 0xffff89690d298000 | cast spa_t * | member spa_errlog_last
(uint64_t)129
```
Before another scrub, reboot the system. This will cause the spa to be initialized with, now stale, data from the MOS directory object. At this point the spa will now reference a deleted object from spa_errlog_last and the MOS directory object continues to reference the deleted object as well.
```
$ reboot
...
$ sdb
sdb> spa
ADDR               NAME
------------------------------------------------------------
0xffff9cc08c598000 rpool
0xffff9cc096e6c000 test_pool
sdb> addr 0xffff9cc096e6c000 | cast spa_t * | member spa_errlog_last
(uint64_t)89
```
Start a new scrub. When we attempt to rotate the new scrub data (from spa_errlog_scrub) to the spa_errlog_last we will detect a non-zero value in spa_errlog_last and attempt to free that (already freed) object. This will trigger the panic loop.
```
$ zpool scrub test_pool
```
### Include any warning/errors/backtraces from the system logs

Panic stack:
```
[   34.473834] VERIFY0(0 == dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)) failed (0 == 2)
[   34.475480] PANIC at spa_errlog.c:1139:delete_errlog()
[   34.476432] Kernel panic - not syncing: VERIFY0(0 == dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)) failed (0 == 2)
[   34.478437] CPU: 1 PID: 1059 Comm: txg_sync Tainted: P           OE     5.4.0-135-dx2023011005-db2b50090-generic #152
[   34.480332] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[   34.482249] Call Trace:
[   34.482712]  dump_stack+0x6d/0x8b
[   34.483327]  panic+0x101/0x2e3
[   34.483894]  spl_panic+0xcf/0x102 [spl]
[   34.487189]  delete_errlog+0x124/0x130 [zfs]
[   34.496503]  spa_errlog_sync+0x256/0x260 [zfs]
[   34.499310]  spa_sync_iterate_to_convergence+0xe5/0x250 [zfs]
[   34.500415]  spa_sync+0x2f7/0x670 [zfs]
[   34.501181]  txg_sync_thread+0x22d/0x2d0 [zfs]
[   34.502988]  thread_generic_wrapper+0x83/0xa0 [spl]
[   34.503889]  kthread+0x104/0x140
[   34.506175]  ret_from_fork+0x1f/0x40
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

panic loop: VERIFY0(0 == dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)) failed (0 == 2) #15277

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	20.4
Kernel Version	5.15.0-67
Architecture	x86_64
OpenZFS Version	zfs-2.2.99

panic loop: VERIFY0(0 == dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)) failed (0 == 2) #15277

Description

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions