Skip to content

zfs list used/free/refer up to 10% *smaller* than sendfile size for large recordsize pools/datasets (75% for draid!) #14420

Open
@malventano

Description

@malventano

edit See next comment for identified cause and proposed solution

System information

Type Version/Name
Kernel Version 5.15.83-1-pve
OpenZFS Version 2.1.7

Describe the problem you're observing

For some raidz/z2/z3 configurations and widths using recordsize > 128k:

  • Samba shares report to Windows clients that file sizes are up to 10% smaller than reality.
  • du reports the same discrepancy.
  • Also observable with compression=off.
  • Traced back as far as zdb -dbdbdbdbdbdb showing file dsize up to 10% smaller than the total of all block lsizes.
  • This extends to improper accounting reflected in zfs list and potentially deceives users into believing queried files / folders are up to 10% smaller than reality, and leads to additional confusion when attempting to evaluate compression effectiveness on various files or folders impacted by this issue.

Describe how to reproduce the problem

To elaborate on the symptoms, I iterated through various raidz/z2/z3 configs of varying recordsizes and widths. Initial tests showed cases where dsize was 2-3x greater than asize, so for each raidz/z2/z3 config I selected a width that was 1 data block per stripe, 2 data blocks per stripe (for small records, forcing more overhead), and then 8, 16, 32, 64, and 128 data blocks per stripe (to most efficiently store larger records). Other pool options used: ashift=12 compression=off (default).
Each entry below represents the percent difference in dsize vs. the single 16M file written to the pool.

image

Given the above data, it appears that perhaps some dsize math was attempting to compensate padding / overhead for smaller records (only my guess), but that math does not appear to be accurate for those cases, and it fails in the opposite direction for datasets with larger recordsizes.

Choosing one of the more egregious entries above (8-wide raidz3 with 16MB recordsize, storing a single 16M file), here is an output of various commands:

Test file creation:
dd if=/dev/urandom of=testfile bs=1M count=16 (16.0M)

ls -l testfile:
-rw-r--r-- 1 root root 16777216 Jan 23 13:06 test/testfile (16.0M)

du -b testfile:
16777216 test/testfile (16.0M)

du -B1 testfile:
15315456 test/testfile (14.6M)

zfs list test: (refer=14.8M)

NAME   USED  AVAIL     REFER  MOUNTPOINT
test  15.7M   440G     14.8M  /test/test

zpool list test:

NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
test   796G  27.5M   796G        -         -     0%     0%  1.00x    ONLINE  -

zfs snapshot test@1;zfs send -RLPcn test@1|grep size
size 16821032 (16.1M)

zdb -dbdbdbdbdbdb test/ 2 (dsize=14.6M)

Dataset test [ZPL], ID 54, cr_txg 1, 14.8M, 7 objects, rootbp DVA[0]=<0:100050000:4000> DVA[1]=<0:200050000:4000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=8L/8P fill=7 cksum=10fcf1fa0c:2ec854b9ded1:44b71aa9feff71:4769acdee207900f

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         2    1   128K    16M  14.6M     512    16M  100.00  ZFS plain file (K=inherit) (Z=inherit=uncompressed)
                                               176   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	path	/testfile
	uid     0
	gid     0
	atime	Mon Jan 23 13:06:43 2023
	mtime	Mon Jan 23 13:06:43 2023
	ctime	Mon Jan 23 13:06:43 2023
	crtime	Mon Jan 23 13:06:43 2023
	gen	7
	mode	100644
	size	16777216
	parent	34
	links	1
	pflags	840800000004
Indirect blocks:
               0 L0 DVA[0]=<0:500054000:199c000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=1000000L/1000000P birth=8L/8P fill=1 cksum=2000977398cc31:ec0a27a616a076e9:518ba24e091e641e:4716324db9dd86fd

		segment [0000000000000000, 0000000001000000) size   16M

testfile as viewed by Windows via SMBD: (14.6M)

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions