Description
edit See next comment for identified cause and proposed solution
System information
Type | Version/Name |
---|---|
Kernel Version | 5.15.83-1-pve |
OpenZFS Version | 2.1.7 |
Describe the problem you're observing
For some raidz/z2/z3 configurations and widths using recordsize
> 128k:
- Samba shares report to Windows clients that file sizes are up to 10% smaller than reality.
du
reports the same discrepancy.- Also observable with
compression=off
. - Traced back as far as
zdb -dbdbdbdbdbdb
showing filedsize
up to 10% smaller than the total of all block lsizes. - This extends to improper accounting reflected in
zfs list
and potentially deceives users into believing queried files / folders are up to 10% smaller than reality, and leads to additional confusion when attempting to evaluate compression effectiveness on various files or folders impacted by this issue.
Describe how to reproduce the problem
To elaborate on the symptoms, I iterated through various raidz/z2/z3 configs of varying recordsizes and widths. Initial tests showed cases where dsize was 2-3x greater than asize, so for each raidz/z2/z3 config I selected a width that was 1 data block per stripe, 2 data blocks per stripe (for small records, forcing more overhead), and then 8, 16, 32, 64, and 128 data blocks per stripe (to most efficiently store larger records). Other pool options used: ashift=12
compression=off
(default).
Each entry below represents the percent difference in dsize
vs. the single 16M file written to the pool.
Given the above data, it appears that perhaps some dsize
math was attempting to compensate padding / overhead for smaller records (only my guess), but that math does not appear to be accurate for those cases, and it fails in the opposite direction for datasets with larger recordsizes.
Choosing one of the more egregious entries above (8-wide raidz3 with 16MB recordsize, storing a single 16M file), here is an output of various commands:
Test file creation:
dd if=/dev/urandom of=testfile bs=1M count=16
(16.0M)
ls -l testfile:
-rw-r--r-- 1 root root 16777216 Jan 23 13:06 test/testfile
(16.0M)
du -b testfile:
16777216 test/testfile
(16.0M)
du -B1 testfile:
15315456 test/testfile
(14.6M)
zfs list test: (refer=14.8M)
NAME USED AVAIL REFER MOUNTPOINT
test 15.7M 440G 14.8M /test/test
zpool list test:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
test 796G 27.5M 796G - - 0% 0% 1.00x ONLINE -
zfs snapshot test@1;zfs send -RLPcn test@1|grep size
size 16821032
(16.1M)
zdb -dbdbdbdbdbdb test/ 2 (dsize=14.6M)
Dataset test [ZPL], ID 54, cr_txg 1, 14.8M, 7 objects, rootbp DVA[0]=<0:100050000:4000> DVA[1]=<0:200050000:4000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=8L/8P fill=7 cksum=10fcf1fa0c:2ec854b9ded1:44b71aa9feff71:4769acdee207900f
Object lvl iblk dblk dsize dnsize lsize %full type
2 1 128K 16M 14.6M 512 16M 100.00 ZFS plain file (K=inherit) (Z=inherit=uncompressed)
176 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /testfile
uid 0
gid 0
atime Mon Jan 23 13:06:43 2023
mtime Mon Jan 23 13:06:43 2023
ctime Mon Jan 23 13:06:43 2023
crtime Mon Jan 23 13:06:43 2023
gen 7
mode 100644
size 16777216
parent 34
links 1
pflags 840800000004
Indirect blocks:
0 L0 DVA[0]=<0:500054000:199c000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=1000000L/1000000P birth=8L/8P fill=1 cksum=2000977398cc31:ec0a27a616a076e9:518ba24e091e641e:4716324db9dd86fd
segment [0000000000000000, 0000000001000000) size 16M
testfile as viewed by Windows via SMBD: (14.6M)