Description
Case: production system with 10Gb Ethernet exporting ZFS-backed file shares from Samba to Windows 7 workstations. SMB2 protocol is being used (forced in smb.conf), and Samba is v4 (release).
Problem: Turning on Samba's option to use sendfile system call (zero copy path) slows transfers down, when it should speed them up.
Server has dual E5620 Xeons and 96 GB RAM. Intel 10GbE 82599EB NIC (same on client). Client in this case has dual X5670 Xeons and 24 GB RAM. Server's running kernel 3.4.6, and client is on Windows 7 SP1 (both x64).
Reference transfer speed is obtained from RAMdisk to RAMdisk copy of a 20GB file. Different Samba versions (3.6.x -> 4.0 release) were tested with different optimizations and smb.conf options to obtain best possible "raw" transfer speeds. Linux network stack had to be tuned also (send & receive window sizes mostly in addition to jumbo frames) to obtain passable 10GbE throughput (close to wire speed with iperf).
The next tests are run 5 times and averages taken. First run is discarded (to make sure the test file is cached) so we disregard disk accesses completely.
Copying the aforementioned file from server to client using RAMdisk as origin and destination results in roughly 710 MB/s throughput with sendfile. Without sendfile throughput drops to about 340 MB/s.
Next up is testing with XFS. We create a zvol, and make an XFS filesystem on top of it. We then copy the same file from RAMdisk to XFS, drop caches and do the copy again from client. Now we get around 670 MB/s with sendfile, and about 320 MB/s without sendfile.
Finally we test with ZFS. Copy the same file to ZFS dataset, then copy from client. With sendfile we now get 310 MB/s (!!), and without sendfile we get 370 MB/s (!!).
Quite interesting. ZFS behaves completely differently from RAMdisk and XFS here. What could possibly cause this discrepancy? According to sendfile's documentation, it uses mmap in the implementation, so there could be something iffy in ZFS's mmap code. Does anyone have more ideas?
Currently I'm profiling (with Oprofile and perf toolstack) what happens in the kernel when these tests are being run, but the profiling data is hard to make sense of. Next I'll take a look at the differences in mmap in XFS and ZFS, but I'm not sure if my knowledge in that area is sufficient to decipher the details.
I'll post my profiling findings in a later post, hopefully tomorrow. In the meantime, please feel free to make suggestions etc.