Skip to content

copy_file_range: fix fallback when source create on same txg #15172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 15, 2023

Conversation

robn
Copy link
Member

@robn robn commented Aug 13, 2023

Motivation and Context

In 019dea0 we removed the conversion from EAGAIN->EXDEV inside zfs_clone_range(), but forgot to add a test for EAGAIN to the copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Fixes #15170

How Has This Been Tested?

New test added that fails before the change, passes after.

FreeBSD untested, but it has the same shape. The test suite will confirm it compiles.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

In 019dea0 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Signed-off-by: Rob Norris <[email protected]>
@robn
Copy link
Member Author

robn commented Aug 13, 2023

@oromenahar whoops, I missed this in review, sorry 😕

@oromenahar
Copy link
Contributor

Will check this evening :)

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix and adding the test case.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Aug 14, 2023
@behlendorf behlendorf merged commit cae502c into openzfs:master Aug 15, 2023
umsaleem pushed a commit to truenas/zfs that referenced this pull request Aug 17, 2023
In 019dea0 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#15170
Closes openzfs#15172
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 25, 2023
In 019dea0 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#15170
Closes openzfs#15172
behlendorf pushed a commit that referenced this pull request Aug 25, 2023
In 019dea0 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #15170
Closes #15172
@lundman
Copy link
Contributor

lundman commented Dec 5, 2023

Oh is this why our clone fails if it is too soon after creating a file? Just returns EAGAIN.

I see both Linux and FreeBSD can call a generic copy_file implementation, but there is no such nice call in OsX. I can certainly do a copy of FreeBSD using vn_rdwr() but it is also tempting to just wait for the txg, and re-do the call. Any reason not to do that?

@robn
Copy link
Member Author

robn commented Dec 5, 2023

Its safe, just slow.

You might be able to return something to signal to the caller that a fallback is necessary. Before Linux 5.3 we didn't call the fallback directly, we'd return EXDEV and the kernel took care of it.

If not the kernel, your copy handoff/clone syscall might have a return that signals to the application to do it by hand (eg copy_file_range and FICLONE both allow EXDEV to be returned to userspace).

@lundman
Copy link
Contributor

lundman commented Dec 5, 2023

There is nothing in XNU's syscall, or VNOP_CLONE that will handle any fallback, just returns error all the way back to userland. It's a bit unattractive to have

cp: reproducer__1: clonefile failed: Resource temporarily unavailable

Although, I see in cp/copyfile:
https://github.com/apple-oss-distributions/file_cmds/blob/main/cp/utils.c#L231-L236

So I can probably return ENOTSUP to fallback. Feels a bit icky, but I'll give it a go.

Its safe, just slow

Is it? So if you create a 4g file, then call clonefile immediately, Linux&FreeBSD would copy another 4g. txg_sync would wait one txg, then actually clone it? Did we test this? I'm curious, just not sure i'm "I'll test it" curious :)

@lundman
Copy link
Contributor

lundman commented Dec 5, 2023

OK so quick and dirty test:

With:

    if ((error == EAGAIN) && (retry_once == 1)) {
        retry_once = 0;
        txg_wait_synced(dmu_objset_pool(inzp->z_zfsvfs->z_os), 0);
        goto again;

Results in

419430400 bytes transferred in 3.116787 secs (134571405 bytes/sec)
writing files

real    0m0.027s
user    0m0.001s
sys     0m0.024s

-rw-r--r--  1 root     wheel  419430400 Dec  5 10:07 reproducer__0
-rw-r--r--  1 root     wheel  401866752 Dec  5 10:07 reproducer__1

The concern there is, why are they different sizes?

Testing with return ENOTSUP

419430400 bytes transferred in 3.283743 secs (127729366 bytes/sec)
writing files

real    0m0.025s
user    0m0.000s
sys     0m0.023s

-rw-r--r--  1 root     wheel  419430400 Dec  5 10:10 reproducer__0
-rw-r--r--  1 root     wheel  401866752 Dec  5 10:10 reproducer__1

So in the interest of being similar to upstream, I will return ENOTSUP.

@robn
Copy link
Member Author

robn commented Dec 5, 2023

There is nothing in XNU's syscall, or VNOP_CLONE that will handle any fallback, just returns error all the way back to userland.

This makes it basically the same as FICLONE and FICLONERANGE on Linux. ie specifically "you must create a clone".

Is it? So if you create a 4g file, then call clonefile immediately, Linux&FreeBSD would copy another 4g.

copy_file_range will copy another 4G, because its copy handoff, not cloning specifically, ie, "make a copy and I don't care how you do it". FreeBSD has no specific clone syscall; on Linux if you explicitly request a clone (FICLONE), it will fail:

# dd if=/dev/random of=somefile bs=128K count=1 ; clonefile -c somefile someclone
1+0 records in
1+0 records out
131072 bytes (131 kB, 128 KiB) copied, 0.00158406 s, 82.7 MB/s
using FICLONE
ioctl(FICLONE): Resource temporarily unavailable
file offsets: src=0/131072; dst=0/0

Looking at the cp you linked, it really does look like ENOTSUP is the signal. Since copy_file_range() is also there, it'd be worth finding out what that does internally; can you signal to it that it should do a copy also? Or does it never even try to clone?

I suspect your implementation is going to be extremely close to the Linux one.

@lundman
Copy link
Contributor

lundman commented Dec 5, 2023

Ahh OK so large file ended up completing in txg, so the cp -r returning ENOTSUP never kicked in. Smaller file has:

cp: reproducer__1: clonefile failed: Operation not supported

That is from older file_cmds-272, the newer file_cmds-428 has the fallback. That went into macOS-14 (Sonoma) so, only the very latest version. All older macOS will just get error if the EAGAIN thing happens, and they have no way to know if it isn't supported, and not just a temporary error. Honestly, EAGAIN is better there, so users can tell the difference, and ENOTSUP on macOS-14 and above.

lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
In 019dea0 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kay Pedersen <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#15170
Closes openzfs#15172
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cp failing sporadically due to EAGAIN being returned from copy_file_range
4 participants