2

I understand that classically, the Linux Kernel was conservative about adding new syscalls.

But, I've learned about the existence of copy_file_range, which seems to do the exact same thing as sendfile. The only differences I could spot are:

  • doesn't work on sockets at all, only on proper regular files
  • allows for both input and output offset to be set

But for regular files, a seek might achieve the same offset semantics, so now I'm confused what the purpose of the copy_file_range syscall is, if its abilities are a strict subset of San existing syscall? Especially since seeking as a separate thing can be done quite sensibly separately, so to not extend the time spent in kernelspace even further l. (As a usual is OS design goal is to make operating system calls not block, and physical seeking can be finished while control is already handed back to userland.)

1

1 Answer 1

4

copy_file_range’s advantage over sendfile (and splice) is that, in some circumstances, it implements copy offload. In fact that’s its main purpose, and it comes from long-standing work on copy offload — the intention was to provide access to features allowing block copies to be delegated to hardware (e.g. SANs). That work is still ongoing; currently, copy offload relies on file system support, where copies can either be replaced by a reference in the file system, or delegated to a server. (See Copy on write for directories? for details.)

sendfile on the other hand comes from a desire to send files to the network efficiently; similarly, splice sends files to pipes efficiently. sendfile was extended to send files to other files a long time ago (using splice under the hood), but the assumption remains that it’s an optimised read-write: instead of reading data in user-space, and writing it, a process can ask the kernel to take care of the read-write loop.

I suspect that copy_file_range’s existence is tied to the additional output offset argument: if you think about exposing a low-level offloading block copy feature available in hardware, tying that to a file descriptor-based API requires both offsets; while that would be possible using sendfile and whatever the current offset on the output file descriptor is, perhaps that isn’t tightly-coupled enough for the copy offload objective.

(The generic copy_file_range implementation, used on file systems without specific support, delegates to splice in the same way as sendfile. Arguably sendfile could be made to use copy_file_range where possible, I don’t know whether that’s ever been implemented.)

3
  • Now that is interesting! Wouldn't have guessed it's about offloading! Commented Feb 28, 2024 at 17:32
  • Yeah I think I see the advantage: if you are say a block storage layer and might have a lot of concurrent things to copy, then you either need to serialize (seek, sendfile, seek, send file...) or start dup'ing fds, just to be able to work on multiple fronts. And then the kernel would need to make sure access order guarantees (I assume sendfile had similar semantics as write when it comes to determinism when it returns), and then send out commands to the hardware. Much easier if commands are just "fire and forget" and can be issued from arbitrary many threads in parallel Commented Feb 29, 2024 at 0:07
  • 1
    It is worth noting that copy_file_range() still has a lot of sharp edges on Linux. Don't use it blindly without understanding its practical implementation quirks. lwn.net/Articles/846403 Commented Apr 7, 2024 at 22:15

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.