2

When copying large files (1-2 GB per file) between file systems, file fragmentation can happen if the destination file system is nearly full.

Our C++ application code uses fallocate() to pre-allocate space when creating and writing data files but I'm wondering how the linux copy command /bin/cp handles that.

Does cp just copy bytes or chunks of data in a loop (and let the file system deal with it)? Or does cp first call fallocate() or posix_fallocate() with the size of the source file?

I haven't found anything on this subject searching the internet.

The filesystem could be ext3, ext4, or xfs.

Centos 8.1, kernel 4.18.0-147.el8.x86_64 #1 SMP

EDIT I

As background, the actual application reads a constant bit rate network stream and pre-allocates a file for N seconds of content. If the actual bitrate is higher, the file naturally grows. ftruncate() is called when the file is closed, which handles if the actual bitrate is lower. cp is only used to move those files between filesystems, hence my question.

And the reasoning for that is to avoid fragmentation. Without fallocate the file system will become increasingly fragmented over time. (Of course fallocate() doesn't completely prevent the problem but certainly mitigates it)

According to Uninitialized blocks and unexpected flags, fallocate() results in "efficient" allocation of contiguous blocks (for most filesystems):

The fallocate() system call is meant to be a way for an application to request the efficient allocation of blocks for a file. Use of fallocate() allows a process to verify that the required disk space is available, helps the filesystem to allocate all of the space in a single, contiguous group, and avoids the overhead that block-by-block allocation would incur.

So I was wondering if copying a large, heavily fragmented file ends up contiguous or fragmented at the destination. Since cp doesn't use fallocate() to pre-allocate space then answer appears to be "possibly yes".

4
  • Preallocation has its drawbacks, too. How does your application handle the situation where the file changes size while it's being copied? If cp doesn't do preallocation, it doesn't have to deal with that situation - it just copies from the input file until there's no more data. Commented Mar 26, 2021 at 2:21
  • The actual application reads a constant bit rate network stream and pre-allocates a file for N seconds of content. If the actual bitrate is higher, the file naturally grows. ftruncate() is called when the file is closed, which handles if the actual bitrate is lower. cp is only used to move those files between filesystems, hence my question. Commented Mar 27, 2021 at 1:32
  • @Danny The context you are giving is quite interesting and should probably be part of the question. That said, I believe your question about cp has been answered and I have offered you a working alternative with dd. You didn't react to any of the two answers, and only added the comment above. So what else are you expecting from us exactly? Commented Mar 27, 2021 at 11:31
  • @xhienne, was a busy weekend. I've added additional context to the question. Commented Mar 29, 2021 at 1:31

1 Answer 1

3

The version of cp provided by GNU coreutils does use fallocate, but only to punch holes in files, not to pre-allocate space for copy targets.

There are a couple of mentions of adding support for fallocate, so it appears there were at least vague plans for something like this at some point.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.