87

I'm using Ansible to copy a directory (900 files, 136MBytes) from one host to another:

--- - name: copy a directory copy: src={{some_directory}} dest={{remote_directory}} 

This operation takes an incredible 17 minutes, while a simple scp -r <src> <dest> takes a mere 7 seconds.

I have tried the Accelerated mode, which according to the ansible docs, but to no avail.

can be anywhere from 2-6x faster than SSH with ControlPersist enabled, and 10x faster than paramiko.

2
  • I am aware that it does an MD5 hash and validates it but that the time you're seeing would see very large. Commented Jan 16, 2015 at 23:04
  • @CatManDo runs sha1, actually, and that isn't responsible (even though it was my first guess). Commented Jan 17, 2015 at 1:53

7 Answers 7

123

TLDR: use synchronize instead of copy.

Here's the copy command I'm using:

- copy: src=testdata dest=/tmp/testdata/ 

As a guess, I assume the sync operations are slow. The files module documentation implies this too:

The "copy" module recursively copy facility does not scale to lots (>hundreds) of files. For alternative, see synchronize module, which is a wrapper around rsync.

Digging into the source shows each file is processed with SHA1. That's implemented using hashlib.sha1. A local test implies that only takes 10 seconds for 900 files (that happen to take 400mb of space).

So, the next avenue. The copy is handled with module_utils/basic.py's atomic_move method. I'm not sure if accelerated mode helps (it's a mostly-deprecated feature), but I tried pipelining, putting this in a local ansible.cfg:

[ssh_connection] pipelining=True 

It didn't appear to help; my sample took 24 minutes to run . There's obviously a loop that checks a file, uploads it, fixes permissions, then starts on the next file. That's a lot of commands, even if the ssh connection is left open. Reading between the lines it makes a little bit of sense- the "file transfer" can't be done in pipelining, I think.

So, following the hint to use the synchronize command:

- synchronize: src=testdata dest=/tmp/testdata/ 

That took 18 seconds, even with pipeline=False. Clearly, the synchronize command is the way to go in this case.

Keep in mind synchronize uses rsync, which defaults to mod-time and file size. If you want or need checksumming, add checksum=True to the command. Even with checksumming enabled the time didn't really change- still 15-18 seconds. I verified the checksum option was on by running ansible-playbook with -vvvv, that can be seen here:

ok: [testhost] => {"changed": false, "cmd": "rsync --delay-updates -FF --compress --checksum --archive --rsh 'ssh -o StrictHostKeyChecking=no' --out-format='<<CHANGED>>%i %n%L' \"testdata\" \"user@testhost:/tmp/testdata/\"", "msg": "", "rc": 0, "stdout_lines": []} 
Sign up to request clarification or add additional context in comments.

6 Comments

Is there no way for the copy module to be faster? This seems like a bug in copy for it to be so slow?
Once you've switched to synchronize over copy, you'll need to specify rsync_opts if you use rsync/ssh with different ports/users/configs: hairycode.org/2016/02/22/…
What if I want to copy a directory locally, i.e., using the copy module with setting remote_src: yes? It is likely that synchronize cannot be used in this situation.
You deserve a drink mate, Nice answer
This is the way to go!! Reduced my time to send over my vim dotfiles and color schemes from 175 and 157 seconds to 0.19s and 0.17s (tested with profile_tasks callback). I can't believe how many MINUTES I've spent watching that thing until we implemented this. NOTE: It may be helpful to instruct a 'file' task to set the user and group permissions after the synchronize operation is done (user/group functionality is not useful in synchronize module).
|
20

synchronize configuration can be difficult in environments with become_user. For one-time deployments you can archive source directory and copy it with unarchive module:

- name: copy a directory unarchive: src: some_directory.tar.gz dest: {{remote_directory}} creates: {{remote_directory}}/indicator_file 

3 Comments

And how to archive local directory? archive seems to support only remote folders.
This answer is not suitable for maintaining remote directory in sync with ever-changing local one. It assumes that the local version is a kind of immutable image, which needs to be deployed only once. In that case one can archive it with tar -cvpzf , then put resulting archive into files/ subfolder of a playbook and then use unarchive module for faster deployment, faster than scp in the question.
I know, thanks. Syncing and immutable overrides are two different things and I happen to need the latter. For the interest of potential readers, I solved the problem with archive by using delegate_to.
2

Best solution I have found is to just zip the folder and use the unarchive module.

450 MB folder finished in 1 minute.


unarchive: src: /home/user/folder1.tar.gz dest: /opt 

1 Comment

... and where's the difference to the answer by @void?
2

Providing the main.yml task conventions

- name: "Copy Files" synchronize: src: <source> dest: <destination> rsync_opts: - "--chmod=F755" # provide here give also permission 

Comments

0

One of the reasons is that it decrypts vault files on copy, and this causes a lot of overhead as it scans each file and potentially does decryption.

Comments

-3

Also, using Mitogen https://github.com/mitogen-hq/mitogen/ can help, but it doesn't support modern Ansible versions (>6) for now, and has some compatibility issues. Also it's a great solution when hundreds of files need to be templated/copied faster.

Example Ansible task with connection:local benchmark for just 545 files:

- name: build-artifacts | copy artifact files to installer ansible.builtin.copy: src: "{{ artifact_tmp_dir.path }}/{{ item }}" dest: "{{ current_installer_directory }}/{{ artifact.installer_path }}" mode: "0644" loop: "{{ artifact.src_file_list }}" 
# without mitogen build-artifacts : build-artifacts | copy artifact files to installer -- 391.72s # with mitogen build-artifacts : build-artifacts | copy artifact files to installer --- 32.69s 

Comments

-5

While synchronize is more preferable in this case than copy, it’s baked by rsync. It means that drawbacks of rsync (client-server architecture) are remained as well: CPU and disc boundaries, slow in-file delta calculations for large files etc. Sounds like for you the speed is critical, so I would suggest you look for a solution based on peer-to-peer architecture, which is fast and easily scalable to many machines. Something like BitTorrent-based, Resilio Connect.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.