132

Which one is more efficient over a very large set of files and should be used?

find . -exec cmd {} + 

or

find . | xargs cmd 

(Assume that there are no funny characters in the filenames)

1

3 Answers 3

123

Speed difference will be insignificant.

But you have to make sure that:

  1. Your script will not assume that no file will have space, tab, etc in file name; the first version is safe, the second is not.

  2. Your script will not treat a file starting with "-" as an option.

So your code should look like this:

find . -exec cmd -option1 -option2 -- {} + 

or

find . -print0 | xargs -0 cmd -option1 -option2 -- 

The first version is shorter and easier to write as you can ignore 1, but the second version is more portable and safe, as "-exec cmd {} +" is a relatively new option in GNU findutils (since 2005, lots of running systems will not have it yet) and it was buggy recently. Also lots of people do not know this "-exec cmd {} +", as you can see from other answers.

Sign up to request clarification or add additional context in comments.

16 Comments

-print0 is also a GNU find (and GNU xargs) option which is missing from a lot of non-Linux systems, so the portability argument isn't as valid. Using just -print and leaving the -0 off of xargs, however, is very portable.
The point is that without -print0 it does not work if there is a file with a space or tab etc. This can be a security vulnerability as if there is a filename like "foo -o index.html" then -o will be treated as an option. Try in empty directory: "touch -- foo\ -o\ index.html; find . | xargs cat". You'll get: "cat: invalid option -- 'o'"
His example is a filename that contains a -. Without -print0, find will spit out ./foo -o index.html. So maybe starting with a - isn't a big deal, but the result is little changed, and on a multiuser system, could provide an attack vector if your script is world readable.
A note on something which tripped me up here - using exec will output results as they are found, wheras xargs will, it seems, wait until the entire directory is searched before writing to stdout. If you're trying this on a large directory, and it seems that xargs isn't working, patience is advisable.
@Motivated Without -print0 find returns filenames separated with newline, but newline can also be part of a filename, making it ambiguous. Byte 0 can't, so it is a safe separator. Yes - adding -- to a command that supports it is a good practice when you can't control its arguments, even if not always strictly required or unsafe.
|
8
find . | xargs cmd 

is more efficient (it runs cmd as few times as possible, unlike exec, which runs cmd once for each match). However, you will run into trouble if filenames contain spaces or funky characters.

The following is suggested to be used:

find . -print0 | xargs -0 cmd 

this will work even if filenames contain funky characters (-print0 makes find print NUL-terminated matches, -0 makes xargs expect this format.)

2 Comments

This is not "find . -exec cmd {} \;" but "find . -exec cmd {} +". The latter will not run one file at a time.
Note that the xargs approach is actually significantly slower if there are no (or only a few) matching files and cmd doesn't have much to do for each file. For example, when run in an empty directory, the xargs version will take at least twice the time, since two processes must be started instead of just one. (Yes, the difference is usually imperceptible on *nix, but in a loop it could be important; or, try it on Windows some time ...)
3

Modern xargs's versions often support parallel pipeline execution.

Obviously it might be a pivot point when it comes to choice between find … -exec and … | xargs

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.