Which one is more efficient over a very large set of files and should be used?
find . -exec cmd {} + or
find . | xargs cmd (Assume that there are no funny characters in the filenames)
Which one is more efficient over a very large set of files and should be used?
find . -exec cmd {} + or
find . | xargs cmd (Assume that there are no funny characters in the filenames)
Speed difference will be insignificant.
But you have to make sure that:
Your script will not assume that no file will have space, tab, etc in file name; the first version is safe, the second is not.
Your script will not treat a file starting with "-" as an option.
So your code should look like this:
find . -exec cmd -option1 -option2 -- {} + or
find . -print0 | xargs -0 cmd -option1 -option2 -- The first version is shorter and easier to write as you can ignore 1, but the second version is more portable and safe, as "-exec cmd {} +" is a relatively new option in GNU findutils (since 2005, lots of running systems will not have it yet) and it was buggy recently. Also lots of people do not know this "-exec cmd {} +", as you can see from other answers.
exec will output results as they are found, wheras xargs will, it seems, wait until the entire directory is searched before writing to stdout. If you're trying this on a large directory, and it seems that xargs isn't working, patience is advisable.-print0 find returns filenames separated with newline, but newline can also be part of a filename, making it ambiguous. Byte 0 can't, so it is a safe separator. Yes - adding -- to a command that supports it is a good practice when you can't control its arguments, even if not always strictly required or unsafe.find . | xargs cmd is more efficient (it runs cmd as few times as possible, unlike exec, which runs cmd once for each match). However, you will run into trouble if filenames contain spaces or funky characters.
The following is suggested to be used:
find . -print0 | xargs -0 cmd this will work even if filenames contain funky characters (-print0 makes find print NUL-terminated matches, -0 makes xargs expect this format.)
xargs approach is actually significantly slower if there are no (or only a few) matching files and cmd doesn't have much to do for each file. For example, when run in an empty directory, the xargs version will take at least twice the time, since two processes must be started instead of just one. (Yes, the difference is usually imperceptible on *nix, but in a loop it could be important; or, try it on Windows some time ...)