5

I want to list all files in /usr/ using ls. I am not calling ls directly, but via xargs. Moreover, I am using xargs parameters -L and -P to utilize all my cores.

find /usr/ -type f -print0 | xargs -0 -L16 -P4 ls -lAd | sort -k9 > /tmp/aaa 

the above command works as expected. It produces nice output. However when I increase the number of lines -L parameter from 16 to 64:

find /usr/ -type f -print0 | xargs -0 -L64 -P4 ls -lAd | sort -k9 > /tmp/bbb 

the resulting output is all garbled up. What I mean by that is, output no longer starts on new line, new lines start in the middle of "previous" line and are all mixed up:

-rw-r--r-- 1 root root 5455 Nov 16 2010 /usr/shareonts/X11/encodings/armscii-8.enc.gz -rw-r--r-- 1 root root 1285 May 29 2016-rw-r--r-- 1 root root 6205 May 29 2016 /usr/include/arpa/nameser_compat.h -rw-r--r-- 1 root root 0 Apr 17 20-rw-r--r-- 1 root root 933 Apr 16 2012 /usr/share/icons/nuoveXT2/16x16/actions/address-book-new.png -rw-r--r-- 1 root root 53651 Jun 17 2012-rw-r--r-- 1 root root 7117 May 29 2016 /usr/include/dlfcn.h -rw-r--r-- 1 root root 311 Jun 9 2015-rw-r--r-- 1 root root 1700 Jun 9 2015 /usr/share/cups/templates/de/add-printer.tmpl -rw-r--r-- 1 root root 5157 M1 root root 10620 Jun 14 2012 /usr/lib/perl5/Tk/pTk/tkIntXlibDecls.m -rw-r--r-- 1 root -rwxr-xr-x 1 root root 1829 Jan 22 2013 /usr/lib/emacsen-common/packages/install/dictionaries-common -rw-r--r-- 1 root r-rw-r--r-- 1 root root 1890 Jun 2 2012 /usr/share/perl5/Date/Manip/TZ/afaddi00.pm -rw-r--r-- 1 root root 1104 Jul-rw-r--r-- 1 root root 10268 Jul 27 15:58 /usr/share/perl/5.14.2/B/Debug.pm -rw-r--r-- 1 root root 725 Apr 1-rw-r--r-- 1 root root 883 Apr 1 2012 /usr/share/icons/gnome/16x16/actions/address-book-new.png 

Funny thing is, it only happens when using -L64 or larger. I don't see this problem with -L16.

Can anybody explain what is happening here?

2
  • @don_crissti - no, it is garbled even without sort. Commented Dec 23, 2016 at 22:59
  • @don_crissti - try greping the output for lines, not starting with '-r': find /usr/ -type f -print0 | xargs -0 -L64 -P4 ls -lAd | grep -v -- '^-r'. I can see lots of lines. Commented Dec 23, 2016 at 23:12

2 Answers 2

5

This is to do with writes to pipes. With -L16 you are running one process for each 16 files, which produces about a thousand characters, depending on how long the filenames are. With -L64 you are about four thousand. The ls program almost certainly uses the stdio library, and almost certainly uses a 4kB buffer for outputting to reduce the number of write calls.

So find produces a load of filenames, then (for the -L64 case) xargs chops them into bundles of 64 and starts up 4 ls processes to handle them. Each ls will generate its first 4k of output and write it to the pipe to sort. Note that this 4k will typically not end with a newline. So say the third ls gets its first 4kB ready first, and it ends

 lrwxrwxrwx 1 root root 6 Oct 21 2013 bzegrep -> bzgrep -rwxr-xr-x 1 root root 4877 Oct 21 2013 bzexe lrwxrwxrwx 1 root root 6 Oct 2 

and then the first ls outputs something, e.g.

 total 123459 

then the input to sort will include lrwxrwxrwx 1 root root 6 Oct 2total 123459

In the -L16 case, the ls processes will (usually) only output a complete set of results in one go.

Of course for this case you are just wasting time and resources by using xargs and ls, you should just let find output the information it already has rather than running extra programs to discover the information again.

5
  • this is just a demo. In real life, I am using md5sum and stat, raher than ls. Commented Dec 23, 2016 at 23:15
  • The stat is probably redundant, but if you are doing md5sum which has to read every byte on some potentially big files then anything like removing stat is just a micro optimization. Commented Dec 23, 2016 at 23:22
  • so, back to the original problem, is there any solution ? Or do i have to use -L16, (or even lower) to be sure it's not garbled? Commented Dec 23, 2016 at 23:29
  • The solution is to make sure that the programs run by xargs use line buffered output. If you have the common set of linux utilities available, then it comes with a program called stdbuf which may be able to change the buffering - read its manual pages. So you would say find .... | xargs -L64 -P4 stdbuf -oL ls -lAd | sort .... Commented Dec 24, 2016 at 1:24
  • @icarus That will not help is a single line is longer than the pipe size (4 or 8 KB). Commented Mar 4, 2017 at 11:12
1

GNU Parallel was built to solve exactly the mixing problem (run time 40 seconds):

find /usr/ -type f -print0 | parallel -0 -L64 -P4 ls -lAd | sort -k9 > /tmp/bbb 

It can even detect the number of cores (run time 40 seconds):

find /usr/ -type f -print0 | parallel -0 -L64 ls -lAd | sort -k9 > /tmp/bbb 

And split the input evenly (run time 24 seconds):

find /usr/ -type f -print0 | parallel -0 -X ls -lAd | sort -k9 > /tmp/bbb 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.