0

Here is a sample directory tree as it would appear if it were sorted in character code order (i.e., directories are not listed first):

${PREFIX}/ .bashrc .include.sh.d/ common.sh applications.txt docs/ contacts.docx parts/ docs-/ docs./ timestamp.txt 

Now, suppose that we want to use find "${PREFIX} -mindepth 1 plus some additional arguments and CLI tools:

.include.sh.d/ .include.sh.d/common.sh docs/ docs/parts/ docs/contacts.docx docs-/ docs./ applications.txt timestamp.txt 

Note four things about the text in the second codeblock:

  1. Paths which are directories are appended with a /.
  2. The children of a directory are first sorted by type—directories come first, followed by files—and then by character code; when this is applied as a rule, it places dir/, dir/-file, dir-/, and dir-/file in exactly that order. This is the way files/directories are sorted by tree --dirsfist "${PREFIX}" (the rest of formatting is still different).
  3. Corollary to #2: The / behaves as a field separator, leading to docs/ coming before docs-/ despite - coming before / in the ASCII character table.
  4. Corollary to #2: Any given directory is contiguous with all of its descendants.

With find "${PREFIX} -not -path "${PREFIX}" -printf "%P\n" | sort -t "/", the result is this:

.include.sh.d .include.sh.d/common.sh applications.txt docs docs- docs. docs/contacts.docx docs/parts timestamp.txt 

The problems with the are as follows:

  • Directories are not appended with /, thereby breaking #1.
  • There are files coming before directories that they share their parent directory with, thereby directly breaking #2.
  • docs/{contacts.txt,parts} do not come immediately after docs, but rather docs{-,.} are sandwiched between the two groups, thereby breaking #4 and, both directly and by extension, #2.

With `find "${PREFIX} -not -path "${PREFIX}" ( -type f -printf "%P\n" , -type d -printf "%P/\n" ) | sort -t "/", we instead get:

.include.sh.d .include.sh.d/common.sh applications.txt docs-/ docs./ docs/ docs/contacts.docx docs/parts/ timestamp.txt 

In this case, directories are appended with / and directories are only ever immediately followed by all of their descendents, which satisfies #1 and #3, but these problems remain:

  • There are one more files coming before one or more sibling directories of theirs, thereby directly breaking #2.
  • docs/ is preceded by docs-/and docs./, thereby breaking #4 and, both directly and by extension, #2.

Is there any way that I can satisfy all 4 requirements while still using find for this task? If so, how? If not, what sequence of commands is at least as fast using find and sort? I'd prefer if find "${PREFIX}" -mindepth 1 \( -type f -printf "%P\n", -type l -xtype f -printf "%P\n", -type d -printf "%P/\n", -type l -xtype d -printf "%P/\n", -xtype l -printf "%P\n" \) is the first command because I want to treat symlinks to directories as directories for the purposes of rule #2.

Update: I've come up with a sequence of commands that for pipung find's output into that groups directories before the files that they are siblings of: sed -r -e 's:[^/]+/:0&:g;s:[^/*]+$:1&:g'| sort -t "/" | sed 's/^[01]//g;s/\/[01]/\//g'.

1 Answer 1

0

Not without either:

a) writing a custom sorting script to pipe find's output into, e.g. with awk or perl, and preferably using a natural sort method so that filenames with numbers are sorted correctly.

or

b) writing a custom find script that sorts and formats output exactly as you want it to, e.g. with perl's File::Find module, which is included with perl as part of its standard library.

This is not at all unusual, and nothing to be surprised about - when you need to do something "non-standard", i.e. not already covered by the standard tools, you really only have 3 choices:

  1. Accept the situation as it is,

  2. Submit a wishlist bug report and hope that the devs think it's worth spending their time on,

  3. Write your own. Or, alternatively, downloading, modifying, and compiling the source for the relevant tool(s) and then maintaining & updating your custom version(s) afterwards.

8
  • Yep, quite often that happens, when the tools are being used in standard ways already thought of by the devs and programmed by them to do exactly that. And also quite often, that isn't possible because the devs haven't programmed that functionality into their tools. That's when you need to write your own using sed or awk or perl or other languages...sometimes all you need is a post-processing script to massage the output into the exact format you require, and sometimes you need to write your own version of the tool from scratch. Non-standard sorts are fairly often examples of the latter. Commented Nov 17 at 4:06
  • I deleted my comment that @cas' above comment was replying to in order to add information. For posterity, this is what that comment originally said: I've seen many tasks that I figured would require a tailored script get solved using only a combination of Unix CLI utilities. Commented Nov 17 at 4:08
  • This is the comment I intended to replace it with before @cas replied quicker than I could add it: I've seen many tasks that I figured would require a tailored script get solved using only a combination of Unix CLI utilities. After I asked this question, I came up with a fix for grouping directories before sibling files: sed -r -e 's:[^/]+/:0&:g;s:[^/*]+$:1&:g'| sort | sed 's/^0//g;s /1 / g'. Commented Nov 17 at 4:10
  • The problem of docs-/ getting sorted before docs/ remains. Commented Nov 17 at 4:12
  • 1
    BTW, it's a mistake to make too great a conceptual distinction between common "simple" tools (e.g. coreutils including grep, du, ls, sort and many others, as well as findutils) and common scripting languages like sed or awk or perl - they're all "Unix CLI utilities" too and intended to be used as such. It's just that the scripting languages are capable of more than just simple things, limited only by your imagination and your programming skill....which, like anything else, gets better with practice. Commented Nov 17 at 4:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.