1

I need to expand some pathes using a POSIX sh or Bash:

Here are two example patterns (I chose overly complicated patterns on purpose):

$ npm pkg get workspaces | jq -r '.[]' apps/app* lib/{be,fe *} lib/*lib 

Let's say my directory tree looks like this:

$ mkdir -p "lib/be lib/fantastic lib" "lib/fantastic" "lib/fe 1 lib/other lib" "apps/app1" "apps/app2" "be" "1" $ tree . ├── 1 ├── apps │   ├── app1 │   └── app2 ├── be └── lib ├── be lib │   └── fantastic lib ├── fantastic └── fe 1 lib └── other lib 12 directories, 0 files 

How do I get a simple list with one path per line of all paths matching the patterns?

It seems like basic shell expansion just resolves the paths and separates them by space, without quoting the individual paths:

For example, what is this even matching?

$ echo "lib/"{"be","fe "*}" lib/"*"lib" lib/be lib/fantastic lib lib/fe 1 lib/other lib 

It could be: lib/be lib/fantastic, lib, lib/fe 1 and lib/other lib
Or: lib/be lib/fantastic lib and lib/fe 1 lib/other lib
Heck, it could even just be one long path: lib/be lib/fantastic lib lib/fe 1 lib/other lib

It seems impossible to tell if you don't know which space is a separator and which space is part of a path.

But equally challenging is the fact that you have to quote everything that contains a space, but at the same time you must not quote wildcards and the like.

I mean, I managed to hack something together, but I highly doubt this would actually take care of all possible cases:

echo 'lib/{be,fe *} lib/*lib' | sed -e 's/\([*,{}]\)/"\1"/g' -e 's/.*/"&"/' -e 's/""//g' 

Running it on my two patterns does appear to work:

$ echo -e 'lib/{be,fe *} lib/*lib\napps/app*' | sed -e 's/\([*,{}]\)/"\1"/g' -e 's/.*/"&"/' -e 's/""//g' | while IFS= read -r line; do bash -c "echo $line"; done lib/be lib/fantastic lib lib/fe 1 lib/other lib apps/app1 apps/app2 

But again, where does a path start and where does it end?

And finally, I don't know how to get around using eval or bash -c. It seems kinda dangerous because a maliciously crafted pattern could basically wipe your system. For example a file pattern like bye && rm -rf ~ could delete your home directory.

1
  • @Forivin All of this is explained in Bash manual under "word splitting", "brace expansion", etc, which you get with man bash. It is a lot of text, but very worth reading at least once if you are going to use it. The pager usually used by man to read manual pages, less, has built-in search, which can be used for navigation. Read man less to become familiar with it. Commented Sep 6, 2023 at 9:04

3 Answers 3

4

When the globbing operators like * and ? are quoted, their special meaning is disabled. However, you need quoting or escaping to protect spaces. The solution is to quote or escape only the parts of the pattern which require it, avoiding the globbing operators. For instance:

All objects in the current directory that contain at least one space (and don't begin with a period):

 *" "* 

Another way, escaping the space rather than quoting it:

 *\ * 

The Bash brace expansion isn't globbing: it's a kind of comprehension notation that generates text. a{b,c}d means { "a$x$d" | x ϵ { "b", "c" } }: all strings a$x$d for $x$ being the elements "b" and "c".

Bash performs brace expansion first to generate fields, and then those are subject to pathname expansion.

Quoting suppresses brace expansion; the braces must be unquoted.

Given a pattern like *.{jpg,gif}, brace expansion is first applied producing the fields *.jpg and *.gif. Then these are subject to filename expansion, exactly as if they had been entered into the command line that way.

Quoting and escaping can be applied to the interior of the braces, so that {\*,"?"} produces \* and "?" which turn into the unexpanded fields * and ?.

4

It seems like basic shell expansion just resolves the paths and separates them by space,

It's not that stupid, and that couldn't even work. The key here is that when the command line is processed, it's processed more like an array of distinct strings ("words" or "fields"), than a single long string. Brace expansion and filename globs produce multiple distinct fields. Those fields end up as the command line arguments of whatever command you run (and eventually as elements of the argv[] array as it's commonly called in C programs).

Your issue, and it's a common pitfall, is that echo joins all arguments it gets with spaces, producing that one long string you saw.

E.g. Bash's interachive help echo explicitly says that's exactly what it does:

$ help echo echo: echo [-neE] [arg ...] Write arguments to the standard output. Display the ARGs, separated by a single space character and followed by a newline, on the standard output. 

Which means these give the same output, even though the arguments are obviously different:

$ echo foo bar doo foo bar doo $ echo "foo bar" doo foo bar doo 

But with something as simple as ls, you'd see it work:

$ touch "foo bar" doo $ ls -l *oo* -rw-r----- 1 ilkkachu ilkkachu 0 Sep 6 12:58 doo -rw-r----- 1 ilkkachu ilkkachu 0 Sep 6 12:58 foo bar 

If the glob would produce what you get when you literally copy the output of echo back to the shell, you'd get one of these:

$ ls -l foo bar doo ls: cannot access 'foo': No such file or directory ls: cannot access 'bar': No such file or directory -rw-r----- 1 ilkkachu ilkkachu 0 Sep 6 12:58 doo 

or

$ ls -l "foo bar doo" ls: cannot access 'foo bar doo': No such file or directory 

(depending on if we were to further split that string on spaces or not)

The solution here is to stop using echo for debugging. Instead, use e.g. printf with suitable options. This prints each distinct argument between < and > using the fact that printf reuses the format string as many times as necessary:

$ printf "<%s>\n" *oo* <doo> <foo bar> 

Or create a script like so:

#!/bin/sh printf "%d args\n" "$#" if [ "$#" -gt 0 ]; then printf "<%s>\n" "$@" fi 

and call it e.g. args.sh. Then try those with your brace expansion.

But equally challenging is the fact that you have to quote everything that contains a space, but at the same time you must not quote wildcards and the like.

You can't get away from this, really. Some characters are special in one way (whitespace splits words), some in another way (glob chars expand to filenames), and some you want to keep like that (glob chars), some you don't (whitespace).

And finally, I don't know how to get around using eval or bash -c. It seems kinda dangerous because a maliciously crafted pattern could basically wipe your system.

Yes, it's dangerous, and that's why you shouldn't do it. Keep data as data, and code as code, and don't mix them. Filename expansion actually does keep a separation there, you can handle filenames with arbitrary characters safely using wildcards. The problem comes when you try to print multiple filenames out to a single string, or a single output stream, like the stdout of echo. Try to avoid doing that if you don't have to, and when you do, print the filenames as NUL-terminated (C-style) strings, since, well, that's what they are.

Your issue isn't exactly about word splitting (of unquoted parameter expansions), but this may still be useful reading: https://mywiki.wooledge.org/WordSplitting

0
1

Thanks to the comment by @ilkkatchu, I now understand that I simply have to use something else than echo, so I came up with a simple inline bash script that prints every received argument as one line to stdout: printf "%s\n" "$0" "$@" and then I "simply" pass the expanded pattern to it.

# Set up test directory structure mkdir -p "lib/be lib/fantastic lib" "lib/fantastic" "lib/fe 1 lib/other lib" "apps/app1" "apps/app2" "be" "1" # Define path patterns export PATH_PATTERNS='lib/{be,fe *} lib/*lib apps/app*' # Print path patterns echo -e "$PATH_PATTERNS" # Output is: # lib/{be,fe *} lib/*lib # apps/app* # Put double quotes around everything that is not `*`, `,`, `{` and `}` export SANITIZED_PATH_PATTERNS="$(echo -e "$PATH_PATTERNS" | sed -e 's/\([*,{}]\)/"\1"/g' -e 's/.*/"&"/' -e 's/""//g')" echo -e "$SANITIZED_PATH_PATTERNS" # Output is: # "lib/"{"be","fe "*}" lib/"*"lib" # "apps/app"* # Iterate over every sanitized expression and expand it by evaluating it with bash -c "... $line", # And inside that new bash put another bash -c "..." right before the $line, so that the expanded $line is passed as multiple parameters to the next bash. # In that next bash we simply print all passed arguments to stdout (on per line), by using `printf "%s\n" "$0" "$@"`: echo -e "$SANITIZED_PATH_PATTERNS" | while IFS= read -r line; do bash -c "bash -c 'printf \"%s\n\" \"\$0\" \"\$@\"' $line"; done # Output is: # lib/be lib/fantastic lib # lib/fe 1 lib/other lib # apps/app1 # apps/app2 

Or as a one-liner:

$ echo "$PATH_PATTERNS" | sed -e 's/\([*,{}]\)/"\1"/g' -e 's/.*/"&"/' -e 's/""//g' | while IFS= read -r line; do bash -c "bash -c 'printf \"%s\n\" \"\$0\" \"\$@\"' $line"; done 

Unfortunately he security implications regarding maliciously crafted patterns, as mentioned in the question still apply and this is also not POSIX-compliant and has only been tested against the two patterns mentioned above. Things that come to mind which would potentially cause issues with my approach:

  • A pattern containing a new line char
  • A path to be matched containing a new line char
  • A pattern containing a comma outside of a braces definition
  • A pattern containing escaped wildcards \*
  • Double wildcards **
  • Pattern containing a question mark

I wish there was a simple way to solve all of these issues, but it seems like there is none. If you have python or another modern scripting engine available, you're most likely better off writing a script that language to take care of the pattern resolving.

Or just use an existing cli utility like glob which can be installed with npm i -g glob and can be used like this:

glob "apps/app*" "/{bin,usr/bin}/" "test/**" 

using the --cmd flag the you can even pass the expanded patterns to another command as arguments.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.