17

I have seen Bash scripting guides suggesting the use of array for working with filenames containing whitespace. DashAsBinSh however suggests that arrays are not portable so I am looking for a POSIX compliant way of working with lists of filenames that may contain whitespace.

I am looking to modify the below example script so that it would echo

foo/target/a.jar foo/target/b.jar bar/target/lol whitespace.jar 

Here is the script

#!/usr/bin/env sh INPUT="foo/target/a.jar foo/target/b.jar bar/target/b.jar bar/target/lol whitespace.jar" # this would be produced by a 'ls' command # We can execute the ls within the script, if it helps dostuffwith() { echo $1; }; F_LOCATIONS=$INPUT ALL_FILES=$(for f in $F_LOCATIONS; do echo `basename $f`; done) ALL_FILES=$(echo "$ALL_FILES" | sort | uniq) for f in $ALL_FILES do fpath=$(echo "$F_LOCATIONS" | grep -m1 $f) dostuffwith $fpath done 
1

2 Answers 2

12

POSIX shells have one array: the positional parameters ($1, $2, etc., collectively refered to as "$@").

set -- 'foo/target/a.jar' 'foo/target/b.jar' 'bar/target/b.jar' 'bar/target/lol whitespace.jar' set -- "$@" '/another/one at the end.jar' … for jar do dostuffwith "$jar" done 

This is inconvenient because there's only one, and it destroys any other use of the positional parameters. Positional parameters are local to a function, which is sometimes a blessing and sometimes a curse.

If your file names are guaranteed not to contain newlines, you can use newlines as the separator. When you expand the variable, first turn off globbing with set -f and set the list of field splitting characters IFS to contain only a newline.

INPUT="foo/target/a.jar foo/target/b.jar bar/target/b.jar bar/target/lol whitespace.jar" … set -f; IFS=' ' # turn off variable value expansion except for splitting at newlines for jar in $INPUT; do set +f; unset IFS dostuffwith "$jar" # restore globbing and field splitting at all whitespace done set +f; unset IFS # do it again in case $INPUT was empty 

With items in your list separated by newlines, you can use many text processing commands usefully, in particular sort.

Remember to always put double quotes around variable substitutions, except when you explicitly want field splitting to happen (as well as globbing, unless you've turned that off).

1
  • Good answer and explanation. I'm going to mark this as accepted because this makes the original sort | uniq step work as intended. Commented Dec 9, 2013 at 9:52
5

Since your $INPUT variable uses newlines as separators, I'm going to assume that your files will not have newlines in the names. As such, yes, there is a simple way of iterating over the files and preserving whitespace.

The idea is to use the read shell builtin. Normally read will split on any whitespace, and so spaces will break it. But you can set IFS=$'\n' and it will instead split on newlines only. So you can iterate over each line in your list.

Here's the smallest solution I could come up with:

INPUT="foo/target/a.jar foo/target/b.jar bar/target/b.jar bar/target/lol whitespace.jar" dostuffwith() { echo "$1" } echo "$INPUT" | awk -F/ '{if (!seen[$NF]++) print }' | \ while IFS=$'\n' read file; do dostuffwith "$file" done 

Basically it sends "$INPUT" to awk which deduplicates based on the file name (it splits on / and then prints the line if the last item hasn't been seen before). Then once awk has generated the list of file paths, we use while read to iterate through the list.

5
  • $ checkbashisms bar.sh possible bashism in bar.sh line 14 (<<< here string) Commented Nov 28, 2013 at 12:12
  • 1
    @EeroAaltonen Changed it to not use the herestring. Note though that with this change, the while loop, and thus dostuffwith is executed in a subshell. So any variables or changes made to the running shell will be lost when the loop completes. The only alternative is to use a full heredoc, which isn't that unpleasant, but I thought this would be preferable. Commented Nov 28, 2013 at 22:28
  • I'm awarding points based more on readability than smallness. This certainly works and already +1 for that. Commented Nov 29, 2013 at 9:21
  • IFS="\n" splits on backslash and n characters. But in read file, there's no splitting. IFS="\n" is still useful in that it removes the blank characters from $IFS which otherwise would have been stripped at the beginning and end of the input. To read a line, the canonical syntax is IFS= read -r line, though IFS=anything read -r line (provided anything doesn't contain blanks) will work as well. Commented Nov 18, 2014 at 8:31
  • oops. Not sure how I managed that one. Fixed. Commented Nov 18, 2014 at 13:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.