Skip to main content
deleted 63 characters in body; added 101 characters in body; added 11 characters in body
Source Link
cas
  • 84.7k
  • 9
  • 138
  • 206

The first regex replaces single-quotes at the beginning and end of the input as well as the sequence "single-quote,one-or-more whitespace chars,single-quote"single-quote, one-or-more whitespace chars, single-quote with NUL characters (the commas and spaces in that are not part of the pattern, they're just grammatical English list separators). The second regex removes the quotes at the beginning and end of the input, and the third removes LF or CRLF at the end of a "line".

The first regex replaces single-quotes at the beginning and end of the input as well as the sequence "single-quote,one-or-more whitespace chars,single-quote" with NUL characters. The second regex removes the quotes at the beginning and end of the input, and the third removes LF or CRLF at the end of a "line".

The first regex replaces the sequence single-quote, one-or-more whitespace chars, single-quote with NUL characters (the commas and spaces in that are not part of the pattern, they're just grammatical English list separators). The second regex removes the quotes at the beginning and end of the input, and the third removes LF or CRLF at the end of a "line".

Source Link
cas
  • 84.7k
  • 9
  • 138
  • 206

Your best option is to fix whatever's generating such useless file lists so that it generates NUL-separated output instead (because a NUL is the only character that can not be in a path/filename, it is the only separator that is guaranteed to handle any filename with any valid characters). If that's impossible, you can kludge up a "fix" by attempting to convert it to NUL-separated format.

The following perl one-liner will (mostly) convert the file to NUL separated filenames, without quotes surrounding them:

perl -0 -pe "s/'\s+'/\0/sg; s/^'|'\$//sg; s/\x0d?\x0a\$//" file.txt 

The first regex replaces single-quotes at the beginning and end of the input as well as the sequence "single-quote,one-or-more whitespace chars,single-quote" with NUL characters. The second regex removes the quotes at the beginning and end of the input, and the third removes LF or CRLF at the end of a "line".

This is very far from perfect - some input is un-fixable because there's no way to know with 100% certainty whether a single-quote or a LF is supposed to be embedded in the filename or not (this is why starting with NUL-separated files is the correct solution, not trying to kludge it after the fact).

For example, it will fail if any filenames have an embedded single-quote at either the beginning or end of the filename, or if they have an embedded single-quote followed by one-or-more whitespace characters and followed by another single-quote (e.g. ' ') - all of these will also be replaced with a NUL because of the /g global modifier to the first regex (which is necessary for it to match all filenames in the input instead of just the first). And probably a few other corner-case I haven't thought of yet.

You can redirect the output to another file, feed it into xargs -0r, or use it with the bash built-in readarray and process substitution to populate an array:

readarray -d '' files < <(perl -0 -pe "s/'\s+'/\0/sg; s/^'|'\$//sg; s/\x0d?\x0a\$//" file.txt) 

If you pipe the output into xxd (or hd or hexdump or similar hex-dumper program), you can see that it has become NUL-separated:

00000000: 2f74 6d70 2f66 696c 6520 6e75 6d62 6572 /tmp/file number 00000010: 206f 6e65 2e74 7874 002f 746d 702f 6669 one.txt./tmp/fi 00000020: 6c65 206e 756d 6265 7220 7477 6f2e 7478 le number two.tx 00000030: 74 t