Skip to main content
deleted 6 characters in body
Source Link
Stéphane Chazelas
  • 586.2k
  • 96
  • 1.1k
  • 1.7k

Quoting in shells is a tricky business. Shells have a great number of different quoting operators ('...', "...", \, $'...', $"...") all but '...' being potentially unsafe as they don't escape every every character (in particular, they don't escape the \ character which is a dangerous one as its encoding is also found in the encoding of other characters in some charsets).

Quoting in shells is a tricky business. Shells have a great number of different quoting operators ('...', "...", \, $'...', $"...") all but '...' being potentially unsafe as they don't escape every every character (in particular, they don't escape the \ character which is a dangerous one as its encoding is also found in the encoding of other characters in some charsets).

Quoting in shells is a tricky business. Shells have a great number of different quoting operators ('...', "...", \, $'...', $"...") all but '...' being potentially unsafe as they don't escape every character (in particular, they don't escape the \ character which is a dangerous one as its encoding is also found in the encoding of other characters in some charsets).

Source Link
Stéphane Chazelas
  • 586.2k
  • 96
  • 1.1k
  • 1.7k

Note that for arbitrary file names, spaces are the least of your worries. Consider for instance a file called $(reboot) or foo;reboot #whatever or foo|reboot|bar...

awk calls sh to interpret command lines in its cmdline | getline, print | cmdline, system(cmdline), so when building the command line out of arbitrary input, it's critical to properly escape arguments to avoid command injection vulnerabilities.

Quoting in shells is a tricky business. Shells have a great number of different quoting operators ('...', "...", \, $'...', $"...") all but '...' being potentially unsafe as they don't escape every every character (in particular, they don't escape the \ character which is a dangerous one as its encoding is also found in the encoding of other characters in some charsets).

It's also important not to use the old `...` form of command substitution in the shell code as they introduce another level of backslash processing.

Say you have the arbitrary file name in an environment variable:

#! /bin/sh - FILE="${1?No file provided}" export FILE awk -v q="'" ' function shquote(s) { gsub(q, "&\"&\"&", s) return q s q } BEGIN { cmdline = "file -- " shquote(ENVIRON["FILE"]) if ((cmdline | getline) > 0) print "The first line of \""cmdline"\" output was \""$0"\"." else print "Could not read a line from \""cmdline"\" output." if (close(cmdline) != 0) print cmdline" failed." }' 

Above, shquote() takes a string as argument and quotes it for sh by enclosing it in single quotes (the safest quotes), except that single quotes in the string itself are changed to '"'"', that is a closing ', followed by a ' quoted with "..." followed by another ' that reopens another single-quoted string.

You'll notice above a few other hints at other possible caveats:

  • you need a -- to make sure your file name is not taken as an option if it starts with -.
  • the output of that file command is not guaranteed to be on a single line, especially if the filename itself contains newline characters. After all, the newline character is as valid as any in a file name. getline only reads one record, records being lines by default. See Slurp-mode in awk? for hints as to how to read the whole output.
  • that output could also not have any line at all. To tell that from an empty first line, you'd need to check the return value of getline.
  • it's a good idea to check the exit status of the command as well to report problems if need be. That's done with looking at the value returned by close(). Note however that there are variations between awk implementations on how that value encodes the exit status. The only common thing between all is that that value is 0 when the command succeeds (exits with a 0 exit code).