Revision 7f81a2a7-bd83-43fe-b4ca-2b28d789bdf5

## Preamble
First, I'd say it's not the right way to address the problem.
It's a bit like saying "_you should not murder people because
otherwise you'll go to jail_".
Similarly, you don't quote your variable because otherwise
you're introducing security vulnerabilities. You quote your
variables because it is wrong not to (but if the fear of the jail can help, why not).
A little summary for those who've just jumped on the train.
In most shells, leaving a variable expansion unquoted (though
that (and the rest of this answer) also applies to command
substitution and arithmetic expansion) has a very special
meaning. The most accurate way to describe it is that it is like
invoking some sort of implicit _split+glob_ operator.
cmd $var
in another language would be written something like:
cmd(glob(split($var)))
`$var` is first split into a list of words according to complex
rules involving the `$IFS` special parameter (the _split_ part)
and then each word resulting of that splitting is considered as
a _pattern_ which is expanded to a list of files that match it
(the _glob_ part).
As an example, if `$var` contains `*.txt,/var/*.xml` and `$IFS`
contains `,`, `cmd` would be called with a number of arguments,
the first one being `cmd` and the next ones being the `txt`
files in the current directory and the `xml` files in `/var`.
If you wanted to call `cmd` with just the two literal arguments `cmd`
and `*.txt,/var/*.xml`, you'd write:
cmd "$var"
which would be in your other more familiar language:
cmd($var)
## What do we mean by _vulnerability in a shell_?
After all, it's been known since the dawn of time that shell
scripts should not be used in security-sensitive contexts.
Surely, OK, leaving a variable unquoted is a bug but that can't
do that much harm, can it?
Well, despite the fact that anybody would tell you that shell
scripts should never be used for web CGIs, or that thankfully
most systems don't allow setuid/setgid shell scripts nowadays,
one thing that shellshock (the remotely exploitable bash bug
that made the headlines in September 2014) revealed is that
shells are still extensively used where they probably shouldn't:
in CGIs, in DHCP client hook scripts, in sudoers commands,
invoked *by* (if not _as_) setuid commands...
Sometimes unknowingly. For instance `system("cmd $PATH_INFO")`
in a `php`/`perl`/`python` CGI script does invoke a shell (not to
mention the fact that `cmd` itself may be a shell script and its
author may have never expected it to be called from a CGI).
You've got a vulnerability when there's a path for privilege
escalation, that is when someone (let's call him _the attacker_)
is able to do something he is not meant to.
Invariably that means _the attacker_ providing data, that data
being processed by a privileged user/process which inadvertently
does something it shouldn't be doing, most of the case because
of a bug.
Basically, you've got a problem when your buggy code processes
data under the control of _the attacker_.
Now, it's not always obvious where that _data_ may come from,
and it's often hard to tell if your code will ever get to
process untrusted data.
As far as variables are concerned, In the case of a CGI script,
it's quite obvious, the data are the CGI GET/POST parameters and
things like cookies, path, host... parameters.
For a setuid script (running as one user when invoked by
another), it's the arguments or environment variables.
Another very common vector is file names. If you're getting a
file list from a directory, it's possible that files have been
planted there by _the attacker_.
In that regard, even at the prompt of an interactive shell, you
could be vulnerable (when processing files in `/tmp` or `~/tmp`
for instance).
Even a `~/.bashrc` can be vulnerable (for instance, `bash` will
interpret it when invoked over `ssh` to run a `ForcedCommand`
like in `git` server deployments with some variables under the
control of the client).
Now, a script may not be called directly to process untrusted
data, but it may be called by another command that does. Or your
incorrect code may be copy-pasted into scripts that do (by you 3
years down the line or one of your colleagues). One place where it's
particularly *critical* is in answers in Q&A sites as you'll
never know where copies of your code may end up.
## Down to business; how bad is it?
Leaving a variable (or command substitution) unquoted is by far
the number one source of security vulnerabilities associated
with shell code. Partly because those bugs often translate to
vulnerabilities but also because it's so common to see unquoted
variables.
Actually, when looking for vulnerabilities in shell code, the
first thing to do is look for unquoted variables. It's easy to
spot, often a good candidate, generally easy to track back to
attacker-controlled data.
There's an infinite number of ways an unquoted variable can turn
into a vulnerabilities. I'll just give a few common trends here.
### Information disclosure
Most people will bump into bugs associated with unquoted
variables because of the _split_ part (for instance, it's
common for files to have spaces in their names nowadays and space
is in the default value of IFS). Many people will overlook the
_glob_ part. The _glob_ part is at least as dangerous as the
_split_ part.
Globbing done upon unsanitised external input means _the
attacker_ can make you read the content of any directory.
In:
echo You entered: $unsanitised_external_input
if `$unsanitised_external_input` contains `/*`, that means _the
attacker_ can see the content of `/`. No big deal. It becomes
more interesting though with `/home/*` which gives you a list of
user names on the machine, `/tmp/*`, `/home/*/.forward` for
hints at other dangerous practises, `/etc/rc*/*` for enabled
services... No need to name them individually. A value of `/*
/*/* /*/*/*...` will just list the whole file system.
### Denial of service vulnerabilities.
Taking the previous case a bit too far and we've got a DoS.
Actually, any unquoted variable in list context with unsanitized
input is _at least_ a DoS vulnerability.
Even expert shell scripters commonly forget to quote things
like:
#! /bin/sh -
: ${QUERYSTRING=$1}
`:` is the no-op command. What could possibly go wrong?
That's meant to assign `$1` to `$QUERYSTRING` if `$QUERYSTRING`
was unset. That's a quick way to make a CGI script callable from
the command line as well.
That `$QUERYSTRING` is still expanded though and because it's
not quoted, the _split+glob_ operator is invoked.
Now, there are some globs that are particularly expensive to
expand. The `/*/*/*/*` one is bad enough as it means listing
directories up to 4 levels down. In addition to the disk and CPU
activity, that means storing tens of thousands of file paths
(40k here on a minimal server VM, 10k of which directories).
Now `/*/*/*/*/../../../../*/*/*/*` means 40k x 10k and
`/*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/*` is enough to
bring even the mightiest machine to its knees.
Try it for yourself (though be prepared for your machine to
crash or hang especially if on Linux):
a='/*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/*' sh -c ': ${a=foo}'
Of course, if the code is:
echo $QUERYSTRING > /some/file
Then you can fill up the disk.
Just do a google search on [shell
cgi](https://www.google.co.uk/search?q=shell+cgi) or [bash
cgi](https://www.google.co.uk/search?q=bash+cgi) or [ksh
cgi](https://www.google.co.uk/search?q=ksh+cgi), and you'll find
a few pages that show you how to write CGIs in shells. Notice
how half of those that process parameters are vulnerable.
Even [David Korn's
own
one](http://www2.research.att.com/~astopen/download/ksh/scripts/cgi-lib.ksh)
is vulnerable (look at the cookie handling).
### up to arbitrary code execution vulnerabilities
Arbitrary code execution is the worst type of vulnerability,
since if _the attacker_ can run any command, there's no limit on
what he may do.
That's generally the _split_ part that leads to those. That
splitting results in several arguments to be passed to commands
when only one is expected. While the first of those will be used
in the expected context, the others will be a different context
so potentially interpreted differently. Better with an example:
awk -v foo=$external_input '$2 == foo'
Here, the intention was to assign the content of the
`$external_input` shell variable to the `foo` `awk` variable.
Now:
$ external_input='x BEGIN{system("uname")}'
$ awk -v foo=$external_input '$2 == foo'
Linux
The second word resulting of the splitting of `$external_input`
is not assigned to `foo` but considered as `awk` code (here that
executes an arbitrary command: `uname`).
That's especially a problem for commands that can execute other
commands (`awk`, `env`, `sed` (GNU one), `perl`, `find`...) especially
with the GNU variants (which accept options after arguments).
Sometimes, you wouldn't suspect commands to be able to execute
others like `ksh`'s `[`, `zsh`/`ksh`'s `printf`...
for file in *; do
[ -f $file ] || continue
something-that-would-be-dangerous-if-$file-were-a-directory
done
If we create a directory called `x -o yes`, then the test
becomes positive, because it's a completely different
conditional expression we're evaluating.
Worse, if we create a file called `x -o a[0$(uname>&2)] -gt 1`,
with all ksh implementations at least (which includes the `sh`
of most commercial Unices and some BSDs), that executes `uname`
because those shells perform arithmetic evaluation on the
numerical comparison operators of the `[` command.
$ touch 'x -o a[0$(uname>&2)] -gt 1'
$ ksh -c 'for f in *; do [ -f $f ]; done'
Linux
Of course, if he can't get arbitrary execution, _the attacker_ may
settle for lesser damage (which may help to get arbitrary
execution). Any command that can write files or change
permissions, ownership or have any main or side effect could be exploited.
All sorts of things can be done with file names.
$ touch -- '-R ..'
$ for file in *; do [ -f "$file" ] && chmod +w $file; done
And you end up making `..` writeable (recursively with GNU
`chmod`).
Scripts doing automatic processing of files in public writeable areas like `/tmp` are to be written very carefully.
### What about `[ $# -gt 1 ]`
That's something I find exasperating. Some people go down all
the trouble of wondering whether a particular expansion may be
problematic to decide if they can omit the quotes.
It's like saying. _Hey, it looks like `$#` cannot be subject to
the split+glob operator, let's ask the shell to split+glob it_.
Or _Hey, let's write incorrect code just because the bug is
unlikely to be hit_.
Now how unlikely is it? OK, `$#` (or `$!`, `$?` or any
arithmetic substitution) may only contain digits (or `-` for
some) so the _glob_ part is out. For the _split_ part to do
something though, all we need is for `$IFS` to contain digits.
With some shells, `$IFS` may be inherited from the environment,
but if the environment is not safe, it's game over anyway.
Now if you write a function like:
my_function() {
[ $# -eq 2 ] || return
...
}
What that means is that the behaviour of your function depends
on the context in which it is called. Or in other words, `$IFS`
becomes one of the inputs to it. Strictly speaking, when you
write the API documentation for your function, it should be
something like:
# my_function
# inputs:
# $1: source directory
# $2: destination directory
# $IFS: used to split $#, expected not to contain digits...
And code calling your function needs to make sure `$IFS` doesn't
contain digits. All that because you didn't feel like typing
those 2 double-quote characters.
Now, for that `[ $# -ne 2 ]` bug to become a vulnerability,
you'd need somehow for the value of `$IFS` to become under
control of _the attacker_. Conceivably, that would not normally
happen unless _the attacker_ managed to exploit another bug.
That's not unheard of though. A common case is when people
forget to sanitize data before using it in arithmetic
expression. We've already seen above that it can allow
arbitrary code execution in some shells, but in all of them, it allows
_the attacker_ to give any variable an integer value.
For instance:
n=$(($1 + 1))
if [ $# -gt 2 ]; then
echo >&2 "Too many arguments"
exit 1
fi
And with a `$1` with value `(IFS=-1234567890)`, that arithmetic
evaluation has the side effect of settings IFS and the next `[`
command fails which means the check for _too many args_ is
bypassed.
### What about when the _split+glob_ operator is not invoked?
There's another case where quotes are needed around variables and other expansions: when it's used as a pattern.
[[ $a = $b ]] # a `ksh` construct also supported by `bash`
case $a in ($b) ...; esac
do not test whether `$a` and `$b` are the same (except with `zsh`) but if `$a` matches the pattern in `$b`. And you need to quote `$b` if you want to compare as strings (same thing in `"${a#$b}"` or `"${a%$b}"` or `"${a##*$b*}"` where `$b` should be quoted if it's not to be taken as a pattern).
What that means is that `[[ $a = $b ]]` may return true in cases where `$a` is different from `$b` (for instance when `$a` is `anything` and `$b` is `*`) or may return false when they are identical (for instance when both `$a` and `$b` are `[a]`).
Can that make for a security vulnerability? Yes, like any bug. Here, _the attacker_ can alter your script's logical code flow and/or break the assumptions that your script are making. For instance, with a code like:
if [[ $1 = $2 ]]; then
echo >&2 '$1 and $2 cannot be the same or damage will incur'
exit 1
fi
_The attacker_ can bypass the check by passing `'[a]' '[a]'`.
Now, if neither that pattern matching nor the _split+glob_ operator apply, what's the danger of leaving a variable unquoted?
I have to admit that I do write:
a=$b
case $a in...
There, quoting doesn't harm but is not strictly necessary.
However, one side effect of omitting quotes in those cases (for instance in Q&A answers) is that it can send a wrong message to beginners: <strike>_that it may be all right not to quote variables_</strike>.
For instance, they may start thinking that if `a=$b` is OK, then <strike>`export a=$b`</strike> would be as well (which it's not in many shells as it's in arguments to the `export` command so in list context) or `env a=$b`.
### What about `zsh`?
`zsh` did fix most of those design awkwardnesses. In `zsh` (at least when not in sh/ksh emulation mode), if you want _splitting_, or _globbing_, or _pattern matching_, you have to request it explicitly: `$=var` to split, and `$~var` to glob or for the content of the variable to be treated as a pattern.
However, splitting (but not globbing) is still done implicitly upon unquoted command substitution (as in `echo $(cmd)`).
Also, a sometimes unwanted side effect of not quoting variable is the _empties removal_. The `zsh` behaviour is similar to what you can achieve in other shells by disabling globbing altogether (with `set -f`) and splitting (with `IFS=''`). Still, in:
cmd $var
There will be no _split+glob_, but if `$var` is empty, instead of receiving one empty argument, `cmd` will receive no argument at all.
That can cause bugs (like the obvious `[ -n $var ]`). That can possibly break a script's expectations and assumptions and cause vulnerabilities, but I can't come up with a not-too-far-fetched example just now).
### What about when you _do_ need the _split+glob_ operator?
Yes, that's typically when you do want to leave your variable unquoted. But then you need to make sure you tune your _split_ and _glob_ operators correctly before using it. If you only want the _split_ part and not the _glob_ part (which is the case most of the time), then you do need to disable globbing (`set -f`) and fix `$IFS`. Otherwise you'll cause vulnerabilities as well (like David Korn's CGI example mentioned above).
## Conclusion
In short, leaving a variable (or command substitution or
arithmetic expansion) unquoted in shells can be very dangerous
indeed especially when done in the wrong contexts, and it's very
hard to know which are those wrong contexts.
That's one of the reasons why it is considered _bad practice_.
Thanks for reading so far. If it goes over your head, don't
worry. One can't expect everyone to understand all the implications of
writing their code the way they write it. That's why we have
_good practice recommendations_, so they can be followed without
necessarily understanding why.
(and in case that's not obvious yet, please avoid writing
security sensitive code in shells).
And **please quote your variables on your answers on this site!**