Skip to main content
2 of 3
added 71 characters in body
Stéphane Chazelas
  • 586.5k
  • 96
  • 1.1k
  • 1.7k

Some various approaches at giving visual representations of strings:

POSIX

$ printf %s "$IFS" | od -vtc -to1 0000000 \t \n \0 040 011 012 000 0000004 $ printf '%s\n' "$IFS" | LC_ALL=C sed -n l \t$ \000$ 

(the extra \n is necessary as otherwise the behaviour of sed is unspecified if the last line doesn't end in a newline). A POSIX sh won't have NUL in $IFS like my zsh does here. The behaviour of sed is unspecified if the input contains NULs.

Shell builtins

  • typeset -p (ksh, zsh, bash, yash) might give you an unambiguous output for some strings.

     $ ksh93 -c 'typeset -p IFS' IFS=$' \t\n' $ zsh -c 'typeset -p IFS' typeset IFS=$' \t\n\C-@' $ mksh -c 'typeset -p IFS' typeset IFS=$' \t\n' $ a=$'\u00e9e\u301\u200b' ksh -c 'typeset -p a' typeset -x a=$'\u[e9]e\u[301]\u[200b]' 

but for that latter one (using Unicode combining acute accent and zero-width space characters), zsh/mksh are not helping (even with LC_ALL=C typeset -p a with mksh -o utf8-mode). bash's output is generally not unambiguous when sent to a terminal.

  • printf %q with GNU printf and the printf builtin of ksh93, zsh and bash:

     $ a=$'\u00e9e\u301\u200b' bash -c 'printf "%q\n" "$IFS" "$a" ""' $' \t\n' éé​ '' $ a=$'\u00e9e\u301\u200b' ksh -c 'printf "%q\n" "$IFS" "$a" ""' $' \t\n' $'\u[e9]e\u[301]\u[200b]' '' \ $'\t'$'\n'$'\0' éé​ '' $ a=$'\u00e9e\u301\u200b' sh -c '/usr/bin/printf "%q\n" "$IFS" "$a" ""' ' '$'\t\n' éé​ '' $ a=$'\u00e9e\u301\u200b' zsh -c 'LC_ALL=C printf "%q\n" "$IFS" "$a" ""' \ $'\t'$'\n'$'\0' $'\303'$'\251'e$'\314'$'\201'$'\342'$'\200'$'\213' '' $ a=$'\u00e9e\u301\u200b' bash -c 'LC_ALL=C printf "%q\n" "$IFS" "$a" ""' $' \t\n' $'\303\251e\314\201\342\200\213' '' 
  • q, qq, qqq, qqqq parameter expansion flags in zsh.

for various types of quoting, qqqq being the one for $'...':

 $ a=$'\u00e9e\u301\u200b' zsh -c 'print -r -- ${(qqqq)a}' $'éé​' $ a=$'\u00e9e\u301\u200b' zsh -c '(){local LC_ALL=C; print -r -- ${(qqqq)a}}' $'\303\251e\314\201\342\200\213' 

There's also q and q+ that only uses quoting for things that need it (though still with the caveat for those unicode ones).

various non-standard commands:

  • hex-dumper: hexdump, hd, xxd... You'd want to feed them the output of printf %s "$var" (or print -rn -- "$var" with ksh/zsh, or echo -E - "$var" with zsh...).

  • cat -vte or cat -A

  • uconv -x hex for unicode code points of characters (as opposed to hex value of the bytes of the encoding), only for UTF-8 (one can preprocess the input with iconv -t utf-8 though provided it's valid text in the locale's encoding)

  • uconv -x name for the character names

  • recode ..dump. both hex and name but know about fewer Unicode characters (not updated with newer versions of Unicode). Works in non-UTF-8 locales though.

Stéphane Chazelas
  • 586.5k
  • 96
  • 1.1k
  • 1.7k