Summary
Is xargs -I s printf s more compatible than xargs -n 1 printf?
Background
To handle binary data that may include 0x00. I know how to convert binary data to text, like this:
# make sure that you have done this: export LC_ALL=C od -A n -t x1 -v | # or -t o1 or -t u1 or whatever tr ABCDEF abcdef | # because POSIX doesn't specify in which case tr -d ' \t\n' | # because POSIX says they are delimiters fold -w 2 | grep . # to make sure to terminate final line with LF ... and here is how to convert back to binary:
# input: for each line, /^[0-9a-f]\{2\}$/ # also make sure export LC_ALL=C before awk -v _maxlen="$(getconf ARG_MAX 2>/dev/null)" ' BEGIN{ # (1) make a table # assume that every non-null byte can be converted easily # actually not portable in Termux; LC_ALL=C does not work and # awk is gawk by default, which depends on locale. # to deal with it, here is alternative: # for(i=0;i<256;i++){ # xc[sprintf("%02x",i)]=sprintf("\\\\%03o",i); # xl[sprintf("%02x",i)]=5; # } # # and skip to (2) # but why not just env -i awk to force one true awk, if so. # also is not it pretty rare that C locale is not available? for(i=1;i<256;i++){ xc[sprintf("%02x",i)]=sprintf("%c",i); xl[sprintf("%02x",i)]=1; } # now for chars that requires special converting. # numbers; for previous char is \\ooo. for(i=48;i<58;i++){ xc[sprintf("%02x",i)]=sprintf("\\\\%03o",i); xl[sprintf("%02x",i)]=5; } # and what cannot be easily passed to xargs -n 1 printf # null xc["00"]="\\\\000"; xl["00"]=5; # <space> xc["09"]="\\\\t"; xl["09"]=3; xc["0a"]="\\\\n"; xl["0a"]=3; xc["0b"]="\\\\v"; xl["0b"]=3; xc["0c"]="\\\\f"; xl["0c"]=3; xc["0d"]="\\\\r"; xl["0d"]=3; xc["20"]="\\\\040"; xl["20"]=5; # meta chars for printf xc["25"]="%%"; xl["25"]=2; xc["5c"]="\\\\\\\\";xl["5c"]=4; # hyphen; to prevent to be treated as if it were an option xc["2d"]="\\\\055"; xl["2d"]=5; # chars for quotation xc["22"]="\\\""; xl["22"]=2; xc["27"]="\\'\''"; xl["27"]=2; # (2) preparation # reason why 4096: _POSIX_ARG_MAX # reason why length("printf "): because of ARG_MAX # reason why 4096/2 and _maxlen/2: because some xargs such as GNU specifies buffer length less than ARG_MAX if(_maxlen==""){ maxlen=(4096/2)-length("printf "); }else{ maxlen=int(_maxlen/2)-length("printf "); } ORS=""; LF=sprintf("\n"); arglen=0; } { # (3) actual conversion here. # XXX. not sure why arglen+4>maxlen. # but I think maximum value for xl[$0] is 5. # and maybe final LF is 1. if(arglen+4>maxlen){ print LF; arglen=0; } print xc[$0]; arglen+=xl[$0]; } END{ # for some xargs who hates input w/o LF termination if(NR>0)print LF; } ' | xargs -n 1 printf I found an issue for null input: in GNU/Linux, it fails, like this:
$ xargs -n 1 printf </dev/null printf: missing operand Try 'printf --help' for more information. Then I found xargs -n 1 printf 2>/dev/null || :, adding if(NR==0)printf"\"\"\n"; on END block, and xargs -I s printf s are alternatives. I have seen only the first one is actually used on ShellShoccar-jpn's programs, but I think it's kinda forceful. The second one is also less clean than the last one. Can the third one be also an alternative on not only GNU/Linux, but also every other (or most of the other) environment? Since I have GNU/Linux only, I have no ideas how to validate my idea on every other environment. The easiest way is to obtain their source and refer them, or refer to their manuals. If it is impossible to validate at all, then I have to give up.
My knowledge
- It seems that
printfrequires at least one argument, as POSIX says so. - Some
xargsignores input without LF termination;grep ^ | xargs something hereis more portable thanxargs something herefor input that may not have LF termination. - xargs is not portable for input without non-blank lines;
printf ' \n\n' | xargs echo foooutputs nothing on FreeBSD andfooon GNU/Linux. In this case, you have to make the command for xargs safe for such input or let the command ignore the error. - FreeBSD's xargs receives its arguments as if they were
$@while GNU/Linux's as if they were"$@". - Escaping by backslash works for xargs, like
printf '\\\\\\'"'" | sed "$(printf 's/[\047\042\\]/\\\\&/g')" | xargs printfto obtain\'as output.
PS
I found out that xargs -E '' is more compatible than without the option, as some xargs defaults -E _.
xargsimplementations have a very low limit on the size of a single argument. IIRC 255 bytes as allowed by POSIX.uuencode/uudecodeare non-optional POSIX utilities (though maybe not available on some of the embedded systems you're trying to target).LC_ALL=BONKERSis also meant to give on the C locale if the BONKERS locale doesn't exist. So in your termux case, it looks more like the C locale's charset is UTF-8 (there is an ongoing discussion on the POSIX (austing-group-l) mailing list as whether that's allowed and the general consensus is that it shouldn't)