If you only need filenames, but do not need them to be "human readable", then you could take advantage of \pdfstringdef
\documentclass{article} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage[unicode]{hyperref} \makeatletter \begingroup \catcode`| 0 \catcode`\\ 12 |gdef|makestring@i\#1#2#3#4% {#1#2#3|if|relax#4|expandafter|@gobbletwo|fi|makestring@i#4} |endgroup \newcommand*{\makestring}[2]{% \pdfstringdef\makestring@{#2}% \edef#1{\expandafter\makestring@i\makestring@\relax}% } \makeatother \begin{document} \makestring{\foo}{æüßéñ} \texttt{\meaning\foo} \end{document} A variation on this theme which is much more efficient, it show the utf8 bytes. One could produce in hexadecimal if desired. (in fact there are possibly macros in utf8.def which could be used here)
\documentclass{article} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \makeatletter \newcommand*\MakeString[2]{% \begingroup \def\UTFviii@two@octets##1##2{\the\numexpr`##1\relax\the\numexpr`##2}% \def\UTFviii@three@octets##1##2##3{\the\numexpr`##1\relax\the\numexpr`##2\relax\the\numexpr`##3\relax}% \def\UTFviii@four@octets##1##2##3##4{\the\numexpr`##1\relax\the\numexpr`##2\relax\the\numexpr`##3\relax\the\numexpr`##4\relax}% \xdef#1{#2}% \endgroup } \makeatother \begin{document} \MakeString{\foo}{æüßéñ} \texttt{\meaning\foo} \show\foo \end{document} Produces:
> \foo=macro: ->195166195188195159195169195177. l.23 \show\foo I should improve so that each byte produce a three-digits decimal, here leading zeros are stripped!
Ok here it is with no stripping and 2-hex digits per byte.
edit removed usage of extra package. Defined \Byte@tohex macro possibly already provided by utf8-inputenc internally, not checked.
\documentclass{article} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \makeatletter % I have not checked but maybe utf8-inputenc provides already % similar macro (not even using e-TeX) \def\Byte@tohex #1% {\expandafter \Byte@tohex@\the\numexpr(`#1+8)/16-1\expandafter .\the\numexpr`#1.}% \def\Byte@tohex@ #1.#2.% {\Byte@onehex #1.% \expandafter\Byte@onehex\the\numexpr #2-16*#1.% } \def\Byte@onehex #1.% {\ifcase #1 0\or1\or2\or3\or4\or5\or6\or7\or8\or9% \or A\or B\or C\or D\or E\or F% \fi }% \newcommand*\MakeString[2]{% \begingroup \def\UTFviii@two@octets##1##2{\Byte@tohex{##1}\Byte@tohex{##2}}% \def\UTFviii@three@octets##1##2##3{\Byte@tohex{##1}\Byte@tohex{##2}\Byte@tohex{##3}}% \def\UTFviii@four@octets##1##2##3##4{\Byte@tohex{##1}\Byte@tohex{##2}\Byte@tohex{##3}\Byte@tohex{##4}}% \xdef#1{#2}% \endgroup } \makeatother \begin{document} \MakeString{\foo}{æüßéñ} \texttt{\meaning\foo} \show\foo \end{document} produces in log
> \foo=macro: ->C3A6C3BCC39FC3A9C3B1. l.27 \show\foo (coding efficiency could be improved)
