Return to Question

Commonmark migration

edited Jun 10, 2020 at 12:32

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?
Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document}

#Notes

Notes

xelatex, being utf-8 capable, must know how to read 2-byte line endings.
Code modified from: How can I make LaTeX to recognize spaces in my macro (catcode 10)?

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?
Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document}

#Notes

xelatex, being utf-8 capable, must know how to read 2-byte line endings.
Code modified from: How can I make LaTeX to recognize spaces in my macro (catcode 10)?

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?
Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document}

Notes

xelatex, being utf-8 capable, must know how to read 2-byte line endings.
Code modified from: How can I make LaTeX to recognize spaces in my macro (catcode 10)?

adjusted title

Link

edited Mar 22, 2018 at 10:32

Jonathan Komar

13.6k
6
63
139

Character bytes and character tokens: If newlines are converted to spaces, then where does catcode 5 come into the picture?

replaced http://tex.stackexchange.com/ with https://tex.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:35

Community Bot

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?
Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document}

#Notes

xelatex, being utf-8 capable, must know how to read 2-byte line endings.
Code modified from: How can I make LaTeX to recognize spaces in my macro (catcode 10)?How can I make LaTeX to recognize spaces in my macro (catcode 10)?

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?
Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document}

#Notes

xelatex, being utf-8 capable, must know how to read 2-byte line endings.
Code modified from: How can I make LaTeX to recognize spaces in my macro (catcode 10)?

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?
Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document}

#Notes