Skip to main content
Commonmark migration
Source Link

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

  • We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?

  • Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document} 

enter image description here

#Notes

Notes

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

  • We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?

  • Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document} 

enter image description here

#Notes

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

  • We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?

  • Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document} 

enter image description here

Notes

adjusted title
Link
Jonathan Komar
  • 13.6k
  • 6
  • 63
  • 139

Character bytes and character tokens: If newlines are converted to spaces, then where does catcode 5 come into the picture?

replaced http://tex.stackexchange.com/ with https://tex.stackexchange.com/
Source Link

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

  • We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?

  • Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document} 

enter image description here

#Notes

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

  • We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?

  • Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document} 

enter image description here

#Notes

My goal is to understand the relationship between character bytes and character tokens with respect to new line bytes. I likely do not have my facts straight.

When TeX reads a file of bytes, encoding must be considered. Putting that aside,

  • We can observe that a single new line character byte (assuming LF and CRLF) is converted into a space. But what happens behind the scenes? Is a token created using data pair (LF byte number, catcode=10)?

  • Two consecutive new line character bytes become one single token with the data pair (space byte number, catcode 5)?

When does catcode 5 "end of the line" come into play?

I know LaTeX inserts a \par when two consecutive line endings are encountered.

Code

I attempted to visually show tokens with catcode 5, but I am still not sure if \tmp truly becomes catcode 5.

\documentclass{article} \usepackage{fontspec}% xelatex \long\def\scan#1{#1\par\rule{\textwidth}{2pt}\par\xscan#1\relax} \long\def\xscan{\afterassignment\xxscan\let\tmp= } \long\def\xxscan{% \ifx\tmp\relax\else% \ifcat\tmp\space10 \else% \ifcat\tmp a11 \else% \ifcat\tmp 112 \else%... \ifcat\tmp 5 \else% \fi\fi\fi\fi \expandafter\xscan \fi} \begin{document} \scan{ mac::exception == a } \end{document} 

enter image description here

#Notes

deleted 3 characters in body
Source Link
Jonathan Komar
  • 13.6k
  • 6
  • 63
  • 139
Loading
Tweeted twitter.com/StackTeX/status/822423253360775168
Source Link
Jonathan Komar
  • 13.6k
  • 6
  • 63
  • 139
Loading