I am using LaTeX to write cover letters for job applications. Quite often, it turns out that the application platform also expects me to submit a plain text version of the application in a text field. That, of course, does not mean that I am not also going to submit my beautiful PDF version for them to look at. But this means that I constantly have to sit down and manually remove commands such as \lettrine and \emph, convert -- into –, \% into %, ~ into spaces. And of course, indentations, comments, and redundant white space should be removed. Double new lines should of course stay, as they correspond to paragraph breaks. And so on…
I guess I could set up pandoc to do at least some of this work, but that requires me to run another command every time. So I wonder if TeX itself could take care of it? I imagine a workflow where TeX takes the entire body text and performs string replacements on it, following the principles stated above. It then takes the result of that and saves it in some document.txt file. (Lua solutions are also welcome, even if I’d probably prefer solutions that also work with pdfTeX.)
For instance, let’s take the following document:
\documentclass{article} \usepackage{lettrine} \begin{document} \lettrine{T}{o be or not to be} -- that is the question, according to Shakespeare. The rest of us might not \emph{quite} agree with him 100\% on this, but you'd have to admit that the phrase has managed to position itself at the heart of premodern and modern culture since Hamlet came out in~1603. % Thnk of adding more. Few playwrights have contributed as many phrases to our vocabulary as Shakespeare. \end{document} Then document.txt should look like this:
To be or not to be – that is the question, according to Shakespeare. The rest of us might not quite agree with him 100% on this, but you'd have to admit that the phrase has managed to position itself at the heart of premodern and modern culture since Hamlet came out in 1603.
Few playwrights have contributed as many phrases to our vocabulary as Shakespeare.
o be or not to bein the argument to\lettrine?pdftotextworks reasonably well, but you need to remove page numbers, headers and footers, if you're using them. (probably similar to @UlrikeFischer's copy-paste suggestion.)