0

I am using LaTeX to write cover letters for job applications. Quite often, it turns out that the application platform also expects me to submit a plain text version of the application in a text field. That, of course, does not mean that I am not also going to submit my beautiful PDF version for them to look at. But this means that I constantly have to sit down and manually remove commands such as \lettrine and \emph, convert -- into , \% into %, ~ into spaces. And of course, indentations, comments, and redundant white space should be removed. Double new lines should of course stay, as they correspond to paragraph breaks. And so on…

I guess I could set up pandoc to do at least some of this work, but that requires me to run another command every time. So I wonder if TeX itself could take care of it? I imagine a workflow where TeX takes the entire body text and performs string replacements on it, following the principles stated above. It then takes the result of that and saves it in some document.txt file. (Lua solutions are also welcome, even if I’d probably prefer solutions that also work with pdfTeX.)

For instance, let’s take the following document:

\documentclass{article} \usepackage{lettrine} \begin{document} \lettrine{T}{o be or not to be} -- that is the question, according to Shakespeare. The rest of us might not \emph{quite} agree with him 100\% on this, but you'd have to admit that the phrase has managed to position itself at the heart of premodern and modern culture since Hamlet came out in~1603. % Thnk of adding more. Few playwrights have contributed as many phrases to our vocabulary as Shakespeare. \end{document} 

Then document.txt should look like this:

To be or not to be – that is the question, according to Shakespeare. The rest of us might not quite agree with him 100% on this, but you'd have to admit that the phrase has managed to position itself at the heart of premodern and modern culture since Hamlet came out in 1603.

Few playwrights have contributed as many phrases to our vocabulary as Shakespeare.

8
  • 3
    why don't you copy & paste from the pdf? normally this gives quite good results (and if you use lualatex and enable the tagging code even hyphenation and spaces should be fine). Commented Jan 11 at 10:41
  • @UlrikeFischer Which tagging is it you would “enable”? I’m aware of the tagging project, but have the impression that it’s in heavy development still. Commented Jan 11 at 10:46
  • sure it is under development, but if you have only simple documents, you could try (lettrine would need a small patch: github.com/latex3/tagging-project/issues/481). A job application should imho be tagged if possible anyway. Commented Jan 11 at 10:52
  • How’s the machine supposed to know to add the whole o be or not to be in the argument to \lettrine? Commented Jan 11 at 10:53
  • 1
    pdftotext works reasonably well, but you need to remove page numbers, headers and footers, if you're using them. (probably similar to @UlrikeFischer's copy-paste suggestion.) Commented Jan 12 at 8:26

2 Answers 2

6
$ detex foo.tex > foo.txt 
 To be or not to be - that is the question, according to Shakespeare. The rest of us might not quite agree with him 100% on this, but you'd have to admit that the phrase has managed to position itself at the heart of premodern and modern culture since Hamlet came out in 1603. Few playwrights have contributed as many phrases to our vocabulary as Shakespeare. 

Or ...

$ pandoc -t plain foo.tex -o foo.txt 
TO BE OR NOT TO BE – that is the question, according to Shakespeare. The rest of us might not quite agree with him 100% on this, but you’d have to admit that the phrase has managed to position itself at the heart of premodern and modern culture since Hamlet came out in 1603. Few playwrights have contributed as many phrases to our vocabulary as Shakespeare. 

Note that detex not only manages the lettrine differently, also indent each line four spaces, adds two blank lines as margin-top and one ending blank line. This could be nice to show the text verbatim "as is", but also can have some disadvantage (e.g., pasting the text in a markdown document).

5
  • With pandoc, you might want to add --wrap=none. Commented Jan 11 at 16:29
  • I wonder if there is a way to make pandoc ignore \lettrine. Commented Jan 11 at 16:36
  • @Gaussler Thanks for the first note, but it should be added that it depends on what you want to do with the text. For example, to edit it in Word or Libreoffice it is a good idea, to edit it in Rstudio in a Quarto document it is not necessary and to show it textually, as it is here, it would be better not to leave it on just two lines. About your second comment, no idea :( Commented Jan 11 at 16:49
  • True, but the original question explicitly states that the purpose is to add it to a text field in an online application platform. And then you don’t want manual text wrapping. But let’s leave it at that. 😉 Commented Jan 11 at 16:51
  • 1
    @Gaussler Well, this is a also an online platfform and it does the text wrapping automatically. My crystal ball goes as far as it goes. Commented Jan 11 at 16:58
6

Writing text from pdftex isn't especially convenient so I would use

$ pdflatex '\AtBeginDocument{\def\lettrine#1#2{#1#2}}\input' file $ pdftotext file.pdf $ cat file.txt To be or not to be – that is the question, according to Shakespeare. The rest of us might not quite agree with him 100% on this, but you’d have to admit that the phrase has managed to position itself at the heart of premodern and modern culture since Hamlet came out in 1603. Few playwrights have contributed as many phrases to our vocabulary as Shakespeare. 1 

(So you might also want to set empty page style while redefining lettrine, to drop the 1)

3
  • By the way, why the \int\upsilon\mathbb{C}\kappa is it called a playwright in English rather than e.g. a playwriter? Commented Jan 11 at 16:53
  • playwrights craft plays, wheelwrights craft wheels, cartwrights craft carts. It's the craft of authorship, "writing" is merely the ability to use a pen. @Gaussler Some may say I'm a planewright Commented Jan 11 at 17:04
  • Some may say that it was the Wrights who invented planes in the first place. Commented Jan 11 at 17:25

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.