4

I have a programming book in EPUB format and I'm trying to convert it to TXT. For that I'm using the utility ebook-convert from calibre. The problem is that the standard usage:

ebook-convert book.epub book.txt 

removes indentation in source code samples. E.g. a sample in the book looks so:

class A { private int a; } 

But in the resulted TXT:

class A { private int a; } 

After reading the utility's man page I've tried the following options:

--keep-ligatures --pretty-print --change-justification=original 

but with no result. How to achieve it?

5
  • What OS and language settings are you using? Please recall that many docs are using non-breaking spaces (NBSP) that are coded into UTF-8 or with several other bytes, when not in ASCII. Try fiddle with your OS/terminal language or locale settings. Commented May 2, 2021 at 10:21
  • The book is english. I'm using Ubuntu 20. $ locale LANG=en_US.UTF-8 Commented May 2, 2021 at 10:24
  • @not2qubit Are you sure the utility shouldn't be responsible for this? For example the utility pdftotext has -layout option to keep original formatting of a PDF in TXT. Commented May 2, 2021 at 10:27
  • I have no idea. I just had a similar issue with OCR reading a PDF and prog was insisting to extract nbsp's since the doc was coded in a foreign language. Commented May 2, 2021 at 10:30
  • you could convert to HTML (or just unzip the EPUB and use the HTML within directly) and try your luck with links -dump or similar. if that doesn't work either you might have to have a look at the HTML directly and write your own helper script for converting the code snippets. Commented May 2, 2021 at 10:35

1 Answer 1

11

Use pandoc instead of ebook-convert. For example:

$ pandoc -f epub -t plain -o filename.txt filename.epub 

I just tested this with a python epub, and it retained the indentation without a problem.

pandoc can also convert to other formats, including various flavours of markdown, asciidoc, latex, odt (Libre/Open Office text), rst, rtf, pdf, and more.

1
  • 1
    Yeah, I know the tool. Not sure why I've not tried it this time. The output looks good. The original indentation is preserved. Thanks! Commented May 2, 2021 at 11:28

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.