Timeline for Convert EPUB to TXT and preserve original formatting

8 events

when toggle format	what		by	license	comment
May 2, 2021 at 11:28	vote	accept	ka3ak
May 2, 2021 at 10:50	answer	added	cas		timeline score: 11
May 2, 2021 at 10:35	comment	added	frostschutz		you could convert to HTML (or just unzip the EPUB and use the HTML within directly) and try your luck with `links -dump` or similar. if that doesn't work either you might have to have a look at the HTML directly and write your own helper script for converting the code snippets.
May 2, 2021 at 10:30	comment	added	not2qubit		I have no idea. I just had a similar issue with OCR reading a PDF and prog was insisting to extract nbsp's since the doc was coded in a foreign language.
May 2, 2021 at 10:27	comment	added	ka3ak		@not2qubit Are you sure the utility shouldn't be responsible for this? For example the utility `pdftotext` has `-layout` option to keep original formatting of a PDF in TXT.
May 2, 2021 at 10:24	comment	added	ka3ak		The book is english. I'm using Ubuntu 20. `$ locale LANG=en_US.UTF-8`
May 2, 2021 at 10:21	comment	added	not2qubit		What OS and language settings are you using? Please recall that many docs are using non-breaking spaces (NBSP) that are coded into UTF-8 or with several other bytes, when not in ASCII. Try fiddle with your OS/terminal language or locale settings.
May 2, 2021 at 10:15	history	asked	ka3ak	CC BY-SA 4.0