Timeline for Convert EPUB to TXT and preserve original formatting
Current License: CC BY-SA 4.0
8 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 2, 2021 at 11:28 | vote | accept | ka3ak | ||
| May 2, 2021 at 10:50 | answer | added | cas | timeline score: 11 | |
| May 2, 2021 at 10:35 | comment | added | frostschutz | you could convert to HTML (or just unzip the EPUB and use the HTML within directly) and try your luck with links -dump or similar. if that doesn't work either you might have to have a look at the HTML directly and write your own helper script for converting the code snippets. | |
| May 2, 2021 at 10:30 | comment | added | not2qubit | I have no idea. I just had a similar issue with OCR reading a PDF and prog was insisting to extract nbsp's since the doc was coded in a foreign language. | |
| May 2, 2021 at 10:27 | comment | added | ka3ak | @not2qubit Are you sure the utility shouldn't be responsible for this? For example the utility pdftotext has -layout option to keep original formatting of a PDF in TXT. | |
| May 2, 2021 at 10:24 | comment | added | ka3ak | The book is english. I'm using Ubuntu 20. $ locale LANG=en_US.UTF-8 | |
| May 2, 2021 at 10:21 | comment | added | not2qubit | What OS and language settings are you using? Please recall that many docs are using non-breaking spaces (NBSP) that are coded into UTF-8 or with several other bytes, when not in ASCII. Try fiddle with your OS/terminal language or locale settings. | |
| May 2, 2021 at 10:15 | history | asked | ka3ak | CC BY-SA 4.0 |