Skip to main content
8 events
when toggle format what by license comment
May 2, 2021 at 11:28 vote accept ka3ak
May 2, 2021 at 10:50 answer added cas timeline score: 11
May 2, 2021 at 10:35 comment added frostschutz you could convert to HTML (or just unzip the EPUB and use the HTML within directly) and try your luck with links -dump or similar. if that doesn't work either you might have to have a look at the HTML directly and write your own helper script for converting the code snippets.
May 2, 2021 at 10:30 comment added not2qubit I have no idea. I just had a similar issue with OCR reading a PDF and prog was insisting to extract nbsp's since the doc was coded in a foreign language.
May 2, 2021 at 10:27 comment added ka3ak @not2qubit Are you sure the utility shouldn't be responsible for this? For example the utility pdftotext has -layout option to keep original formatting of a PDF in TXT.
May 2, 2021 at 10:24 comment added ka3ak The book is english. I'm using Ubuntu 20. $ locale LANG=en_US.UTF-8
May 2, 2021 at 10:21 comment added not2qubit What OS and language settings are you using? Please recall that many docs are using non-breaking spaces (NBSP) that are coded into UTF-8 or with several other bytes, when not in ASCII. Try fiddle with your OS/terminal language or locale settings.
May 2, 2021 at 10:15 history asked ka3ak CC BY-SA 4.0