Skip to main content
14 events
when toggle format what by license comment
Apr 21, 2015 at 22:03 comment added alexis Well, it's kind of important! I looked more carefully and I get the impression it depends on the Unix "locale" for the particular computer; if it's not set to UTF-8, you'll get completely wrong character counts if there are many non-ascii characters. Latin-1 and od -c would be a safer catch-all (since characters are one byte in Latin-1)
Apr 21, 2015 at 14:35 comment added egreg @alexis I guess it depends on your OS.
Apr 21, 2015 at 14:22 comment added alexis Utf-8 seems like a strange option for character counting, since it expands characters into multiple bytes. Does wc -m understand UTF-8? (I suspect it might, but it's not explained in its documentation).
Jul 2, 2012 at 8:37 comment added egreg @AbhimanyuArora I can only point to this link
Jul 2, 2012 at 8:31 comment added Abhimanyu Arora @egreg:Grazie, I have windows XP, can you tell me please whether it applies in this case as well? And is xpdf to be installed via \usepackage?
Jul 2, 2012 at 8:29 comment added egreg pdftotext is a program coming with xpdf; how to invoke it depends on the operating system: on Unix systems it's called from the command line.
Jul 2, 2012 at 8:23 comment added Abhimanyu Arora Ciao @egreg: where exactly is this command pdftotext... to be typed?
Mar 22, 2012 at 21:56 history edited egreg CC BY-SA 3.0
Alternative using catdvi. Set encodings for output file
S Mar 22, 2012 at 21:56 history suggested Bob CC BY-SA 3.0
Alternative using catdvi. Set encodings for output file
Mar 22, 2012 at 21:50 review Suggested edits
S Mar 22, 2012 at 21:56
Mar 22, 2012 at 21:48 history bounty awarded Bob
Mar 22, 2012 at 21:48 vote accept Bob
Mar 22, 2012 at 21:48 comment added Bob Thanks for your answer. This lead me to the idea to use catdvi. Using catdvi -s document.dvi | wc -m it gives me some good results. pdftotext has some problems reproducing special chars.
Mar 21, 2012 at 17:09 history answered egreg CC BY-SA 3.0