How can I calculate the average number of the characters per line of my document?
- 5I think you can have an estimation by dividing the number of characters in the document (see this question) with the number of lines in the document (see this question). Beware: that's just an estimation.Claudio Fiandrino– Claudio Fiandrino2012-08-18 13:20:15 +00:00Commented Aug 18, 2012 at 13:20
- I suggest you to use some script and run it on your tex file. I guess that is not possible to do this directly in TeX. Second: you want to count the letters on pdf or tex file?Sigur– Sigur2012-08-18 14:33:05 +00:00Commented Aug 18, 2012 at 14:33
1 Answer
This is not perfect, but in a Linux terminal you can:
$ pdftotext -layout test.pdf $ wc -l -w -m test.txt The ouput will be something as:
96 986 6673 test.txt So the total characters (6637) divided by lines (96) is the average that you want (counting spaces also, but taking into account that there are 986 words, you can roughly calculate it without spaces)
For Windows user I think that there are the same or similar programs (but I have no tested any) as wc, xpdf (include pdftotext) and free pdf to text converter.
One problem with this approach is that you cannot distinguish between normal text and headers, figures, etc..
Without external programs there are some ways to make roughly predictions. Some are self-explained with this tex (not minimal) example:
\documentclass{article} % \usepackage[chars=60, lines=30, hyphen=true, noindent]{stdpage} \usepackage{lineno} \usepackage{canoniclayout} \usepackage{amssymb} \usepackage{hyperref} \usepackage{calc} \usepackage{lipsum} \usepackage{xcolor} \newlength{\oneem} \setlength{\oneem}{1em} \newlength{\ispace} \settowidth{\ispace}{i} \newlength{\mspace} \settowidth{\mspace}{m} \newlength{\alphabet} \settowidth{\alphabet} {abcdefghijklmnopqrstuvwxyz} \usepackage{geometry} \geometry{textwidth=2.5\alphabet} %\geometry{textwidth=26ex} \newlength{\avgchar} \setlength{\avgchar} {\textwidth/65} \pagestyle{empty} \setlength{\parskip} {\bigskipamount} \begin{document} \section*{How to estimate characters per line win \LaTeX} \subsection*{A dirty way: Fix {\tt textwidth} to $n$ alphabets or $n$ em units} As you can see in the preamble, the text width of this text is fixed to 2.5 times the length of the alphabet with 26 characters (\the\alphabet) with a default font size of \the\oneem, resulting in \the\textwidth per line. Therefore should be 26 * 2.5 = 65 characters per line (a good value according to the Bringhurst rule) where each character have a with of \the\avgchar~in average (note that is roughly one half of the font size, i.e, each character is aprox. $\thickapprox\frac{1}{2}$em and then fixing {\tt \textbackslash{textwidth}} in em units allow also an easy calculation of the number of characters. But this prediction is only very useful if you plan to write $n$ times of complete alphabets. Only then really 130 characters (5 alphabets) fill two lines ... \begin{linenumbers} \noindent \textcolor{blue}{abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmn opqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz} \end{linenumbers} . .. 260 characters fill 4 lines, and so on. Unfortunately, for any other text this rule is less because the proportion of each character changes. For example, the 10 first paragrahs of \emph{Lore Ipsum} (below only the first) with this format produce 96 lines with 986 words and 6673 characters. That is 69 characters per line (counting the spaces), not 65. \begin{linenumbers} \textcolor{blue}{\lipsum[1]} \end{linenumbers} Not too bad prediction after all, taking into account that the thin $i$ (\the\ispace) appears 503 times while the tick $m$ (\the\mspace) appears only 218 times, and moreover, there are variable spaces and signs of punctuation. Of course, you can also fix any predetermined {\tt \textbackslash{textwidth}} in pt, cm, etc., calculate the width of your preferred font ( \url{http://tex.stackexchange.com/questions/60277/average-width-of-popular-tex-fonts}) and then simply do some math. \subsection*{A quicker way: the {\tt canoniclayout} package} \begin{itemize} \item Put {\tt \textbackslash{}usepackage\{canoniclayout\}} in the preamble \item And {\tt \textbackslash{currentfontletters}} in the body of the document: \framebox{\begin{minipage}[t]{1\columnwidth}% \currentfontletters \end{minipage}} \item Or {\tt \textbackslash{charactersperpage}}: \framebox{\begin{minipage}[t]{1\columnwidth}% \charactersperpage \end{minipage}} \end{itemize} Note that this package make estimates of the amount of characters and lines that the page layout could have, do not count the number of compiled lines\footnote{ For this you can use {\tt lineno} package as above in the \emph{Lore Ipsum} ouput}. \subsection*{A strange way: the {\tt stdpage} package} Produce a format with a nonproportional font but with 30 lines and 60 characters and about 1440 character (german “Normseite”) by default. The number of characters and lines can be adjusted: {\tt \textbackslash{}usepackage{[}chars=65, lines=30, noindent{]}\{stdpage\}} Probably you don't want print the final version in this format, but temporally could be useful to compare the lengths of text in each line, as well as the amount of pages, with the version of the proportional font and so obtain some information about the character density in your text. \subsection*{Another unexplored ways} It saw that in ConTeX (I never used) there are a \textbackslash{averagecharwidth} command. Please see \url{http://tex.stackexchange.com/questions/68105/macro-for-the-average-width-of-a-character} \end{document} But a more practical approach is TeXcount since allow more control of what you count. Let me explain with this file:
% CAUTION !!! % 1) Need --enable-write18 or --shell-escape % 2) This file MUST be saved % as "borra.tex" before the compilation % in your working directory % 3) This code will write wordcount.tex % and charcount.tex in /tmp of your disk. % (Windows users must change this path) % 4) Do not compile if you are unsure % of what you are doing. \documentclass{article} \usepackage{lineno} % for line numbers \usepackage{moreverb} % for verbatim ouput % Only for format purposes \usepackage{geometry} \geometry{verbose,tmargin=2cm,bmargin=2cm,lmargin=6cm,rmargin=3cm} \usepackage{graphicx} \setlength{\parskip}{\bigskipamount} \setlength{\parindent}{1em} % Count of words \immediate\write18{texcount -inc -incbib -sum borra.tex > /tmp/wordcount.tex} \newcommand\wordcount{ \verbatiminput{/tmp/wordcount.tex}} % Count of characters \immediate\write18{texcount -char -freq borra.tex > /tmp/charcount.tex} \newcommand\charcount{ \verbatiminput{/tmp/charcount.tex}} % Only two example lengths \newlength{\ispace} \settowidth{\ispace}{i} \newlength{\mspace} \settowidth{\mspace}{m} \begin{document} % Note that the next line is NOT a comment %TC:ignore {\bf Note}: Comparison of source and compiled version of this file must be self explanatory. See {\tt lineno} and {\tt\TeX count} documentation if you need more information. \noindent\resizebox{\textwidth}{!}{\bf How to determine characters per line with \LaTeX} With few \LaTeX{} commands we can see in the compiled document the number of lines (with package {\tt lineno}) and count the number of words and characters for the whole documents or some parts with the aid of the {\tt\TeX count} program (included in \TeX{} Live). The rest is child's play: We can determine that in the example (see below) the text of the example section (without head nor float nor subsection), there are 7-1=6 lines (see left margin) with 70 words (see page 2) and 350 characters (see page 3), so that there is an average of 70/6 = 11.6 words and 350/6 = 58.3 characters per line. Moreover, as frequency and with of each character can be determined (see page 3) it is also easy obtain the average width of these characters in a long reference text to make predictions of characters per line in texts of the same language/style that have not yet been written. In the whole example there are 454 characters: 25 $i$ with widths of \the\ispace{} , 16 $m$ with widths of \the\mspace{} \dots and so on. Therefore the average will be $$\frac{(25*2.77)+(16*8.33)+ ...}{454}$$ And the text width (\the\textwidth{} in the example) divided by this average will give the prediction of characters per line. That's all. \dotfill Start of the example text \dotfill %TC:endignore \linenumbers \section{Section: text example with a float} Words and characters of this example file are automatically counted from the source file when compiled (therefore generated text as \textbackslash{}lipsum[1-10] is {\bf not} counted). The results are showed at the end of the compiled version. Counts are made in headers, caption floats and normal text for the whole file. Subcounts for structured parts (sections, subsections, etc.) are also made. Number of headers, floats and math chunks are also counted. \begin{figure}[h] \centering \framebox{This is only a example float} \caption{This is a example caption} \end{figure} \subsection*{Subsection: Little text with math chunks} In line math: $\pi +2 = 2+\pi$ \\ Display math: \[\pi +2 = 2+\pi\] \nolinenumbers %TC:ignore \dotfill End of the example text \dotfill \newpage \subsubsection*{Counts of words} \wordcount \newpage \subsubsection*{Counts of characters and frequencies} \charcount %TC:endignore \end{document}