1

I have a dataset of marks (0 to 20), for example a [2, 3, 7, 14, 15, ...]

I would like to draw an histogram of the repartition of this dataset. What I mean is an histogram where on the x-axis are some ranges of marks (for instance: 0 to 1, 1 to 2, ..., 19 to 20) and on the y-axis are the number of marks in this range.

For instance something like that:

enter image description here

The code of this image is produced by a Python program writing a .tex file.

\documentclass[twoside]{article} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage{lmodern} \usepackage{pgfplots} \begin{document} \begin{minipage}[b]{0.65\textwidth} \begin{center} \begin{tikzpicture} \begin{axis}[ ymin=0, ymax=6.2 , ytick={0,2,...,6}, minor y tick num = 0, %area style, width=10cm, height=4cm, axis lines*=left, bar width=0.6cm, y axis line style = {draw = none}, tick align = outside, tick pos = left ]\addplot+[ybar interval, mark=no, fill=black!20, draw=black!40] coordinates{(6.0,2) (6.933333333333334,2) (6.933333333333334,1) (7.866666666666667,1) (7.866666666666667,1) (8.8,1) (8.8,2) (9.733333333333334,2) (9.733333333333334,6) (10.666666666666668,6) (10.666666666666668,3) (11.600000000000001,3) (11.6,5) (12.533333333333333,5) (12.533333333333333,6) (13.466666666666667,6) (13.466666666666667,4) (14.4,4) (14.4,2) (15.333333333333334,2) (15.333333333333334,4) (16.266666666666666,4) (16.266666666666666,3) (17.2,3) (17.2,0) (18.133333333333333,0) (18.133333333333333,2) (19.066666666666666,2) (19.066666666666666,1) (20.0,1) }; \addplot+[ybar interval, mark=no, fill=black!70, draw=black!90] coordinates{(15.333333333333334,4) (16.266666666666666,4) }; \end{axis} \end{tikzpicture} \end{center} \end{minipage} \end{document} 

My question

I would like to know if it is possible to do it only with LaTeX (or LuaLaTeX, etc.). The dataset would be, for instance, in the following format (where 8,5 is the french notation of 8.5):

8,5 13,5 6,5 8,5 10 8 5,5 2,5 5,5 7 20 12 18,5 5,5 9,5 3,5 0 7 7 5,5 10 3,5 7 14,5 7 10,5 16,5 10,5 6,5 8 14 8,5 2,5 5 9,5 8 10 9,5 9 8,5 4,5 5,5 

I would prefer to stick to this format but any format produced automatically by Excel (.csv, etc.) can also be considered.

What tool would you use? Do you have any advise for such a "program"?


edit

With the dataset given in example, the histogram should look like this image

enter image description here

1

1 Answer 1

2

One can accumulate the data points as follows. (Alternatives include stacked plots.)

\documentclass[twoside]{article} \usepackage{filecontents} \begin{filecontents*}{commadata.dat} 8,5 13,5 6,5 8,5 10 8 5,5 2,5 5,5 7 20 12 18,5 5,5 9,5 3,5 0 7 7 5,5 10 3,5 7 14,5 7 10,5 16,5 10,5 6,5 8 14 8,5 2,5 5 9,5 8 10 9,5 9 8,5 4,5 5,5 \end{filecontents*} \usepackage{xstring} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage{lmodern} \usepackage{pgfplots} \pgfplotsset{compat=1.16} \newcommand*{\ReadOutElement}[4]{% \pgfplotstablegetelem{#2}{[index]#3}\of{#1}% \let#4\pgfplotsretval } \begin{document} \begin{minipage}[b]{0.65\textwidth} \begin{center} \begin{tikzpicture} \pgfplotstableread[/pgf/number format/read comma as period]{commadata.dat}\datatable \pgfplotstablegetrowsof{\datatable} \pgfmathtruncatemacro{\numrows}{\pgfplotsretval} \pgfplotstablegetcolsof{\datatable} \pgfmathtruncatemacro{\numcols}{\pgfplotsretval} \edef\myxmax{0}% \foreach \nY in {1,...,\numrows} {\ReadOutElement{\datatable}{\the\numexpr\nY-1}{0}{\Current}% \StrSubstitute{\Current}{,}{.}[\mytemp] \pgfmathtruncatemacro{\myx}{\mytemp+1} \pgfmathtruncatemacro{\myxmax}{max(\myxmax,\myx)} \xdef\myxmax{\myxmax} } \foreach \X in {0,...,\myxmax} {\expandafter\xdef\csname mypile\X\endcsname{0}} \foreach \nY in {1,...,\numrows} {\ReadOutElement{\datatable}{\the\numexpr\nY-1}{0}{\Current}% \StrSubstitute{\Current}{,}{.}[\mytemp] \pgfmathtruncatemacro{\myx}{\mytemp}% \edef\currentval{\csname mypile\myx\endcsname} \pgfmathtruncatemacro{\mycur}{\currentval+1} \expandafter\xdef\csname mypile\myx\endcsname{\mycur} } \begin{axis}[ ymin=0,% ymax=6.2, xmin=0, %ytick={0,2,...,6}, minor y tick num = 0, %area style, width=10cm, height=4cm, axis lines*=left, %bar width=0.2cm, y axis line style = {draw = none}, tick align = outside, tick pos = left ] \pgfplotsinvokeforeach{0,...,\myxmax}{% \edef\currentval{\csname mypile#1\endcsname} \pgfmathtruncatemacro{\mycur}{\currentval} \addplot[ybar, fill=black!20, draw=black!40] coordinates {(#1-0.5,\mycur)}; } \end{axis} \end{tikzpicture} \end{center} \end{minipage} \end{document} 

enter image description here

OLDER ANSWER: One can read your data with /pgf/number format/read comma as period, as pointed out in this answer. In order to get the intervals of width 1, we only need x expr=\coordindex.

\documentclass[twoside]{article} \usepackage{filecontents} \begin{filecontents*}{commadata.dat} y 8,5 13,5 6,5 8,5 10 8 5,5 2,5 5,5 7 20 12 18,5 5,5 9,5 3,5 0 7 7 5,5 10 3,5 7 14,5 7 10,5 16,5 10,5 6,5 8 14 8,5 2,5 5 9,5 8 10 9,5 9 8,5 4,5 5,5 \end{filecontents*} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage{lmodern} \usepackage{pgfplots} \pgfplotsset{compat=1.16} \begin{document} \begin{minipage}[b]{0.65\textwidth} \begin{center} \begin{tikzpicture} \begin{axis}[ ymin=0,% ymax=6.2, xmin=0, %ytick={0,2,...,6}, minor y tick num = 0, %area style, width=10cm, height=4cm, axis lines*=left, %bar width=0.2cm, y axis line style = {draw = none}, tick align = outside, tick pos = left ] \addplot+[ybar interval, mark=no, fill=black!20, draw=black!40] table[y=y,x expr=\coordindex,/pgf/number format/read comma as period,col sep=tab] {commadata.dat}; % \addplot+[ybar interval, mark=no, fill=black!70, draw=black!90] coordinates{(15.333333333333334,4) (16.266666666666666,4) }; \end{axis} \end{tikzpicture} \end{center} \end{minipage} \end{document} 

enter image description here

7
  • Amazing! Thanks! But, it is not the right histogram. I have edited the question. Commented Nov 18, 2019 at 16:18
  • @Colas How is that possible? Your data set starts with the three values 8,5, 13,5 and 6,5, whereas the desired output shows a bar of height 1, a bar of height 0 and a bar of height 2. What am I missing? What is the prescription to draw the bars from the data? (BTW, you nee to use ybar rather than ybar interval to produce the histogram of the desired output.) Commented Nov 18, 2019 at 16:22
  • See the second paragraph of the question. Commented Nov 18, 2019 at 16:24
  • @Colas OK, I see. And in which bins should the integers go. E.g. 10 belongs in the bin from 9 to 10 or 10 to 11? Commented Nov 18, 2019 at 16:30
  • Thanks. I take 21 bins : 0 ≤ x < 1, ..., 19≤ x < 20, x = 20. Commented Nov 18, 2019 at 16:36

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.