Skip to content

Conversation

@jeschwar
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

When creating a latex table with DataFrame.to_latex(longtable=False) the output is written inside a latex tabular environment and stored in some file like pandas_tabular.tex; the user can conveniently typeset the table in a main report.tex file complete with caption and label as follows:

\begin{table} \caption{the caption} \label{the label} \input{pandas_tabular.tex} \end{table}

This is good because the pandas_tabular.tex file can be re-created and the main report.tex simply needs to be recompiled to get the updated output.

The problem when creating a latex longtable with DataFrame.to_latex(longtable=True) is the caption and label need to go inside the latex longtable environment which is stored in a some file like pandas_longtable.tex. The latex longtable environment does not go inside a table environment like the tabular environment does; this means that setting the caption and label requires the user to edit the pandas_longtable.tex file after its creation. This does not support an automated workflow like we have with the tabular environment.

This PR adds caption and label support to DataFrame.to_latex(longtable=True) with the arguments lt_caption and lt_label. Example usage is described below.

The following python code creates some data in a DataFrame and writes it to disk in tabular and longtable latex environments:

import numpy as np import pandas as pd # create some example data with more rows than would fit on a single page df = pd.DataFrame(np.random.randn(60,3)) # write the first 5 rows to regular table in a latex tabular environment df.head().to_latex( 'pandas_tabular.tex', ) # write the whole table in the latex longtable environment c/w caption and label df.to_latex( 'pandas_longtable.tex', longtable=True, lt_caption='table in \\texttt{longtable} environment', lt_label='tab:longtable', )

The following latex code is contained in a main report.tex and is used to typset both tables:

\documentclass{article} \usepackage{longtable} \usepackage{booktabs} \begin{document} % typeset the table in the tabular environment Table \ref{tab:tabular}	is a \texttt{tabular} and has 5 rows: \begin{table}[h] \centering \caption{table in \texttt{tabular} environment} \label{tab:tabular} \input{pandas_tabular.tex} \end{table} % typeset the table in the longtable environment Table \ref{tab:longtable} is a \texttt{longtable} and has 60 rows: \input{pandas_longtable.tex} \end{document}

Using DataFrame.to_latex(longtable=True) with the new arguments lt_caption and lt_label means we don't have to edit pandas_longtable.tex after its creation to get the caption and label working. This functionality also works with Series.to_latex(longtable=True).

PDF output is shown below:

image

@codecov
Copy link

codecov bot commented Feb 16, 2019

Codecov Report

Merging #25339 into master will decrease coverage by <.01%.
The diff coverage is 52.94%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #25339 +/- ## ========================================== - Coverage 91.72% 91.71% -0.01%  ========================================== Files 173 173 Lines 52831 52842 +11 ========================================== + Hits 48457 48462 +5  - Misses 4374 4380 +6
Flag Coverage Δ
#multiple 90.27% <52.94%> (-0.01%) ⬇️
#single 41.71% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 94.16% <0%> (ø) ⬆️
pandas/io/formats/format.py 97.99% <100%> (ø) ⬆️
pandas/io/formats/latex.py 95.52% <45.45%> (-4.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83fe6ca...7c105fa. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Feb 16, 2019

Codecov Report

Merging #25339 into master will decrease coverage by <.01%.
The diff coverage is 52.94%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #25339 +/- ## ========================================== - Coverage 91.72% 91.71% -0.01%  ========================================== Files 173 173 Lines 52831 52842 +11 ========================================== + Hits 48457 48462 +5  - Misses 4374 4380 +6
Flag Coverage Δ
#multiple 90.27% <52.94%> (-0.01%) ⬇️
#single 41.71% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 94.16% <0%> (ø) ⬆️
pandas/io/formats/format.py 97.99% <100%> (ø) ⬆️
pandas/io/formats/latex.py 95.52% <45.45%> (-4.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83fe6ca...7c105fa. Read the comment docs.

@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Feb 16, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is label / caption useful to a tabular write?

you would need to add a test for this

@jeschwar
Copy link
Contributor Author

@jreback that is a good suggestion to allow the label and caption arguments to apply to the latex table environment when longtable=False. This means that DataFrame.to_latex() would have to write the nested latex table/tabular environments in the output; this may be ok for most uses. Here is what I am thinking for possible scenarios and corresponding behavior:

.to_latex(longtable=False, caption=None, label=None)

  • only output the latex tabular environment which is the current behavior

.to_latex(longtable=True, caption=None, label=None)

  • output the latex longtable environment without any captions or labels which the the current behavior

.to_latex(longtable=False, caption='some caption', label='tab:some label')

  • output the latex nested table/tabular environments which include the caption and label from the user
  • this code could be added to this PR
  • if the user wants to add customized latex code inside the table environment but outside the tabular environment then they should not pass values for the caption and label arguments

.to_latex(longtable=True, caption='some caption', label='tab:some label')

  • output the latex longtable environment which includes the caption and label from the user as initially described in this PR

Thoughts anyone?

@WillAyd
Copy link
Member

WillAyd commented Feb 20, 2019

@jeschwar not a LaTeX expert by any means but your proposal makes sense. Can you open as an issue and reference that from this PR? That is typically easiest for change management

@jeschwar
Copy link
Contributor Author

Thanks @WillAyd I created issue #25436 and will create a new PR because the scope has increased.

@jeschwar jeschwar closed this Feb 25, 2019
@jeschwar jeschwar deleted the to_latex_longtable_caption_label branch February 25, 2019 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Output-Formatting __repr__ of pandas objects, to_string

3 participants