Fix repr of DataFrame with IntervalIndex #24134

jorisvandenbossche · 2018-12-06T22:10:22Z

@TomAugspurger after the repr PR, the docs build catched an error: the repr of a DataFrame with an IntervalIndex started failing:

In [1]: df = pd.DataFrame({'A': [1, 2, 3, 4]}, ...: index=pd.IntervalIndex.from_breaks([0, 1, 2, 3, 4])) In [2]: df Out[2]: --------------------------------------------------------------------------- ImportError Traceback (most recent call last) ~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj) 700 type_pprinters=self.type_printers, 701 deferred_pprinters=self.deferred_printers) --> 702 printer.pretty(obj) 703 printer.flush() 704 return stream.getvalue() ~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj) 400 if cls is not object \ 401 and callable(cls.__dict__.get('__repr__')): --> 402 return _repr_pprint(obj, self, cycle) 403 404 return _default_pprint(obj, self, cycle) ~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle) 695 """A pprint that just redirects to the normal repr function.""" 696 # Find newlines and replace them with p.break_() --> 697 output = repr(obj) 698 for idx,output_line in enumerate(output.splitlines()): 699 if idx: ~/scipy/pandas/pandas/core/base.py in __repr__(self) 75 Yields Bytestring in Py2, Unicode String in py3. 76 """ ---> 77 return str(self) 78 79 ~/scipy/pandas/pandas/core/base.py in __str__(self) 54 55 if compat.PY3: ---> 56 return self.__unicode__() 57 return self.__bytes__() 58 ~/scipy/pandas/pandas/core/frame.py in __unicode__(self) 626 width = None 627 self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols, --> 628 line_width=width, show_dimensions=show_dimensions) 629 630 return buf.getvalue() ~/scipy/pandas/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, line_width) 707 decimal=decimal, 708 line_width=line_width) --> 709 formatter.to_string() 710 711 if buf is None: ~/scipy/pandas/pandas/io/formats/format.py in to_string(self) 601 else: 602 --> 603 strcols = self._to_str_columns() 604 if self.line_width is None: # no need to wrap around just print 605 # the whole frame ~/scipy/pandas/pandas/io/formats/format.py in _to_str_columns(self) 510 # may include levels names also 511 --> 512 str_index = self._get_formatted_index(frame) 513 514 if not is_list_like(self.header) and not self.header: ~/scipy/pandas/pandas/io/formats/format.py in _get_formatted_index(self, frame) 807 names=show_index_names, formatter=fmt) 808 else: --> 809 fmt_index = [index.format(name=show_index_names, formatter=fmt)] 810 fmt_index = [tuple(_make_fixed_width(list(x), justify='left', 811 minimum=(self.col_space or 0), ~/scipy/pandas/pandas/core/indexes/base.py in format(self, name, formatter, **kwargs) 993 return header + list(self.map(formatter)) 994 --> 995 return self._format_with_header(header, **kwargs) 996 997 def _format_with_header(self, header, na_rep='NaN', **kwargs): ~/scipy/pandas/pandas/core/indexes/interval.py in _format_with_header(self, header, **kwargs) 1012 1013 def _format_with_header(self, header, **kwargs): -> 1014 return header + list(self._format_native_types(**kwargs)) 1015 1016 def _format_native_types(self, na_rep='', quoting=None, **kwargs): ~/scipy/pandas/pandas/core/indexes/interval.py in _format_native_types(self, na_rep, quoting, **kwargs) 1016 def _format_native_types(self, na_rep='', quoting=None, **kwargs): 1017 """ actually format my specific types """ -> 1018 from pandas.io.formats.format import IntervalArrayFormatter 1019 return IntervalArrayFormatter(values=self, 1020 na_rep=na_rep, ImportError: cannot import name 'IntervalArrayFormatter'

What is in this PR "fixes" the immediate error, but, I see a difference with what was before:

On 0.23.4:

In [2]: i = pd.io.formats.format.IntervalArrayFormatter(pd.interval_range(1, 5)) In [3]: i.get_result() Out[3]: ['(1, 2]', '(2, 3]', '(3, 4]', '(4, 5]']

On master:

In [22]: i = pd.io.formats.format.ExtensionArrayFormatter(pd.interval_range(1, 5)) In [23]: i.get_result() Out[23]: [' (1, 2]', ' (2, 3]', ' (3, 4]', ' (4, 5]']

So there is now an extra space. So which means also the DataFrame repr starts with a space.

(still need to add tests, and it might this also breaks existing tests due to the whitespace change)

pep8speaks · 2018-12-06T22:10:24Z

Hello @jorisvandenbossche! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/indexes/interval.py !

codecov · 2018-12-06T22:47:02Z

Codecov Report

Merging #24134 into master will not change coverage.
The diff coverage is 0%.

@@ Coverage Diff @@ ## master #24134 +/- ## ======================================= Coverage 92.2% 92.2% ======================================= Files 162 162 Lines 51701 51701 ======================================= Hits 47671 47671 Misses 4030 4030

Flag	Coverage Δ
#multiple	`90.6% <0%> (ø)`	⬆️
#single	`43.02% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/interval.py	`94.73% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03134cb...ec88f66. Read the comment docs.

codecov · 2018-12-06T22:47:03Z

Codecov Report

Merging #24134 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@ Coverage Diff @@ ## master #24134 +/- ## ========================================== + Coverage 92.2% 92.22% +0.01%  ========================================== Files 162 162 Lines 51700 51769 +69 ========================================== + Hits 47669 47742 +73  + Misses 4031 4027 -4

Flag	Coverage Δ
#multiple	`90.62% <100%> (+0.01%)`	⬆️
#single	`43.01% <70%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/formats/format.py	`97.97% <100%> (+0.01%)`	⬆️
pandas/core/indexes/interval.py	`95.26% <100%> (+0.52%)`	⬆️
pandas/core/arrays/categorical.py	`95.3% <0%> (-0.1%)`	⬇️
pandas/core/frame.py	`96.91% <0%> (ø)`	⬆️
pandas/core/indexes/base.py	`96.27% <0%> (ø)`	⬆️
pandas/core/indexes/category.py	`97.9% <0%> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`98.56% <0%> (ø)`	⬆️
pandas/core/generic.py	`96.65% <0%> (ø)`	⬆️
pandas/io/pytables.py	`92.31% <0%> (ø)`	⬆️
pandas/core/internals/managers.py	`95.93% <0%> (+0.01%)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0b9eea...f95ae76. Read the comment docs.

TomAugspurger · 2018-12-08T13:37:54Z

I'll look into this later today.

It seems that we're now hitting the else in

pandas/pandas/io/formats/format.py

Lines 933 to 939 in c911151

     for i, v in enumerate(vals):  
   if not is_float_type[i] and leading_space:  
   fmt_values.append(u' {v}'.format(v=_format(v)))  
   elif is_float_type[i]:  
   fmt_values.append(float_format(v))  
   else:  
   fmt_values.append(u' {v}'.format(v=_format(v)))  
 

.

…e-interval-df-repr

TomAugspurger · 2018-12-08T20:49:40Z

Updated to pass through a leading_space argument and tests for the formatters that we removed.

jreback · 2018-12-09T14:11:35Z

pandas/core/indexes/interval.py

+ from pandas.io.formats.format import ExtensionArrayFormatter
+ return ExtensionArrayFormatter(values=self,
+ na_rep=na_rep,
+ justify='all',


why do we need this new arg? just change the output tests, which are incorrect

In 0.23.4, we didn't have the leading space for indexes.

In [2]: df = pd.Series(1, index=pd.IntervalIndex.from_breaks([1, 2, 3, 4])).to_frame() In [3]: df Out[3]: 0 (1, 2] 1 (2, 3] 1 (3, 4] 1

wasn’t this the same issue you recently adjusted for DTi? this keywords just promote inconsistency

No I don't think so.

AFAICT, this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't. So I think things should be more consistent. If you want I can remove the keyword and go back to the old implementation which just did the formatting on its own, but I suspect you don't want that :)

jreback · 2018-12-09T14:11:53Z

pandas/io/formats/format.py

 fmt_values.append(float_format(v))
 else:
- fmt_values.append(u' {v}'.format(v=_format(v)))
+ if leading_space is False:


this is so hacky, we should not be doing this

jreback · 2018-12-10T17:57:17Z

pandas/io/formats/format.py


 is_float_type = lib.map_infer(vals, is_float) & notna(vals)
- leading_space = is_float_type.any()
+ leading_space = self.leading_space


can you document when this is set.

jreback

see comments

jorisvandenbossche · 2018-12-11T00:02:36Z

this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't.

I don't really know the formatting code, but would it make sense to move this "adding a leading space or not" to the code that actually combines the list of strings into the actual repr? As it is only when concatting them with ',' in the middle for the Index repr that you need those spaces?
(just wondering, not for this PR, but might be an idea to clean this up)

TomAugspurger · 2018-12-11T02:55:15Z

I'm not too familiar with it either, but that seems sensible to me. It'd be nice to focus the array formatter on formatting the values, and a higher-level formatter for joining together those formatted arrays.

…

On Mon, Dec 10, 2018 at 6:02 PM Joris Van den Bossche < ***@***.***> wrote: this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't. I don't really know the formatting code, but would it make sense to move this "adding a leading space or not" to the code that actually combines the list of strings into the actual repr? As it is only when concatting them with ',' in the middle for the Index repr that you need those spaces? (just wondering, not for this PR, but might be an idea to clean this up) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24134 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIrYCCgEJd4vD1B22GuQBggTQqy6xks5u3vYhgaJpZM4ZHaSi> .

jreback · 2018-12-13T00:50:54Z

ok, merging this though @TomAugspurger can you open an issue on #24134 (comment) which sounds good.

Fix repr of DataFrame with interval values

ec88f66

jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string Blocker Blocking issue or pull request for an upcoming release Interval Interval data type labels Dec 6, 2018

jorisvandenbossche added this to the 0.24.0 milestone Dec 6, 2018

TomAugspurger added 2 commits December 8, 2018 14:32

Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…

137cba7

…e-interval-df-repr

Pass through leading space argument

5b0479f

jreback requested changes Dec 9, 2018

View reviewed changes

TomAugspurger mentioned this pull request Dec 10, 2018

DOC: Fixed implicit imports for whatsnew (v >= version 20.0) #24199

Merged

4 tasks

jreback reviewed Dec 10, 2018

View reviewed changes

jreback requested changes Dec 10, 2018

View reviewed changes

update

f95ae76

jreback approved these changes Dec 13, 2018

View reviewed changes

jreback merged commit 33ca356 into pandas-dev:master Dec 13, 2018

jorisvandenbossche deleted the interval-df-repr branch December 16, 2018 07:54

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Fix repr of DataFrame with IntervalIndex (pandas-dev#24134)

c13fcd1

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Fix repr of DataFrame with IntervalIndex (pandas-dev#24134)

c2f3284

simonjayhawkins mentioned this pull request Apr 5, 2019

to_html formatter not called for float values in a mixed-type column (2) #26000

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix repr of DataFrame with IntervalIndex #24134

Fix repr of DataFrame with IntervalIndex #24134

Uh oh!

jorisvandenbossche commented Dec 6, 2018

pep8speaks commented Dec 6, 2018

codecov bot commented Dec 6, 2018

codecov bot commented Dec 6, 2018 •

edited

Loading

TomAugspurger commented Dec 8, 2018 •

edited

Loading

TomAugspurger commented Dec 8, 2018

jreback Dec 9, 2018

TomAugspurger Dec 9, 2018

jreback Dec 9, 2018

TomAugspurger Dec 10, 2018

jreback Dec 10, 2018

jreback Dec 9, 2018

jreback Dec 10, 2018

jreback left a comment

jorisvandenbossche commented Dec 11, 2018

TomAugspurger commented Dec 11, 2018 via email

jreback commented Dec 13, 2018

Labels

4 participants

Uh oh!

Fix repr of DataFrame with IntervalIndex #24134

Fix repr of DataFrame with IntervalIndex #24134

Uh oh!

Conversation

jorisvandenbossche commented Dec 6, 2018

pep8speaks commented Dec 6, 2018

codecov bot commented Dec 6, 2018

Codecov Report

codecov bot commented Dec 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

TomAugspurger commented Dec 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TomAugspurger commented Dec 8, 2018

jreback Dec 9, 2018

Choose a reason for hiding this comment

TomAugspurger Dec 9, 2018

Choose a reason for hiding this comment

jreback Dec 9, 2018

Choose a reason for hiding this comment

TomAugspurger Dec 10, 2018

Choose a reason for hiding this comment

jreback Dec 10, 2018

Choose a reason for hiding this comment

jreback Dec 9, 2018

Choose a reason for hiding this comment

jreback Dec 10, 2018

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 11, 2018

TomAugspurger commented Dec 11, 2018 via email

jreback commented Dec 13, 2018

Labels

4 participants

codecov bot commented Dec 6, 2018 •

edited

Loading

TomAugspurger commented Dec 8, 2018 •

edited

Loading