Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

@TomAugspurger after the repr PR, the docs build catched an error: the repr of a DataFrame with an IntervalIndex started failing:

In [1]: df = pd.DataFrame({'A': [1, 2, 3, 4]}, ...: index=pd.IntervalIndex.from_breaks([0, 1, 2, 3, 4])) In [2]: df Out[2]: --------------------------------------------------------------------------- ImportError Traceback (most recent call last) ~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj) 700 type_pprinters=self.type_printers, 701 deferred_pprinters=self.deferred_printers) --> 702 printer.pretty(obj) 703 printer.flush() 704 return stream.getvalue() ~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj) 400 if cls is not object \ 401 and callable(cls.__dict__.get('__repr__')): --> 402 return _repr_pprint(obj, self, cycle) 403 404 return _default_pprint(obj, self, cycle) ~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle) 695 """A pprint that just redirects to the normal repr function.""" 696 # Find newlines and replace them with p.break_() --> 697 output = repr(obj) 698 for idx,output_line in enumerate(output.splitlines()): 699 if idx: ~/scipy/pandas/pandas/core/base.py in __repr__(self) 75 Yields Bytestring in Py2, Unicode String in py3. 76 """ ---> 77 return str(self) 78 79 ~/scipy/pandas/pandas/core/base.py in __str__(self) 54 55 if compat.PY3: ---> 56 return self.__unicode__() 57 return self.__bytes__() 58 ~/scipy/pandas/pandas/core/frame.py in __unicode__(self) 626 width = None 627 self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols, --> 628 line_width=width, show_dimensions=show_dimensions) 629 630 return buf.getvalue() ~/scipy/pandas/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, line_width) 707 decimal=decimal, 708 line_width=line_width) --> 709 formatter.to_string() 710 711 if buf is None: ~/scipy/pandas/pandas/io/formats/format.py in to_string(self) 601 else: 602 --> 603 strcols = self._to_str_columns() 604 if self.line_width is None: # no need to wrap around just print 605 # the whole frame ~/scipy/pandas/pandas/io/formats/format.py in _to_str_columns(self) 510 # may include levels names also 511 --> 512 str_index = self._get_formatted_index(frame) 513 514 if not is_list_like(self.header) and not self.header: ~/scipy/pandas/pandas/io/formats/format.py in _get_formatted_index(self, frame) 807 names=show_index_names, formatter=fmt) 808 else: --> 809 fmt_index = [index.format(name=show_index_names, formatter=fmt)] 810 fmt_index = [tuple(_make_fixed_width(list(x), justify='left', 811 minimum=(self.col_space or 0), ~/scipy/pandas/pandas/core/indexes/base.py in format(self, name, formatter, **kwargs) 993 return header + list(self.map(formatter)) 994 --> 995 return self._format_with_header(header, **kwargs) 996 997 def _format_with_header(self, header, na_rep='NaN', **kwargs): ~/scipy/pandas/pandas/core/indexes/interval.py in _format_with_header(self, header, **kwargs) 1012 1013 def _format_with_header(self, header, **kwargs): -> 1014 return header + list(self._format_native_types(**kwargs)) 1015 1016 def _format_native_types(self, na_rep='', quoting=None, **kwargs): ~/scipy/pandas/pandas/core/indexes/interval.py in _format_native_types(self, na_rep, quoting, **kwargs) 1016 def _format_native_types(self, na_rep='', quoting=None, **kwargs): 1017 """ actually format my specific types """ -> 1018 from pandas.io.formats.format import IntervalArrayFormatter 1019 return IntervalArrayFormatter(values=self, 1020 na_rep=na_rep, ImportError: cannot import name 'IntervalArrayFormatter' 

What is in this PR "fixes" the immediate error, but, I see a difference with what was before:

On 0.23.4:

In [2]: i = pd.io.formats.format.IntervalArrayFormatter(pd.interval_range(1, 5)) In [3]: i.get_result() Out[3]: ['(1, 2]', '(2, 3]', '(3, 4]', '(4, 5]'] 

On master:

In [22]: i = pd.io.formats.format.ExtensionArrayFormatter(pd.interval_range(1, 5)) In [23]: i.get_result() Out[23]: [' (1, 2]', ' (2, 3]', ' (3, 4]', ' (4, 5]'] 

So there is now an extra space. So which means also the DataFrame repr starts with a space.

(still need to add tests, and it might this also breaks existing tests due to the whitespace change)

@pep8speaks
Copy link

Hello @jorisvandenbossche! Thanks for submitting the PR.

@jorisvandenbossche jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string Blocker Blocking issue or pull request for an upcoming release Interval Interval data type labels Dec 6, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Dec 6, 2018
@codecov
Copy link

codecov bot commented Dec 6, 2018

Codecov Report

Merging #24134 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #24134 +/- ## ======================================= Coverage 92.2% 92.2% ======================================= Files 162 162 Lines 51701 51701 ======================================= Hits 47671 47671 Misses 4030 4030
Flag Coverage Δ
#multiple 90.6% <0%> (ø) ⬆️
#single 43.02% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 94.73% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03134cb...ec88f66. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 6, 2018

Codecov Report

Merging #24134 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #24134 +/- ## ========================================== + Coverage 92.2% 92.22% +0.01%  ========================================== Files 162 162 Lines 51700 51769 +69 ========================================== + Hits 47669 47742 +73  + Misses 4031 4027 -4
Flag Coverage Δ
#multiple 90.62% <100%> (+0.01%) ⬆️
#single 43.01% <70%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 97.97% <100%> (+0.01%) ⬆️
pandas/core/indexes/interval.py 95.26% <100%> (+0.52%) ⬆️
pandas/core/arrays/categorical.py 95.3% <0%> (-0.1%) ⬇️
pandas/core/frame.py 96.91% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.27% <0%> (ø) ⬆️
pandas/core/indexes/category.py 97.9% <0%> (ø) ⬆️
pandas/core/arrays/datetimes.py 98.56% <0%> (ø) ⬆️
pandas/core/generic.py 96.65% <0%> (ø) ⬆️
pandas/io/pytables.py 92.31% <0%> (ø) ⬆️
pandas/core/internals/managers.py 95.93% <0%> (+0.01%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0b9eea...f95ae76. Read the comment docs.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 8, 2018

I'll look into this later today.

It seems that we're now hitting the else in

for i, v in enumerate(vals):
if not is_float_type[i] and leading_space:
fmt_values.append(u' {v}'.format(v=_format(v)))
elif is_float_type[i]:
fmt_values.append(float_format(v))
else:
fmt_values.append(u' {v}'.format(v=_format(v)))
.

@TomAugspurger
Copy link
Contributor

Updated to pass through a leading_space argument and tests for the formatters that we removed.

from pandas.io.formats.format import ExtensionArrayFormatter
return ExtensionArrayFormatter(values=self,
na_rep=na_rep,
justify='all',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this new arg? just change the output tests, which are incorrect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 0.23.4, we didn't have the leading space for indexes.

In [2]: df = pd.Series(1, index=pd.IntervalIndex.from_breaks([1, 2, 3, 4])).to_frame() In [3]: df Out[3]: 0 (1, 2] 1 (2, 3] 1 (3, 4] 1 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn’t this the same issue you recently adjusted for DTi? this keywords just promote inconsistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think so.

AFAICT, this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't. So I think things should be more consistent. If you want I can remove the keyword and go back to the old implementation which just did the formatting on its own, but I suspect you don't want that :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i guess

fmt_values.append(float_format(v))
else:
fmt_values.append(u' {v}'.format(v=_format(v)))
if leading_space is False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so hacky, we should not be doing this


is_float_type = lib.map_infer(vals, is_float) & notna(vals)
leading_space = is_float_type.any()
leading_space = self.leading_space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document when this is set.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@jorisvandenbossche
Copy link
Member Author

this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't.

I don't really know the formatting code, but would it make sense to move this "adding a leading space or not" to the code that actually combines the list of strings into the actual repr? As it is only when concatting them with ',' in the middle for the Index repr that you need those spaces?
(just wondering, not for this PR, but might be an idea to clean this up)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 11, 2018 via email

@jreback
Copy link
Contributor

jreback commented Dec 13, 2018

ok, merging this though @TomAugspurger can you open an issue on #24134 (comment) which sounds good.

@jreback jreback merged commit 33ca356 into pandas-dev:master Dec 13, 2018
@jorisvandenbossche jorisvandenbossche deleted the interval-df-repr branch December 16, 2018 07:54
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Blocker Blocking issue or pull request for an upcoming release Bug Interval Interval data type Output-Formatting __repr__ of pandas objects, to_string

4 participants