Skip to content

Conversation

@kirkhansen
Copy link
Contributor

@kirkhansen kirkhansen commented Jul 29, 2018

@kirkhansen kirkhansen changed the title BUG: TimedeltaIndex.intersection #17433 BUG: TimedeltaIndex.intersection Jul 29, 2018
@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 62d9e3e to 5af3c53 Compare July 29, 2018 17:08
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note in bug fixes / timedeltas

""" alias for is_monotonic_increasing (deprecated) """
return self.is_monotonic_increasing

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we actually using this? (it seems you in-lined it below), I actually prefer that to making a new method

Copy link
Contributor Author

@kirkhansen kirkhansen Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are similar, but I'm calling this function twice in the intersection function to help me deal with the boolean logic (already hard for me to follow, even with this addition). I can remove this, and put this inline if you really want it that way.

end = right[-1]

if end > start:
return Index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not []?

start = right[0]

if end < start:
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth to consolidate these into 1 routine? as you can call with the ascending arg

result.offset = frequencies.to_offset(result.inferred_freq)
return result

# handle intersecting things like this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a tiny bit more of expl

return wrapper


def apply_index_wraps(func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been awhile. I don't remember writing this code. I'll guess an artifact of my rebase.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type labels Jul 30, 2018
@kirkhansen
Copy link
Contributor Author

Having come back at this, I'm struggling with how to deal with all the edge cases that need to be accounted for before trying to apply this quick method of intersecting, and this is making me question whether or not to even include it.
The quick method cannot allow any 'gaps' between values, unless both indexes have the same gaps, and having to check for that before intersecting seems a lot harder than just using Index.intersection, unless that method is worse than something we can write to check if if values are contiguous based on the frequency passed in or something.

I'm having a hard time describing it clearly, but hopefully this is making sense. If not, I can commit my broken code so you all can see what I'm talking about.

@jorisvandenbossche
Copy link
Member

A possibility to at least solve the bug would be to fall back to the Index.intersection implementation for the cases where monotonicity does not match?

Basically similar like DatetimeIndex.intersection also falls back to it when eg freq does not match:

elif (other.freq is None or self.freq is None or
other.freq != self.freq or
not other.freq.isAnchored() or
(not self.is_monotonic or not other.is_monotonic)):
result = Index.intersection(self, other)
result = self._shallow_copy(result._values, name=result.name,
tz=result.tz, freq=None)
if result.freq is None:
result.freq = to_offset(result.inferred_freq)
return result

Will not be the most performant option, but at least could fix the bug

@kirkhansen
Copy link
Contributor Author

@jorisvandenbossche I thought I had been doing that, but it seems my check was backwards for that case :/. Correcting that appears to solve the issues. Thanks!

@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 5af3c53 to 808ce4e Compare September 14, 2018 14:02
@pep8speaks
Copy link

pep8speaks commented Sep 14, 2018

Hello @kirkhansen! Thanks for updating the PR.

Comment last updated on December 12, 2018 at 21:29 Hours UTC
@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 808ce4e to 5703681 Compare November 9, 2018 05:50
@codecov
Copy link

codecov bot commented Nov 9, 2018

Codecov Report

Merging #22114 into master will increase coverage by <.01%.
The diff coverage is 86.79%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #22114 +/- ## ========================================== + Coverage 92.21% 92.22% +<.01%  ========================================== Files 162 162 Lines 51768 51761 -7 ========================================== - Hits 47739 47734 -5  + Misses 4029 4027 -2
Flag Coverage Δ
#multiple 90.62% <86.79%> (ø) ⬆️
#single 43.14% <47.16%> (+0.14%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.8% <ø> (+0.46%) ⬆️
pandas/core/indexes/period.py 93.06% <100%> (ø) ⬆️
pandas/tseries/offsets.py 96.86% <100%> (+0.01%) ⬆️
pandas/core/indexes/base.py 96.27% <100%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.69% <33.33%> (+0.26%) ⬆️
pandas/core/indexes/datetimelike.py 96.24% <87.5%> (-1.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9bc42f9...4aac7e8. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

if you can merge master

@kirkhansen kirkhansen force-pushed the fix-descending-timedeltaindex-intersect branch from 16965c5 to 2f5513c Compare December 12, 2018 19:57
@mroeschke
Copy link
Member

You'll need to run isort on this file:

2018-12-12T21:36:13.4930067Z ERROR: /home/vsts/work/1/s/pandas/core/indexes/datetimelike.py Imports are incorrectly sorted. 
first = getattr(first, 'freq', None)
second = getattr(second, 'freq', None)
are_offsets_equal = True
if first is None or second is None or first != second:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return this statement; no need for the intermediate variable

df2 = tm.makeTimeDataFrame().rename(columns=lambda x: "%s_2" % x)
df1.iloc[1, df1.columns.get_indexer(['A', 'B'])] = np.nan
df = concat([df1, df2], axis=1)
print(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print should be removed.

"""
Speedy intersection that works only if certain assumptions are met.
See intersection for details.
Parameters
Copy link
Member

@mroeschke mroeschke Dec 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give a bit more color to Parameters and Returns

@WillAyd
Copy link
Member

WillAyd commented Feb 27, 2019

Closing as stale ping if you'd like to continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timedelta Timedelta data type

6 participants