-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
API DesignEnhancementGroupbyNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations
Milestone
Description
I don't think there is a way to get the nlargest elements in a DataFrame without sorting.
In ordinary python you'd use heapq's nlargest (and we can hack a bit to use it for a DataFrame):
In [10]: df Out[10]: IP Agent Count 0 74.86.158.106 Mozilla/5.0+(compatible; UptimeRobot/2.0; http... 369 1 203.81.107.103 Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20... 388 2 173.199.120.155 Mozilla/5.0 (compatible; AhrefsBot/4.0; +http:... 417 3 124.43.84.242 Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.3... 448 4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454 5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461 6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467 In [11]: df.sort('Count', ascending=False).head(3) Out[11]: IP Agent Count 6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467 5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461 4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454 In [21]: from heapq import nlargest In [22]: top_3 = nlargest(3, df.iterrows(), key=lambda x: x[1]['Count']) In [23]: pd.DataFrame.from_items(top_3).T Out[23]: IP Agent Count 6 124.43.104.198 Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20... 467 5 124.43.155.138 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) G... 461 4 112.135.196.223 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3... 454 This is much slower than sorting, presumbly from the overhead, I thought I'd throw this as a feature idea anyway.
Metadata
Metadata
Assignees
Labels
API DesignEnhancementGroupbyNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations