Suppose if I have following list
{ {10,b,30}, {100,a,40}, {1000,b,10}, {1000,b,70}, {100,b,20}, {10,b,70} } How to find rows that have max value in 3rd column, in this case
(*{{1000,b,70},{10,b,70}}*) With:
dat = {{10, b, 30}, {100, a, 40}, {1000, b, 10}, {1000, b, 70}, {100, b, 20}, {10, b, 70}}; Perhaps most directly:
Cases[dat, {_, _, Max@dat[[All, 3]]}] More approaches:
Last @ SplitBy[SortBy[dat, {#[[3]] &}], #[[3]] &]
Pick[dat, #, Max@#] &@dat[[All, 3]]
Reap[Fold[(If[#2[[3]] >= #, Sow@#2]; #2[[3]]) &, dat]][[2, 1]]
Of these Pick appears to be concise and efficient, so it is my recommendation.
Edit: Position and Extract are three times as efficient as Pick on some data. Using Transpose is slightly more efficient on packed rectangular data.
dat ~Extract~ Position[#, Max@#] & @ dat[[All, 3]]
dat ~Extract~ Position[#, Max@#] & @ Part[dat\[Transpose], 3]
Here are some timings performed in version 7:
SetAttributes[timeAvg, HoldFirst] timeAvg[func_] := Do[If[# > 0.3, Return[#/5^i]] & @@ Timing@Do[func, {5^i}], {i, 0, 15}] SeedRandom[1] dat = RandomInteger[99999, {500000, 3}]; Cases[dat, {_, _, Max@dat[[All, 3]]}] // timeAvg Last@SplitBy[SortBy[dat, {#[[3]] &}], #[[3]] &] // timeAvg Pick[dat, #, Max@#] &@dat[[All, 3]] // timeAvg Reap[Fold[(If[#2[[3]] >= #, Sow@#2]; #2[[3]]) &, dat]][[2, 1]] // timeAvg dat ~Extract~ Position[#, Max@#] &@dat[[All, 3]] // timeAvg dat ~Extract~ Position[#, Max@#] &@Part[dat\[Transpose], 3] // timeAvg 0.1278
0.764
0.0904
0.904
0.02996
0.02496
(In actuality I restarted the Kernel between each individual timing line as otherwise each run gets slower, unfairly biasing the test toward the earlier lines.)
These can be further optimized by using faster position functions for numeric data.
Michael E2 recommended compiling (probably faster in versions after 7):
pos = Compile[{{list, _Real, 1}, {pat, _Real}}, Position[list, pat]]; dat ~Extract~ pos[#, Max@#] & @ Part[dat\[Transpose], 3] // timeAvg 0.01372
My favorite method is SparseArray properties:
spos = SparseArray[Unitize[#], Automatic, 1]["AdjacencyLists"] &; dat[[spos[# - Max@#]]] & @ Part[dat\[Transpose], 3] // timeAvg 0.002872
This is now about 30X faster than Pick, my original recommendation.
#[[3]] & is replaceable with Last[], e.g. Last[SplitBy[SortBy[data, Last], Last]]. $\endgroup$ Pick version but by that time someone else posted a method using Pick. Also, I think it is interesting to show different approaches, even if for a specific problem some of them are contrived or awkward because they may not be on another problem. $\endgroup$ Position is even faster (I think it is optimized to check only level 1). Try sticking pos = Compile[{{list, _Real, 1}, {pat, _Real}}, Position[list, pat]] into the last two. I get a speedup of more than a factor of 2 in V9.0.1 $\endgroup$ timeAvg I've found on Mma.SE. I thought you might like to compare it with a new function in V10, Needs["GeneralUtilities"]; AccurateTiming[Range[10^4]]`. $\endgroup$ This works:
data = {{10, b, 30}, {100, a, 40}, {1000, b, 10}, {1000, b, 70}, {100, b, 20}, {10, b, 70}}; Pick[data, data[[All, 3]], Max[data[[All, 3]]]] Pick[#, #[[All, 3]], Max[Last /@ #]] &[data], following Wizard's style. $\endgroup$ With[] can be useful... $\endgroup$ You can use Select to choose only those rows with the maximum value in the third column.
list = {{10, b, 30}, {100, a, 40}, {1000, b, 10}, {1000, b, 70}, {100, b, 20}, {10, b, 70}}; With[{max = Max@list[[All, 3]]}, Select[list, (#[[3]] == max) &]] As of version 10 you can use MaximalBy:
data = {{10, b, 30}, {100, a, 40}, {1000, b, 10}, {1000, b, 70}, {100, b, 20}, {10, b, 70}}; MaximalBy[data, Last] {{1000, b, 70}, {10, b, 70}}
Here's a method with a stable sort involving two of my favorite functions, Reap and Sow:
Module[{a = -Infinity}, Reap[ Sow[{##}, a = Max[a, #3]; #3] & @@@ dat, _, If[#1 == a, #2, Unevaluated[Sequence[]]] & ][[2, 1]] ] The way Reap and Sow work is that Sow attaches to each term a tag, and Reap collects those tags according to a Pattern (second parameter), and a function can then be applied to the collected terms (third parameter).
In this case, I use the third element of the tuple as the tag, while keeping a running total of the Max value, a. And for the function, it determines which tuple has a tag equal to the Max, spitting out an empty Sequence if it doesn't.
As a curious note, initially I tried attaching the test to the Pattern parameter, but it is applied before the list has been fully traversed, so it included tuples that did not have a max third term. Apparently, the function is applied after the list has been traversed, so a had attained its maximum value by the point it was used.
Update: Here's a nice and short one (if not fast):
data = {{10, b, 30}, {100, a, 40}, {1000, b, 10}, {1000, b, 70}, {100, b, 20}, {10, b, 70}} Last@SplitBy[SortBy[data, Last], Last] (* ==> {{10, b, 70}, {1000, b, 70}} *) You got many nice solutions. I'd like to add one more, which is less general, and only works when there's a singe maximum, but illustrates nicely how Ordering is useful for minimum/maximum element problems:
Analogously to SortBy, we can define
MaxBy[list_, fun_] := list[[First@Ordering[fun /@ list, -1]]] Then with your data,
MaxBy[data, Last] Again, this will give you a single result only, not two as in your example.
SortBy[data, Last], the sorting method used is unstable. It may or may not matter for your application, but you need to keep this in mind. $\endgroup$ Another option:
data = {{10, b, 30}, {100, a, 40}, {1000, b, 10}, {1000, b, 70}, {100, b, 20}, {10, b, 70}} $\left( \begin{array}{ccc} 10 & b & 30 \\ 100 & a & 40 \\ 1000 & b & 10 \\ 1000 & b & 70 \\ 100 & b & 20 \\ 10 & b & 70 \\ \end{array} \right)$
data[[#]] & /@ First /@ Position[data, Max@data[[All, 3]]] $\left( \begin{array}{ccc} 1000 & b & 70 \\ 10 & b & 70 \\ \end{array} \right)$
Pickwould probably be fastest (@J.M solution) if the list is long. $\endgroup$