Sorting arrays in NumPy by column

Question

How do I sort a NumPy array by its nth column?

For example, given:

a = array([[9, 2, 3], [4, 5, 6], [7, 0, 5]])

I want to sort the rows of a by the second column to obtain:

array([[7, 0, 5], [9, 2, 3], [4, 5, 6]])

Mateen Ulhaq · Accepted Answer · 2021-04-02 11:24:12Z

1021

To sort by the second column of a:

a[a[:, 1].argsort()]

edited Apr 2, 2021 at 11:24

Mateen Ulhaq

27.9k21 gold badges121 silver badges155 bronze badges

answered May 13, 2010 at 15:39

Steve Tjoa

61.5k18 gold badges93 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Steven C. Howell Over a year ago

If you want the reverse sort, modify this to be a[a[:,1].argsort()[::-1]]

Václav Pavlík Over a year ago

Looks simple and works! Is it faster than np.sort or not?

poppie Over a year ago

I find this easier to read: ind = np.argsort( a[:,1] ); a = a[ind]

bean Over a year ago

a[a[:,k].argsort()] is the same as a[a[:,k].argsort(),:]. This generalizes to the other dimension (sort cols using a row): a[:,a[j,:].argsort()] (hope i typed that right.)

pippo1980 Over a year ago

needed to use b = a[a[:, 1].argsort()] then b is the sorted one

|

Trenton McKinney · Accepted Answer · 2020-06-16 00:22:50Z

@steve's answer is actually the most elegant way of doing it.

For the "correct" way see the order keyword argument of numpy.ndarray.sort

However, you'll need to view your array as an array with fields (a structured array).

The "correct" way is quite ugly if you didn't initially define your array with fields...

As a quick example, to sort it and return a copy:

In [1]: import numpy as np In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]]) In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int) Out[3]: array([[0, 0, 1], [1, 2, 3], [4, 5, 6]])

To sort it in-place:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None In [7]: a Out[7]: array([[0, 0, 1], [1, 2, 3], [4, 5, 6]])

@Steve's really is the most elegant way to do it, as far as I know...

The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].

In my numpy 1.6.1rc1, it raises ValueError: new type not compatible with array.
Would it make sense to file a feature request that the "correct" way be made less ugly?
What if the values in the array are float? Should I change anything?
One major advantage of this method over Steve's is that it allows very large arrays to be sorted in place. For a sufficiently large array, the indices returned by np.argsort may themselve take up quite a lot of memory, and on top of that, indexing with an array will also generate a copy of the array that is being sorted.
Can someone explain the 'i8,i8,i8'? This is for each column or each row? What should change if sorting a different dtype? How do I find out how many bits are being used? Thank you

J.J · Accepted Answer · 2017-02-25 22:37:00Z

62

You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:

a = a[a[:,2].argsort()] # First sort doesn't need to be stable. a = a[a[:,1].argsort(kind='mergesort')] a = a[a[:,0].argsort(kind='mergesort')]

This sorts by column 0, then 1, then 2.

edited Feb 25, 2017 at 22:37

answered Jul 5, 2016 at 1:42

J.J

3,6172 gold badges33 silver badges37 bronze badges

3 Comments

Little Bobby Tables Over a year ago

Why does First Sort not need to be stable?

J.J Over a year ago

Good question - stable means that when there's a tie you maintain the original order, and the original order of the unsorted file is irrelevant.

Clumsy cat Over a year ago

This seems like a really super important point. having a list that silently doesn’t sort would be bad.

prl900 · Accepted Answer · 2016-02-25 10:37:19Z

In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:

import numpy as np table = np.random.rand(5000, 10) %timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0) 1000 loops, best of 3: 1.88 ms per loop %timeit table[table[:,9].argsort()] 10000 loops, best of 3: 180 µs per loop import pandas as pd df = pd.DataFrame(table) %timeit df.sort_values(9, ascending=True) 1000 loops, best of 3: 400 µs per loop

So, it looks like indexing with argsort is the quickest method so far...

Peter Mortensen · Accepted Answer · 2017-05-26 10:00:55Z

From the NumPy mailing list, here's another solution:

>>> a array([[1, 2], [0, 0], [1, 0], [0, 2], [2, 1], [1, 0], [1, 0], [0, 0], [1, 0], [2, 2]]) >>> a[np.lexsort(np.fliplr(a).T)] array([[0, 0], [0, 0], [0, 2], [1, 0], [1, 0], [1, 0], [1, 0], [1, 2], [2, 1], [2, 2]])

The correct generalization is a[np.lexsort(a.T[cols])]. where cols=[1] in the original question.

Mateen Ulhaq · Accepted Answer · 2022-06-20 03:05:14Z

24

As the Python documentation wiki suggests:

a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]); a = sorted(a, key=lambda a_entry: a_entry[1]) print a

Output:

[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]

edited Jun 20, 2022 at 3:05

Mateen Ulhaq

27.9k21 gold badges121 silver badges155 bronze badges

answered Sep 28, 2011 at 20:05

user541064

3332 silver badges7 bronze badges

4 Comments

Eric O. Lebigot Over a year ago

With this solution, one gets a list instead of a NumPy array, so this might not always be convenient (takes more memory, is probably slower, etc.).

Jivan Over a year ago

this "solution" is slower by the most-upvoted answer by a factor of ... well, close to infinity actually

Antony Hatchkins Over a year ago

@Jivan Actually, this solution is faster than the most-upvoted answer by a factor of 5 imgur.com/a/IbqtPBL

Kelly Bundy Over a year ago

@AntonyHatchkins But this doesn't do the whole job. Produces a list instead of an array. I get similar times as yours (3.02 ms vs 549 μs), but if I finish this by applying np.array to the result, it goes up to 4.3 ms.

Peter Mortensen · Accepted Answer · 2017-05-26 10:09:52Z

I had a similar problem.

My Problem:

I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors. My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.

So I want to sort a two-dimensional array column-wise by the first row in descending order.

My Solution

a = a[::, a[0,].argsort()[::-1]]

So how does this work?

a[0,] is just the first row I want to sort by.

Now I use argsort to get the order of indices.

I use [::-1] because I need descending order.

Lastly I use a[::, ...] to get a view with the columns in the right order.

David Buck · Accepted Answer · 2020-06-27 09:14:07Z

import numpy as np a=np.array([[21,20,19,18,17],[16,15,14,13,12],[11,10,9,8,7],[6,5,4,3,2]]) y=np.argsort(a[:,2],kind='mergesort')# a[:,2]=[19,14,9,4] a=a[y] print(a)

Desired output is [[6,5,4,3,2],[11,10,9,8,7],[16,15,14,13,12],[21,20,19,18,17]]

note that argsort(numArray) returns the indices of an numArray as it was supposed to be arranged in a sorted manner.

example

x=np.array([8,1,5]) z=np.argsort(x) #[1,3,0] are the **indices of the predicted sorted array** print(x[z]) #boolean indexing which sorts the array on basis of indices saved in z

answer would be [1,5,8]

hpaulj · Accepted Answer · 2016-08-07 16:33:59Z

A little more complicated lexsort example - descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort are that it sorts on rows (hence the .T), and gives priority to the last.

In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]]) In [121]: b Out[121]: array([[1, 2, 1], [3, 1, 2], [1, 1, 3], [2, 3, 4], [3, 2, 5], [2, 1, 6]]) In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)] Out[122]: array([[3, 1, 2], [3, 2, 5], [2, 1, 6], [2, 3, 4], [1, 1, 3], [1, 2, 1]])

rubengavidia0x · Accepted Answer · 2022-03-04 23:09:08Z

Pandas Approach Just For Completeness:

a = np.array([[9, 2, 3], [4, 5, 6], [7, 0, 5]]) a = pd.DataFrame(a) a.sort_values(1, ascending=True).to_numpy() array([[7, 0, 5], # '1' means sort by second column [9, 2, 3], [4, 5, 6]])

prl900 Did the Benchmark, comparing with the accepted answer:

%timeit pandas_df.sort_values(9, ascending=True) 1000 loops, best of 3: 400 µs per loop %timeit numpy_table[numpy_table[:,9].argsort()] 10000 loops, best of 3: 180 µs per loop

Sefa · Accepted Answer · 2018-01-30 19:36:58Z

Here is another solution considering all columns (more compact way of J.J's answer);

ar=np.array([[0, 0, 0, 1], [1, 0, 1, 0], [0, 1, 0, 0], [1, 0, 0, 1], [0, 0, 1, 0], [1, 1, 0, 0]])

Sort with lexsort,

ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]

Output:

array([[0, 0, 0, 1], [0, 0, 1, 0], [0, 1, 0, 0], [1, 0, 0, 1], [1, 0, 1, 0], [1, 1, 0, 0]])

Ehsan · Accepted Answer · 2020-04-27 04:59:01Z

It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:

np.einsum('ij->ij', a[a[:,1].argsort(),:])

This is an overkill for two dimensions and a[a[:,1].argsort()] would be enough per @steve's answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.

Output:

[[7 0 5] [9 2 3] [4 5 6]]

umair ali · Accepted Answer · 2020-08-15 08:45:00Z

#for sorting along column 1

indexofsort=np.argsort(dataset[:,0],axis=-1,kind='stable') dataset = dataset[indexofsort,:]

Arkady · Accepted Answer · 2021-01-31 14:58:57Z

def sort_np_array(x, column=None, flip=False): x = x[np.argsort(x[:, column])] if flip: x = np.flip(x, axis=0) return x

Array in the original question:

a = np.array([[9, 2, 3], [4, 5, 6], [7, 0, 5]])

The result of the sort_np_array function as expected by the author of the question:

sort_np_array(a, column=1, flip=False)

[2]: array([[7, 0, 5], [9, 2, 3], [4, 5, 6]])

lhoupert · Accepted Answer · 2021-06-01 12:12:15Z

Thanks to this post: https://stackoverflow.com/a/5204280/13890678

I found a more "generic" answer using structured array. I think one advantage of this method is that the code is easier to read.

import numpy as np a = np.array([[9, 2, 3], [4, 5, 6], [7, 0, 5]]) struct_a = np.core.records.fromarrays( a.transpose(), names="col1, col2, col3", formats="i8, i8, i8" ) struct_a.sort(order="col2") print(struct_a)

[(7, 0, 5) (9, 2, 3) (4, 5, 6)]

marc_s · Accepted Answer · 2022-03-05 08:17:05Z

Simply using sort, use column number based on which you want to sort.

a = np.array([1,1], [1,-1], [-1,1], [-1,-1]]) print (a) a = a.tolist() a = np.array(sorted(a, key=lambda a_entry: a_entry[0])) print (a)

Collectives™ on Stack Overflow

Sorting arrays in NumPy by column

16 Answers 16

10 Comments

15 Comments

3 Comments

Comments

1 Comment

4 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

10 Comments

15 Comments

3 Comments

Comments

1 Comment

4 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related