How to use awk sort by column 3

Question

I have a file (user.csv)like this

ip,hostname,user,group,encryption,aduser,adattr

want to print all column sort by user,

I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.

sort -t, -k3 file

Kevin
– Kevin

2013-06-11 15:39:39 +00:00
Commented Jun 11, 2013 at 15:39 — Kevin
– Kevin, Commented Jun 11, 2013 at 15:39

jaypal singh · Accepted Answer · 2013-06-11 15:46:40Z

257

How about just sort.

sort -t, -nk3 user.csv

where

-t, - defines your delimiter as ,.
-n - gives you numerical sort. Added since you added it in your attempt. If your user field is text only then you dont need it.
-k3 - defines the field (key). user is the third field.

edited Jun 11, 2013 at 15:46

answered Jun 11, 2013 at 15:39

jaypal singh

77.6k24 gold badges108 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user2452340 Over a year ago

How can I use sort 2 columns? for example, I want sort by column 6 first, and sort by column 3 second.

davemyron Over a year ago

This won't work if there are quoted strings containing commas in the CSV (unless the column you want to sort by is earlier than the comma-containing column). You might have to make a pass first with awk (using FPAT="[^,]*|\"[^\"]*\"" and OFS="|" or some other delimiter that you could use with sort)

Matthew Over a year ago

@user2452340 You could do this: sort -t, -nk3 filename.csv | sort -t, -nk6 - first it will sort by column 3, then will sort that by column 6 so column 6 is sorted correctly all the way and for any rows where column 6 is the same, those will be sorted by column 3.

Kusalananda Over a year ago

@Matthew sort -t ',' -k3,3n -k6,6n will be better. -k3 will use column 3 and the rest of the line.

Ricardo Rivera Nieves Over a year ago

I just needed the -t, to divide my 2 column file divided by commas, thanks jaypal

|

user3781670 · Accepted Answer · 2014-06-27 07:48:08Z

33

Use awk to put the user ID in front.
Sort
Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces.
```
awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //' 
```

answered Jun 27, 2014 at 7:48

user3781670

3313 silver badges2 bronze badges

4 Comments

skytaker Over a year ago

This is very useful, especially if you need to parse or combine columns to add a sort field, then retain only the original line. I used awk/split to parse/combine date & time fields for a sort, then remove.

tripleee Over a year ago

sort already knows how to sort by a particular column, but this technique -- known as the Schwartzian transform -- is useful when the field you want to sort on is not trivially a well-defined column.

Tan Yu Hau Sean Over a year ago

Would the print $3, $0 print the sorted columns, or would it print the before-sorting columns?

ruoho ruotsi Jul 9 at 3:12

You can test it on a short file with vs without the sort to verify. The answer is, the print step is completed first, the output of which is piped | to sort. So there would NEVER be any printing before sorting because of the order of operations with pipe |

dagelf · Accepted Answer · 2024-10-30 22:21:30Z

Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk.

Sample net.csv file with headers:

ip,hostname,user,group,encryption,aduser,adattr 192.168.0.1,gw,router,router,-,-,- 192.168.0.2,server,admin,admin,-,-,- 192.168.0.3,ws-03,user,user,-,-,- 192.168.0.4,ws-04,user,user,-,-,-

And sort.awk:

#!/usr/bin/env -S awk -f # # original source: # https://stackoverflow.com/a/65768883/586229 # # Usage: # awk -f sort.awk [-F<field separator>] [-v h=HAS_HEADER] [-v f=COLUMN_TO_SORT_BY] INPUT_FILE # Examples: # awk -f sort.awk -F, -v h=1 -v f=1 input.csv > output.csv # cat input.txt | awk -f sort.awk | tee -a output.txt # for each line { if (h && NR == 0) { print $0 } else { a[NR-h]=$0 "" s[NR-h]=$f "" } } END { isort(s, a, NR-h); for (i = 1; i <= NR-h; i++) { print a[i] } } # insertion sort of A[1..n] function isort(S, A, n, i, j) { for (i = 2; i <= n; i++) { hs = S[j=i] ha = A[j=i] while (S[j-1] > hs) { j--; S[j+1] = S[j] A[j+1] = A[j] } S[j] = hs A[j] = ha } }

To use it:
See header in the script.

Update: see my other answer for 100x speedup doing Quicksort instead of Insertion sort.

Thanks for not being part of the problem, dagelf. I literally came here to do exactly this and all the other answers were useless.
If you're sorting A LOT of data, see the quicksort answer, its a lot faster!

Ketan · Accepted Answer · 2018-06-07 23:17:18Z

You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order:

awk -F\: '{print $1|"sort -u"}' /etc/passwd

vsingh · Accepted Answer · 2016-07-26 15:57:40Z

awk -F, '{ print $3, $0 }' user.csv | sort -nk2

and for reverse order

awk -F, '{ print $3, $0 }' user.csv | sort -nrk2

VIPIN KUMAR · Accepted Answer · 2016-10-09 11:47:39Z

6

try this -

awk '{print $0|"sort -t',' -nk3 "}' user.csv

OR

sort -t',' -nk3 user.csv

answered Oct 9, 2016 at 11:47

VIPIN KUMAR

3,1572 gold badges25 silver badges37 bronze badges

1 Comment

AnthonyK Over a year ago

Is the | to sort using a built-in to awk? If not, any idea why -V - version sort - would not work here? Also, if I choose not to use -t option, it seems that to select the third column, I need to use -k4 - odd indeed!

Francesco · Accepted Answer · 2020-05-25 04:09:17Z

4

awk -F "," '{print $0}' user.csv | sort -nk3 -t ','

This should work

edited May 25, 2020 at 4:09

Francesco

1,00711 silver badges26 bronze badges

answered May 24, 2020 at 17:42

user13608932

411 bronze badge

Comments

rupert160 · Accepted Answer · 2020-03-25 02:19:41Z

To exclude the first line (header) from sorting, I split it out into two buffers.

df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}'

Cyrus · Accepted Answer · 2021-11-22 18:35:37Z

With GNU awk:

awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="@ind_str_asc"; for(i in a) print a[i] }' file

See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.

dagelf · Accepted Answer · 2024-12-19 13:54:02Z

Here is another actual awk sort script. This script is slightly longer, but 100x faster.

#!/usr/bin/env -S awk -f # Awk Quicksort # Usage: # awk -f sort.awk [-F<field separator>] [-v header=1] [-v field=N] [-v reverse=1] [-v numeric=1] INPUT_FILE # Examples: # awk -f sort.awk -F, -v header=1 -v field=1 input.csv > sorted.csv # awk -f sort.awk -F, -v header=1 -v field=2 -v reverse=1 -v numeric=1 input.csv > sorted.csv # cat input.txt | awk -f sort.awk -v field=3 | tee output.txt BEGIN { # Initialize variables if not set header = (header == "") ? 0 : header # Whether input has a header row field = (field == "") ? 1 : field # Column to sort by reverse = (reverse == "") ? 0 : reverse # Whether to sort in reverse order numeric = (numeric == "") ? 0 : numeric # Whether to use numeric sorting } # Store header separately if present NR == 1 && header { header_line = $0 next } # Store each line and its key for sorting { # Store the full line lines[NR - header] = $0 # Extract the sort key if (field <= NF) { keys[NR - header] = $field } else { # If field number is larger than available fields, use empty string keys[NR - header] = "" } } END { # Print header if present if (header) { print header_line } # Sort and print the data n = length(lines) quicksort(keys, lines, 1, n) # Print sorted results for (i = 1; i <= n; i++) { idx = reverse ? n - i + 1 : i print lines[idx] } } # Quicksort implementation function quicksort(keys, lines, left, right) { if (left >= right) return # Choose pivot (middle element) pivot_idx = int((left + right) / 2) pivot = keys[pivot_idx] # Partition i = left j = right while (i <= j) { while (compare(keys[i], pivot) < 0) i++ while (compare(keys[j], pivot) > 0) j-- if (i <= j) { # Swap elements temp_key = keys[i] temp_line = lines[i] keys[i] = keys[j] lines[i] = lines[j] keys[j] = temp_key lines[j] = temp_line i++ j-- } } # Recursive calls if (left < j) quicksort(keys, lines, left, j) if (i < right) quicksort(keys, lines, i, right) } # Comparison function that handles both numeric and string comparisons function compare(a, b) { if (numeric) { return (a + 0) - (b + 0) # Force numeric comparison } return a < b ? -1 : (a > b ? 1 : 0) }

To use it:
For usage examples, see the header in the script. Or save it to a file called sort.awk, and then chmod +x sort.awk, then you can call it like any other program, with ./sort.awk ...

user272735 · Accepted Answer · 2022-09-29 09:03:07Z

I'm running Linux (Ubuntu) with mawk:

tmp$ awk -W version mawk 1.3.4 20200120 Copyright 2008-2019,2020, Thomas E. Dickey Copyright 1991-1996,2014, Michael D. Brennan random-funcs: srandom/random regex-funcs: internal compiled limits: sprintf buffer 8192 maximum-integer 2147483647

mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output:

The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream.

Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter:

tmp$ cat input.csv alpha,num D,4 B,2 A,1 E,5 F,10 C,3

tmp$ cat sort.awk # print header line /^alpha,num/ { print } # all other lines are data lines that should be sorted !/^alpha,num/ { print | "sort --field-separator=, --key=2 --numeric-sort" }

tmp$ awk -f sort.awk input.csv alpha,num A,1 B,2 C,3 D,4 E,5 F,10

See man sort for the details of the sort options:

-t, --field-separator=SEP use SEP instead of non-blank to blank transition -k, --key=KEYDEF sort via a key; KEYDEF gives location and type -n, --numeric-sort compare according to string numerical value

One can correctly argue this answer has no new information. However I found most of the other answers unnecessary terse and I had to dig out the magic of | from the mighty manual. So I wrote a note to myself for the next time :)

Collectives™ on Stack Overflow

How to use awk sort by column 3

11 Answers 11

6 Comments

4 Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

6 Comments

4 Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Linked

Related