143

I have a file (user.csv)like this

ip,hostname,user,group,encryption,aduser,adattr 

want to print all column sort by user,

I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.

1
  • 17
    sort -t, -k3 file Commented Jun 11, 2013 at 15:39

11 Answers 11

257

How about just sort.

sort -t, -nk3 user.csv 

where

  • -t, - defines your delimiter as ,.

  • -n - gives you numerical sort. Added since you added it in your attempt. If your user field is text only then you dont need it.

  • -k3 - defines the field (key). user is the third field.

Sign up to request clarification or add additional context in comments.

6 Comments

How can I use sort 2 columns? for example, I want sort by column 6 first, and sort by column 3 second.
This won't work if there are quoted strings containing commas in the CSV (unless the column you want to sort by is earlier than the comma-containing column). You might have to make a pass first with awk (using FPAT="[^,]*|\"[^\"]*\"" and OFS="|" or some other delimiter that you could use with sort)
@user2452340 You could do this: sort -t, -nk3 filename.csv | sort -t, -nk6 - first it will sort by column 3, then will sort that by column 6 so column 6 is sorted correctly all the way and for any rows where column 6 is the same, those will be sorted by column 3.
@Matthew sort -t ',' -k3,3n -k6,6n will be better. -k3 will use column 3 and the rest of the line.
I just needed the -t, to divide my 2 column file divided by commas, thanks jaypal
|
33
  1. Use awk to put the user ID in front.
  2. Sort
  3. Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces.

    awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //' 

4 Comments

This is very useful, especially if you need to parse or combine columns to add a sort field, then retain only the original line. I used awk/split to parse/combine date & time fields for a sort, then remove.
sort already knows how to sort by a particular column, but this technique -- known as the Schwartzian transform -- is useful when the field you want to sort on is not trivially a well-defined column.
Would the print $3, $0 print the sorted columns, or would it print the before-sorting columns?
You can test it on a short file with vs without the sort to verify. The answer is, the print step is completed first, the output of which is piped | to sort. So there would NEVER be any printing before sorting because of the order of operations with pipe |
20

Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk.

Sample net.csv file with headers:

ip,hostname,user,group,encryption,aduser,adattr 192.168.0.1,gw,router,router,-,-,- 192.168.0.2,server,admin,admin,-,-,- 192.168.0.3,ws-03,user,user,-,-,- 192.168.0.4,ws-04,user,user,-,-,- 

And sort.awk:

#!/usr/bin/env -S awk -f # # original source: # https://stackoverflow.com/a/65768883/586229 # # Usage: # awk -f sort.awk [-F<field separator>] [-v h=HAS_HEADER] [-v f=COLUMN_TO_SORT_BY] INPUT_FILE # Examples: # awk -f sort.awk -F, -v h=1 -v f=1 input.csv > output.csv # cat input.txt | awk -f sort.awk | tee -a output.txt # for each line { if (h && NR == 0) { print $0 } else { a[NR-h]=$0 "" s[NR-h]=$f "" } } END { isort(s, a, NR-h); for (i = 1; i <= NR-h; i++) { print a[i] } } # insertion sort of A[1..n] function isort(S, A, n, i, j) { for (i = 2; i <= n; i++) { hs = S[j=i] ha = A[j=i] while (S[j-1] > hs) { j--; S[j+1] = S[j] A[j+1] = A[j] } S[j] = hs A[j] = ha } } 

To use it:
See header in the script.

Update: see my other answer for 100x speedup doing Quicksort instead of Insertion sort.

3 Comments

Thanks for actually answering the users question...
Thanks for not being part of the problem, dagelf. I literally came here to do exactly this and all the other answers were useless.
If you're sorting A LOT of data, see the quicksort answer, its a lot faster!
12

You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order:

awk -F\: '{print $1|"sort -u"}' /etc/passwd 

Comments

10
awk -F, '{ print $3, $0 }' user.csv | sort -nk2 

and for reverse order

awk -F, '{ print $3, $0 }' user.csv | sort -nrk2 

Comments

6

try this -

awk '{print $0|"sort -t',' -nk3 "}' user.csv 

OR

sort -t',' -nk3 user.csv 

1 Comment

Is the | to sort using a built-in to awk? If not, any idea why -V - version sort - would not work here? Also, if I choose not to use -t option, it seems that to select the third column, I need to use -k4 - odd indeed!
4
awk -F "," '{print $0}' user.csv | sort -nk3 -t ',' 

This should work

Comments

3

To exclude the first line (header) from sorting, I split it out into two buffers.

df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}' 

Comments

2

With GNU awk:

awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="@ind_str_asc"; for(i in a) print a[i] }' file 

See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.

Comments

1

Here is another actual awk sort script. This script is slightly longer, but 100x faster.

#!/usr/bin/env -S awk -f # Awk Quicksort # Usage: # awk -f sort.awk [-F<field separator>] [-v header=1] [-v field=N] [-v reverse=1] [-v numeric=1] INPUT_FILE # Examples: # awk -f sort.awk -F, -v header=1 -v field=1 input.csv > sorted.csv # awk -f sort.awk -F, -v header=1 -v field=2 -v reverse=1 -v numeric=1 input.csv > sorted.csv # cat input.txt | awk -f sort.awk -v field=3 | tee output.txt BEGIN { # Initialize variables if not set header = (header == "") ? 0 : header # Whether input has a header row field = (field == "") ? 1 : field # Column to sort by reverse = (reverse == "") ? 0 : reverse # Whether to sort in reverse order numeric = (numeric == "") ? 0 : numeric # Whether to use numeric sorting } # Store header separately if present NR == 1 && header { header_line = $0 next } # Store each line and its key for sorting { # Store the full line lines[NR - header] = $0 # Extract the sort key if (field <= NF) { keys[NR - header] = $field } else { # If field number is larger than available fields, use empty string keys[NR - header] = "" } } END { # Print header if present if (header) { print header_line } # Sort and print the data n = length(lines) quicksort(keys, lines, 1, n) # Print sorted results for (i = 1; i <= n; i++) { idx = reverse ? n - i + 1 : i print lines[idx] } } # Quicksort implementation function quicksort(keys, lines, left, right) { if (left >= right) return # Choose pivot (middle element) pivot_idx = int((left + right) / 2) pivot = keys[pivot_idx] # Partition i = left j = right while (i <= j) { while (compare(keys[i], pivot) < 0) i++ while (compare(keys[j], pivot) > 0) j-- if (i <= j) { # Swap elements temp_key = keys[i] temp_line = lines[i] keys[i] = keys[j] lines[i] = lines[j] keys[j] = temp_key lines[j] = temp_line i++ j-- } } # Recursive calls if (left < j) quicksort(keys, lines, left, j) if (i < right) quicksort(keys, lines, i, right) } # Comparison function that handles both numeric and string comparisons function compare(a, b) { if (numeric) { return (a + 0) - (b + 0) # Force numeric comparison } return a < b ? -1 : (a > b ? 1 : 0) } 

To use it:
For usage examples, see the header in the script. Or save it to a file called sort.awk, and then chmod +x sort.awk, then you can call it like any other program, with ./sort.awk ...

Comments

0

I'm running Linux (Ubuntu) with mawk:

tmp$ awk -W version mawk 1.3.4 20200120 Copyright 2008-2019,2020, Thomas E. Dickey Copyright 1991-1996,2014, Michael D. Brennan random-funcs: srandom/random regex-funcs: internal compiled limits: sprintf buffer 8192 maximum-integer 2147483647 

mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output:

The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream.

Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter:

tmp$ cat input.csv alpha,num D,4 B,2 A,1 E,5 F,10 C,3 
tmp$ cat sort.awk # print header line /^alpha,num/ { print } # all other lines are data lines that should be sorted !/^alpha,num/ { print | "sort --field-separator=, --key=2 --numeric-sort" } 
tmp$ awk -f sort.awk input.csv alpha,num A,1 B,2 C,3 D,4 E,5 F,10 

See man sort for the details of the sort options:

-t, --field-separator=SEP use SEP instead of non-blank to blank transition -k, --key=KEYDEF sort via a key; KEYDEF gives location and type -n, --numeric-sort compare according to string numerical value 

1 Comment

One can correctly argue this answer has no new information. However I found most of the other answers unnecessary terse and I had to dig out the magic of | from the mighty manual. So I wrote a note to myself for the next time :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.