23

I have been trying to get the unique values in each column of a tab delimited file in bash. So, I used the following command.

cut -f <column_number> <filename> | sort | uniq -c 

It works fine and I can get the unique values in a column and its count like

105 Linux 55 MacOS 500 Windows 

What I want to do is instead of sorting by the column value names (which in this example are OS names) I want to sort them by count and possibly have the count in the second column in this output format. So It will have to look like:

Windows 500 MacOS 105 Linux 55 

How do I do this?

3 Answers 3

20

You can use (where N is the column number and F is the input file):

cut -f N F |sort |uniq -c |sort -nrk1,1 |awk '{print $2" "$1}' 

The initial sort/uniq is to get each OS in the form <count> <os> so that the rest of the pipeline can work on it.

The sort -nrk1,1 sorts numerically (n), in reverse order (r), using the first field (-k1,1).

The awk then simply reverses the order of the columns. You can test the full pipeline with the following:

pax> cat test.in a Windows b Linux c Windows d Windows e Linux f Windows g MacOS h Linux i Windows j MacOS k Windows l Linux m MacOS n Windows o Linux p MacOS q Windows r Linux s Linux t Linux u Linux v Linux pax> cut -f2 test.in |sort |uniq -c |sort -nrk1,2 |awk '{print $2" "$1}' Linux 10 Windows 8 MacOS 4 

This test file format is similar in style to your own input, including tabs separating the fields. It's unlikely to be the exact same format so you'll need to tailor the cut command to your own file, in such a way that it only gives you the desired column.

However, you've probably already done that since that's not the bit you're asking about.

Sign up to request clarification or add additional context in comments.

Comments

2

Mine:

cut -f <column_number> <filename> | sort | uniq -c | awk '{ print $2" "$1}' | sort 

This will alter the column order (awk) and then just sort the output.

Hope this will help you

2 Comments

That sorts by name rather than count.
Sure, from sfactor question: "What I want to do is instead of sorting by the column value names"
0

Using sed based on Tagged RE:

cut -f <column_number> <filename> | sort | uniq -c | sort -r -k1 -n | sed 's/\([0-9]*\)[ ]*\(.*\)/\2 \1/' 

Doesn't produce output in a neat format though.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.