How to specify more spaces for the delimiter using cut?

Question

Is there a way to specify multiple spaces as a field delimiter with the cut command (something like a " "+ regex)? For example, what field delimiter I should specify for the following string to reach value 3744?

$ps axu | grep jboss jboss 2574 0.0 0.0 3744 1092 ? S Aug17 0:00 /bin/sh /usr/java/jboss/bin/run.sh -c example.com -b 0.0.0.0

cut -d' ' is not what I want, because it's only for a single space. awk is not what I am looking for either, so how to do this with cut?

best answer is using tr as shown here: stackoverflow.com/a/4483833/168143 — John Bachir
– John Bachir, Commented Jan 18, 2013 at 11:05
Not directly relevant to the actual question being asked but instead of ps+grep you could use pgrep which is available in most modern distros. It will return the result exactly in the form you need it. — ccpizza
– ccpizza, Commented Apr 8, 2013 at 14:03
Possible duplicate of How to make the 'cut' command treat multiple characters as one delimiter? — user9645477
– user9645477, Commented Apr 16, 2018 at 4:06
These days I just use hck as a drop in cut replacement. By default it splits on all whitespace, like awk. And the key feature is that you can specify a delimiter with -d like cut, but unlike cut that delimiter can be a regex! No more needing to pre-process with tr -s before passing to cut. You can find hck here: github.com/sstadick/hck — Chris
– Chris, Commented Jan 19, 2023 at 23:14
Does this answer your question? Does CUT support multiple spaces as the delimiter? — dsimic
– dsimic, Commented Aug 22, 2023 at 2:33

paxdiablo · Accepted Answer · 2015-07-16 12:21:39Z

Actually awk is exactly the tool you should be looking into:

ps axu | grep '[j]boss' | awk '{print $5}'

or you can ditch the grep altogether since awk knows about regular expressions:

ps axu | awk '/[j]boss/ {print $5}'

But if, for some bizarre reason, you really can't use awk, there are other simpler things you can do, like collapse all whitespace to a single space first:

ps axu | grep '[j]boss' | sed 's/\s\s*/ /g' | cut -d' ' -f5

That grep trick, by the way, is a neat way to only get the jboss processes and not the grep jboss one (ditto for the awk variant as well).

The grep process will have a literal grep [j]boss in its process command so will not be caught by the grep itself, which is looking for the character class [j] followed by boss.

This is a nifty way to avoid the | grep xyz | grep -v grep paradigm that some people use.

Great answer. I'll be coming back to look this up again next time I need it.
The grep trick seems to not work in crontab files. Any reason?
I keep learning and forgetting the grep trick. Thanks for my most recent reminder. Maybe this time it'll stick. But I wouldn't bet on it.
This is great answer but the OP asked how to do it with cut, so I think stackoverflow.com/a/29685565/869951 deserves more credit than it currently has.
Oliver, sometimes the best answer to "how do I do X with Y?" is "Don't use Y, use Z instead". Since OP accepted this answer, it's likely I convinced them of that :-)

fedorqui · Accepted Answer · 2017-02-15 11:39:06Z

138

awk version is probably the best way to go, but you can also use cut if you firstly squeeze the repeats with tr:

ps axu | grep jbos[s] | tr -s ' ' | cut -d' ' -f5 # ^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^ # | | | # | | get 5th field # | | # | squeeze spaces # | # avoid grep itself to appear in the list

edited Feb 15, 2017 at 11:39

answered Jan 31, 2014 at 9:40

fedorqui

293k112 gold badges592 silver badges640 bronze badges

5 Comments

Haggra Over a year ago

Fancy illustration.

Chris Over a year ago

tr -s ' ' is mighty nice! I hope I can remember that better than awk

fedorqui Over a year ago

@Chris I have to object :D Awk is way better for these things!!

Weekend Over a year ago

@fedorqui When it comes to print nth field to the end, the cut -f5- grammar, "-fN-" is much simpler than awk.

fedorqui Over a year ago

@Weekend agreed.

RobertDeRose · Accepted Answer · 2015-04-16 20:42:51Z

46

I like to use the tr -s command for this

 ps aux | tr -s [:blank:] | cut -d' ' -f3

This squeezes all white spaces down to 1 space. This way telling cut to use a space as a delimiter is honored as expected.

answered Apr 16, 2015 at 20:42

RobertDeRose

6598 silver badges5 bronze badges

1 Comment

Oliver Over a year ago

I think this should be the answer, it is closer to the OP request (asked to use cut). This approach is 5-10% slower than the awk approach (because there is one more pipe to handle with tr), but in general this will be irrelevant.

kenorb · Accepted Answer · 2015-08-11 21:56:51Z

I am going to nominate tr -s [:blank:] as the best answer.

Why do we want to use cut? It has the magic command that says "we want the third field and every field after it, omitting the first two fields"

cat log | tr -s [:blank:] |cut -d' ' -f 3-

I do not believe there is an equivalent command for awk or perl split where we do not know how many fields there will be, ie out put the 3rd field through field X.

arielf · Accepted Answer · 2019-01-12 08:06:27Z

Shorter/simpler solution: use `cuts` (cut on steroids I wrote)

ps axu | grep '[j]boss' | cuts 4

Note that cuts field indexes are zero-based so 5th field is specified as 4

http://arielf.github.io/cuts/

And even shorter (not using cut at all) is:

pgrep jboss

Jared Ng · Accepted Answer · 2011-08-22 03:01:53Z

8

One way around this is to go:

$ps axu | grep jboss | sed 's/\s\+/ /g' | cut -d' ' -f3

to replace multiple consecutive spaces with a single one.

answered Aug 22, 2011 at 3:01

Jared Ng

5,0712 gold badges22 silver badges18 bronze badges

2 Comments

rjurney Over a year ago

Strange, this does not work on OS X. The sed command does not change multiple spaces to one space.

Jared Ng Over a year ago

\s is a GNU sed extension. On OS X you can pass the -E flag to sed to enable extended regular expressions, then use [[:space:]] in place of \s, like so: sed -E 's/[[:space:]]+/ /g'

paulsm4 · Accepted Answer · 2011-08-22 03:00:02Z

5

Personally, I tend to use awk for jobs like this. For example:

ps axu| grep jboss | grep -v grep | awk '{print $5}'

answered Aug 22, 2011 at 3:00

paulsm4

123k23 gold badges175 silver badges245 bronze badges

2 Comments

zwol Over a year ago

That can be compressed down to ps axu | awk '/[j]boss/ {print $5}'.

pihentagy Over a year ago

Isn't awk slower (especially when there are some superfluous other processes), then sed / grep / cut?

flitz · Accepted Answer · 2016-02-26 07:09:13Z

2

As an alternative, there is always perl:

ps aux | perl -lane 'print $F[3]'

Or, if you want to get all fields starting at field #3 (as stated in one of the answers above):

ps aux | perl -lane 'print @F[3 .. scalar @F]'

answered Feb 26, 2016 at 7:09

flitz

231 silver badge4 bronze badges

4 Comments

rubo77 Over a year ago

This does not work with the output of lsof I tried lsof|perl -lane 'print $F[5]' this sometimes gets the 5th column, sometimes the 6th

flitz Over a year ago

I think the question just was how to use delimiters that might contain a varying number of spaces. For this purpose the answer was correct.

flitz Over a year ago

In lsof the problem is that the number of columns is not always consistent in each row.

rubo77 Over a year ago

You can use this answer: Get a certain column of an output with content aligned right and some columns not always filled

Mike · Accepted Answer · 2018-09-12 15:58:59Z

If you want to pick columns from a ps output, any reason to not use -o?

e.g.

ps ax -o pid,vsz ps ax -o pid,cmd

Minimum column width allocated, no padding, only single space field separator.

ps ax --no-headers -o pid:1,vsz:1,cmd 3443 24600 -bash 8419 0 [xfsalloc] 8420 0 [xfs_mru_cache] 8602 489316 /usr/sbin/apache2 -k start 12821 497240 /usr/sbin/apache2 -k start 12824 497132 /usr/sbin/apache2 -k start

Pid and vsz given 10 char width, 1 space field separator.

ps ax --no-headers -o pid:10,vsz:10,cmd 3443 24600 -bash 8419 0 [xfsalloc] 8420 0 [xfs_mru_cache] 8602 489316 /usr/sbin/apache2 -k start 12821 497240 /usr/sbin/apache2 -k start 12824 497132 /usr/sbin/apache2 -k start

Used in a script:-

oldpid=12824 echo "PID: ${oldpid}" echo "Command: $(ps -ho cmd ${oldpid})"

dsimic · Accepted Answer · 2023-09-10 02:54:38Z

I've implemented a patch that adds new -m command-line option to cut(1), which works in the field mode and treats multiple consecutive delimiters as a single delimiter. This basically solves the OP's question in a rather efficient way, by treating several spaces as one delimiter right within cut(1).

In particular, with my patch applied, the following command will perform the desired operation. It's as simple as that, just add -m into the invocation of cut(1) and simply use -d ' ' -f 5 to extract the PID values from the process list produced by ps(1):

ps axu | grep jboss | cut -d ' ' -m -f 5

I also submitted this patch upstream, and let's hope that it will eventually be accepted and merged into the coreutils project.

There are some further thoughts about adding even more whitespace-related features to cut(1), and having some feedback on all that from different people would be great, preferably on the coreutils mailing list. I'm willing to implement more patches for cut(1) and submit them upstream, which would make this utility more versatile and more usable in various real-world scenarios.

My previous answer to this question was deleted because it wasn't tailored specifically to this question. Thus, I answered this question again, providing a much more specific answer. I hope it's fine now.

BMW · Accepted Answer · 2014-02-03 06:11:36Z

Another way if you must use cut command

ps axu | grep [j]boss |awk '$1=$1'|cut -d' ' -f5

In Solaris, replace awk with nawk or /usr/xpg4/bin/awk

AAAfarmclub · Accepted Answer · 2015-08-26 03:41:53Z

I still like the way Perl handles fields with white space.
First field is $F[0].

$ ps axu | grep dbus | perl -lane 'print $F[4]'

Ondra Žižka · Accepted Answer · 2018-02-07 15:18:35Z

My approach is to store the PID to a file in /tmp, and to find the right process using the -S option for ssh. That might be a misuse but works for me.

#!/bin/bash TARGET_REDIS=${1:-redis.someserver.com} PROXY="proxy.somewhere.com" LOCAL_PORT=${2:-6379} if [ "$1" == "stop" ] ; then kill `cat /tmp/sshTunel${LOCAL_PORT}-pid` exit fi set -x ssh -f -i ~/.ssh/aws.pem centos@$PROXY -L $LOCAL_PORT:$TARGET_REDIS:6379 -N -S /tmp/sshTunel$LOCAL_PORT ## AWS DocService dev, DNS alias # SSH_PID=$! ## Only works with & SSH_PID=`ps aux | grep sshTunel${LOCAL_PORT} | grep -v grep | awk '{print $2}'` echo $SSH_PID > /tmp/sshTunel${LOCAL_PORT}-pid

Better approach might be to query for the SSH_PID right before killing it, since the file might be stale and it would kill a wrong process.

Collectives™ on Stack Overflow

How to specify more spaces for the delimiter using cut?

13 Answers 13

7 Comments

5 Comments

1 Comment

Comments

Shorter/simpler solution: use `cuts` (cut on steroids I wrote)

Comments

2 Comments

2 Comments

4 Comments

Comments

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

7 Comments

5 Comments

1 Comment

Comments

Shorter/simpler solution: use cuts (cut on steroids I wrote)

Comments

2 Comments

2 Comments

4 Comments

Comments

1 Comment

Comments

Comments

Comments

Linked

Related

Shorter/simpler solution: use `cuts` (cut on steroids I wrote)