233

Is there a way to specify multiple spaces as a field delimiter with the cut command (something like a " "+ regex)? For example, what field delimiter I should specify for the following string to reach value 3744?

$ps axu | grep jboss jboss 2574 0.0 0.0 3744 1092 ? S Aug17 0:00 /bin/sh /usr/java/jboss/bin/run.sh -c example.com -b 0.0.0.0 

cut -d' ' is not what I want, because it's only for a single space. awk is not what I am looking for either, so how to do this with cut?

6
  • 16
    best answer is using tr as shown here: stackoverflow.com/a/4483833/168143 Commented Jan 18, 2013 at 11:05
  • 1
    Not directly relevant to the actual question being asked but instead of ps+grep you could use pgrep which is available in most modern distros. It will return the result exactly in the form you need it. Commented Apr 8, 2013 at 14:03
  • 1
    Possible duplicate of How to make the 'cut' command treat multiple characters as one delimiter? Commented Apr 16, 2018 at 4:06
  • These days I just use hck as a drop in cut replacement. By default it splits on all whitespace, like awk. And the key feature is that you can specify a delimiter with -d like cut, but unlike cut that delimiter can be a regex! No more needing to pre-process with tr -s before passing to cut. You can find hck here: github.com/sstadick/hck Commented Jan 19, 2023 at 23:14
  • Does this answer your question? Does CUT support multiple spaces as the delimiter? Commented Aug 22, 2023 at 2:33

13 Answers 13

361

Actually awk is exactly the tool you should be looking into:

ps axu | grep '[j]boss' | awk '{print $5}' 

or you can ditch the grep altogether since awk knows about regular expressions:

ps axu | awk '/[j]boss/ {print $5}' 

But if, for some bizarre reason, you really can't use awk, there are other simpler things you can do, like collapse all whitespace to a single space first:

ps axu | grep '[j]boss' | sed 's/\s\s*/ /g' | cut -d' ' -f5 

That grep trick, by the way, is a neat way to only get the jboss processes and not the grep jboss one (ditto for the awk variant as well).

The grep process will have a literal grep [j]boss in its process command so will not be caught by the grep itself, which is looking for the character class [j] followed by boss.

This is a nifty way to avoid the | grep xyz | grep -v grep paradigm that some people use.

Sign up to request clarification or add additional context in comments.

7 Comments

Great answer. I'll be coming back to look this up again next time I need it.
The grep trick seems to not work in crontab files. Any reason?
I keep learning and forgetting the grep trick. Thanks for my most recent reminder. Maybe this time it'll stick. But I wouldn't bet on it.
This is great answer but the OP asked how to do it with cut, so I think stackoverflow.com/a/29685565/869951 deserves more credit than it currently has.
Oliver, sometimes the best answer to "how do I do X with Y?" is "Don't use Y, use Z instead". Since OP accepted this answer, it's likely I convinced them of that :-)
|
138

awk version is probably the best way to go, but you can also use cut if you firstly squeeze the repeats with tr:

ps axu | grep jbos[s] | tr -s ' ' | cut -d' ' -f5 # ^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^ # | | | # | | get 5th field # | | # | squeeze spaces # | # avoid grep itself to appear in the list 

5 Comments

Fancy illustration.
tr -s ' ' is mighty nice! I hope I can remember that better than awk
@Chris I have to object :D Awk is way better for these things!!
@fedorqui When it comes to print nth field to the end, the cut -f5- grammar, "-fN-" is much simpler than awk.
@Weekend agreed.
46

I like to use the tr -s command for this

 ps aux | tr -s [:blank:] | cut -d' ' -f3 

This squeezes all white spaces down to 1 space. This way telling cut to use a space as a delimiter is honored as expected.

1 Comment

I think this should be the answer, it is closer to the OP request (asked to use cut). This approach is 5-10% slower than the awk approach (because there is one more pipe to handle with tr), but in general this will be irrelevant.
12

I am going to nominate tr -s [:blank:] as the best answer.

Why do we want to use cut? It has the magic command that says "we want the third field and every field after it, omitting the first two fields"

cat log | tr -s [:blank:] |cut -d' ' -f 3- 

I do not believe there is an equivalent command for awk or perl split where we do not know how many fields there will be, ie out put the 3rd field through field X.

Comments

9

Shorter/simpler solution: use cuts (cut on steroids I wrote)

ps axu | grep '[j]boss' | cuts 4 

Note that cuts field indexes are zero-based so 5th field is specified as 4

http://arielf.github.io/cuts/

And even shorter (not using cut at all) is:

pgrep jboss 

Comments

8

One way around this is to go:

$ps axu | grep jboss | sed 's/\s\+/ /g' | cut -d' ' -f3 

to replace multiple consecutive spaces with a single one.

2 Comments

Strange, this does not work on OS X. The sed command does not change multiple spaces to one space.
\s is a GNU sed extension. On OS X you can pass the -E flag to sed to enable extended regular expressions, then use [[:space:]] in place of \s, like so: sed -E 's/[[:space:]]+/ /g'
5

Personally, I tend to use awk for jobs like this. For example:

ps axu| grep jboss | grep -v grep | awk '{print $5}' 

2 Comments

That can be compressed down to ps axu | awk '/[j]boss/ {print $5}'.
Isn't awk slower (especially when there are some superfluous other processes), then sed / grep / cut?
2

As an alternative, there is always perl:

ps aux | perl -lane 'print $F[3]' 

Or, if you want to get all fields starting at field #3 (as stated in one of the answers above):

ps aux | perl -lane 'print @F[3 .. scalar @F]' 

4 Comments

This does not work with the output of lsof I tried lsof|perl -lane 'print $F[5]' this sometimes gets the 5th column, sometimes the 6th
I think the question just was how to use delimiters that might contain a varying number of spaces. For this purpose the answer was correct.
In lsof the problem is that the number of columns is not always consistent in each row.
2

If you want to pick columns from a ps output, any reason to not use -o?

e.g.

ps ax -o pid,vsz ps ax -o pid,cmd 

Minimum column width allocated, no padding, only single space field separator.

ps ax --no-headers -o pid:1,vsz:1,cmd 3443 24600 -bash 8419 0 [xfsalloc] 8420 0 [xfs_mru_cache] 8602 489316 /usr/sbin/apache2 -k start 12821 497240 /usr/sbin/apache2 -k start 12824 497132 /usr/sbin/apache2 -k start 

Pid and vsz given 10 char width, 1 space field separator.

ps ax --no-headers -o pid:10,vsz:10,cmd 3443 24600 -bash 8419 0 [xfsalloc] 8420 0 [xfs_mru_cache] 8602 489316 /usr/sbin/apache2 -k start 12821 497240 /usr/sbin/apache2 -k start 12824 497132 /usr/sbin/apache2 -k start 

Used in a script:-

oldpid=12824 echo "PID: ${oldpid}" echo "Command: $(ps -ho cmd ${oldpid})" 

Comments

1

I've implemented a patch that adds new -m command-line option to cut(1), which works in the field mode and treats multiple consecutive delimiters as a single delimiter. This basically solves the OP's question in a rather efficient way, by treating several spaces as one delimiter right within cut(1).

In particular, with my patch applied, the following command will perform the desired operation. It's as simple as that, just add -m into the invocation of cut(1) and simply use -d ' ' -f 5 to extract the PID values from the process list produced by ps(1):

ps axu | grep jboss | cut -d ' ' -m -f 5 

I also submitted this patch upstream, and let's hope that it will eventually be accepted and merged into the coreutils project.

There are some further thoughts about adding even more whitespace-related features to cut(1), and having some feedback on all that from different people would be great, preferably on the coreutils mailing list. I'm willing to implement more patches for cut(1) and submit them upstream, which would make this utility more versatile and more usable in various real-world scenarios.

1 Comment

My previous answer to this question was deleted because it wasn't tailored specifically to this question. Thus, I answered this question again, providing a much more specific answer. I hope it's fine now.
0

Another way if you must use cut command

ps axu | grep [j]boss |awk '$1=$1'|cut -d' ' -f5 

In Solaris, replace awk with nawk or /usr/xpg4/bin/awk

Comments

0

I still like the way Perl handles fields with white space.
First field is $F[0].

$ ps axu | grep dbus | perl -lane 'print $F[4]' 

Comments

0

My approach is to store the PID to a file in /tmp, and to find the right process using the -S option for ssh. That might be a misuse but works for me.

#!/bin/bash TARGET_REDIS=${1:-redis.someserver.com} PROXY="proxy.somewhere.com" LOCAL_PORT=${2:-6379} if [ "$1" == "stop" ] ; then kill `cat /tmp/sshTunel${LOCAL_PORT}-pid` exit fi set -x ssh -f -i ~/.ssh/aws.pem centos@$PROXY -L $LOCAL_PORT:$TARGET_REDIS:6379 -N -S /tmp/sshTunel$LOCAL_PORT ## AWS DocService dev, DNS alias # SSH_PID=$! ## Only works with & SSH_PID=`ps aux | grep sshTunel${LOCAL_PORT} | grep -v grep | awk '{print $2}'` echo $SSH_PID > /tmp/sshTunel${LOCAL_PORT}-pid 

Better approach might be to query for the SSH_PID right before killing it, since the file might be stale and it would kill a wrong process.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.