23

How can I use awk to remove all text after a certain character ; that appears on every line of my text file? (I then need to run for loops on the text)

Jenny,Sarah,John;North Dakota Henry,Frank;Illinois Aaron,Kathryn,Caitlin,Harris;New York 
0

5 Answers 5

29

There are two general approaches.

  1. Set awk's field separator to that character. You can then get the parts you want as $1:

    $ echo "Today was cloudy; yesterday too" | awk -F';' '{print $1}' Today was cloudy 
  2. Use gsub() to substitute it with an empty string:

    $ echo "Today was cloudy; yesterday too" | awk '{sub(/;.*/,""); print}' Today was cloudy 

So, for your example:

$ awk -F';' '{print $1}' file Jenny,Sarah,John Henry,Frank Aaron,Kathryn,Caitlin,Harris 
9

Here's an answer with sed -- since you're not really doing any field processing, awk is probably overkill.

sed 's/;.*//' 
1
  • 1
    +1 but based on the OP's comments, I am assuming this is all part of a larger script. @Jenny, that's the kind of detail you should include in your questions by the way. Commented Feb 28, 2014 at 4:13
5

And also just cut ..

cut -d\; -f1 file 
0

Sometimes you may want to replace all characters after a certain word with another string. For example:

original_string="abc blabla foo bar" and you want to replace words after blabla with 'hello world'

echo $original_string | sed -E 's/(.+ blabla) .+/\1 hello world/' 
0

Using Raku (formerly known as Perl_6):

raku -pe 's:g/ \; .*? $$//;' 

OUTPUTS:

Jenny,Sarah,John Henry,Frank Aaron,Kathryn,Caitlin,Harris 

The above code implements the command line -pe linewise-autoprinting flags, in conjunction with the well-known s/// substitution construct. The code tells Raku to :g globally search for a ;, identify .*? 0-or-more characters that follow (? means non-greedily), up to the end-of-line ($$).

(Actually, since the OP seems to indicate that the ; only occurs once-per-line, the :g can be omitted. Also, since the -pe command line linewise-autoprinting flags are in use, you can use the $ end-of-string assertion, instead of the $$ end-of-line assertion).

The OP seems to indicate that he/she will be running for-loops over the text. This sounds like a simple comma-separated list of names is desired? If so, the following code works:

raku -e 'lines.grep(*.chars).map(*.subst(/\; .*? $$/)).join(",").put;' 

OUTPUTS:

Jenny,Sarah,John,Henry,Frank,Aaron,Kathryn,Caitlin,Harris 

https://raku.org/

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.