Join every other column with sed or awk

Question

I have a large text file (666000 colums) in the format

A B C D E F

Desired output

AB CD EF

How can we do it in sed or awk. I have tried a couple of things but nothing seems to be working. Please suggest something.

do you mean 666000 columns or 666000 rows?

iruvar
– iruvar

2013-09-27 00:29:49 +00:00
Commented Sep 27, 2013 at 0:29 — iruvar
– iruvar, Commented Sep 27, 2013 at 0:29

Joseph R. · Accepted Answer · 2013-09-27 00:43:54Z

In sed:

sed 's! \([^ ]\+\)\( \|$\)!\1 !g' your_file

This will make the substitutions and print the result to standard out. To modify the file in place, add the -i switch:

sed -i 's! \([^ ]\+\)\( \|$\)!\1 !g' your_file

Explanation

This sed command will look for a space, followed by at least one non-space character, followed by a space or the end of the line. It substitutes this sequence with whatever non-space characters it found followed by a single space. The substitution is applied as many times as possible across the line (this is called a global substitution) because the g modifier is supplied at the end. So, basically, with a sequence like A B C, sed will find the pattern " B " and substitute it with "B " leaving you with AB C as the final result.

Assumptions made by this code

This code assumes the spaces between your columns are really spaces and not TABs for example. This can be easily fixed at the expense of readability:

sed 's![[:blank:]]\+\([^[:blank:]]\+\)\([[:blank:]]\+\|$\)!\1 !g' your_file

terdon · Accepted Answer · 2013-09-27 01:26:13Z

awk:

awk '{printf $1$2;for(i=3; i<=NF;i+=2){printf " %s",$i$(i+1)}print}' file

This will probably be the fastest of the two for large files.

Perl:

perl -pe 's/([^\s]+)\s+([^\s]+)/$1$2/g' file

iruvar · Accepted Answer · 2013-09-27 12:05:43Z

If your file indeed has that many columns, one option is to use gawk to treat each column as a record by setting RS to "one or more whitespace characters". This helps avoid having to set up a loop through the columns. Note that this solution is fragile in the face of an odd number of columns in a line.

awk --re-interval -v RS='[[:space:]]{1,}' '{x=$0; getline; printf x$0RT}' file

Stack Exchange Network

Join every other column with sed or awk

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Join every other column with sed or awk

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions