I have a large text file (666000 colums) in the format
A B C D E F Desired output
AB CD EF How can we do it in sed or awk. I have tried a couple of things but nothing seems to be working. Please suggest something.
I have a large text file (666000 colums) in the format
A B C D E F Desired output
AB CD EF How can we do it in sed or awk. I have tried a couple of things but nothing seems to be working. Please suggest something.
In sed:
sed 's! \([^ ]\+\)\( \|$\)!\1 !g' your_file This will make the substitutions and print the result to standard out. To modify the file in place, add the -i switch:
sed -i 's! \([^ ]\+\)\( \|$\)!\1 !g' your_file Explanation
This sed command will look for a space, followed by at least one non-space character, followed by a space or the end of the line. It substitutes this sequence with whatever non-space characters it found followed by a single space. The substitution is applied as many times as possible across the line (this is called a global substitution) because the g modifier is supplied at the end. So, basically, with a sequence like A B C, sed will find the pattern " B " and substitute it with "B " leaving you with AB C as the final result.
Assumptions made by this code
This code assumes the spaces between your columns are really spaces and not TABs for example. This can be easily fixed at the expense of readability:
sed 's![[:blank:]]\+\([^[:blank:]]\+\)\([[:blank:]]\+\|$\)!\1 !g' your_file awk:
awk '{printf $1$2;for(i=3; i<=NF;i+=2){printf " %s",$i$(i+1)}print}' file This will probably be the fastest of the two for large files.
Perl:
perl -pe 's/([^\s]+)\s+([^\s]+)/$1$2/g' file If your file indeed has that many columns, one option is to use gawk to treat each column as a record by setting RS to "one or more whitespace characters". This helps avoid having to set up a loop through the columns. Note that this solution is fragile in the face of an odd number of columns in a line.
awk --re-interval -v RS='[[:space:]]{1,}' '{x=$0; getline; printf x$0RT}' file