4

file1.txt:

hi wonderful amazing sorry superman superhumanwith loss 

file2.txt :

1 2 3 4 5 6 7 

When i try to combine using paste -d" " file1.txt file2.txt > actualout.txt

actualout.txt :

hi 1 wonderful 2 amazing 3 sorry 4 superman 5 superhumanwith 6 loss 7 

But i want my output to look like this desired

OUT.txt :

hi 1 wonderful 2 amazing 3 sorry 4 superman 5 superhumanwith 6 loss 7 

Which command can be used to combine 2 files an look like the desired output? Solaris 5.10 ksh nawk, sed, paste

3
  • You need to find the length of the longest word in file1; I would turn to perl for this one. Do you require a nawk/sed/paste-only solution? Commented Jun 12, 2015 at 0:59
  • What's lenght maximum of your file1? Commented Jun 12, 2015 at 1:07
  • i dont want to use perl, yes i want awk only solution. Could you provide mea a command to find the longest word in file 1 using awk or sed and then use it to get the desired output. Commented Jun 12, 2015 at 3:56

5 Answers 5

8

You seem to need column:

paste file1.txt file2.txt | column -tc2 

which creates this output:

hi 1 wonderful 2 amazing 3 sorry 4 superman 5 superhumanwith 6 loss 7 


You can of course also write your own script to do the formatting. Here is one way using awk:

awk ' NR==FNR { a[FNR] = $0 ; if (length > max) max = length ; next } { printf "%-*s %s\n", max, a[FNR], $0 } ' file1.txt file2.txt 
5
  • Not available by default on Solaris. Commented Jun 12, 2015 at 9:53
  • 2
    @lcd047; Not "per default" - so it can be made available? - If not then there's the fallback to old pr, as in: pr -tm file1.txt file2.txt Commented Jun 12, 2015 at 10:08
  • @Janis No man entry for column in solaris 5.10. I tagged it for solaris, any alternate solution? Commented Jun 12, 2015 at 18:31
  • @Janis pr -tm seems to having limitations when the length of sting in first column is greater than 24 characters, can we adjust this? so that it prints all charaters in each column? Commented Jun 12, 2015 at 19:00
  • You can adjust the page width with pr's option -w to a larger value. But that would be quasi ad hoc and hard coded. - If you want it be determined dynamically, and if you haven't tools like columns available, then as a last resort you would have to implement the function yourself. - I'll add yet another program in my answer. Commented Jun 12, 2015 at 20:22
5

pr

I'd probably go w/ pr:

printf %s\\n hi wonderful amazing sorry \ superman superhumanwith loss >/tmp/file #^what is all of that, anyway?^ seq 7 | pr -tm /tmp/file - 

pr can -merge input files (here /tmp/file and - stdin) line-by-line like paste column-wise, but it can take many other parameters besides. By default it will print headers and footers as well, but -t squashes that.

OUTPUT:

hi 1 wonderful 2 amazing 3 sorry 4 superman 5 superhumanwith 6 loss 7 

expand

If you're interested in getting more specific on your own, another option is expand - because you can hand it a list of virtual tab-stops which it will expand to as many spaces as are necessary to fill them.

seq 7 | paste /tmp/file - | expand -t15 

Here we only need the first -tabstop of course...

hi 1 wonderful 2 amazing 3 sorry 4 superman 5 superhumanwith 6 loss 7 

...but if more were wanted...

seq 14 | paste /tmp/file - /tmp/file - | expand -t15,23,38,46 

...we could spell them out in a compounding, comma-separated list...

hi 1 hi 2 wonderful 3 wonderful 4 amazing 5 amazing 6 sorry 7 sorry 8 superman 9 superman 10 superhumanwith 11 superhumanwith 12 loss 13 loss 14 

grep:

To find the length of the longest line in a file, and not counting any trailing spaces, and as incremented by standard 8-char tabstop positions, this will probably work:

i=0 while grep -Eq ".{$(((i+=8)-1))}.*[^[:blank:]]" <infile; do :; done 

That loop will increment $i by 8 for each run and search <infile for any line which contains at least as many characters as are counted in $i followed by any not blank character. And so when grep cannot find such a line, it will return false and, for your example data, it will assign:

echo "$i" 16 

wc:

But those are all POSIX solutions. The most simple thing to do on a GNU system is:

wc -L <infile 

...to list out the length of the longest line in <infile, but that will include counts for trailing blanks.

3

If you insist on doing it with awk:

awk -v file=file2.txt '{ cnt++ a[cnt] = $0 getline b[cnt] <file if(length(a[cnt]) > max) max = length(a[cnt]) } END { max++ for(i = 1; i <= cnt; i++) printf "%-" max "s%s\n", a[i], b[i] }' file1.txt 

On a side note: I'm pretty sure this particular wheel has been re-invented a zillion times already, but right now I'd rather not coerce my brain to come up with the right incantation to find proper examples of prior SE / SO art. :)

2
awk 'FNR==1{f+=1;w++;} f==1{if(length>w) w=length; next;} f==2{printf("%-"w"s",$0); getline<f2; print;} ' f2=file2 file1 file1 

Note: file1 is quite intentionally read twice; the first time is to find the maximum line length, and the second time is to format each line for the final concatenation with corresponding lines from file2. — file2 is read programatically; its name is provided by awk's variable-as-an-arg feature.

Output:

hi 1 wonderful 2 amazing 3 sorry 4 superman 5 superhumanwith 6 loss 7 

To handle any number of input files, the following works.but *Note: it does not cope with repeating the same filename. ie each filename arg refers to a different file. It can, however, handle files of different lengths - beyond a files EOF, spaces are used.

awk 'BEGIN{ for(i=1; i<ARGC; i++) { while( (getline<ARGV[i])>0) { nl[i]++; if(length>w[i]) w[i]=length; } w[i]++; close(ARGV[i]) if(nl[i]>nr) nr=nl[i]; } for(r=1; r<=nr; r++) { for(f=1; f<ARGC; f++) { if(r<=nl[f]) getline<ARGV[f]; else $0="" printf("%-"w[f]"s",$0); } print "" } } ' file1 file2 file3 file4 

Here is the output with 4 input files:

hi 1 cat A wonderful 2 hat B amazing 3 mat C sorry 4 moose D superman 5 E superhumanwith 6 F loss 7 G H 
8
  • It worked albeit with nawk instead of awk as my box is solaris Commented Jun 12, 2015 at 18:36
  • now how to combine multiple files like this, lets say i have 4 files? just repeat this process 3 times? Commented Jun 12, 2015 at 19:01
  • You've gotta read it twice - at least. It's why I tried grep - most grep's will -quit at the first match in -q mode and it was the sanest simple solution I could think of. But this is all-in-one - and exactly what was asked for to boot, as opposed to some hodge-podge. I like it. Commented Jun 12, 2015 at 19:20
  • 1
    Yes, I thought reading a file twice would irk quite a lot of people :) - I could have used an array, but this way has no potential to run out of memory, and I just liked the simplicity of no programmed loops. Commented Jun 12, 2015 at 21:37
  • With an array you would not only have to read it twice, but also copy it. Most modern systems will minimize the differences via disk cache anyway. I never liked arrays and usually prefer tmpfs for stuff like this. I can't tell the difference performance wise, and it's easier to use - writing/reading files is what I know. Commented Jun 12, 2015 at 22:16
1

Well, i found myself what i wanted. This works in Solaris 5.10.

paste file1 file2| pr -t -e$(awk 'n<length {n=length} END {print n+1}' file1)

I am storing the length of longest string in first file and using it to tab delimit

Multi File scenario

Provided we know which file is going to have the longest word i would replace that file name in calculating length and use paste to join multiple files. If file4.txt has the longest string. Then solution would be

paste file1 file2 file3 file4 | pr -t -e$(awk 'n<length {n=length} END {print n+1}' file4)

3
  • Do you need quotes around -e$(...) or are you splitting that intentionally? You also should be able to do pretty similarly pr -tm file[1234]. Commented Jun 12, 2015 at 20:08
  • You don't need to know which file will have the longest line. Use your exact solution, just run it on all files: paste file{1..4} | pr -t -e$(awk 'n<length {n=length} END {print n+1}' file{1..4}) Commented Jun 12, 2015 at 23:53
  • Will that work if the files have different names instead of file1, file2, file3 and file4. I guess not. Commented Jun 15, 2015 at 17:56

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.