Skip to main content
added 3 characters in body
Source Link
terdon
  • 252.7k
  • 69
  • 481
  • 719

I know you said you didn't want a Perl or Python solution but it might be useful to someone else (and you really really should learn one of those languages if you are doing Bioinformaticsbioinformatics).

perl -ane '$f=$F[0].$F[1]; print "$k{$f}$_" if $k{$f}; $k{$f}=$_;' file1 file2 

EXPLANATION:

The -a option will cause Perl to split input into the @F array, -n means read input files line by line and -e`-e means run"run the script I give on the command lineline".

So, $f is set to the concatenation of the first ($F[0]) and second ($F[1]) fields. $k{$f}=$_ means save the current line ($_) as the value in a hash (associative arrays in Perl) called k with the key $f. As we read through the files, print the current line and the value of $k{$f} if that value exists. In other words, if we have already seen a line that has the same two first fields, print that line and the current one.

I know you said you didn't want a Perl or Python solution but it might be useful to someone else (and you really really should learn one of those languages if you are doing Bioinformatics).

perl -ane '$f=$F[0].$F[1]; print "$k{$f}$_" if $k{$f}; $k{$f}=$_;' file1 file2 

EXPLANATION:

The -a option will cause Perl to split input into the @F array, -n means read input files line by line and -e` means run the script I give on the command line.

So, $f is set to the concatenation of the first ($F[0]) and second ($F[1]) fields. $k{$f}=$_ means save the current line ($_) as the value in a hash (associative arrays in Perl) called k with the key $f. As we read through the files, print the current line and the value of $k{$f} if that value exists. In other words, if we have already seen a line that has the same two first fields, print that line and the current one.

I know you said you didn't want a Perl or Python solution but it might be useful to someone else (and you really really should learn one of those languages if you are doing bioinformatics).

perl -ane '$f=$F[0].$F[1]; print "$k{$f}$_" if $k{$f}; $k{$f}=$_;' file1 file2 

EXPLANATION:

The -a option will cause Perl to split input into the @F array, -n means read input files line by line and -e means "run the script I give on the command line".

So, $f is set to the concatenation of the first ($F[0]) and second ($F[1]) fields. $k{$f}=$_ means save the current line ($_) as the value in a hash (associative arrays in Perl) called k with the key $f. As we read through the files, print the current line and the value of $k{$f} if that value exists. In other words, if we have already seen a line that has the same two first fields, print that line and the current one.

Source Link
terdon
  • 252.7k
  • 69
  • 481
  • 719

I know you said you didn't want a Perl or Python solution but it might be useful to someone else (and you really really should learn one of those languages if you are doing Bioinformatics).

perl -ane '$f=$F[0].$F[1]; print "$k{$f}$_" if $k{$f}; $k{$f}=$_;' file1 file2 

EXPLANATION:

The -a option will cause Perl to split input into the @F array, -n means read input files line by line and -e` means run the script I give on the command line.

So, $f is set to the concatenation of the first ($F[0]) and second ($F[1]) fields. $k{$f}=$_ means save the current line ($_) as the value in a hash (associative arrays in Perl) called k with the key $f. As we read through the files, print the current line and the value of $k{$f} if that value exists. In other words, if we have already seen a line that has the same two first fields, print that line and the current one.