I am trying to print set of two lines that do not have a corresponding pair. I ultimately want to remove these lines.
Example:
NM00123_rn5_0_1_2 XXXXXXXXXXXXXXXXXXXXXXXXXXX NM00123_mm10_0_1_2 XXXXXXXXXXXXXXXXXXXXXXXXXXXX NM00124_rn5_0_1_3 yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy NM00124_mm10_0_1_3 yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy NM00125_rn5_0_1_4 zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz NM00126_rn5_0_1_5 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRr NM00126_mm10_0_1_5 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR The line starting with NM are headers and the next line is made of sequence of alphabets. The header lines for a pair match in all positions except for rn5 and mm10. I want to only retain sets of four lines were the NM header digits before and after rn5 and mm10 match for a pair. So from the above example: Header in line 1 for rn5 matches Header in line 3 for mm10 so keep that....but Header for rn5 at line 9 does not have a corresponding pair so print both the header and the next line with the sequence. I want finally to have a file of equal number of rn5 and mm10 entries.
I am very new to using Unix and would really appreciate help to do this. Thank you.
Expected outcome:
All the above entries sans the line without a corresponding pair. In this case:
NM00125_rn5_0_1_4 zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz