Skip to main content
1 of 6
Jiao
  • 375
  • 2
  • 11

awk/sed split a cluster file in to multiple files

I have a cluster fasta file (called file) which looks like:

>1AB2 >1AB2 AA NWWIEUNJRNIBGOWNGIOWGRBIGBRGRIOWGI NCIDHFR8EHGBVPIWOBGIGRI >1AB3 AA WNIOREHUEBRGOUERGHBERGIORBGREUGEGO NWFWRUBGREOUEREOBRIOBNERIOBN >1SC4 AA WNIOREHUEBRGOUERGHBERGIORBGREUGEGO NWFWRUBGREOUEREOBRIOBNERIOBN >2CD5 AA WNIOREHUEBRGOUERGHBERGIORBGREUGEGO NWFWRUBGREOUEREOBRIOBNERIOBN >2AC6 >2AC6 AA NFIGEURHGEIROHEGHTUTJGENLJBBEOWRIU NFIROUHBOERVERUGBERUOVREOIBROEBVUE NVHIRE >2ONM AA BUCIEHBUORBREOBWQVURVELLAJFLHIEBGR NHEIBVEURIGBVNRIHEOEAJVSJDNHVUGBVR NEBIBVVBRU >2POD AA BUFEWIBOEUWBWOREBRIUBGUERIGBVOSRIP BUEIBVEO >7KZL >7KZL AA BUIREBVAUREVBREOIRGPNJBFDVERUBVROR >6HG3 >6GH3 AA NBVUIREVOIAWRHRUGRTYUVDNJKDFHUGSEI FHUIERBLUUIREB >6GH4 AA BDFUIGEVUERERHOBERIHBSDLKFJBNIERIH NFHILRUGAURHG 

the about file has 4 groups: 1AB2, 2AC6, 7KZL, and 6GH3. the content during the first >1AB2 and the first >2AC6 belongs to the cluster 1AB2. the content during the first >2AC6 and the first >7KZL belongs to the cluster 2AC6.

I want to separate the file into 4 files at the second >XXXX. each file should look like:

file_1

>1AB2 AA NWWIEUNJRNIBGOWNGIOWGRBIGBRGRIOWGI NCIDHFR8EHGBVPIWOBGIGRI >1AB3 AA WNIOREHUEBRGOUERGHBERGIORBGREUGEGO NWFWRUBGREOUEREOBRIOBNERIOBN >1SC4 AA WNIOREHUEBRGOUERGHBERGIORBGREUGEGO NWFWRUBGREOUEREOBRIOBNERIOBN >2CD5 AA WNIOREHUEBRGOUERGHBERGIORBGREUGEGO NWFWRUBGREOUEREOBRIOBNERIOBN 

file_2

>2AC6 AA NFIGEURHGEIROHEGHTUTJGENLJBBEOWRIU NFIROUHBOERVERUGBERUOVREOIBROEBVUE NVHIRE >2ONM AA BUCIEHBUORBREOBWQVURVELLAJFLHIEBGR NHEIBVEURIGBVNRIHEOEAJVSJDNHVUGBVR NEBIBVVBRU >2POD AA BUFEWIBOEUWBWOREBRIUBGUERIGBVOSRIP BUEIBVEO 

file_3

>7KZL AA BUIREBVAUREVBREOIRGPNJBFDVERUBVROR 

file_4

>6GH3 AA NBVUIREVOIAWRHRUGRTYUVDNJKDFHUGSEI FHUIERBLUUIREB >6GH4 AA BDFUIGEVUERERHOBERIHBSDLKFJBNIERIH NFHILRUGAURHG 
Jiao
  • 375
  • 2
  • 11