I have a file that has a DNA sequence identifier in one line and the DNA sequence in the next line right below it. The DNA sequence is long but it is in one line.
File1.fasta:
>AB244308.1.1447 233_28379 1..292-----------------------------------------------------------------------------------------------------------------------------------------------------GTGCCAG-C-C-G-C-CGC-GGTAATAC-GG-AGGAT-GCG-A-GCG-TTATC-CGG-ATTCATT-GG-GT-TTA--AAGGGTGCGCAGG-C-G-G-GCGT-A-T------------------------------------AA----G-T-C-A-----------------------------------------------------G-G-G--G--TG--A-AA-TG--C-C-AC-G-G---------------------------------------------------------------------------------------------------------------------------------------CT-C-AA----------------------------------------------------------------------------------------------------------------------------------------------------------------C-C-G-T-G-G-A--A-C----T-G--C-C---T--T----------------------------T--GA-T-A---C----------------------------------------------------T--G-T--AT--G-T-C----------------------------------------------------------------------------------------------------------------------------------T-T-G-A-G-T--T-----T-AG------TT-G-A---------------------A-G-T-G---GG-C---------------------------------------------------------------------------------------------------------------------------------------GG--A--ATG------------------------------------------------------------------------------------------------------------------------------------T-A-G-C-AT--GT-A-G-CG-GT--G--------------A--A-A---------------------------------------------------------------------------------------------------TG-C-AT-AG--AG-A-TG-------------------------------C-T------A-C------A-G-A-AC-A-CC------------------------------------------------GA--T--A--GC-GAA-G--G-C----A--------G--C-T-C-A---CTA---------A--GT-T-A-----------------------------------------------------------------------------------------------------------------------------------------A-G--------A-C-T--GA--CG-----C---------------------------------------------TC--A-TG--C-A-CG-A--AA-G-C----G-TG--GG-G-AT-C-A-AA-CA--GG-AT--------TA-G-ATA--------CC-C-C-C-GTA--GT-C-C-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
There's about 112,000 sequences in this file that follow that format. I have about 20 sequence identifiers that I'd like to pull from the fasta file and save to another file.
The sequence identifiers are in a txt file like this:
File2.txt:
AB244308.1.1447 New.ReferenceOTU151 New.CleanUp.ReferenceOTU19 New.ReferenceOTU59 New.CleanUp.ReferenceOTU6 In addition to pulling lines with the sequence identifiers, I'd like to pull the following line with the DNA sequence as well and print all of this to a new text file.
I've found through this answer (How to extract lines from a text file that contains strings from a list in another file?) that I would need to use grep and sed. I have also found another answer (https://stackoverflow.com/questions/7103531/how-to-get-the-part-of-file-after-the-line-that-matches-grep-expression-first) relevant to getting the line after the grep match.
Unfortunately, I am unsure how to proceed in combining these answers to get what I want.