I have a really big file that looks like this:
>name1 ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT >name2 ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT >name ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT >name4 ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT ACGTACGTACGT It is a fasta file. It has about 3183 lines that start with > (3183 names), followed by random number of lines of ACGTs. I want to split it into smaller files of 250 >s followed by their number of lines of ACGTs. And if the last file does not have 250 >s that is fine. I would still like to keep it. So far I tried split, which I don't think is appropriate here since it splits the file into one > in each small file. I also tried awk:
awk -F'>' 'NR==1{f=0;c=1}NR>1{ c++ if($((c%250))==0) { fn="file"c".fasta"; print > fn} }' kmer_subtraction/kmercollection.fasta I am not sure if this works because I cannot see my file. Could you please help me with this? Thank you!