The sample text file will be like this
ID Z4WTH3_9ACTN Unreviewed; 182 AA. AC Z4WTH3; A0SD0SDF; AC Z12SDFG3; ADFFGDF; DT 11-JUN-2014, integrated into UniProtKB/TrEMBL. SQ SEQUENCE 182 AA; 20675 MW; B85D18AC3B1F0E75 CRC64; MNFLEYNKDE KLHFNYKKSC GLWLIVVALI IFAATVIGGK QIINMSVFSF GYVAAFLSIN // ID Z4WXU8_9ACTN Unreviewed; 203 AA. AC Z4WXU8; AC QWERDFV1; DT 11-JUN-2014, integrated into UniProtKB/TrEMBL. SQ SEQUENCE 203 AA; 23224 MW; 35F1AE4342F6B3AC CRC64; MDCKSIRSEV LWQVVRLREK LMNFLEYNKD EKLCFNYKKS CGLWLIVVAL IIFAATVIGG // ID Z9JHX1_9GAMM Unreviewed; 132 AA. AC Z9JHX1; SQ SEQUENCE 132 AA; 13880 MW; 0E09988C0F3ED155 CRC64; MKISVDTNVL ARAVLQDDAN QGRSASTLLK DASLIAVSLP CLCELVWILS RGAKLSKEDV // The actual file is a 100GB file The file contains only one "ID" line and always start with "ID" line. End with "//"
"AC" line may be multiple. We have to take first element of first "AC" line as filename.
Need to split this file into multiple files based on the "//". Each file should be named as the text in the line begin with AC.
So the output files will look like
Z4WTH3.txt
ID Z4WTH3_9ACTN Unreviewed; 182 AA. AC Z4WTH3; A0SD0SDF; AC Z12SDFG3; ADFFGDF; DT 11-JUN-2014, integrated into UniProtKB/TrEMBL. SQ SEQUENCE 182 AA; 20675 MW; B85D18AC3B1F0E75 CRC64; MNFLEYNKDE KLHFNYKKSC GLWLIVVALI IFAATVIGGK QIINMSVFSF GYVAAFLSIN // Z4WXU8.txt
ID Z4WXU8_9ACTN Unreviewed; 203 AA. AC Z4WXU8; AC QWERDFV1; DT 11-JUN-2014, integrated into UniProtKB/TrEMBL. SQ SEQUENCE 203 AA; 23224 MW; 35F1AE4342F6B3AC CRC64; MDCKSIRSEV LWQVVRLREK LMNFLEYNKD EKLCFNYKKS CGLWLIVVAL IIFAATVIGG // Z9JHX1.txt
ID Z9JHX1_9GAMM Unreviewed; 132 AA. AC Z9JHX1; SQ SEQUENCE 132 AA; 13880 MW; 0E09988C0F3ED155 CRC64; MKISVDTNVL ARAVLQDDAN QGRSASTLLK DASLIAVSLP CLCELVWILS RGAKLSKEDV //