I am trying to extract lines from file genome.gff that contain a line from file suspicious.txt. suspicious.txt was derived from genome.gff and every line should match.
Using grep on a single line from suspicious.txt works as expected:
grep 'gene10002' genome.gff NC_007082.3 Gnomon gene 1269632 1273520 . + . ID=gene10002;Dbxref=BEEBASE:GB54789,GeneID:409846;Name=bur;gbkey=Gene;gene=bur;gene_biotype=protein_coding NC_007082.3 Gnomon mRNA 1269632 1273520 . + . ID=rna21310;Parent=gene10002;Dbxref=GeneID:409846,Genbank:XM_393336.5,BEEBASE:GB54789;Name=XM_393336.5;gbkey=mRNA;gene=bur;product=burgundy;transcript_id=XM_393336.5 But every variation on using grep from a file that I've been able to think of or find online produces no output or an empty file:
grep -f suspicious.txt genome.gff grep -F -f suspicious.txt genome.gff while read line; do grep "$line" genome.gff; done<suspicious.txt while read line; do grep '$line' genome.gff; done<suspicious.txt while read line; do grep "${line}" genome.gff; done<suspicious.txt cat suspicious.txt | while read line; do grep '$line' genome.gff; done cat suspicious.txt | while read line; do grep '$line' genome.gff >> suspicious.gff; done cat suspicious.txt | while read line; do grep -e "${line}" genome.gff >> suspicious.gff; done cat "$(cat suspicious_bee_geneIDs_test.txt)" | while read line; do grep -e "${line}" genome.gff >> suspicious.gff; done Running it as a script also produces an empty file:
#!/bin/bash SUSP=$1 GFF=$2 while read -r line; do grep -e "${line}" $GFF >> suspicious_bee_genes.gff done<$SUSP This is what the files look like:
head genome.gff ##gff-version 3 #!gff-spec-version 1.21 #!processor NCBI annotwriter #!genome-build Amel_4.5 #!genome-build-accession NCBI_Assembly:GCF_000002195.4 ##sequence-region NC_007070.3 1 29893408 ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=7460 NC_007070.3 RefSeq region 1 29893408 . + . ID=id0;Dbxref=taxon:7460;Name=LG1;gbkey=Src;genome=chromosome;linkage- group=LG1;mol_type=genomic DNA;strain=DH4 NC_007070.3 Gnomon gene 181 211962 . - . ID=gene0;Dbxref=BEEBASE:GB42164,GeneID:726912;Name=cort;gbkey=Gene;gene=cort;gene_biotype=protein_coding NC_007070.3 Gnomon mRNA 181 71559 . - . ID=rna0;Parent=gene0;Dbxref=GeneID:726912,Genbank:XM_006557348.1,BEEBASE:GB42164;Name=XM_006557348.1;gbkey=mRNA;gene=cort;product=cortex%2C transcript variant X2;transcript_id=XM_006557348.1 wc -l genome.gff 457742 head suspicious.txt gene10002 gene1001 gene1003 gene10038 gene10048 gene10088 gene10132 gene10134 gene10181 gene10209 wc -l suspicious.txt 928 Does anyone know what's going wrong here?
genestring in suspicious.txt produce the expected result for me with your script.