Fasterq-dump: --split-spot or -concatenate-reads?

Question

After using files that I downloaded from the SRA with fasterq-dump, I realize I am not 100% sure that I have all the data.

I noticed in my downstream analysis that I seem to be missing the .1 and .2 numbers in the code associated with individual reads with the same spot ID. Then I noticed that my initial fastq file doesn't seem to have those either.

So I'm wondering if the -split-spot option that I used is doing what I want it do do, which is 'get all the sequence data', and how does it differ from --concatenate-reads. I am afraid I did not understand this from the program description. Please help?

Hi Laura, do you have paired-end reads? Can you provide the SRA accession too? — StupidWolf
– StupidWolf, Commented Oct 24, 2019 at 11:35
@StupidWolf It looks like they're paired. Eg. SRR6462984.63.1 and SRR6462984.63.2 . — Laura
– Laura, Commented Oct 24, 2019 at 13:05

Community · Accepted Answer · 2020-06-18 08:30:41Z

Fasterq comes from the latest version of sratools. So if you check the manual , it says the equivalence is:

fastq-dump SRRXXXXXX --split-3 --skip-technical

fasterq-dump SRRXXXXXX

In older versions of sratoolkit, if you use fastq-dump without specifying --split-3 for paired-end reads, you get the format mentioned, spotID.1 for forward, spotID.2 for reverse:

fastq-dump SRR6462984.sra more SRR6462984.fastq

You get:

@SRR6462984.1 1 length=301 NATCTGCCCGTTCCACATAAACACCGTTAAACATCGATAGAGCGAAATAAAGCTGCTGAGTGATGTGATGATAAAGCTTTTCCGTTCTGACACGATCTTCCATCTCGTCTATCATTGCATCGAGAGCGTCAGAGATCGCAAGCAGTGCGGATATTGCCGTCCGTCTTGAAGCTTTGGCAGAGCCAGGCGGAATTTGTTTATCGGATAGTGTTTATGCGCAAATTTAGCATATGATTTCCGACAACTTTTGCTTCATCGGGCCGCAAAGCGTGAAAAACAGTGTCGTCACCGTTGATGTGTG +SRR6462984.1 1 length=301 #/AAAEE/EEEA/EAAEEAEE/EE/EEA/EEEEE/AAE/EE/EAEAE//EEAEE/EEEEE<EEEEE/EE<EE/AEEEE/EAEEEEA/E<EEAEAEEEEEEAEEEE<EEAEEEEEAAAEEE/EEEEE/E//AEEEE6EA66EE<<<EAAA/AAAAAE/E6E/EA/AE/AEEEEEEAAEEEEA6EEEEEEEEE/EEE/AEEAAAEEEAE/E/EEEEEA//E//EEE/EEE/A<//AE/EE//6A///AA/E//E/E/EE//AE//AAEEE////E/A<AAE/EEAE//A//6/A</<6/</E/ @SRR6462984.2 2 length=302 NTCAGCAGGCGCCGTAACTTCAGCAAAACTGTAGCTATAACTTTTCCGAACTTTATACCTAACTTCGCTTTCCTAGCACTGTATAAAATGCCTAAAAGAAATCAGCAGGCGCCGTCACTTCAAAAAATCAGTATGGATCAATCAGACGCGGAGCTTATTAAAACAGAGCTTATTAAAAATTATGTCTTTCTTCTTCTTTCAGCAATTCGCAACGTAAAATTCAAAATTTTGGACGCCGTAATTGCGCAGGAACCAATTCAAAGAGGCGGCGGCCAATTAATTCCAACGTCTCGAAGAGTGGC +SRR6462984.2 2 length=302

With the latest version, it by defaults splits the paired end sra file into forward and reverse fastq (i.e --split-3):

fasterq-dump SRR6462984.sra head -2 SRR6462984_*.fastq

And you get

==> SRR6462984_1.fastq <== @SRR6462984.1 1 length=150 NATCTGCCCGTTCCACATAAACACCGTTAAACATCGATAGAGCGAAATAAAGCTGCTGAGTGATGTGATGATAAAGCTTTTCCGTTCTGACACGATCTTCCATCTCGTCTATCATTGCATCGAGAGCGTCAGAGATCGCAAGCAGTGCGG ==> SRR6462984_2.fastq <== @SRR6462984.1 1 length=151 ATATTGCCGTCCGTCTTGAAGCTTTGGCAGAGCCAGGCGGAATTTGTTTATCGGATAGTGTTTATGCGCAAATTTAGCATATGATTTCCGACAACTTTTGCTTCATCGGGCCGCAAAGCGTGAAAAACAGTGTCGTCACCGTTGATGTGTG

So if you need to get back the .1 and .2 format, use fastq-dump from the latest installation.

I would suggest sticking to the newest format, with 2 files, because fastq-dump will be deprecated in the future.

Can you have a look at this bioinformatics.stackexchange.com/questions/21407/… — PesKchan
– PesKchan, Commented Aug 9, 2023 at 3:37

Stack Exchange Network

Fasterq-dump: --split-spot or -concatenate-reads?

1 Answer 1

Linked

Hot Network Questions

Fasterq-dump: --split-spot or -concatenate-reads?

1 Answer 1

Linked

Related

Hot Network Questions