2
$\begingroup$

After using files that I downloaded from the SRA with fasterq-dump, I realize I am not 100% sure that I have all the data.

I noticed in my downstream analysis that I seem to be missing the .1 and .2 numbers in the code associated with individual reads with the same spot ID. Then I noticed that my initial fastq file doesn't seem to have those either.

So I'm wondering if the -split-spot option that I used is doing what I want it do do, which is 'get all the sequence data', and how does it differ from --concatenate-reads. I am afraid I did not understand this from the program description. Please help?

$\endgroup$
2
  • $\begingroup$ Hi Laura, do you have paired-end reads? Can you provide the SRA accession too? $\endgroup$ Commented Oct 24, 2019 at 11:35
  • $\begingroup$ @StupidWolf It looks like they're paired. Eg. SRR6462984.63.1 and SRR6462984.63.2 . $\endgroup$ Commented Oct 24, 2019 at 13:05

1 Answer 1

3
$\begingroup$

Fasterq comes from the latest version of sratools. So if you check the manual , it says the equivalence is:

fastq-dump SRRXXXXXX --split-3 --skip-technical

fasterq-dump SRRXXXXXX

In older versions of sratoolkit, if you use fastq-dump without specifying --split-3 for paired-end reads, you get the format mentioned, spotID.1 for forward, spotID.2 for reverse:

fastq-dump SRR6462984.sra more SRR6462984.fastq 

You get:

@SRR6462984.1 1 length=301 NATCTGCCCGTTCCACATAAACACCGTTAAACATCGATAGAGCGAAATAAAGCTGCTGAGTGATGTGATGATAAAGCTTTTCCGTTCTGACACGATCTTCCATCTCGTCTATCATTGCATCGAGAGCGTCAGAGATCGCAAGCAGTGCGGATATTGCCGTCCGTCTTGAAGCTTTGGCAGAGCCAGGCGGAATTTGTTTATCGGATAGTGTTTATGCGCAAATTTAGCATATGATTTCCGACAACTTTTGCTTCATCGGGCCGCAAAGCGTGAAAAACAGTGTCGTCACCGTTGATGTGTG +SRR6462984.1 1 length=301 #/AAAEE/EEEA/EAAEEAEE/EE/EEA/EEEEE/AAE/EE/EAEAE//EEAEE/EEEEE<EEEEE/EE<EE/AEEEE/EAEEEEA/E<EEAEAEEEEEEAEEEE<EEAEEEEEAAAEEE/EEEEE/E//AEEEE6EA66EE<<<EAAA/AAAAAE/E6E/EA/AE/AEEEEEEAAEEEEA6EEEEEEEEE/EEE/AEEAAAEEEAE/E/EEEEEA//E//EEE/EEE/A<//AE/EE//6A///AA/E//E/E/EE//AE//AAEEE////E/A<AAE/EEAE//A//6/A</<6/</E/ @SRR6462984.2 2 length=302 NTCAGCAGGCGCCGTAACTTCAGCAAAACTGTAGCTATAACTTTTCCGAACTTTATACCTAACTTCGCTTTCCTAGCACTGTATAAAATGCCTAAAAGAAATCAGCAGGCGCCGTCACTTCAAAAAATCAGTATGGATCAATCAGACGCGGAGCTTATTAAAACAGAGCTTATTAAAAATTATGTCTTTCTTCTTCTTTCAGCAATTCGCAACGTAAAATTCAAAATTTTGGACGCCGTAATTGCGCAGGAACCAATTCAAAGAGGCGGCGGCCAATTAATTCCAACGTCTCGAAGAGTGGC +SRR6462984.2 2 length=302 

With the latest version, it by defaults splits the paired end sra file into forward and reverse fastq (i.e --split-3):

fasterq-dump SRR6462984.sra head -2 SRR6462984_*.fastq 

And you get

==> SRR6462984_1.fastq <== @SRR6462984.1 1 length=150 NATCTGCCCGTTCCACATAAACACCGTTAAACATCGATAGAGCGAAATAAAGCTGCTGAGTGATGTGATGATAAAGCTTTTCCGTTCTGACACGATCTTCCATCTCGTCTATCATTGCATCGAGAGCGTCAGAGATCGCAAGCAGTGCGG ==> SRR6462984_2.fastq <== @SRR6462984.1 1 length=151 ATATTGCCGTCCGTCTTGAAGCTTTGGCAGAGCCAGGCGGAATTTGTTTATCGGATAGTGTTTATGCGCAAATTTAGCATATGATTTCCGACAACTTTTGCTTCATCGGGCCGCAAAGCGTGAAAAACAGTGTCGTCACCGTTGATGTGTG 

So if you need to get back the .1 and .2 format, use fastq-dump from the latest installation.

I would suggest sticking to the newest format, with 2 files, because fastq-dump will be deprecated in the future.

$\endgroup$
1

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.