Variant calling on amplicon sequencing data

Question

I've got some human amplicon sequencing data which I know contains variants, but I'm trying to work out the best way to call the variants, as most variant callers seem set up for whole genome data. I've got a mix of short read and long read (ONT and PacBio) data for the same amplicon.

The amplicon is at chr15:30904775-30905567, and you can see the indexed and sorted bam in the attached IGV screenshot. It contains an indel in the centre and a heterozygous SNP to the right. The bam was generated using minimap2 as follows:

minimap2 -ax map-ont -R "$rg" "$ref" "$fastq" | samtools sort -o "$out/${sample_name}.bam"

Can anyone suggest the best way to call variants, including indels and SNPs, in my amplicon. I want to integrate it into a flexible pipeline.

It appears all your reads have a very low mapping quality (mapq), judging by their white color in IGV. Did you align to a reference genome that includes _alt and _fix chromosomes? This amplicon also 'exists' on chr15_KN538374v1_fix and chr15_KI270905v1_alt (based on blatting). See also this blog post: lh3.github.io/2017/11/13/which-human-reference-genome-to-use — Wouter De Coster
– Wouter De Coster, Commented Mar 11 at 8:01
I downloaded it from UCSC- wget hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz gunzip hg38.fa.gz — Mike
– Mike, Commented Mar 11 at 8:47
I don’t suppose you have any suggestions for variant calling? Thanks — Mike
– Mike, Commented Mar 11 at 9:02
Your mapping quality will be a problem. Fix that before you think about variant calling. — Wouter De Coster
– Wouter De Coster, Commented Mar 12 at 10:13
Your fear that most variant calling tools are designed for WGS is not true at all. All of the standard tools can deal with targeted sequencing since that is the most common type of sequencing carried out, by several orders of magnitude. Any chance you could share the bam file with us? That does look like it's very bad quality. Have you tried any variant callers? What went wrong? — terdon
– terdon ♦, Commented Mar 12 at 12:54

swbarnes2 · Accepted Answer · 2025-03-11 17:58:55Z

There isn't necessarily one "best" way to call variants. There are a lot of tools.

Are you looking for variants that might be a very small percentage of your sample? Or are you figuring that all those little discrepancies are errors?

Since you're looking at amplicons, I'd align to the expected sequence, not the whole genome.

If you want a software to try, I'd try FreeBayes.

Mike · Accepted Answer · 2025-03-13 09:26:22Z

0

Thanks everyone. We resolved this issue using by realigning with a reference without _alt and using deepvariant with region '15:30904775-30905567'

edited Mar 13 at 9:26

answered Mar 13 at 8:54

Mike

2171 silver badge6 bronze badges

$\begingroup$ And you realigned to a reference without _alt and _fix contigs. Many people here have provided suggestions, which eventually led you to solve the problem. So it wouldn't hurt to give some credit where credit is due. But happy to help regardless. $\endgroup$

Wouter De Coster
– Wouter De Coster

2025-03-13 09:24:58 +00:00
Commented Mar 13 at 9:24

Add a comment |

Stack Exchange Network

Variant calling on amplicon sequencing data

2 Answers 2

Hot Network Questions

Variant calling on amplicon sequencing data

2 Answers 2

Related

Hot Network Questions