Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Why is there difference of variants between after-BQSR bam and after-HaplotypeCaller bam?
Dear GATK team,
Hi, I have followed Best Practices to find out germline variants (GATK-3.7) of my samples designed by case-control study for ~500 samples in total.
I have run BQSR, Prind Reads, and then HaplotypeCaller as described in below:
java -jar $GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R $Reference -knownSites $dbSNP138 -knownSites $Mills -knownSites $oneKGindels -nct 8 -I $Output/$1.sort.dup.ir.bam -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o $Output/$1.recal.data.grp -L $Interval -ip 100
java -jar $GATK/GenomeAnalysisTK.jar -T PrintReads -nct 8 -R $Reference -I $Output/$1.sort.dup.ir.bam -BQSR $Output/$1.recal.data.grp -o $Output/$1.sort.dup.ir.BQSR.bam
java -jar $GATK/GenomeAnalysisTK.jar -T HaplotypeCaller -R $Reference -I $Input/$1.sort.dup.ir.BQSR.bam -o $Output/$1.hc.vcf.gz -L chr14:92537200-92537700 -bamout $Output/$1.bamout.bam
When I comparing variants of after-BQSR bam with those of after-HC bam in region of chr14:92537200-92537700 using IGV, I noticed that both of the bams showed different looking especially for indels like this:
So I have several questions,
1) Why is there difference of variants between after-BQSR bam and after-HC bam in terms of indels? The indels at chr14:92,537,354 were not in after-BQSR bam, but those were in after-HC bam. Among my processed samples, some samples showed same indels in both bams, but others showed different indels.
2) I noticed that some regions seems to be snapped in after-HC bam, not in after-BQSR bam. I don't have an idea why this happened.
3) Some samples showed that variants in whole regions of chr14:92537200-92537700 were not called in after-HC bam, but reads were mapped in the same regions in after-BQSR bam. How can I interpret it?
I don't know exactly but I guess that there are quite possibility to calling inaccurate variants since the regions I interested in have several repeat sequences as well as the variants are repeated indels. Is this right? I don't know what can I do, so I ask for help me regarding to this issues.
Thanks in advance!