This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
help diagnosing missed variants in amplicon data
I'm relatively new to GATK. At first I was intimidated by the amount of documentation, but lately I've really come to appreciate how there are answers to nearly all questions somewhere on this website.
After asking around to see which tool people recommend for tumour amplicon variant calling the consensus choice was obviously GATK. In line with that I've been trying to run a variant of the best practices to identify some variants in my amplicon samples. The intended deviation is of course to tell GATK to ignore the duplicate status of the reads as nearly all reads are duplicates.
Here are my commands:
# find the indels to realign GenomeAnalysisTK.jar -T RealignerTargetCreator -R GRCh37-lite.fa -I $1 -o $1.realignment_targets.list --disable_read_filter DuplicateRead # fix up the indels GenomeAnalysisTK.jar -T IndelRealigner -R GRCh37-lite.fa -I $1 -o $1.realigned.bam --disable_read_filter DuplicateRead -targetIntervals $1.realignment_targets.list # base recalibration GenomeAnalysisTK.jar -T BaseRecalibrator -R GRCh37-lite.fa -I $1.realigned.bam -o $1.recal_data.table --disable_read_filter DuplicateRead -knownSites dbSNP/common_all.vcf GenomeAnalysisTK.jar -T PrintReads -R GRCh37-lite.fa -I $1.realigned.bam -BQSR $1.recal_data.table --disable_read_filter DuplicateRead -o $1.recal_reads.bam # Indels are realigned, bases are re-calibrated, now do the variant calling GenomeAnalysisTK.jar -T HaplotypeCaller -R GRCh37-lite.fa --disable_read_filter DuplicateRead -I $1.recal_reads.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o $1.raw_variants.vcf -bamout $1.raw_variants.bamout.bam
I'm finding that I'm still missing a small handful of variants that I expected to be able to call. I've followed the instructions here:
with hope to get to the bottom of it.
The bamout is really helpful, but as far as i can tell, my variant (shown under the cursor in the attached image ) is in the original, realigned-recalibrated, and bamout version of my bam. Keep in mind that these are amplicon data, so the depths extend wayy beyond what is shown in the image.
The mapping qualities of the reads are all roughly 50-60 and the base qualities (post recalibration are in the 40-50 range).
The variant allele fraction is about 25% at all points in the analysis.
I'm sure there is something simple that I'm missing. Can anyone suggest what I might think of changing so I could call the variant in the image?