Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Indel Calling

I have Ion Torrent data. I am trying to call a variant that I know to exist (confirmed with Sanger). In the position where there is the known indel, I have a depth of roughly 80-90 (in two different runs) and a of those between 20-23% of the reads have the insertion called. What parameters should I be adjusting to get this indel to call? I don't mind a large number of false positives.

I've tried several iterations that include indel realignment using known indels (1000G_phase1 and ills_and_1000G_gold_standard) and also excluding them. I have also tried iterations of setting these flags in UnifiedGenotyper:-stand_call_conf 30.0 -stand_emit_conf 0.0 --min_base_quality_score 0 -glm BOTH --dbsnp dbsnp_137.b37.vcf -nt -rf BadCigar -minIndelCnt 3 -minIndelFrac 0.15. I have also attempted to use HaplotypeCaller: -stand_call_conf 30.0 -stand_emit_conf 0.0 --dbsnp dbsnp_137.b37.vcf -rf BadCigar

Any suggestions would be great.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,903Administrator, GATK Developer admin

    Hi there,

    Is your indel quite big? If so you may need to use HaplotypeCaller and override the default ActiveRegion size to increase the callable size.

    You can also try running in GENOTYPE_GIVEN_ALLELES mode to force a call.

    Geraldine Van der Auwera, PhD

  • MutagenicMutagenic Posts: 6Member

    Thank you Geraldine. I will give this a try. The indel is just a dupT.

  • MutagenicMutagenic Posts: 6Member

    Geraldine,

    I used these parameters for UnifiedGenotyper and received no variants and no input indel vcfs for the indel realignment step.

    -stand_call_conf 30.0 -stand_emit_conf 0.0 --min_base_quality_score 0 -glm BOTH --dbsnp $dbSNPRef -nt $numThreads -rf BadCigar -minIndelCnt 3 -minIndelFrac 0.15 --genotyping_mode GENOTYPE_GIVEN_ALLELES

    I'm obviously missing something. It appears that I should have included the --alleles flag. What should I be passing to the --alleles flag? I don't quite understand RodBinding[VariantContext]. Could I pass a bed file for regions which I want to force a call, so that I can then go back and see what kind of quality scores prevented the dupT from being called in the first place?

    Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,903Administrator, GATK Developer admin

    Ah yes, the point of GGA mode is that you provide known variant sites with specific ALT alleles that you are interested in, and ask the GATK to evaluate whether they are present in your samples. To do this you pass in a VCF containing the sites/alleles of interest with the --alleles argument, and typically you also pass the same VCF in via the -L argument to restrict calling to those sites (otherwise GATK will try to call the rest of the genome as well in normal discovery mode).

    This may still not produce the call you want; if so you can use the experimental reference likelihoods in the UG, or the HaplotypeCaller's reference confidence model to get an idea of what the GATK thinks is going on at those sites.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.