The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Indel Calling

I have Ion Torrent data. I am trying to call a variant that I know to exist (confirmed with Sanger). In the position where there is the known indel, I have a depth of roughly 80-90 (in two different runs) and a of those between 20-23% of the reads have the insertion called. What parameters should I be adjusting to get this indel to call? I don't mind a large number of false positives.

I've tried several iterations that include indel realignment using known indels (1000G_phase1 and ills_and_1000G_gold_standard) and also excluding them. I have also tried iterations of setting these flags in UnifiedGenotyper:-stand_call_conf 30.0 -stand_emit_conf 0.0 --min_base_quality_score 0 -glm BOTH --dbsnp dbsnp_137.b37.vcf -nt -rf BadCigar -minIndelCnt 3 -minIndelFrac 0.15. I have also attempted to use HaplotypeCaller: -stand_call_conf 30.0 -stand_emit_conf 0.0 --dbsnp dbsnp_137.b37.vcf -rf BadCigar

Any suggestions would be great.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Is your indel quite big? If so you may need to use HaplotypeCaller and override the default ActiveRegion size to increase the callable size.

    You can also try running in GENOTYPE_GIVEN_ALLELES mode to force a call.

  • Thank you Geraldine. I will give this a try. The indel is just a dupT.

  • Geraldine,

    I used these parameters for UnifiedGenotyper and received no variants and no input indel vcfs for the indel realignment step.

    -stand_call_conf 30.0 -stand_emit_conf 0.0 --min_base_quality_score 0 -glm BOTH --dbsnp $dbSNPRef -nt $numThreads -rf BadCigar -minIndelCnt 3 -minIndelFrac 0.15 --genotyping_mode GENOTYPE_GIVEN_ALLELES

    I'm obviously missing something. It appears that I should have included the --alleles flag. What should I be passing to the --alleles flag? I don't quite understand RodBinding[VariantContext]. Could I pass a bed file for regions which I want to force a call, so that I can then go back and see what kind of quality scores prevented the dupT from being called in the first place?

    Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah yes, the point of GGA mode is that you provide known variant sites with specific ALT alleles that you are interested in, and ask the GATK to evaluate whether they are present in your samples. To do this you pass in a VCF containing the sites/alleles of interest with the --alleles argument, and typically you also pass the same VCF in via the -L argument to restrict calling to those sites (otherwise GATK will try to call the rest of the genome as well in normal discovery mode).

    This may still not produce the call you want; if so you can use the experimental reference likelihoods in the UG, or the HaplotypeCaller's reference confidence model to get an idea of what the GATK thinks is going on at those sites.

Sign In or Register to comment.