If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

Downsampling to coverage and the 3.x HaplotypeCaller

Hi all,

I am just wondering about the --downsample_to_coverage when using HC with --emitRefConfidence GVCF followed by GenotypeGVCFs.

What I want to obtain is the same result as when running just HC on BAM files in plain VCF output format with --downsample_to_coverage.

Does the --downsample_to_coverage option need to be used when calling HC with --emitRefConfidence GVCF, GenotypeGVCFs or both?



  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Tim,

    You only need to specify downsampling with HaplotypeCaller, not with GenotypeGVCFs.

  • TimHughesTimHughes Member

    Cheers, Geraldine.


  • TimHughesTimHughes Member
    edited April 2014

    I ran the following without specifying the downsampling, but the default 250 kicked in (see below).

    INFO  03:40:42,130 HelpFormatter - Program Args: -T HaplotypeCaller --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -kmerSize 5 -minPruning 3 --forceActive -R /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta --dbsnp /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf -L /Users/tim/home/proj_maria_dementia/captureRegion/04818-1366129663-2/04818-1366129663_Covered_bed6_noHeader_b37.list -I 188.aln.valid.bam --out 188.aln.valid.hc.gvcf -nct 4 
    INFO  03:40:42,135 HelpFormatter - Executing as [email protected] on Mac OS X 10.8.5 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18. 
    INFO  03:40:42,136 HelpFormatter - Date/Time: 2014/04/02 03:40:42 
    INFO  03:40:42,136 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  03:40:42,136 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  03:40:42,699 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  03:40:42,797 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 

    I then checked the output file for depth and see that the downsampling does not seem to have been done.

    # Getting the variants with the most depth in out file
    cat 188.aln.valid.hc.gvcf | grep -o "DP=[0-9]*" | sed 's/DP=//' | sort -nr | head

    It would seem that the downsampling is not working with HC (at least with the above settings)....?

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Tim,

    This is actually an issue we're aware of and will fix eventually. Basically, dcov currently doesn't work with ActiveRegion Walkers, so the HaplotypeCaller does its own downsampling to a hardcoded maximum of 1000 reads per sample per active region. I'll make a note to update the docs to clarify this.

  • TimHughesTimHughes Member

    Now that you mention it, I remember this limitation of the ActiveRegion walkers: I have run into it before :)

Sign In or Register to comment.