Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

how to do downsampling


Could anyone tell me how to do downsampling analysis by using GATK tools? The GATK version I use is

Below is my command:
gatk PrintReads -R path/to/hg19.fa -I LIB.bam -O LIB_downsample10.bam --downsample_to_coverage 10

But it always threw out this error:
A USER ERROR has occurred: downsample_to_coverage is not a recognized option

When I changed the '--downsample_to_coverage' to '-dcov', the error became this:
A USER ERROR has occurred: d is not a recognized option

Could you please kindly help?

Thank you very much.



  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Li,

    We recommend DownsampleSam.


  • hsiaoyi0504hsiaoyi0504 Member

    @Sheila Is there any reason why gatk don't provide downsample to coverage anymore ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi Li,

    Honestly, I am not sure. The team changed quite a bit in PrintReads (mostly removing old functionalities). Perhaps they thought other tools could do those functions better, so there is no need for them in PrintReads. Hopefully DownsampleSam works well for you.


  • zhoulizhouli Member

    Hi there,

    I tried DownsampleSam to subsample a bam file to a series of different fractions, e.g. 40%, 60%, 80%. Sometimes it works, but sometimes it does not: the output bam file is somehow truncated for unknown reasons. This can be reproduced when I tried the same code on other bam files.

    Have you experienced this issue? How did you make subsampled bam file without truncated end?

    Thank you.

  • I have a related question - how do I disable downsampling in HaplotypeCaller in gatk4?

    I am using the gatk wrapper script:
    /programs/gatk4/gatk HaplotypeCaller --input input.sorted.bam --output output_emitRefConf.snps.indels.vcf --reference ref.fa --base-quality-score-threshold 18 --genotyping-mode DISCOVERY --min-base-quality-score 10 --output-mode EMIT_ALL_CONFIDENT_SITES --sample-ploidy 2 --standard-min-confidence-threshold-for-calling 10 --emit-ref-confidence BP_RESOLUTION

    I have tried the following flags:
    -dt NONE
    --downsample_type NONE
    --downsample-type NONE

    In all cases I get an error:

    A USER ERROR has occurred: is not a recognized option.

    The help menu that displays with the error report does not list any of these flags as a valid option.

    I do not wish to downsample because I have pooled samples in my sequencing data.

    Thank you.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @sesander,

    For the version of HaplotypeCaller you are using, check the tool documentation. You will want to carefully consider the parameters and possibly change the defaults, e.g. for --max-reads-per-alignment-start and --max-num-haplotypes-in-population and other parameters that may limit analysis and calling.

    Also, --sample-ploidy 2 does not make sense for your pooled situation unless you expect only two different alleles for each site.

    You can review all the parameters either on the tool docs website OR by typing gatk Haplotypecaller. This will call the tool help menu.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @zhouli,

    Can you make sure your original BAMs validate with ValidateSamFile? If they do, then the absence of the EOF marker could be a bug and we'll need more details on your setup, e.g. version of GATK4 and how you are running this tool (are you multithreading etc). Instructions for bug reports are at https://software.broadinstitute.org/gatk/guide/article?id=1894.

Sign In or Register to comment.