Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Changing compression level in GATK 4.0.0.0

amywilliamsamywilliams Ithaca, NYMember

When running GATK 4.0.0.0, (in this case using Apply BQSR) the notice

11:36:10.430 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 1

appears. A bit of digging led me to the Python code in the newly distributed gatk program. There, there are two variables that set -Dsamjdk.compression_level=1 by default. I changed the level here to 5, but the output from ApplyBQSR remained the same, and from the file sizes i'm seeing (though I may be wrong), it seems that the compression level is not at 5.

Thoughts?

Issue · Github
by Sheila

Issue Number
2966
State
closed
Last Updated
Closed By
chandrans

Comments

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Have you tried explicitly changing compression level by the gatk parameter inside the command line?

  • amywilliamsamywilliams Ithaca, NYMember

    I guess the real question is, what parameter should I be using. Under GATK version 3.8-0, there was --bam_compression (or -compress), but these options don't work in 4.0.0.0 and I don't see any options that mention compression in the new documentation.

  • rdubinrdubin Albert Einstein College of MedicineMember

    In a similar vein, when I run the picard tool IlluminaBasecallsToFastq that now comes packaged with the GenomeAnalysisTK version 4.0.0.0, I see no difference in output file size whether I make the call using both --COMPRESS_OUTPUTS true AND --COMPRESSION_LEVEL 5 or whether I make the call using only --COMPRESS_OUTPUTS true (which uses the default value for compression_level, which, from the --help page for this version of IlluminaBasecallsToFastq, appears to be 1).

  • SheilaSheila Broad InstituteMember, Broadie admin

    @rdubin
    Hi,

    It seems the default for Compression level is 5 in GATK4. Have a look at the tool doc for more information. We now have docs for the tools in GATK4 :smiley:

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @amywilliams
    Hi,

    Sorry for the delay. Somehow I missed your question in my email. I hope the tool doc I pointed to above helps.

    -Sheila

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Sheila,
    Regardless of what the tool doc says (and you are correct, it says default is 5), here is what My gatk v4 help says:

    $ gatk IlluminaBasecallsToFastq -help

    Using GATK jar /gs/gsfs0/hpc01/apps/GenomeAnalysisTK/4.0.0.0/java.1.8.0_20/gatk-package-4.0.0.0-local.jar
    ......
    --COMPRESSION_LEVEL:Integer Compression level for all compressed files created (e.g. BAM and VCF). Default value: 1.

    In addition, the default compression level for old versions of Picard's IlluminaBasecallsToFastq is 5. However, when I run the old version of picard's IlluminaBasecallsToFastq I get one file size on the output fastq of a particular output sample and when I run the gatk v4 IlluminaBasecallsToFastq I get a larger file size on the same sample's output fastq file. So, they both cannot have compression level of 5, right?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @rdubin
    Hi,

    Yep, looks like a doc error. I will also need to check with the team on this. Let me get back to you.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @rdubin
    Hi again,

    This is starting to look like a bug, but let me get confirmation from my team before I put in a ticket.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @rdubin
    Hi again,

    This should be fixed in 4.0.2.1 available for download now.

    -Sheila

  • mpetekmpetek CambridgeMember

    @Sheila
    Hello,
    I'm seeing what I believe is the same issue with BAM compression after running ApplyBQSR. I can't find any option to define what the compression level should be, and the BAM size nearly doubles after bqsr. My workflow is alignment with bwa-mem, mark duplicates (file size stable at ~12GB for a whole exome), jumping to 20-22GB after applying bqsr.
    I'm running GATK v.4.1.0.0 within the broadinstitute/latest docker.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @mpetek

    Please post the exact command you are using and the entire error log.

Sign In or Register to comment.