To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Changing compression level in GATK 4.0.0.0

When running GATK 4.0.0.0, (in this case using Apply BQSR) the notice

11:36:10.430 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 1

appears. A bit of digging led me to the Python code in the newly distributed gatk program. There, there are two variables that set -Dsamjdk.compression_level=1 by default. I changed the level here to 5, but the output from ApplyBQSR remained the same, and from the file sizes i'm seeing (though I may be wrong), it seems that the compression level is not at 5.

Thoughts?

Comments

  • SkyWarriorSkyWarrior TurkeyMember

    Have you tried explicitly changing compression level by the gatk parameter inside the command line?

  • amywilliamsamywilliams Ithaca, NYMember

    I guess the real question is, what parameter should I be using. Under GATK version 3.8-0, there was --bam_compression (or -compress), but these options don't work in 4.0.0.0 and I don't see any options that mention compression in the new documentation.

  • rdubinrdubin Albert Einstein College of MedicineMember

    In a similar vein, when I run the picard tool IlluminaBasecallsToFastq that now comes packaged with the GenomeAnalysisTK version 4.0.0.0, I see no difference in output file size whether I make the call using both --COMPRESS_OUTPUTS true AND --COMPRESSION_LEVEL 5 or whether I make the call using only --COMPRESS_OUTPUTS true (which uses the default value for compression_level, which, from the --help page for this version of IlluminaBasecallsToFastq, appears to be 1).

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rdubin
    Hi,

    It seems the default for Compression level is 5 in GATK4. Have a look at the tool doc for more information. We now have docs for the tools in GATK4 :smiley:

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @amywilliams
    Hi,

    Sorry for the delay. Somehow I missed your question in my email. I hope the tool doc I pointed to above helps.

    -Sheila

  • rdubinrdubin Albert Einstein College of MedicineMember

    Hi Sheila,
    Regardless of what the tool doc says (and you are correct, it says default is 5), here is what My gatk v4 help says:

    $ gatk IlluminaBasecallsToFastq -help

    Using GATK jar /gs/gsfs0/hpc01/apps/GenomeAnalysisTK/4.0.0.0/java.1.8.0_20/gatk-package-4.0.0.0-local.jar
    ......
    --COMPRESSION_LEVEL:Integer Compression level for all compressed files created (e.g. BAM and VCF). Default value: 1.

    In addition, the default compression level for old versions of Picard's IlluminaBasecallsToFastq is 5. However, when I run the old version of picard's IlluminaBasecallsToFastq I get one file size on the output fastq of a particular output sample and when I run the gatk v4 IlluminaBasecallsToFastq I get a larger file size on the same sample's output fastq file. So, they both cannot have compression level of 5, right?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rdubin
    Hi,

    Yep, looks like a doc error. I will also need to check with the team on this. Let me get back to you.

    -Sheila

Sign In or Register to comment.