Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

When there is a GATK update, when do you know if you should rerun old data?

When there is a new version of GATK released, how do you know whether it is necessary to rerun all of your old data for an ongoing project? I realize there is probably not one good answer to this question because it would depend on the updates involved but I am just looking for some general recommendations. For instance, my last batch of sequencing data was done under GATK v3.4 and an update was just released (v3.6). I am currently using the workflow where I generate a GVCF for each sample and then every new batch of exome data I get, I use the previous GVCF's and the new GVCF's and do variant calling (GenotypeGVCFs) on all of them at once. So the next time I run, I would plan on using v3.6 (because it is the most updated), but does that negate the usefulness of the other files generated with older versions? From 3.4 to 3.6 the only major workflow change appears to be the removal of "local realignment around indels" from the workflow according to the release notes but there are also some bug fixes it appears.

I am just trying to get some recommendations for dealing with program updates because I have a sequencing project that will go on for a few years and I am sure there will be countless upgrades to make the GATK workflow more accurate and this concern will just keep rearing its ugly head.

Thank you for your program and your help,

Annie

Issue · Github
by Sheila

Issue Number
973
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • shawpashawpa Member

    I appreciate the rambling. This was so helpful.

  • james_lawlorjames_lawlor Huntsville, ALMember

    @Geraldine_VdAuwera - This post was very helpful! Now that 3.8 is out, and 3.x is nearing the end of its lifetime, can you comment again on whether or not it may be appropriate to use a newer version of GenotypeGVCFs on GVCF files generated with an older version of HaplotypeCaller?

    Specifically, I have a project I was hoping to genotype with version 3.8, to take advantage of the latest speed improvements, though the GVCF files were generated with 3.3.0. Thoughts?

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Unfortunately due to a bug in GKL JNI implementation speed improvements have to be disabled in order to use multicore and high mem in GATK 3.8 at the moment. We users are all waiting for a patch release to fix the issue. I would stick with 3.7 or use 3.8 with GKL compressors/decompressors disabled.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @james_lawlor
    Hi,

    Unfortunately, in your case, I think there were some pretty big changes made between 3.3 and 3.8 that would affect your work. It would be best to either stick with 3.3 or use 3.8 (best option :smiley: ). You can read the release notes for more information, but I am thinking the changes to how spanning deletions are handled will produce incorrect results.

    -Sheila

Sign In or Register to comment.