This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
When there is a GATK update, when do you know if you should rerun old data?
When there is a new version of GATK released, how do you know whether it is necessary to rerun all of your old data for an ongoing project? I realize there is probably not one good answer to this question because it would depend on the updates involved but I am just looking for some general recommendations. For instance, my last batch of sequencing data was done under GATK v3.4 and an update was just released (v3.6). I am currently using the workflow where I generate a GVCF for each sample and then every new batch of exome data I get, I use the previous GVCF's and the new GVCF's and do variant calling (GenotypeGVCFs) on all of them at once. So the next time I run, I would plan on using v3.6 (because it is the most updated), but does that negate the usefulness of the other files generated with older versions? From 3.4 to 3.6 the only major workflow change appears to be the removal of "local realignment around indels" from the workflow according to the release notes but there are also some bug fixes it appears.
I am just trying to get some recommendations for dealing with program updates because I have a sequencing project that will go on for a few years and I am sure there will be countless upgrades to make the GATK workflow more accurate and this concern will just keep rearing its ugly head.
Thank you for your program and your help,