We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Joint-calling with GATK4.1 on gvcf's made with GATK4.0.0.0+

vvivvi United KingdomMember

Some version of this question comes up often, with different parts of the GATK workflow with different versions needing to be combined.

In our simple case: we have run Haplotype calling of a study with v We'd like to joint call with v4.1.0.0. In the more complex case: we have two studies, one run with v4.0.0.0 and the next with a slightly later point-release (v4.0.x.y), and we would like to joint-call the two together, preferably with v4.1.0.0.

The general theme of the responses to this kind of question is always cautious, but changes depending on the versions in play and the parts of the pipe in play. The closest answer I've got is this one: https://gatkforums.broadinstitute.org/gatk/discussion/7728/when-there-is-a-gatk-update-when-do-you-know-if-you-should-rerun-old-data.

Can you provide me with a more tailored response?


  • AdelaideRAdelaideR Member admin

    @vvi -

    It is always a best practice to use the same method all the way through an analysis, when possible.

    I can understand how rerunning old data can appear to be a cumbersome exercise so I do have a suggestion that may minimize the amount of regeneration of data when a version of GATK changes.

    If possible, rerunning one sample through the HC and using the SelectVariants tool to compare the two files can help you determine how much difference you are seeing due to the version of GATK.

    The 4.1.0 version of GATK has some cool new tools, and details can be found in the release notes.

    So, ultimately it is up to your discretion, but retesting one or two of the HC runs won't slow down the progress of your research and should allay any concerns about missing important biological information.

Sign In or Register to comment.