Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs on BP_RESOLUTION mode and AD values

Hello,

I have a question related to this one here by @bassu

I've been working with HaplotypeCaller + GenotypeGVCFs on BP_RESOLUTION mode. I noticed recently that bcftools merge would refuse to merge two VCFs generated by gatk in the previous manner due to the presence of REF positions for which gatk outputs two values for the AD field. The example below is for an 'N' base at REF but this would be the same behaviour if 'N' is a regular base (A,G,T,C).

File 1 extract:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT V00055 V00082
Y 2600000 . N . . . AN=0 GT:AD:DP:RGQ .:0,0:0:0 .:0,0:0:0

File 2 extract:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GRC001622 GRC004194
Y 2600000 . N . . . AN=0 GT:AD:DP:RGQ .:0,0:0:0 .:0,0:0:0

~/bcftools-1.6/bcftools merge FILE_1.vcf.gz FILE_2.vcf.gz
...
...
Incorrect number of AD fields (2) at Y:2600000, cannot merge.

I first thought this was a possible wrong behaviour of bcftools, but then I was pointed out here to the specs of the AD field in VCF v4.2 and well, I would like to ask your observations on this also. Should not gatk be outputting only one AD value for positions like these sorry? Or this is the expected too on the gatk side? It maybe the latter since CombineVariants has no problem merging two such files.

On the other hand, being able to merge files like these using bcftools makes my pipeline easier ,

Please let me know your comments,
Many thanks
Rodrigo

( the previous applies when using gatk versions 3.5 or superior, therefore implying VCF v4.2 or superior too)

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @jrodrigof
    Hi Rodrigo,

    Can you check if this happens with the latest version of GATK3? If it does, can you check if it still happens in GATK4 beta 6?

    If it is still an issue in GATK4, I may need you to submit a bug report.

    Thanks,
    Sheila

  • JRodrigo_FJRodrigo_F EBCMember

    @Sheila
    Hello Sheila,

    I'm sorry I have not been able to test this on gatk4. All we do is still for the moment based on gatk 3.5 and 3.8 so if gatk4 solves it, is still somewhat not a practial solution for me.

    Also, I need to use BP_RESOLUTION mode in genotypeGVCFs , and last time I tried, this option was not available in version 4.

    What I'm sure is that yes, the latest version of gatk 3.8 has the issue. And as I said before, bcftools complains and stops, but CombineVariants does the job,

    Best
    Rodrigo

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @JRodrigo_F
    Hi Rodrigo,

    Okay, so are you okay with using CombineVariants? Unfortunately, I don't think the team will do anything in this case, as efforts are focused on GATK4 and there seems to be a workaround.

    -Sheila

Sign In or Register to comment.