Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

DiagnoseTargets output -reference allele?

I have used a VCF file that was produced by GATK for the -L option of DiagnoseTargets, but I get the alternative allele from the original VCF as the reference allele on the output vcf from Diagnose targets:
input VCF:

chr1 10425470 . G A 139.99 SNPBHF AC=1;AF=0.042;AN=24;BaseQRankSum=-5.862e+00;Clipp......

chr1 10425470 . A <DT> . PASS END=10425470;GC=1.00;IDP=805.00 FT:IDP:LL:ZL......

The GC content reflects the reference allele though.
Is this normal behaviour or a bug?

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Hi @annat,

    The reference allele output by DiagnoseTargets is not actually meaningful; in some cases where it is hard to look up, it is set to 'A' to save on compute time. This is harmless because the reference allele is not meaningful in the output of DT. Ideally we would prefer to emit a symbolic allele (<REF>) but this is not allowed by the VCF spec. So this is all just a limitation of the format.


Sign In or Register to comment.