GenotypeGVCFs shows <NON-REF> flag despite all reads in all samples supporting the reference allele

Hello,
I have been jointly genotyping variants in my samples using GenotypeGVCFs but a few sites show a flag despite all reads in all samples supporting the reference allele. Do you know why this might be the case?

An example:
chr2 37521 . T 66.18 PASS AC=0;AF=0.00;AN=30;DP=320;InbreedingCoeff=-0.0000;MLEAC=0;MLEAF=0.00;MQ=64.28 GT:AD:DP:RGQ 0/0:22,0:22:63 0/0:27,0:27:66 0/0:17,0:17:48 0/0:17,0:17:42 0/0:31,0:31:63 0/0:25,0:25:60 0/0:15,0:15:36 0/0:22,0:22:63 0/0:26,0:26:72 0/0:22,0:22:60 0/0:28,0:28:81 0/0:22,0:22:60 0/0:20,0:20:57 0/0:12,0:12:33 0/0:14,0:14:42
Thank you in advance for your help.

Tagged:

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited March 2018

    Hi @w_anderson,

    Can you be more specific about what you mean by flag? I see the record you posted has no variant allele, just the REF allele:

    chr2 37521 . T 66.18 PASS 
    

    Can you tell us which version of GATK you are using and post the exact GenotypeGVCFs and HaplotypeCaller commands that gave you this result? Thanks.

  • w_andersonw_anderson USMember
    edited March 2018

    I am sorry, it didn't seem to have posted correctly but there is a "NON-REF" tag in the ALT entry of the vcf:

    chr2 37521 . T NON_REF 66.18 PASS AC=0;AF=0.00;AN=30;DP=320;InbreedingCoeff=-0.0000;MLEAC=0;MLEAF=0.00;MQ=64.28 GT:AD:DP:RGQ 0/0:22,0:22:63 0/0:27,0:27:66 0/0:17,0:17:48 0/0:17,0:17:42 0/0:31,0:31:63 0/0:25,0:25:60 0/0:15,0:15:36 0/0:22,0:22:63 0/0:26,0:26:72 0/0:22,0:22:60 0/0:28,0:28:81 0/0:22,0:22:60 0/0:20,0:20:57 0/0:12,0:12:33 0/0:14,0:14:42
    

    We were following the GATK 3.6 Best Practises, using the following command:

    java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs \
    -R reference.fasta \
    --variant ind1.g.vcf \
    --variant ind2.g.vcf (...) \
    --variant ind15.g.vcf \
    -o output.vcf \
    --allSites \
    -nt 6
    

    I appreciate that this is not the latest version but we have been working on this dataset for a while and can't re-run the pipeline with the newest version.

    Post edited by shlee on
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @w_anderson,

    There appear to be a couple of inconsistencies in what you've posted. First, for a GVCF, the NON_REF allele ought to be symbolic, i.e. indicated by <NON_REF> as per VCF specifications.

    Second, your GenotypeGVCFs command uses the --allSites option. I asked previously that you also post your HaplotypeCaller command because these two steps go hand-in-hand. Did you use the -ERC BP_RESOLUTION mode of HaplotypeCaller? As this is the mode that should be used with a GenotypeGVCFs --allSites option.

Sign In or Register to comment.