Phantom indels from HaplotypeCaller?

grafalgrafal MPI TuebingenMember

Dear GATK users and developers,

I am running HaplotypeCaller followed by ValidateVariants and the latter complains about variants that have called alternative allele without any observation for it.

ERROR MESSAGE: File /storage/rafal.gutaker/NEXT_test/work/4f/6f8738a66d1c9d12651b76b7ef8819/IRIS_313-15896.g.vcf fails strict validation: one or more of the ALT allele(s) for the record at position LOC_Os01g01010:6190 are not observed at all in the sample genotypes |
ERROR ------------------------------------------------------------------------------------------

Here is an example of site that ValidateVariant complains about:

LOC_Os01g01010 6190 . GT G, 0 . DP=4;ExcessHet=3.0103;MLEAC=0,0;MLEAF=0.00,0.00;RAW_MQ=14400.00 GT:AD:DP:GQ:PL:SB 0/0:4,0,0:4:12:0,12,135,12,135,135:4,0,0,0
LOC_Os01g01010 6192 . T . . END=6192 GT:DP:GQ:MIN_DP:PL 0/0:8:0:8:0,0,254

In general, it seems not dangerous so i am thinking of removing this check, but why is HaplotypeCaller finding phanotm variants is a mystery to me.

Thank you and

Best!
Rafal

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @grafal
    Hi Rafal,

    Interesting. That site has a proper variant allele along with NON-REF that is present in a hom-ref block. I think those sites should not be part of a block (if they have a proper variant allele called). Can you confirm you are using the latest version of GATK?

    Thanks,
    Sheila

  • grafalgrafal MPI TuebingenMember

    @Sheila

    Hi Sheila,

    That's right, i have version 3.8. Let me know if you need anything to get to the bottom of this.

    Thanks,
    Rafal

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @grafal
    Hi Rafal,

    Can you check if this happens with GATK4 latest beta? I may need you to submit a bug report if it does. You can download the latest beta here.

    Thanks,
    Sheila

  • mshrutimshruti Member
    edited June 3

    @ Sheila
    Hi Sheila,

    I am facing a similar issue in GATK4. When I run ValidateVariants on the g.vcf generated by HaplotypeCaller, I get the following error:
    A USER ERROR has occurred: Input /scratch/PI/euan/common/udn/externalUdnData/FDA/GATK4/final/05292018/NA12878_raw_snps_indels.g.vcf fails strict validation: one or more of the ALT allele(s) for the record at position 1:16103 are not observed at all in the sample genotypes of type:

    I am confused about interpreting this variant as I have only one sample, but two variants are mentioned in this line of vcf.
    1 16103 rs200358166 T G,TGG, 222.75 . DB;DP=9;ExcessHet=3.0103;MLEAC=2,0,0;MLEAF=1,0,0;RAW_MQ=10580 GT:AD:DP:GQ:PL:SB 1/1:0,8,0,0:8:24:260,24,0,283,27,366,276,27,329,313:0,0,7,1

    I get java.io.IOException: Input/output error while running GenotypeGVCFs if I use this g.vcf.

    Command run to validate the vcf:
    /scratch/PI/euan/common/udn/externalUdnData/FDA/GATK4]$ /share/PI/euan/apps/gatk/gatk-4.0.4.0/gatk ValidateVariants \
    -V /scratch/PI/euan/common/udn/externalUdnData/FDA/GATK4/final/05292018/NA12878_raw_snps_indels.g.vcf \
    -R /scratch/PI/euan/common/udn/externalUdnData/FDA/hs37d5_ref/hs37d5.fa \
    --dbsnp /share/PI/euan/apps/gatk/gatk_bundle_b37/dbsnp_138.b37.vcf.gz

    Thanks,
    Shruti

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @mshruti
    Hi,

    Can you add `--validate-GVCF to your command? I am assuming you are not using that.

    -Sheila

  • mshrutimshruti Member

    Thanks Sheila. I was not using --validate-GVCF argument.

Sign In or Register to comment.