Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

Phantom indels from HaplotypeCaller?

grafalgrafal MPI TuebingenMember

Dear GATK users and developers,

I am running HaplotypeCaller followed by ValidateVariants and the latter complains about variants that have called alternative allele without any observation for it.

ERROR MESSAGE: File /storage/rafal.gutaker/NEXT_test/work/4f/6f8738a66d1c9d12651b76b7ef8819/IRIS_313-15896.g.vcf fails strict validation: one or more of the ALT allele(s) for the record at position LOC_Os01g01010:6190 are not observed at all in the sample genotypes |
ERROR ------------------------------------------------------------------------------------------

Here is an example of site that ValidateVariant complains about:

LOC_Os01g01010 6190 . GT G, 0 . DP=4;ExcessHet=3.0103;MLEAC=0,0;MLEAF=0.00,0.00;RAW_MQ=14400.00 GT:AD:DP:GQ:PL:SB 0/0:4,0,0:4:12:0,12,135,12,135,135:4,0,0,0
LOC_Os01g01010 6192 . T . . END=6192 GT:DP:GQ:MIN_DP:PL 0/0:8:0:8:0,0,254

In general, it seems not dangerous so i am thinking of removing this check, but why is HaplotypeCaller finding phanotm variants is a mystery to me.

Thank you and

Best!
Rafal

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @grafal
    Hi Rafal,

    Interesting. That site has a proper variant allele along with NON-REF that is present in a hom-ref block. I think those sites should not be part of a block (if they have a proper variant allele called). Can you confirm you are using the latest version of GATK?

    Thanks,
    Sheila

  • grafalgrafal MPI TuebingenMember

    @Sheila

    Hi Sheila,

    That's right, i have version 3.8. Let me know if you need anything to get to the bottom of this.

    Thanks,
    Rafal

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @grafal
    Hi Rafal,

    Can you check if this happens with GATK4 latest beta? I may need you to submit a bug report if it does. You can download the latest beta here.

    Thanks,
    Sheila

  • mshrutimshruti Member
    edited June 3

    @ Sheila
    Hi Sheila,

    I am facing a similar issue in GATK4. When I run ValidateVariants on the g.vcf generated by HaplotypeCaller, I get the following error:
    A USER ERROR has occurred: Input /scratch/PI/euan/common/udn/externalUdnData/FDA/GATK4/final/05292018/NA12878_raw_snps_indels.g.vcf fails strict validation: one or more of the ALT allele(s) for the record at position 1:16103 are not observed at all in the sample genotypes of type:

    I am confused about interpreting this variant as I have only one sample, but two variants are mentioned in this line of vcf.
    1 16103 rs200358166 T G,TGG, 222.75 . DB;DP=9;ExcessHet=3.0103;MLEAC=2,0,0;MLEAF=1,0,0;RAW_MQ=10580 GT:AD:DP:GQ:PL:SB 1/1:0,8,0,0:8:24:260,24,0,283,27,366,276,27,329,313:0,0,7,1

    I get java.io.IOException: Input/output error while running GenotypeGVCFs if I use this g.vcf.

    Command run to validate the vcf:
    /scratch/PI/euan/common/udn/externalUdnData/FDA/GATK4]$ /share/PI/euan/apps/gatk/gatk-4.0.4.0/gatk ValidateVariants \
    -V /scratch/PI/euan/common/udn/externalUdnData/FDA/GATK4/final/05292018/NA12878_raw_snps_indels.g.vcf \
    -R /scratch/PI/euan/common/udn/externalUdnData/FDA/hs37d5_ref/hs37d5.fa \
    --dbsnp /share/PI/euan/apps/gatk/gatk_bundle_b37/dbsnp_138.b37.vcf.gz

    Thanks,
    Shruti

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @mshruti
    Hi,

    Can you add `--validate-GVCF to your command? I am assuming you are not using that.

    -Sheila

  • mshrutimshruti Member

    Thanks Sheila. I was not using --validate-GVCF argument.

Sign In or Register to comment.