Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCF_weird_result?

Hi GATK team,
I have a problem with (at least) one variant in my vcf output of GenotypeGVCF. This is the problematic variant:
1 899937 rs143296006 G T 8502.66 PASS AC=8;AF=0.800;AN=10;DB;DP=386;FS=0.000;GQ_MEAN=116.00;GQ_STDDEV=96.12;MLEAC=10;MLEAF=1.00;MQ=59.95;MQ0=0;NCC=7;QD=30.91;SOR=4.475;VQSLOD=2.43;culprit=FS GT:AD:DP:GQ:PGT:PID:PL 0/0:0,0:13:0:0|1:899928_G_C:0,0,0 ./.:22,0:22 1/1:0,26:26:87:1|1:899928_G_C:1261,87,0 ./.:14,0:14 ./.:29,0:29 1/1:0,30:30:99:1|1:899928_G_C:1422,99,0 ./.:39,0:39 1/1:0,40:40:99:1|1:899928_G_C:1935,129,0 ./.:29,0:29 1/1:0,85:85:99:1|1:899928_G_C:3908,265,0./.:32,0:32 ./.:26,0:26

As can be observed, for sample 1, I have 0/0 genotype while actually there are not reads supporting that genotype (AD=0,0), am I correct on this? If I look for this position in the gvcf file of that sample, I can not find any line regarding to the position, this is normal? So I have two issues:
1) Why AD=0,0 stands for 0/0 GT?
2) Why in the gvcf there is no line regarding to this position?

Furthermore, If I visualize this position in IGV, with the recalibrated bam file, I see this (see attached picture, the first bam). So, in my opinion, this is clearly 1/1 genotype for my sample. Am I missing something? Also, If I see this position for sample2 (./.:22,0) at IGV (see attached picture, second bam), it is also a 1/1 genotype, not ./. . And in sample2.gvcf file I can see this:
1 899937 . G <NON_REF> . . END=899938 GT:DP:GQ:MIN_DP:PL 0/0:22:0:22:0,0,0
Why 0/0? ...

Sorry if I'm not explaining myself very clear.
Thanks in advance.

Tagged:

Issue · Github
by Geraldine_VdAuwera

Issue Number
877
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Irantzu
    Hi,

    Can you please post the exact commands you used for each step starting at Haplotype Caller? Please post you command for Haplotype Caller, CombineGVCFs (if you used it), and GenotypeGVCFs.

    Can you also confirm that you are using the latest version of GATK?

    Thanks,
    Sheila

  • IrantzuIrantzu Member

    Hi Sheila,
    Yes. I'm using version 3.3-0.
    The commands are:
    HaplotypeCaller (one run per sample):
    java -Xmx8g -jar -XX:ParallelGCThreads=4 -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37.fasta -I sampleX.bam --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 --dbsnp dbsnp_138.b37.vcf

    CombineGVCF: Not used.

    GenotypeGVCF:
    java -Xmx16g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 6 -R human_g1k_v37.fasta --variant gvcf.list -o all.joinGeno.raw.vcf --dbsnp_138.b37.vcf

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @Irantzu,

    I assumed the sample you're concerned about is the one with record 0/0:0,0:13:0:0|1:899928_G_C:0,0,0. It looks like there is some read coverage at that site, albeit none with the called alleles. Furthermore, the PLs tell you that the program has no idea what the genotype should be -- they are all equally unlikely. So it is just defaulting to 0/0.

    That said, those records look nothing like the screenshot you show. Don't take this the wrong way, but are you sure you're looking at the data that correspond to the correct samples?

  • IrantzuIrantzu Member
    edited March 2015

    Hi @Geraldine ,
    Well, I'm concerned about all the genotypes (all samples) at that position, but yes, let's start from the record 0/0:0,0:13:0:0|1:899928_G_C:0,0,0. I know that these records have nothing in common with my IGV screenshot, but the data correspond to the samples. I'm quite sure about these, because I have observed other variants, and the genotypes are OK for them. As I have said, the problem is in this variant (at least). I'm totally stuck.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Irantzu
    Hi,

    We would like to debug this issue locally, so if you can upload snippets of your files, that would be very helpful. Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report
    We just need snippets of all your sample bam files in the region of concern.

    Thanks,
    Sheila

  • IrantzuIrantzu Member
    edited March 2015

    Hi there,
    I've just upload the data to ftp. The name of the tar file is irantzu_snippets2gatk.tar.gz . Let me know if you need something else.
    The problem, as I said, is the variant:
    1 899937 rs143296006 G T 8502.66 PASS AC=8;AF=0.800;AN=10;DB;DP=386;FS=0.000;GQ_MEAN=116.00;GQ_STDDEV=96.12;MLEAC=10;MLEAF=1.00;MQ=59.95;MQ0=0;NCC=7;QD=30.91;SOR=4.475;VQSLOD=2.43;culprit=FS GT:AD:DP:GQ:PGT:PID:PL 0/0:0,0:13:0:0|1:899928_G_C:0,0,0 ./.:22,0:22 1/1:0,26:26:87:1|1:899928_G_C:1261,87,0 ./.:14,0:14 ./.:29,0:29 1/1:0,30:30:99:1|1:899928_G_C:1422,99,0 ./.:39,0:39 1/1:0,40:40:99:1|1:899928_G_C:1935,129,0 ./.:29,0:29 1/1:0,85:85:99:1|1:899928_G_C:3908,265,0./.:32,0:32 ./.:26,0:26

    Maybe I'm missing something, I'm not sure.
    Thanks in advance,

    Irantzu

  • IrantzuIrantzu Member

    Hi all,
    I was wondering if you have any news about this issue?
    Thanks in advance.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Irantzu
    Hi,

    Sorry I have not been able to process this yet. I will get to it early next week.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Irantzu
    Hi,

    I have put in a bug report. Because this is a representation issue, the developers will need to finalize a solution. The issue is that one sample has a variant at position 899937, but another sample has a deletion. The default for the deletion is to be homozygous reference. Once there is a solution, I will let you know.

    -Sheila

Sign In or Register to comment.