Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Question regarding calling denovo mutations

MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

Hi everyone!

In the genotype refinement process at the last step, 'hiConfDenovo' and 'loConfDenovo' mutations are tagged.
After Genotype Refinement, I just filtered 'hiConfDenovo' mutations in the separate VCF file by SelectVariants. But i have confusion regarding the results when i looked at the genotypes information of individuals from trios.

I just want to know how the following Genotype is called denovo for the affected child (male)? My data set was unphased when i process it through genotype refinement protocol.

GT:AD:DP:GQ:JL:JP:PL:PP (child) 0/1:202,15:223:86:6:40:89,0,711:86,0,761 (father) 0/0:235,8:248:43:6:40:0,42,891:0,43,900 (mother) 0/1:226,10:236:46:6:40:6,0,665:46,0,673 (REF= C, ALT=A)

GT:AD:DP:GQ:JL:JP:PL:PP (child) 0/1:16,12:28:99:52:52:231,0,428:291,0,485 (father) 0/0:24,1:25:58:52:52:0,66,732:0,58,775 (mother) 1/1:2,26:29:55:52:52:730,53,0:798,55,0 (REF= A, ALT=G)

GT:AD:DP:GQ:JL:JP:PL:PP (child) 0/1:15,10:25:99:75:75:263,0,377:263,0,443 (father) 0/1:20,8:29:99:75:75:183,0,572:243,0,578 (mother) 0/0:27,0:27:75:75:75:0,75,850:0,75,856 (REF= C, ALT=T)

The child can inherit from father or mother the respective alleles, so how can we call that de novo mutation for affected child? Also these two SNPs are already existed in dbSNP database.. That is mentioned in VCF file.

I am pretty new in this field, Can you please clear this?

Thanks!

Issue · Github
by Sheila

Issue Number
66
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi,

    Can you post the full vcf records? It is hard to comment without site-level information.

    Thanks,
    Sheila

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila
    Hi! here the records

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HMN15-1 (child) HMN15-2 (Father) HMN15-3 (Mother)

    Record 1
    1 2618804 rs4074927 C A 64.35 VQSRTrancheSNP99.50to100.00 AC=2;AF=0.333;AN=6;BaseQRankSum=-0.814;DB;DP=708;Dels=0.00;FS=30.073;HaplotypeScore=86.5005;MLEAC=2;MLEAF=0.333;MQ=14.14;MQ0=342;MQRankSum=4.019;PG=0,2,10;QD=0.14;ReadPosRankSum=2.441;SOR=2.220;VQSLOD=-1.497e+02;culprit=DP;hiConfDeNovo=HMN15-1 GT:AD:DP:GQ:JL:JP:PL:PP 0/1:202,15:223:86:6:40:89,0,711:86,0,761 0/0:235,8:248:43:6:40:0,42,891:0,43,900 0/1:226,10:236:46:6:40:6,0,665:46,0,673

    Record 2
    1 4965491 rs4654467 A G 924.90 PASS AC=3;AF=0.500;AN=6;BaseQRankSum=-3.098;DB;DP=82;Dels=0.00;FS=0.854;HaplotypeScore=3.6810;MLEAC=3;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.554;PG=9,2,0;POSITIVE_TRAIN_SITE;QD=16.23;ReadPosRankSum=-0.901;SOR=0.540;VQSLOD=16.50;culprit=MQ;hiConfDeNovo=HMN15-1 GT:AD:DP:GQ:JL:JP:PL:PP 0/1:16,12:28:99:52:52:231,0,428:291,0,485 0/0:24,1:25:58:52:52:0,66,732:0,58,775 1/1:2,26:29:55:52:52:730,53,0:798,55,0

    Record 3
    1 116338765 rs9428236 C T 414.16 PASS AC=2;AF=0.333;AN=6;BaseQRankSum=0.812;DB;DP=81;Dels=0.00;FS=3.764;HaplotypeScore=0.3319;MLEAC=2;MLEAF=0.333;MQ=60.00;MQ0=0;MQRankSum=-1.492;PG=0,0,6;POSITIVE_TRAIN_SITE;QD=7.67;ReadPosRankSum=-0.144;SOR=1.304;VQSLOD=15.93;culprit=MQ;hiConfDeNovo=HMN15-1 GT:AD:DP:GQ:JL:JP:PL:PP 0/1:15,10:25:99:75:75:263,0,377:263,0,443 0/1:20,8:29:99:75:75:183,0,572:243,0,578 0/0:27,0:27:75:75:75:0,75,850:0,75,856

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila
    I am waiting for your kind comments..

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi,

    I need to get some confirmation on this question. I am putting in a ticket for the methods team to help me out.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi,

    It seems those should not be getting called as de Novos. Can you please upload your input ped file and your VCF subset to those three records? Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila
    I have uploaded all the data and VCF files in directory "SOHAIL_BUGREPORT_Sheila". Please read read the note and commands that are included in archive first. Please let the data confidential.

    Thanks,
    sohail

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila is there any update? it is so long ...

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi,

    I have yet to put in the bug report. Sorry! I have been swamped with higher priority bugs recently. I will get yours in by Wednesday.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi,

    It turns out you submitted a .rar file, which we do not accept. We only accept .gz or .tar files. Can you resubmit with one of those file types?

    Also, .rar files are only supported on Windows. GATK is not supported on Windows.

    -Sheila

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila I uploaded the files in the same folder in *.tar.gz format.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi,

    I just tested your files with the latest version of GATK, and the issue does not exist. Maybe try upgrading to the latest version, if you have not already done so.

    -Sheila

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila I am already using the updated version of GATK-3.4, can you please share your commands?

    My commands were attached in this comment..

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi,

    I ran the same commands as you did, but for CalculateGenotypePosteriors, I did not use the --supporting file.

    I think the --supporting file is only necessary if you have more than ten samples, but it can change the results.

    Please try running Calculate Genotype Posteriors without the --supporting file and let me know if that works. If it does, I will look into this issue farther.

    -Sheila

    Issue · Github
    by Sheila

    Issue Number
    122
    State
    open
    Last Updated
  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi again,

    I just tested with a --supporting file, and it still works for me. I suspect the ALL.wgs.phase3_20130502_biallelic_only_snps.sites.vcf may be causing your issue somehow.

    You can also try using 1000G_phase3_v4_20130502.sites.vcf which is in our bundle. That is what I used as the supporting file.

    -Sheila

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila Okay i will try to replicate the experiment..

    but one unusual thing that i observed was this kind of result i got with UG-VQSR file but not with HC-VQSR file. Though same VQSR steps and parameters was set in both cases.

    -Sohail

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Hi Sohail,

    I am not sure I understand what you mean by "this kind of result". Can you clarify? It is certainly possible that the outputs of Unified Genotyper and Haplotype Caller contain different variant calls. We recommend Haplotype Caller for the best results.

    Thanks,
    Sheila

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    Hi @Sheila
    I "this kind of result" mean that the amibiguous possible de novo calls (the VCF records that i reported above) that were made after genotype refinement were originated from UG-VQSR file..

    Sorry for inconvenience!

    -Sohail .

  • SheilaSheila Broad InstituteMember, Broadie admin

    @MUHAMMADSOHAILRAZA
    Ah, Okay. So, Haplotype Caller produces the correct results?

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Sheila yup, i will make sure again.. and will get back to you... but the file that i provided you was UG-VQSR. and you ran them successfully. i will check where the issue is?? i am also not sure about --supporting file, so let me try.. :)

    -sohail

Sign In or Register to comment.