Trio DeNovo Events Confusion

kmhernankmhernan Chicago, ILMember

Hello,

I have followed the GATK guidelines for trio calling, recalculating posteriors, and annotating possible de novo. However, I am confused about some of the 'hiConfDeNovo' calls.

Here is an example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  child   father  mother
chr1    144674391   rs61810930  C   T   1177.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=2.82;ClippingRankSum=0.306;DB;DP=71;FS=2.137;MLEAC=3;MLEAF=0.500;MQ=64.80;MQRankSum=-1.675e+00;PG=0,0,0;QD=16.59;ReadPosRankSum=-1.813e+00;SOR=1.034;hiConfDeNovo=child GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:16,17:33:99:127:127:0|1:144674371_T_C:645,0,612:645,0,612   0/1:11,5:16:99:127:127:0|1:144674371_T_C:151,0,447:151,0,447    0/1:11,11:22:99:127:127:0|1:144674371_T_C:412,0,692:412,0,692
chr1    145037984   rs2489136   T   G   1401.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=-4.595e+00;ClippingRankSum=0.717;DB;DP=107;FS=43.850;MLEAC=3;MLEAF=0.500;MQ=69.55;MQRankSum=-9.500e-02;PG=0,0,0;QD=13.35;ReadPosRankSum=0.032;SOR=0.296;hiConfDeNovo=child  GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:20,21:41:99:127:127:.:.:510,0,640:510,0,640 0/1:10,18:28:99:127:127:0|1:145037818_T_C:454,0,293:454,0,293   0/1:17,19:36:99:127:127:.:.:468,0,504:468,0,504
chr1    145053717   rs6670785   T   C   74.13   PASS    AC=1;AF=0.167;AN=6;BaseQRankSum=1.91;ClippingRankSum=-1.220e-01;DB;DP=114;FS=5.902;MLEAC=1;MLEAF=0.167;MQ=68.67;MQRankSum=0.203;PG=0,0,0;QD=3.22;ReadPosRankSum=1.01;SOR=2.712;hiConfDeNovo=child   GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/0:48,0:48:99:96:96:.:.:0,102,1530:0,102,1590  0/1:19,4:23:99:96:96:0|1:145053711_C_T:105,0,826:105,0,886  0/0:42,0:42:99:96:96:.:.:0,99,1485:0,99,1545
chr1    145121901   rs10752808  A   G   3364.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=4.39;ClippingRankSum=0.353;DB;DP=173;FS=3.597;MLEAC=3;MLEAF=0.500;MQ=63.47;MQRankSum=-3.750e-01;PG=0,0,0;QD=19.45;ReadPosRankSum=0.164;SOR=0.550;hiConfDeNovo=child GT:AD:DP:GQ:JL:JP:PL:PP 0/1:20,37:57:99:127:127:1187,0,627:1187,0,627   0/1:21,24:45:99:127:127:717,0,736:717,0,736 0/1:24,47:71:99:127:127:1491,0,706:1491,0,706
chr2    91766984    rs55874359  A   T   6057.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=0.467;ClippingRankSum=-5.510e-01;DB;DP=404;FS=23.592;MLEAC=3;MLEAF=0.500;MQ=67.76;MQRankSum=-5.800e-02;PG=0,0,0;QD=15.78;ReadPosRankSum=0.318;SOR=0.878;hiConfDeNovo=child  GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:69,62:131:99:127:127:.:.:1543,0,2330:1543,0,2330    0/1:44,40:84:99:127:127:0|1:91766961_T_C:1493,0,1698:1493,0,1698    0/1:87,82:169:99:127:127:0|1:91766961_T_C:3052,0,3450:3052,0,3450

If the all samples are called as 0/1 why is it called a deNovo event? Perhaps I am misunderstanding the definition used? This was done using gatk v3.4.

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kmhernan
    Hi,

    Can you post the exact command you used? If you used a supporting file, can you try only using a ped file?

    Thanks,
    Sheila

  • kmhernankmhernan Chicago, ILMember

    Sure.

    I first reduced to only biallelic sites in my trio vcf and in the 1000G_phase1.indels.hg19.vcf/1000G_phase1.snps.high_confidence.hg19.vcf files (note the phase3 do not appear to be in your bundle for hg19) using SelectVariants... nothing special there..

    Here are the three relevant steps:

    # Calculate posteriors. Note: the supporting files are reduce to biallelic sites only
    java -Xmx12G -jar $GATK -T CalculateGenotypePosteriors -R ucsc.hg19.fa --disable_auto_index_creation_and_locking_when_reading_rods -ped pedigree.ped --supporting 1000G_phase1.snps.high_confidence.hg19.vcf --supporting 1000G_phase1.indels.hg19.vcf -V trio.novoalign.filtered.biallelic.vcf -o trio.novoalign.posterior.vcf
    
    # Filter low quality genotypes
    java -jar $GATK -T VariantFiltration -R ucsc.hg19.fa -V trio.novoalign.posterior.vcf --disable_auto_index_creation_and_locking_when_reading_rods -G_filter "GQ < 20.0" -G_filterName lowGQ -o trio.novoalign.posterior.Gfiltered.vcf
    
    # Add trio annotations and others
    java -Xmx12G -jar $GATK -T VariantAnnotator -R ucsc.hg19.fa --disable_auto_index_creation_and_locking_when_reading_rods -nt 4 -ped pedigree.ped -V trio.novoalign.posterior.Gfiltered.vcf --dbsnp dbsnp_138.hg19.vcf -mvq 20.0 -A PossibleDeNovo  -o trio.novoalign.cgp.vanno.vcf
    

    There is only a single trio, so perhaps the --supporting is not needed?

    Kyle

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kmhernan
    Hi Kyle,

    The supporting file is only required if you have more than 10 samples. Try not using the supporting file and see if the results look better.

    Thanks,
    Sheila

  • kmhernankmhernan Chicago, ILMember

    @Sheila I re-ran without the supporting file and am still finding some of these events:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  child   father  mother
    chr1    143405603       rs201962840     T       G       938.92  PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=3.66;ClippingRankSum=1.20;DB;DP=273;FS=8.351;MLEAC=3;MLEAF=0.500;MQ=55.75;MQRankSum=-2.582e+00;PG=0,0,0;QD=3.48;ReadPosRankSum=0.449;SOR=1.430;hiConfDeNovo=child       GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:88,26:114:99:52:52:.:.:821,0,3672:821,0,3672        0/1:64,7:71:95:52:52:.:.:95,0,2624:95,0,2624    0/1:78,7:85:53:52:52:0|1:143405602_G_A:53,0,3324:53,0,3324
    chr1    144835803       rs58741194      T       G       575.92  PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=-1.660e+00;ClippingRankSum=0.410;DB;DP=130;FS=4.487;MLEAC=3;MLEAF=0.500;MQ=46.03;MQRankSum=-2.130e-01;PG=0,0,0;QD=4.43;ReadPosRankSum=-1.490e+00;SOR=1.220;hiConfDeNovo=child   GT:AD:DP:GQ:JL:JP:PL:PP 0/1:44,16:60:99:71:71:373,0,1351:373,0,1351     0/1:18,4:22:71:71:71:71,0,562:71,0,562  0/1:39,9:48:99:71:71:162,0,1227:162,0,1227
    chr1    145037984       rs2489136       T       G       1401.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=-4.595e+00;ClippingRankSum=0.717;DB;DP=107;FS=43.850;MLEAC=3;MLEAF=0.500;MQ=69.55;MQRankSum=-9.500e-02;PG=0,0,0;QD=13.35;ReadPosRankSum=0.032;SOR=0.296;hiConfDeNovo=child      GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:20,21:41:99:127:127:.:.:510,0,640:510,0,640 0/1:10,18:28:99:127:127:0|1:145037818_T_C:454,0,293:454,0,293   0/1:17,19:36:99:127:127:.:.:468,0,504:468,0,504
    chr1    145055239       rs1808995       G       T       1742.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=-2.500e-02;ClippingRankSum=0.481;DB;DP=277;FS=6.578;MLEAC=3;MLEAF=0.500;MQ=70.00;MQRankSum=-5.090e-01;PG=0,0,0;QD=6.31;ReadPosRankSum=0.169;SOR=0.736;hiConfDeNovo=child        GT:AD:DP:GQ:JL:JP:PL:PP 0/1:80,31:111:99:127:127:820,0,2603:820,0,2603  0/1:50,10:60:99:127:127:182,0,1611:182,0,1611   0/1:74,31:105:99:127:127:771,0,2361:771,0,2361
    

    hmmmm..... thoughts?

    Best,
    Kyle

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kmhernan
    Hi Kyle,

    Sorry for the late response. Can you try running without the -nt 4? Also, can you post your ped file?

    Thanks,
    Sheila

  • kmhernankmhernan Chicago, ILMember

    @Sheila No worries!

    First, here is my ped file:

    A   father  0   0   1   1
    A   mother  0   0   2   1
    A   child   father  mother  2   2
    

    Second, I am re-running without parallelization. It isn't completed yet, but I haven't come across the issues I had previously in the first 3 chromosomes...

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kmhernan
    Hi Kyle,

    Your ped file looks fine. Looking at the documentation for CalculateGenotypePosteriors, I don't think -nt is supported. Hopefully you get correct results without -nt!

    -Sheila

  • kmhernankmhernan Chicago, ILMember

    @Sheila thanks! However I think it is a more severe problem. I didn't use nt with CalculteGenotypePosteriors, but I did use it with VariantAnnotator which it is supposedly supported. So that is very worrisome. Are other annotations effected by some race condition when threads are used???!!!

Sign In or Register to comment.