Dfrac and dbsnp rod

I am currently running a test of a sample, where I use 'dfrac' at a range from 0.05 to 1.00 (in 0.05 increments), and compare the proportion of novel SNPs (not in dbSNP 146) across runs. I noticed a strange issue, where there are variants that sometimes get assigned the rsID and other times do not. Notice below that the genotype is always homozygous, yet only at certain dfrac does the rsID get assigned from the dbsnp146 ROD.

dfrac CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GENOTYPE
0.1 21 47864560 rs767266921 CTTTT C 53.7 . AC=2;AF=1.00;AN=2;DB;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.85;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:90,6,0
0.15 21 47864560 rs143709505 CTTT C 143 . AC=2;AF=1.00;AN=2;DB;DP=4;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=32.08;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,4:4:12:180,12,0
0.2 21 47864560 rs143709505 CTTT C 53.7 . AC=2;AF=1.00;AN=2;DB;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.85;SOR=2.303 GT:AD:DP:GQ:PL 1/1:0,2:2:6:90,6,0
0.3 21 47864560 . CTTT C 618.73 . AC=2;AF=1.00;AN=2;DP=18;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=34.37;SOR=3.126 GT:AD:DP:GQ:PL 1/1:0,16:16:47:638,47,0
0.35 21 47864560 rs143709505 CTTT C 435.73 . AC=2;AF=1.00;AN=2;DB;DP=12;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=30.88;SOR=3.056 GT:AD:DP:GQ:PL 1/1:0,12:12:35:434,35,0
0.4 21 47864560 . CTTT C 567.73 . AC=2;AF=1.00;AN=2;DP=20;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=28.39;SOR=2.670 GT:AD:DP:GQ:PL 1/1:1,17:18:42:552,42,0
0.45 21 47864560 . CTTT C 226.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.370;ClippingRankSum=-0.555;DP=18;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.925;QD=12.60;ReadPosRankSum=-0.555;SOR=0.818 GT:AD:DP:GQ:PL 0/1:3,13:16:59:264,0,59
0.5 21 47864560 . CTTT C 678.73 . AC=2;AF=1.00;AN=2;DP=24;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=28.28;SOR=2.147 GT:AD:DP:GQ:PL 1/1:1,20:21:51:670,51,0
0.55 21 47864560 . CTTT C 575.19 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.779;ClippingRankSum=1.378;DP=27;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=-0.899;QD=21.30;ReadPosRankSum=-0.060;SOR=0.760 GT:AD:DP:GQ:PL 1/1:2,22:24:4:611,4,0
0.6 21 47864560 . CTTT C 510.73 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.474;ClippingRankSum=0.158;DP=19;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=-0.158;QD=26.88;ReadPosRankSum=-1.107;SOR=1.179 GT:AD:DP:GQ:PL 1/1:2,16:18:12:473,12,0
0.65 21 47864560 rs143709505 CTTT C 751.73 . AC=2;AF=1.00;AN=2;DB;DP=22;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=34.17;SOR=2.584 GT:AD:DP:GQ:PL 1/1:1,21:22:54:722,54,0
0.7 21 47864560 . CTTT C 895.73 . AC=2;AF=1.00;AN=2;DP=27;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=33.18;SOR=3.383 GT:AD:DP:GQ:PL 1/1:0,26:26:77:882,77,0
0.75 21 47864560 . CTTT C 883.73 . AC=2;AF=1.00;AN=2;DP=28;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=31.56;SOR=2.833 GT:AD:DP:GQ:PL 1/1:1,25:26:66:890,66,0
0.8 21 47864560 . CTTT C 1132.73 . AC=2;AF=1.00;AN=2;DP=36;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=31.46;SOR=2.833 GT:AD:DP:GQ:PL 1/1:1,33:34:89:1136,89,0
0.85 21 47864560 . CTTT C 958.73 . AC=2;AF=1.00;AN=2;DP=30;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=31.96;SOR=2.636 GT:AD:DP:GQ:PL 1/1:0,27:27:80:991,80,0
0.9 21 47864560 . CTTT C 1115.73 . AC=2;AF=1.00;AN=2;DP=36;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=30.99;SOR=2.925 GT:AD:DP:GQ:PL 1/1:1,32:33:86:1117,86,0
0.95 21 47864560 . CTTT C 1157.73 . AC=2;AF=1.00;AN=2;DP=37;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=31.29;SOR=3.528 GT:AD:DP:GQ:PL 1/1:1,33:34:89:1162,89,0
1 21 47864560 . CTTT C 1215.73 . AC=2;AF=1.00;AN=2;DP=39;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;QD=31.17;SOR=3.098 GT:AD:DP:GQ:PL 1/1:1,35:36:95:1226,95,0

Here is what shows up in dbSNP

CHROM POS ID REF ALT QUAL FILTER INFO

21 47864558 rs779564626 CTCTTTT C . . RS=779564626;RSPOS=47864559;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864558 rs754825923 CTCTT C . . RS=754825923;RSPOS=47864559;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864558 rs755540815 CTCT C . . RS=755540815;RSPOS=47864559;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864558 rs765968070 CTCTTT C . . RS=765968070;RSPOS=47864559;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864558 rs764395237 CTCT C . . RS=764395237;RSPOS=47864560;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs747167277 C CTT,CTTT . . RS=747167277;RSPOS=47864560;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs143709505 CTTT C . . RS=143709505;RSPOS=47864561;dbSNPBuildID=134;SSR=0;SAO=0;VP=0x050000080005170026000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP;VLD;G5A;G5;K
21 47864560 rs767266921 CTTTT C . . RS=767266921;RSPOS=47864561;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs771189603 CT C . . RS=771189603;RSPOS=47864561;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs778013779 CTT C . . RS=778013779;RSPOS=47864561;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs757334979 CTTT C . . RS=757334979;RSPOS=47864562;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP;CAF=0.2386,0
21 47864560 rs750282079 CTT C . . RS=750282079;RSPOS=47864563;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs75071492 CTTTTT C . . RS=75071492;RSPOS=47864573;dbSNPBuildID=131;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs3058084 CTTT C . . RS=3058084;RSPOS=47864575;dbSNPBuildID=102;SSR=0;SAO=0;VP=0x050100080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;SLO;INT;ASP;CAF=0.2386
21 47864560 rs397867558 CT C . . RS=397867558;RSPOS=47864575;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP
21 47864560 rs147229575 CTT C . . RS=147229575;RSPOS=47864576;dbSNPBuildID=134;SSR=0;SAO=0;VP=0x050000080005000002000200;GENEINFO=PCNT:5116;WGT=1;VC=DIV;INT;ASP

Is this a bug in GATK HaploidCaller, the dfrac command, or dbSNP?

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @jlrflores
    Hi,

    I cannot reproduce the issue you described. I get the dbsnp ID for each dfrac value I specify. Sorry to ask, but are you sure you are specifying --dbsnp each time you run the command? Can you also post the exact command you ran?

    Thanks,
    Sheila

  • jlrfloresjlrflores Member ✭✭

    Hi Sheila,
    Yes I am sure that I am specifying dbsnp for each command. I am running this through a bash script:

    for b in bam1 bam2 bam3
    do
    for d in 1.00 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05
    do
    java -Xmx40G -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hs37d5.fa --dbsnp dbsnp146.vcf.gz -I $b -o ${b}.hc.coding.dfrac-${d}.vcf -dfrac $d -L agilent.bed "
    done
    done

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @jlrflores
    Hi,

    Hmm. I wonder if the issue occurs when you don't use a .gz file for the dbsnp VCF. Can you try unzipping the file? We know of some issues with using .gz files.

    -Sheila

  • jlrfloresjlrflores Member ✭✭

    I unzipped the dbsnp.vcf.gz file, and reran the same command (with --dbsnp dbsnp146.vcf). Also generated gVCF with the -ERC BP_RESOLUTION

    This does not seem to solve the problem, I still see sites with the same ref/alt allele and not all have the rsID assigned properly. Here are two example sites where the REF/ALT alleles are the same, yet in one VCF the rsID is added and in the other it is not added:

    20 62309554 rs41309367 C T,
    20 62309554 . C T,

    20 62309564 rs759077039 C G,
    20 62309564 . C G,

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @jrflores if you run the commands individually (not through a shell script) does the problem still occur? Can you produce an example case where this reproduces reliably?

Sign In or Register to comment.