If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How exactly does downsample_to_coverage work with UnifiedGenoyper?

igorigor New YorkMember ✭✭
edited October 2012 in Ask the GATK team

I haven't been using GATK for long, but I assumed that downsample_to_coverage feature wouldn't ever be a cause for concern. I just tried running UnifiedGenotyper with -dcov set at 500, 5,000, and 50,000 on the same 1-sample BAM file. One would expect the results to be similar. However, 500 yielded 26 variants, 5,000 yielded 13, and 50,000 yielded just 1. Depth of that one variant was about 1,300 in the 50,000 cutoff. Why are the results so different?

Most of the other variants are in the biggest set were cut off at 500, so some reads were filtered. A few of them are at relatively low frequency, but most are at 25% or higher. If they are appearing by chance, they should not be at such high frequencies.

In addition, there are some variants that are below 500, so they should not be affected by the cutoff. Why are those showing up with the low cutoff and not the higher cutoff?

I am using GATK 2.1-8. I am looking at a single gene only, so that is why there are so few variants and such high coverage.

Best Answers


  • igorigor New YorkMember ✭✭
    edited October 2012

    This is targeted sequencing, so we are only amplifying a few kb of the human genome. Alignment is to hg19. Average coverage over the region of interest is ~13000x with all bases above 1000x according to GATK DepthOfCoverage. This is run on the Illumina MiSeq.

    This is the command (in case there is something else I may be overlooking):

    GATK -T UnifiedGenotyper -L path/intervals.bed -R path/hg19.fasta \ -dcov NNN -nt 6 -glm BOTH -stand_call_conf 30 -stand_emit_conf 10 \ -I sample.bam -o sample.vcf

    These are the variants called with dcov at 5,000 (I am skipping 500 to save space):

    chrX 76777866 . C G 31908.01 . AC=2;AF=1.00;AN=2;BaseQRankSum=2.345;DP=995;DS;Dels=0.00;FS=11.871;HaplotypeScore=172.0982;MLEAC=2;MLEAF=1.00;MQ=34.17;MQ0=0;MQRankSum=-0.511;QD=32.07;ReadPosRankSum=-4.069;SB=-7.181e+03 GT:AD:DP:GQ:PL 1/1:23,966:995:99:31908,2093,0 chrX 76854939 . C G 7477.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-30.703;DP=5000;DS;Dels=0.00;FS=373.021;HaplotypeScore=348.1264;MLEAC=1;MLEAF=0.500;MQ=53.03;MQ0=0;MQRankSum=9.918;QD=1.50;ReadPosRankSum=-39.747;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:4305,686:5000:99:7477,0,32767 chrX 76854940 . C T 8116.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-25.305;DP=5000;DS;Dels=0.00;FS=352.184;HaplotypeScore=366.1656;MLEAC=1;MLEAF=0.500;MQ=53.03;MQ0=0;MQRankSum=6.666;QD=1.62;ReadPosRankSum=-38.857;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:4285,710:4999:99:8116,0,32767 chrX 76888793 . A G 5413.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-21.112;DP=5000;DS;Dels=0.00;FS=86.451;HaplotypeScore=220.0143;MLEAC=1;MLEAF=0.500;MQ=51.74;MQ0=0;MQRankSum=-4.330;QD=1.08;ReadPosRankSum=-36.291;SB=-9.880e+02 GT:AD:DP:GQ:PL 0/1:4422,577:5000:99:5413,0,32767 chrX 76937453 . T G 14479.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-37.717;DP=5000;DS;Dels=0.00;FS=171.285;HaplotypeScore=149.7972;MLEAC=1;MLEAF=0.500;MQ=51.34;MQ0=0;MQRankSum=7.997;QD=2.90;ReadPosRankSum=-46.588;SB=-1.507e+03 GT:AD:DP:GQ:PL 0/1:4100,890:5000:99:14479,0,32767 chrX 76938431 . T G 1908.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.617;DP=5000;DS;Dels=0.00;FS=866.524;HaplotypeScore=329.1422;MLEAC=1;MLEAF=0.500;MQ=55.85;MQ0=0;MQRankSum=-3.964;QD=0.38;ReadPosRankSum=-34.104;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:4511,448:5000:99:1938,0,32767 chrX 76938432 . C T 1438.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-26.333;DP=5000;DS;Dels=0.00;FS=898.969;HaplotypeScore=359.6230;MLEAC=1;MLEAF=0.500;MQ=55.80;MQ0=0;MQRankSum=-2.827;QD=0.29;ReadPosRankSum=-34.183;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:4504,481:5000:99:1468,0,32767 chrX 76939472 . C G 5888.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-31.391;DP=4999;DS;Dels=0.00;FS=351.329;HaplotypeScore=208.9992;MLEAC=1;MLEAF=0.500;MQ=55.65;MQ0=0;MQRankSum=1.817;QD=1.18;ReadPosRankSum=-39.162;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:4365,627:4998:99:5888,0,32767 chrX 76939929 . C G 13166.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-24.986;DP=5000;DS;Dels=0.00;FS=225.568;HaplotypeScore=206.9380;MLEAC=1;MLEAF=0.500;MQ=46.86;MQ0=0;MQRankSum=0.395;QD=2.63;ReadPosRankSum=-42.110;SB=-3.987e+03 GT:AD:DP:GQ:PL 0/1:4183,805:4998:99:13166,0,32767 chrX 76939930 . A T 15306.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-25.308;DP=5000;DS;Dels=0.00;FS=138.513;HaplotypeScore=233.2270;MLEAC=1;MLEAF=0.500;MQ=46.70;MQ0=0;MQRankSum=0.037;QD=3.06;ReadPosRankSum=-42.358;SB=-6.286e+03 GT:AD:DP:GQ:PL 0/1:4127,865:4999:99:15306,0,32767 chrX 76940056 . C G 678.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-23.852;DP=5000;DS;Dels=0.00;FS=3.837;HaplotypeScore=157.7561;MLEAC=1;MLEAF=0.500;MQ=53.51;MQ0=0;MQRankSum=1.895;QD=0.14;ReadPosRankSum=-33.461;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:4541,440:5000:99:708,0,32767 chrX 76940057 . A T 1367.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-26.723;DP=5000;DS;Dels=0.00;FS=21.495;HaplotypeScore=205.1271;MLEAC=1;MLEAF=0.500;MQ=53.67;MQ0=0;MQRankSum=2.130;QD=0.27;ReadPosRankSum=-33.511;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:4510,464:4999:99:1397,0,32767

    These are the variants called with dcov at 50,000:

    chrX 76777866 . C G 32767.01 . AC=2;AF=1.00;AN=2;BaseQRankSum=2.164;DP=1340;Dels=0.00;FS=15.774;HaplotypeScore=383.2877;MLEAC=2;MLEAF=1.00;MQ=35.09;MQ0=0;MQRankSum=1.261;QD=24.45;ReadPosRankSum=-4.060;SB=-1.030e+04 GT:AD:DP:GQ:PL 1/1:25,1302:1340:99:32767,2990,0

    Let me know if there is anything else you need.


Sign In or Register to comment.