Why I obtained a g.vcf with wrong variant DPs, and too few variants according to coverage?
I extracted Exome regions from public bam files to apply the same pipeline that I did for my samples and merge them. It is weird that looking in the g.vcf files, I have very few "variants", and those have very few reads (these are high coverage samples). Here there are two examples: As you can see this regions has a coverage of 22-40, How can I have a DP=1??
1 146639579 . C . . END=146639580 GT:DP:GQ:MIN_DP:PL 0/0:38:99:37:0,99,1485
1 146639581 . C . . END=146639581 GT:DP:GQ:MIN_DP:PL 0/0:38:96:38:0,96,1440
1 146639582 . C . . END=146639587 GT:DP:GQ:MIN_DP:PL 0/0:40:99:38:0,99,1485
1 146639588 rs55797044 A G, 0.01 . DB;DP=1;MLEAC=0,0;MLEAF=0.00,0.00;MQ=254.00 GT:AD:DP:GQ:PL:SB 0/0:1,0,0:1:3:0,3,10,3,10,10:0,1,0,0
1 146639589 . T . . END=146639612 GT:DP:GQ:MIN_DP:PL 0/0:40:99:37:0,105,1575
1 146643468 . G . . END=146643471 GT:DP:GQ:MIN_DP:PL 0/0:23:54:22:0,54,810
1 146643472 . G . . END=146643472 GT:DP:GQ:MIN_DP:PL 0/0:22:57:22:0,57,855
1 1263129 . C . . END=1263129 GT:DP:GQ:MIN_DP:PL 0/0:26:59:26:0,59,461
1 1263130 . T . . END=1263143 GT:DP:GQ:MIN_DP:PL 0/0:24:60:22:0,60,900
1 1263144 rs307350 G A, 0.14 . DB;DP=1;MLEAC=1,0;MLEAF=0.500,0.00;MQ=254.00 GT:AD:DP:GQ:PL:SB 1/1:0,1,0:1:3:10,3,0,10,3,10:0,0,1,0
1 1263145 . G . . END=1263163 GT:DP:GQ:MIN_DP:PL 0/0:25:63:23:0,63,945
1 1263164 . A . . END=1263164 GT:DP:GQ:MIN_DP:PL 0/0:22:49:22:0,49,378
When I compare the lines of this g.vcf file with what I obtained with extracting the same regions in individuals from other public bam files (the 1000GP) of (10x), I got:
1000GP bam (10X): Total reads: 13924960, wc raw.g.vcf = 20819861, grep 0/1 raw.g.vcf = 57,362
Public bam (39X): Total reads: 43168049, wc raw.g.vcf = 8136676, grep 0/1 raw.g.vcf = 9
So, why having high coverage bam files, I end up with so few variants?
Could it be the result of some error I had in the IndelRealignment step (GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T IndelRealigner) ?
ERROR MESSAGE: Exception when processing alignment for BAM index HS2000-1017_290:2:2106:4486:29552 2/2 100b unmapped read.
The only way to solve that error (according to ValidateSamFile there was no errors) was to remove those unmapped reads:
samtools view -b -F 4 file.bam > mapped.bam
BaseRecalibrator Round 2
Thank you very much,