The current GATK version is 3.3-0

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

HaplotypeCaller and detection of large indels

Posts: 61Member

Hi,

I am wondering about the detection of large indels with the haplotypecaller. I have an example where to my mind there is quite clearly a large deletion (a couple of kb) in the sample, but it is not called by the HaplotypeCaller.

How do I modify the parameters of HC to detect large indels? activeRegionMaxsize? indelSizeToEliminateInRefModel

Tim.

Tagged:

Hi Tim,

The problem here is that your deletion is too big for HaplotypeCaller. As a rule of thumb, it can call indels up to half the length of the reads. I think at the size you're looking at, this would be considered a structural variant. If your data is whole-genome you may be able to call it with GenomeSTRiP.

Geraldine Van der Auwera, PhD

• Posts: 61Member
edited March 18

Hi Geraldine,

OK about the deletion being too big, but why does the HaplotypeCaller have this kind of dependence on read length. I thought this was a thing of the past

With the UG, I understand the dependency on the read length because we are aligning the reads to the reference, so you need enough read beyond the deletion to favour opening a gap rather than allowing mismatches. And there was always an asymetry between deletions (where you could detect them up to about a half of the read length in size) and insertions (where it was more like 1/4 of the read length because with the insertion you always have a part of the read that is non-reference sequence).

But with HC, I don't quite see how read length comes in to the picture. Longer read lengths make it easier to assemble the haplotype unambiguously, but even with short reads it should be possible to assemble the haplotypes and it is from the haplotypes that one calls the variants, so why/how does read length affect the size of an indel that can be detected?

Any help is much appreciated

Tim

Post edited by TimHughes on

As far as I know (but @ebanks may jump in to correct me) the problem is that the HC needs to see reads that span the entire indel to distinguish between real indels vs. lack of coverage...

Geraldine Van der Auwera, PhD

• Posts: 61Member
edited March 18

Hmmm, so HC cannot exploit a situation like the one that I posted above, where the mapper has not opened a gap in individual reads spanning the deletion due to the deletion size, but has "correctly" soft clipped these reads and has correctly mapped on either side of the deletion any pairs of reads that span the deletion. I had been thinking that the soft clipping and the abnormal insert size would trigger the HC to attempt to assemble haplotypes over the whole region and then compare the long haplotypes to the reference.

Sounds like this makes the HC more sensitive than I thought to the quality of alignments it is fed and the read lengths. I was under the impression that as long as reads were mapped correctly, the haplotype assembly would isolate variant calling from alignment issues (in particular alignment around indels)....?

Post edited by TimHughes on

It's not really an alignment quality issue, it's just that it's missing a crucial piece of information (the presence of spanning reads) to distinguish between a legitimate deletion and lack of coverage in the intervening region. If I had to speculate I would say you could solve this by adding long reads from e.g. PacBio data...

Geraldine Van der Auwera, PhD

• Posts: 61Member

I would have to disagree there: the information is there in that there are lots of reads that contain the deletion but the aligner will not open the gap (bcse of the size of the indel, the mapper prefers mismatches which it soft clips).

No chance of improving sensitivity to long indels with any of the HC parameters, like activeRegionMaxsize? I am desperate here The data above is small target with 300 bp PE reads of long fragments (>600 bp) and most CNV software requires WG and usually also multiple samples.....

Just want to understand more of the inner workings of the HC and, hopefully in the process, increase sensitivity to larger indels

Tim.

Increasing the active region size will work up to a point but this is just too big for what HC can currently do with the reads you have, sorry. I wish I could give you the answer you want but I don't have magical powers (sadly).

Geraldine Van der Auwera, PhD

• Posts: 61Member

I have seen you display magical powers many times on this forum

I might have a go with some simulations to see where the indel size limit lies for the HC and how it correlates with read length.

• Posts: 61Member
edited March 19

Here is some data which is rather crude, but might be useful to others than me.

I generate a truth VCF containing both insertions and deletions:
* deletions (1800 in total): sizes from 1 to 60 (20 HET and 10 HOM at each size)
* Insertions (1800 in total): sizes from 1 to 60 (20 HET and 10 HOM at each size)
* And generate simulated reads (given the truth VCF) with dwgsim 100 bp PE and average coverage 20X

* Map reads with bwa mem
* No refinement
* Variant call with HC version v2.7-4-g6f46d11

For the deletions (first column is count which should be 30 and second column is indel size), seems like HC will detect deletions at least up to size 60 bp with reads of length 100 bp.

grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.qualAnnot.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($4) != 1){print length($4)};END{}' | sort -n | uniq -c
28 2
30 3
30 4
30 5
30 6
30 7
30 8
29 9
30 10
29 11
30 12
30 13
29 14
30 15
30 16
30 17
28 18
30 19
30 20
30 21
30 22
30 23
30 24
30 25
30 26
29 27
29 28
30 29
30 30
30 31
29 32
30 33
30 34
30 35
30 36
29 37
30 38
30 39
30 40
30 41
29 42
29 43
29 44
30 45
30 46
30 47
30 48
30 49
30 50
29 51
30 52
30 53
30 54
19 55
2 56
10 57
30 58
30 59
30 60
30 61


For the insertions (first column is count which should be 30 and second column is indel size), seems like 100 bp reads will do fine up to about 25 bp insertions, after that we stop getting all 30 events of that size and we are getting a number of false positives with sizes over 60 bp.

grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.qualAnnot.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($5) != 1){print length($5)};END{}' | sort -n | uniq -c
34 2
32 3
30 4
32 5
31 6
30 7
31 8
33 9
30 10
32 11
30 12
30 13
30 14
30 15
30 16
30 17
29 18
31 19
32 20
28 21
29 22
28 23
28 24
30 25
33 26
27 27
23 28
26 29
27 30
24 31
28 32
32 33
23 34
23 35
28 36
28 37
21 38
26 39
17 40
21 41
25 42
26 43
22 44
24 45
22 46
16 47
24 48
19 49
16 50
18 51
22 52
18 53
30 54
14 55
23 56
23 57
17 58
19 59
18 60
26 61
3 62
1 63
4 64
1 65
2 66
2 69
2 70
2 71
4 72
1 74
1 77
3 78
5 80
4 82
2 83
1 86
4 88
4 90
1 94
1 95
7 96
1 98
2 102
1 103
2 104
1 105
4 106
1 110
5 112
2 114
2 118
1 120

Post edited by TimHughes on

You flatter me, monsieur

Interesting observations, thanks. For the most part this fits my expectations, although I'm a little surprised by the >60 bp insertion FPs.

If you have a few spare cycles, could you possibly run this through the HC in the very latest version (3.1-1)?

Geraldine Van der Auwera, PhD

• Posts: 122GATK Developer mod

I'm curious, if you set -activeRegionMaxSize to 3000 and run on just an interval around your large deletion, what happens? In principle I think the HaplotypeCaller should be able to call such events when the signal is so clear in the reads but it isn't something we've really tried to do before. The activeRegionMaxSize parameter was put in there for someone who wanted to experiment with this could do so if they were inclined. We've restricted ourselves to the range of +/- ~100 bp events in order to trade off the runtime considerations when the haplotypes get so large.

• Posts: 61Member
edited March 19

I will give your suggestion of increasing the -activeRegionMaxSize to 3000.

On the issue of FP insertions beyond a certain size. My stats above were too crude. I investigated what these are and they are all cases where HC has called two alternative alleles when there is actually only one in the truth VCF: a much lesser short coming than calling large FP insertions, but interesting nevertheless since it is only beyond a certain size and only for insertions (no such issue for deletions at least below 60 bp in 100 bp reads).

grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($4)!=1 || length($5)!=1){size=length($4)-length($5); if(size < -60){print $0}};END{}' 5 96350606 . T TAGATGTTGCAGCGTTGCTGTCGTGGAAAC,TAGATGTTGCAGCGTTGCTGTCGTGGAAACA 471.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.80 GT:AD:DP:GQ:PL 1/2:0,10,7:17:99:742,264,229,384,0,354 5 96362276 . T TACCCCAGGATGGTGCTTAGCGACCTCACG,TACCCCAGGATGGTGCTTAGCGACCTCACGA 436.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=22.96 GT:AD:DP:GQ:PL 1/2:0,12,4:16:99:771,145,111,460,0,437 5 96364082 . A AATAATATCCTAAAAAGTGTTGTGCGCGGC,AATAATATCCTAAAAAGTGTTGTGCGCGGCC 518.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=29;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.87 GT:AD:DP:GQ:PL 1/2:0,16,5:21:99:1169,189,122,567,0,508 5 98208075 . T TAACAATTACCACTCAACTAACGCACGGGTC,TACAATTACCACTCAACTAACGCACGGGTCG 649.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=31;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.94 GT:AD:DP:GQ:PL 1/2:0,15,15:30:99:1192,571,766,634,0,956 5 98217663 . A AATCGTCACTCTCCTTGAAGCGCAATAGTCC,AATCGTCACTCTCCTTGAAGCGCAATAGTCCC 485.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=33;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.70 GT:AD:DP:GQ:PL 1/2:0,14,8:22:99:1308,271,190,484,0,407 5 100231334 . T TAGGTCCCTCGTGGTAGCAGCGACCCCAGATC,TGGTCCCTCGTGGTAGCAGCGACCCCAGATCG 683.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=27.33 GT:AD:DP:GQ:PL 1/2:0,14,11:25:99:1014,459,691,604,0,973 5 101572552 . T TAGTGACAACTTACTTTCGCCTTTAGATTACA,TAGTGACAACTTACTTTCGCCTTTAGATTACAA 220.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.23 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:727,209,177,188,0,155 5 102237057 . T TGTGTCTCGACCGCGGGTCTCCATGTCTTC,TGTGTCTCGACCGCGGGTCTCCATGTCTTCAGA 284.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=11;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.84 GT:AD:DP:GQ:PL 1/2:0,3,6:9:99:452,265,303,132,0,141 5 102249651 . T TGGTGAAGTCTTAAACTCCTGAGTGGCGAG,TGGTGAAGTCTTAAACTCCTGAGTGGCGAGAGA 323.29 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=21.55 GT:AD:DP:GQ:PL 1/2:0,2,8:10:92:606,344,410,96,0,92 5 102262301 . G GACCATTATATTGTCTTACCAATGACCACCAGA,GCTAACCATTATATTGTCTTACCAATGACCACC 511.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.26 GT:AD:DP:GQ:PL 1/2:0,15,13:28:99:1155,519,789,664,0,1264 5 102432213 . A AATTGCGTACAAGGATGGTGGTGACCCAGGATCC,AATTGCGTACAAGGATGGTGGTGACCCAGGATCCG 624.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=26.01 GT:AD:DP:GQ:PL 1/2:0,17,5:22:99:954,162,117,649,0,619 5 102433275 . A AATAAACTGTAGCGCACCTTGTGTCGGAATGGCT,ATAAACTGTAGCGCACCTTGTGTCGGAATGGCTG 366.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=39;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=9.39 GT:AD:DP:GQ:PL 1/2:0,13,25:38:99:1542,978,1116,505,0,538 5 102440223 . C CACCCGCTTCCGTGTCGGGAGGCTTATTTCGGAA,CACCCGCTTCCGTGTCGGGAGGCTTATTTCGGAAA 349.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=15.18 GT:AD:DP:GQ:PL 1/2:0,10,6:16:99:928,197,153,356,0,308 5 102444252 . A AACAAGTGGGTGCAGAGTACCGTTACATGCGTGC,ACAAGTGGGTGCAGAGTACCGTTACATGCGTGCT 524.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.16 GT:AD:DP:GQ:PL 1/2:0,16,10:26:99:1045,385,483,666,0,974 5 102474100 . T TATAAAGGGTTTGGGATAAATCACTGTGGAATGG,TTATAAAGGGTTTGGGATAAATCACTGTGGAATG 408.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.75 GT:AD:DP:GQ:PL 1/2:0,8,15:23:99:907,611,847,309,0,405 5 102611601 . T TAGAGATTTCTCTGTAACACACGAATTCGCGGAGG,TGAGATTTCTCTGTAACACACGAATTCGCGGAGGC 328.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=21.88 GT:AD:DP:GQ:PL 1/2:0,8,6:14:99:561,244,346,346,0,581 5 102890468 . T TAGCTGCGGCTCCGCATACTGGCATACCCTCAGGG,TAGCTGCGGCTCCGCATACTGGCATACCCTCAGGGC 502.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.92 GT:AD:DP:GQ:PL 1/2:0,12,6:18:99:910,228,184,465,0,430 5 102891626 . G GAAATTATCTAGTCAACCGGTTTTGGGACCCGATA,GAATTATCTAGTCAACCGGTTTTGGGACCCGATAT 471.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.12 GT:AD:DP:GQ:PL 1/2:0,16,9:25:99:1020,358,429,662,0,955 5 108382822 . A AACATCTATTATGTCAAAGGGGTATAAGGCGGACGG,AACATCTATTATGTCAAAGGGGTATAAGGCGGACGGC 379.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.96 GT:AD:DP:GQ:PL 1/2:0,9,6:15:99:754,221,182,347,0,309 5 108516434 . T TGTGACCCAGAATCGTCCGGCCCGTGCTCAAGCGGC,TGTGACCCAGAATCGTCCGGCCCGTGCTCAAGCGGCG 402.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=23.66 GT:AD:DP:GQ:PL 1/2:0,9,6:15:99:689,223,195,342,0,319 5 109152942 . A ACAGCTGAGTTCCGTCGCACACGATACTCGTTCT,ACAGCTGAGTTCCGTCGCACACGATACTCGTTCTAGC 486.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.59 GT:AD:DP:GQ:PL 1/2:0,5,9:14:99:721,403,497,195,0,219 5 109155892 . T TCCCCGTTACTGCAACTAGGGCGTGTAAGCGATG,TCCCCGTTACTGCAACTAGGGCGTGTAAGCGATGAGC 721.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.57;MQ0=0;QD=28.85 GT:AD:DP:GQ:PL 1/2:0,4,16:20:99:1024,660,812,174,0,179 5 109181546 . G GACCCCAGGCCCAAAGGGTTGAATGGTTTAAAAT,GACCCCAGGCCCAAAGGGTTGAATGGTTTAAAATAGC 390.34 . AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.52 GT:AD:DP:GQ:PL 1/2:0,2,9:11:89:798,413,508,100,0,89 5 110438015 . A AGACCTAGCACCTGAGCATAATATTCAGAACTATTTAC,AGACCTAGCACCTGAGCATAATATTCAGAACTATTTACT 228.49 . AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.69 GT:AD:DP:GQ:PL 1/2:0,7,3:10:86:704,118,86,279,0,254 5 110439436 . A AACCCCAACTATCACAACGGCTATTGGACTAGAGTGAC,AACCCCAACTATCACAACGGCTATTGGACTAGAGTGACT 329.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.36 GT:AD:DP:GQ:PL 1/2:0,8,5:13:99:695,187,157,308,0,284 5 110440972 . G GAGAAACATGGGGTTCTAGCGTGTTCACCGACGCGTTA,GAGAAACATGGGGTTCTAGCGTGTTCACCGACGCGTTAA 199.22 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.28 GT:AD:DP:GQ:PL 1/2:0,7,4:11:96:590,124,96,241,0,215 5 110445915 . T TAGAACTGGCCTTCAATCCCGTGCGAGGCACGATTGAG,TGAACTGGCCTTCAATCCCGTGCGAGGCACGATTGAGC 244.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.28 GT:AD:DP:GQ:PL 1/2:0,8,7:15:99:615,284,359,346,0,527 5 110446892 . A AAGGGAAGATATACTAAAACGGGATGAGGAATCCTTAG,ATAGGGAAGATATACTAAAACGGGATGAGGAATCCTTAG 298.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.91 GT:AD:DP:GQ:PL 1/2:0,7,5:12:99:756,200,163,281,0,247 5 111066566 . T TATAATGGAGTGCAAACTTAGGTCGTCCCCAGCGCCCGC,TATAATGGAGTGCAAACTTAGGTCGTCCCCAGCGCCCGCC 326.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=48.17;MQ0=0;QD=21.75 GT:AD:DP:GQ:PL 1/2:0,10,4:14:99:593,127,105,350,0,333 5 111071136 . C CATGTGATTTCTACTGGGCTGCTACAGAGTGGTGGGGTA,CTGTGATTTCTACTGGGCTGCTACAGAGTGGTGGGGTAT 466.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.01;MQ0=0;QD=19.42 GT:AD:DP:GQ:PL 1/2:0,17,7:24:99:940,247,370,710,0,1012 5 111500661 . C CAGCTGGCCTGTTGGTTGCGATCGTATCAAATCGCTAAG,CGCTGGCCTGTTGGTTGCGATCGTATCAAATCGCTAAGG 312.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.01 GT:AD:DP:GQ:PL 1/2:0,14,10:24:99:873,323,372,520,0,787 5 112090540 . T TAGCGCATTGGACAGAGGCTCTCCAGTTCTCGAATATGGG,TAGCGCATTGGACAGAGGCTCTCCAGTTCTCGAATATGGGG 279.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.14 GT:AD:DP:GQ:PL 1/2:0,9,7:16:99:826,208,169,260,0,222 5 112111329 . A ATGCAGCGAATTGATAGCCTGCGGGACCTAAACTTGGCGT,ATGCAGCGAATTGATAGCCTGCGGGACCTAAACTTGGCGTT 279.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.45 GT:AD:DP:GQ:PL 1/2:0,8,6:14:99:608,188,157,263,0,234 5 112136991 . A AAGGGCTTGGGGAGGACACGTCTCTTACAATATTTGTGAG,AAGGGCTTGGGGAGGACACGTCTCTTACAATATTTGTGAGG 138.34 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=8.14 GT:AD:DP:GQ:PL 1/2:0,3,5:8:89:581,187,161,116,0,89 5 112151211 . A AAAACAAAAGGATCAGCGTCTATCACGTATGTCCTTCGGG,AAAACAAAAGGATCAGCGTCTATCACGTATGTCCTTCGGGT 487.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.96;MQ0=0;QD=20.30 GT:AD:DP:GQ:PL 1/2:0,12,6:18:99:994,233,185,456,0,414 5 112154642 . C CACCGTGCATTTATCAAAAATTGAAAGTCTAAACCCAAAG,CCCGTGCATTTATCAAAAATTGAAAGTCTAAACCCAAAGG 323.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.05 GT:AD:DP:GQ:PL 1/2:0,14,8:22:99:875,319,434,540,0,731 5 112337055 . A ACATACAAGACGGTGGATAGAATCGTGGCTGAGACGTGAGG,ATCACATACAAGACGGTGGATAGAATCGTGGCTGAGACGTG 520.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.56;MQ0=0;QD=20.01 GT:AD:DP:GQ:PL 1/2:0,8,16:24:99:943,659,1134,341,0,624 5 112337262 . A AACTTACCTGGCAACGAACCTAGGCATCTCGGTTGGTGAGG,ACAGACTTACCTGGCAACGAACCTAGGCATCTCGGTTGGTG 797.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=33;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.83;MQ0=0;QD=24.16 GT:AD:DP:GQ:PL 1/2:0,14,19:33:99:1357,836,1349,614,0,1115 5 112349010 . T TAGCAATAGGTCGCGTCCCGCCAACTCCTAAGGGGACA,TAGCAATAGGTCGCGTCCCGCCAACTCCTAAGGGGACAAGG 496.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=29.19 GT:AD:DP:GQ:PL 1/2:0,6,8:14:99:640,349,438,259,0,308 5 112362999 . T TCGAAGCAATGGGTACACCGAGATCTCGCTCCTACTGAAGG,TTGCCGAAGCAATGGGTACACCGAGATCTCGCTCCTACTGA 674.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=28.09 GT:AD:DP:GQ:PL 1/2:0,17,7:24:99:970,295,651,744,0,1445 5 112379247 . C CTATTATGTCACATCGTCGGCCTAGTCTAATTTGTAAT,CTATTATGTCACATCGTCGGCCTAGTCTAATTTGTAATAGG 444.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.51 GT:AD:DP:GQ:PL 1/2:0,6,6:12:99:953,287,311,269,0,315 5 112869998 . A AAACAATCTACACCTAAGGCTCAGAATTGGTTCTCCTGATTG,AAACAATCTACACCTAAGGCTCAGAATTGGTTCTCCTGATTGC 232.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=14;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.59 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:523,195,175,191,0,169 5 114469551 . C CAGGTAGCTGGTTTAAACAACTATTTTCCAAGTACCCTCATTT,CAGGTAGCTGGTTTAAACAACTATTTTCCAAGTACCCTCATTTA 331.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.43 GT:AD:DP:GQ:PL 1/2:0,9,5:14:99:726,148,121,350,0,322 5 114607231 . A ACAAGAAGCAAGTTCAAAAACATCAGGCTAGTGCGACCGGGCCT,AGCAAGAAGCAAGTTCAAAAACATCAGGCTAGTGCGACCGGGCC 244.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=30;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.64;MQ0=0;QD=8.14 GT:AD:DP:GQ:PL 1/2:0,14,16:30:99:1228,616,684,522,0,555 5 114860030 . T TAAGTCGTTTCATTGTTCACACCTCGCCACTGATTACGCGGATA,TAAGTCGTTTCATTGTTCACACCTCGCCACTGATTACGCGGATAA 385.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.55;MQ0=0;QD=16.05 GT:AD:DP:GQ:PL 1/2:0,8,8:16:99:968,289,245,294,0,252 5 114878649 . T TAACTTATTATCCGGTGCTGGCCCGTGAGAATGTCTCGTTTAGC,TAACTTATTATCCGGTGCTGGCCCGTGAGAATGTCTCGTTTAGCA 449.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=29;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.57;MQ0=0;QD=15.49 GT:AD:DP:GQ:PL 1/2:0,10,7:17:99:1141,260,207,403,0,354 5 115319060 . G GCTTATCGCCCCTTTAAACGCCTTAATGCCCTTTCGTTACCC,GCTTATCGCCCCTTTAAACGCCTTAATGCCCTTTCGTTACCCAGT 574.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.19;MQ0=0;QD=30.22 GT:AD:DP:GQ:PL 1/2:0,6,10:16:99:761,444,551,242,0,274 5 115335439 . C CCTTTGGCTTTGTCTCTTGCTTAATACTCCCGCCAGGAGTAA,CCTTTGGCTTTGTCTCTTGCTTAATACTCCCGCCAGGAGTAAAGT 486.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=30.39 GT:AD:DP:GQ:PL 1/2:0,4,10:14:99:640,434,554,164,0,187 5 115336132 . A ACCCGAGCAGAACGTTAGGATGCCCCTCCAGCATGTAGTAGTAGT,ATGGCCCGAGCAGAACGTTAGGATGCCCCTCCAGCATGTAGTAGT 305.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.07 GT:AD:DP:GQ:PL 1/2:0,10,6:16:99:600,249,498,392,0,709 5 115338540 . G GAGCATATTAACGAGGAAGTCCGCAGTGCACTCGGGCTTTAT,GAGCATATTAACGAGGAAGTCCGCAGTGCACTCGGGCTTTATAGT 428.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.19 GT:AD:DP:GQ:PL 1/2:0,5,7:12:99:704,314,354,226,0,252 5 118280243 . C CTAGGTAGATGAACATTGCAGTCTTTATTTACAGTACCTTTGAACT,CTAGGTAGATGAACATTGCAGTCTTTATTTACAGTACCTTTGAACTA 179.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=11.20 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:659,192,160,163,0,131 5 118556137 . T TTTAGATAAAGGATCATTACGGGCCCGGCTAAAAGAAGGCCGTCCTG,TTTAGATAAAGGATCATTACGGGCCCGGCTAAAAGAAGGCCGTCCTGA 322.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.65 GT:AD:DP:GQ:PL 1/2:0,8,6:14:99:832,207,175,290,0,259 5 118556616 . G GAAAAACTCAGTCGATCGGGTCAGCACTTCCAGCGCCGTGCTGACAC,GAAAACTCAGTCGATCGGGTCAGCACTTCCAGCGCCGTGCTGACACT 369.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.61;MQ0=0;QD=13.19 GT:AD:DP:GQ:PL 1/2:0,13,15:28:99:1105,582,732,489,0,628 5 118560404 . A AAGGGTTTGATTGGGAGATTCATATATCCGGGGATAACGATTTACAC,AGGGTTTGATTGGGAGATTCATATATCCGGGGATAACGATTTACACT 250.27 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=8.94 GT:AD:DP:GQ:PL 1/2:0,15,13:28:99:1139,482,509,575,0,748 5 118866940 . T TAGACCCAACCACTAATTCGAATTTCTACAACCAACCGCAATGTGAGA,TAGACCCAACCACTAATTCGAATTTCTACAACCAACCGCAATGTGAGAA 198.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.65;MQ0=0;QD=15.25 GT:AD:DP:GQ:PL 1/2:0,5,5:10:99:499,161,153,173,0,158 5 121761047 . C CGTGAACAAAGCGTATATCAAATTATCTATGGTGCGTCTGAGTGCGATA,CTCCGTGAACAAAGCGTATATCAAATTATCTATGGTGCGTCTGAGTGCG 648.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.60;MQ0=0;QD=30.87 GT:AD:DP:GQ:PL 1/2:0,14,7:21:99:845,296,548,610,0,1341 5 121767690 . G GTGACTGGACGAACAATGCATGTCCCGATTGCCACCTTTTCCTATT,GTGACTGGACGAACAATGCATGTCCCGATTGCCACCTTTTCCTATTATA 422.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.83 GT:AD:DP:GQ:PL 1/2:0,5,7:12:99:632,311,348,223,0,243 5 121780253 . G GATAAGAGGGCTGGTCGGTAACACTCTGTCCCTTCGTAGTGCATTCATA,GTAAATAAGAGGGCTGGTCGGTAACACTCTGTCCCTTCGTAGTGCATTC 666.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=26.65 GT:AD:DP:GQ:PL 1/2:0,15,9:24:99:956,377,683,648,0,1405 5 122281747 . T TAAGGCGCTAAGAAGGTTCTATCGTCGATATCCTATGAACCGGAACTATA,TAAGGCGCTAAGAAGGTTCTATCGTCGATATCCTATGAACCGGAACTATAC 317.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.79 GT:AD:DP:GQ:PL 1/2:0,8,5:13:99:889,182,145,319,0,284 5 122361481 . T TACATATACGCCCACGAGCAAGCAGGTGCCCCAACGTGTGATACATCAAG,TACATATACGCCCACGAGCAAGCAGGTGCCCCAACGTGTGATACATCAAGG 388.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.88 GT:AD:DP:GQ:PL 1/2:0,8,8:16:99:956,298,250,297,0,250 5 122924133 . A AAGGTGCTCGTTAGGTAGTTCTTCTTAATTATTGTGCGACCCACAACCGGC,AGGTGCTCGTTAGGTAGTTCTTCTTAATTATTGTGCGACCCACAACCGGCT 609.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=36;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.02;MQ0=0;QD=16.92 GT:AD:DP:GQ:PL 1/2:0,17,19:36:99:1432,789,1086,637,0,832 5 125822562 . C CACGTGCGATCGGGATAGCAGGCCTGTTCGAAATAGCTTGGCAGCTAATATA,CACGTGCGATCGGGATAGCAGGCCTGTTCGAAATAGCTTGGCAGCTAATATAT 261.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.32 GT:AD:DP:GQ:PL 1/2:0,6,5:11:99:651,188,164,231,0,209 5 125828598 . T TAGTTGGAAGAATGTGATCAGTGTACAGCATGGGGACCCTAGTGTCGCACTC,TAGTTGGAAGAATGTGATCAGTGTACAGCATGGGGACCCTAGTGTCGCACTCA 458.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.69;MQ0=0;QD=16.97 GT:AD:DP:GQ:PL 1/2:0,7,13:20:99:1046,442,396,226,0,174 5 125880653 . A ATCAACAAACACTGGCCTTACTGTTGTAGGTGCAGTTTATTAAGCGTTCTGC,ATCAACAAACACTGGCCTTACTGTTGTAGGTGCAGTTTATTAAGCGTTCTGCC 276.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.48;MQ0=0;QD=10.23 GT:AD:DP:GQ:PL 1/2:0,5,9:14:99:1087,316,258,189,0,130 5 125885627 . C CGGATTGACCAAACAGCGCGGGCGGCCGTAAGTCGAGGGCGACACCGGGTTG,CTGGATTGACCAAACAGCGCGGGCGGCCGTAAGTCGAGGGCGACACCGGGTTG 633.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.26;MQ0=0;QD=23.45 GT:AD:DP:GQ:PL 1/2:0,5,17:22:99:1058,643,609,183,0,136 5 125885906 . C CTAAGCGGGATCCAGATCCTTATACCTACCTTGATAATGGACGTAAGGCTAG,CTAGCGGGATCCAGATCCTTATACCTACCTTGATAATGGACGTAAGGCTAGT 243.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=9.01 GT:AD:DP:GQ:PL 1/2:0,8,18:26:99:1060,693,806,316,0,406 5 126781117 . C CCCGATCGGGCTGGACGGAAGGTAAAGCAAGTCCAAGCCAAGCAACGAATTGCA,CACCGATCGGGCTGGACGGAAGGTAAAGCAAGTCCAAGCCAAGCAACGAATTGCA 328.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.89;MQ0=0;QD=17.27 GT:AD:DP:GQ:PL 1/2:0,5,8:13:99:671,280,290,182,0,160 5 126790250 . T TAAGCGAAAGTCTCGATGCCGTTTACCGTTCGCCCTAATATCACGTCTGACACA,TAAGCGAAAGTCTCGATGCCGTTTACCGTTCGCCCTAATATCACGTCTGACACAC 140.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=12;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=11.68 GT:AD:DP:GQ:PL 1/2:0,4,4:8:99:493,155,135,137,0,117 5 126791106 . T TATGGGGCCAAAGTGCGGGTGTTCGGAACACCAATTATTCACCTGGCTAACAAC,TATGGGGCCAAAGTGCGGGTGTTCGGAACACCAATTATTCACCTGGCTAACAACC 207.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.19;MQ0=0;QD=7.97 GT:AD:DP:GQ:PL 1/2:0,6,5:11:99:985,174,126,239,0,193 5 127484392 . C CATAGTGATATTCCTTTAATAATAGTCAGGGCGTTAGTTGGATAAGTCTTCCTAT,CATAGTGATATTCCTTTAATAATAGTCAGGGCGTTAGTTGGATAAGTCTTCCTATT 564.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=31;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.94;MQ0=0;QD=18.20 GT:AD:DP:GQ:PL 1/2:0,14,8:22:99:1159,275,224,497,0,452 5 127488417 . T TCCAATAAACGAGGTCCTAAAAATGCCTGCAGTGTTAATGTTCCGGAAGACCGAA,TCCAATAAACGAGGTCCTAAAAATGCCTGCAGTGTTAATGTTCCGGAAGACCGAAG 323.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.47 GT:AD:DP:GQ:PL 1/2:0,7,6:13:99:1007,245,197,284,0,238 5 127503476 . G GACAGAGGTTCATTTCAGAAGCAAACCGGGGAGGCAATGGCCCTAAGGGATAAGA,GACAGAGGTTCATTTCAGAAGCAAACCGGGGAGGCAATGGCCCTAAGGGATAAGAA 315.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.36;MQ0=0;QD=15.01 GT:AD:DP:GQ:PL 1/2:0,6,9:15:99:838,311,275,195,0,152 5 127507387 . G GAATAGAGTGGTTGCGAAATATCTTGCGTTTCCAAATTATCACGTCCTAATCTGC,GAATAGAGTGGTTGCGAAATATCTTGCGTTTCCAAATTATCACGTCCTAATCTGCC 204.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=10.21 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:804,203,162,196,0,154 5 127627167 . A ACCAACCATATGCGAACACCTCTTCTCGATAGTAGGGATTTGGAGAAATGCGCCAT,ACCAACCATATGCGAACACCTCTTCTCGATAGTAGGGATTTGGAGAAATGCGCCATG 169.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.74;MQ0=0;QD=8.90 GT:AD:DP:GQ:PL 1/2:0,5,4:9:99:735,137,120,176,0,161 5 127637048 . A AAGGGTGATGACTAAGGCTAACACTATCCTAGAACCTCGAAAAAGTGGTCCCCGCT,AAGGGTGATGACTAAGGCTAACACTATCCTAGAACCTCGAAAAAGTGGTCCCCGCTT 333.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=14.49 GT:AD:DP:GQ:PL 1/2:0,7,9:16:99:871,265,230,251,0,215 5 127640619 . T TCAAAGATTGGGCAAATGATTCGTGGTGTATATTATCACATTACGACCATCCCCTG,TCAAAGATTGGGCAAATGATTCGTGGTGTATATTATCACATTACGACCATCCCCTGA 284.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.73;MQ0=0;QD=14.96 GT:AD:DP:GQ:PL 1/2:0,8,4:12:99:779,144,112,310,0,284 5 127641176 . C CATTTCTCGTCGTCCTACTTCTCGCTTTGTGCGCACGTGCTCAGTATTAACCATAA,CATTTCTCGTCGTCCTACTTCTCGCTTTGTGCGCACGTGCTCAGTATTAACCATAAT 359.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.81;MQ0=0;QD=16.33 GT:AD:DP:GQ:PL 1/2:0,10,4:14:99:895,152,119,379,0,352 5 127697364 . A AGATCATGGGTACCTTGCACGATGCGTGGGAGTTGGTCAGTTCATGTAATTAGG,AGATCATGGGTACCTTGCACGATGCGTGGGAGTTGGTCAGTTCATGTAATTAGGATG 577.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.10 GT:AD:DP:GQ:PL 1/2:0,7,9:16:99:921,388,456,301,0,355 5 127704860 . G GGCAGAACCATCTCGTTGTCAAGGTTCCATCTGAATTCCACCACTAAGGCTTGC,GGCAGAACCATCTCGTTGTCAAGGTTCCATCTGAATTCCACCACTAAGGCTTGCGAT 783.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=27.97 GT:AD:DP:GQ:PL 1/2:0,10,11:21:99:1118,481,551,414,0,472 5 127712395 . T TATGTAATCATTTACTTTAGTTCAAAACGACGAGCCAGCTAGATCGATTCGGGC,TATGTAATCATTTACTTTAGTTCAAAACGACGAGCCAGCTAGATCGATTCGGGCATG 474.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=22.58 GT:AD:DP:GQ:PL 1/2:0,6,8:14:99:818,355,425,231,0,269 5 128430383 . A ATATTTAGGCGTTGCTACCTCGACGGGCCGCCCTCTCCAATTTTCGGAAGAATGCCGC,ATATTTAGGCGTTGCTACCTCGACGGGCCGCCCTCTCCAATTTTCGGAAGAATGCCGCC 389.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.87;MQ0=0;QD=17.69 GT:AD:DP:GQ:PL 1/2:0,10,6:16:99:859,207,175,355,0,326 5 128448530 . T TTATTCACTAATCCCATTGTTCCTCCCGCAAGTTGCAGCTCAGGCAGAATACCTTGTA,TTATTCACTAATCCCATTGTTCCTCCCGCAAGTTGCAGCTCAGGCAGAATACCTTGTAG 297.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.51 GT:AD:DP:GQ:PL 1/2:0,4,8:12:99:910,318,284,163,0,125 5 129243737 . G GAAGGTCAGTAGATTTTATATCCATAGCGCAAGCTCCGGTTACATAGATTCGACGAACT,GAAGGTCAGTAGATTTTATATCCATAGCGCAAGCTCCGGTTACATAGATTCGACGAACTT 200.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.44;MQ0=0;QD=13.35 GT:AD:DP:GQ:PL 1/2:0,6,4:10:99:584,129,119,202,0,194 5 130825237 . A ATTCAGCAAGCATGTTGGGCGGTTGCATCCAACACTCTTACAGTGTGCCTCATTGTGGCG,ATTCAGCAAGCATGTTGGGCGGTTGCATCCAACACTCTTACAGTGTGCCTCATTGTGGCGT 180.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.02;MQ0=0;QD=8.19 GT:AD:DP:GQ:PL 1/2:0,5,4:9:99:876,170,134,194,0,158 5 130828262 . G GAAGGCAATATCGAGTACGCCCGCGGATCTAGGGTTCTAACACCGTTGAGATGCAGAAAT,GAAGGCAATATCGAGTACGCCCGCGGATCTAGGGTTCTAACACCGTTGAGATGCAGAAATA 265.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.68;MQ0=0;QD=14.73 GT:AD:DP:GQ:PL 1/2:0,5,7:12:99:684,226,210,185,0,167 5 130831245 . C CAGATAGTACACATCGTCACTGCTATCCCATCGTATCGGGGCGAGTCCCCGGCCGGTGAG,CAGATAGTACACATCGTCACTGCTATCCCATCGTATCGGGGCGAGTCCCCGGCCGGTGAGT 245.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.69;MQ0=0;QD=12.90 GT:AD:DP:GQ:PL 1/2:0,8,5:13:99:719,137,115,269,0,242 5 130846006 . C CAACCTCTGGGTCACTCACCGAGAATGGGTCTGAGTCGTGACTGTAATTGGTGCGCTTGT,CAACCTCTGGGTCACTCACCGAGAATGGGTCTGAGTCGTGACTGTAATTGGTGCGCTTGTT 234.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=55.11;MQ0=0;QD=18.01 GT:AD:DP:GQ:PL 1/2:0,6,5:11:99:518,172,154,209,0,192 5 131066577 . A AAGTCGTGGTTATTGCTCACGGTGCCGACCGCGCGCCAGGAGTAGGTGTCCCCCCATG,AAGTCGTGGTTATTGCTCACGGTGCCGACCGCGCGCCAGGAGTAGGTGTCCCCCCATGATT 315.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=56.69;MQ0=0;QD=18.54 GT:AD:DP:GQ:PL 1/2:0,4,7:11:99:691,272,298,156,0,160 5 131296234 . A AGCACCCTTGGCTGTAGGAGCAATGCTCTTTAATCTTACAGCCGACTGAAATAGGGGC,AGCACCCTTGGCTGTAGGAGCAATGCTCTTTAATCTTACAGCCGACTGAAATAGGGGCATT 165.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=56.31;MQ0=0;QD=12.71 GT:AD:DP:GQ:PL 1/2:0,3,3:6:99:535,138,162,139,0,164 5 131298210 . G GGGTTTTGGAATGGAACCGTATCCTCAGCAGACTCTTATTTGCATCCTCCTGATAGTC,GGGTTTTGGAATGGAACCGTATCCTCAGCAGACTCTTATTTGCATCCTCCTGATAGTCATT 398.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.89 GT:AD:DP:GQ:PL 1/2:0,4,8:12:99:603,344,408,166,0,185  Post edited by TimHughes on • Posts: 61Member I took a further look at these cases listed above. My simulated fastq reads are designed to simulate an exome capture and some of the insertions and deletions are placed in the center of a exon whereas others are placed near the edge where one can expect to not have an even number of reads from each strand. It turns out that almost all the cases above, where we have two alternative alleles for insertions, are when the simulated insertion was placed near the edge of the exon. Zoomed out So obviously this is not a very general situation: long insertion placed near edge of exon, but not insignificant for exome capture I suppose. • Posts: 61Member I suppose all this could be driven by an inaccuracy in the details of my simulation.... • Posts: 61Member I tried -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 when restricting HC to just the region with the deletion and it ran for about an hour but then just gave the SNPs but not the big deletion. The report also says Ran local assembly on 2 active regions which I guess is on either side of deletion....? /Users/tim/home/proj_tim_pharmGen/pharmGenRepo/code/vc_singleSample_withHC.bash *.valid.dedup.bam bigDeletion.list INFO 13:12:01,110 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:12:01,125 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51 INFO 13:12:01,125 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 13:12:01,125 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 13:12:01,130 HelpFormatter - Program Args: -T HaplotypeCaller -R /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta --dbsnp /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --downsampling_type BY_SAMPLE --downsample_to_coverage 250 --intervals bigDeletion.list --validation_strictness LENIENT -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 -I Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.bam --out Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.hc.wholeGene.variantSites.vcf -nct 4 INFO 13:12:01,131 HelpFormatter - Date/Time: 2014/03/20 13:12:01 INFO 13:12:01,131 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:12:01,131 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:12:01,178 ArgumentTypeDescriptor - Dynamically determined type of /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf to be VCF INFO 13:12:02,659 GenomeAnalysisEngine - Strictness is LENIENT INFO 13:12:02,828 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 INFO 13:12:02,841 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  13:12:02,948 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO  13:12:03,449 IntervalUtils - Processing 7001 bp from intervals
INFO  13:12:03,482 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine
INFO  13:12:03,817 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO  13:12:03,950 GenomeAnalysisEngine - Done preparing for traversal
INFO  13:12:03,951 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  13:12:03,951 ProgressMeter -        Location processed.active regions  runtime per.1M.active regions completed total.runtime remaining
INFO  13:12:04,296 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO  13:12:34,652 ProgressMeter -     2:234636000        0.00e+00   30.0 s       49.6 w    100.0%        30.0 s     0.0 s
INFO  13:13:05,255 ProgressMeter -     2:234636000        0.00e+00   61.0 s      101.4 w    100.0%        61.0 s     0.0 s
.....................................
.....................................
.....................................
INFO  14:10:06,224 ProgressMeter -     2:234636000        0.00e+00   58.0 m     5757.7 w    100.0%        58.0 m     0.0 s
INFO  14:10:36,228 ProgressMeter -     2:234636000        0.00e+00   58.5 m     5807.3 w    100.0%        58.5 m     0.0 s
INFO  14:10:39,283 HaplotypeCaller - Ran local assembly on 2 active regions
INFO  14:11:06,232 ProgressMeter -     2:234636000        0.00e+00   59.0 m     5856.9 w    100.0%        59.0 m     0.0 s
INFO  14:11:14,831 ProgressMeter -            done        7.00e+03   59.2 m        5.9 d    100.0%        59.2 m     0.0 s
INFO  14:11:14,831 ProgressMeter - Total runtime 3550.88 secs, 59.18 min, 0.99 hours
INFO  14:11:14,832 MicroScheduler - 34 reads were filtered out during the traversal out of approximately 1745 total reads (1.95%)
INFO  14:11:14,832 MicroScheduler -   -> 25 reads (1.43% of total) failing DuplicateReadFilter
INFO  14:11:14,832 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO  14:11:14,832 MicroScheduler -   -> 9 reads (0.52% of total) failing HCMappingQualityFilter
INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter
INFO  14:11:57,686 GATKRunReport - Uploaded run statistics report to AWS S3


Try using -bamOut -bamWriterType ALL_POSSIBLE_HAPLOTYPES to have HC output the haplotypes it's considering.

Geraldine Van der Auwera, PhD

• Posts: 61Member

With the following call (which took almost two hours to run) where the -nct 4 was removed so that BAM file of haplotypes could be produced, the large deletion is detected. It would seem that removing the -nct 4 is what did it. However zygosity seems to be incorrect: from the alignment it seems like the deletion is homozygous, but it is called heterozygous. You will see from the screenshot below with the haplotypes that it seems like a few reads that are mapping into the deleted region are causing this.....?

java -jar /Users/tim/home/PLATFORM/softwareRepo/swRepo_r01/install/gatk/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T HaplotypeCaller -R /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta --dbsnp /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --downsampling_type BY_SAMPLE --downsample_to_coverage 250 --intervals bigDeletion.list --validation_strictness LENIENT -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 -I Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.bam --out Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.hc.wholeGene.variantSites.vcf --bamOutput assembledHaplotypes.bam -bamWriterType ALL_POSSIBLE_HAPLOTYPES


The resulting VCF looks like this:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Hughes-MiSeqExcap-Lib1-907


Hmm, that's unexpected -- multithreading shouldn't have such an impact on the calls. I'd expect to see marginal differences due to downsampling effect, but this is not marginal. I'll ask @rpoplin to comment on this.

Geraldine Van der Auwera, PhD

• Posts: 122GATK Developer mod

Sorry to say that I've gotten a little lost in this thread. Can you please post the output from the command both with and without -nct 4 so that I can help debug it?

Thanks! Quite awesome that you can get such a large event called.

• Posts: 61Member

Yes, awesome that such large events are called.

I am not sure there is anything really to debug here. I think the situation I set up with a limited interval of about 6k bp and a deletion of 3 k bp and an --activeRegionMaxSize 6000 was probably bound to prevent -nct 4 from working properly.

A more interesting observation here is maybe the fact that the genotype is called heteroz when from the reads it seems homoz: it would seem like a few reads mapping into the deletion cause this. I wonder whether this could be a general "shortcoming" of the HC when deletions exceed a certain size. Would increasing pruning value resolve this?