Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

HaplotypeCaller and detection of large indels

TimHughesTimHughes Posts: 60Member

Hi,

I am wondering about the detection of large indels with the haplotypecaller. I have an example where to my mind there is quite clearly a large deletion (a couple of kb) in the sample, but it is not called by the HaplotypeCaller.

How do I modify the parameters of HC to detect large indels? activeRegionMaxsize? indelSizeToEliminateInRefModel

image

Tim.

Screen Shot 2014-03-18 at 12.28.30 .png
1918 x 1030 - 104K
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Hi Tim,

    The problem here is that your deletion is too big for HaplotypeCaller. As a rule of thumb, it can call indels up to half the length of the reads. I think at the size you're looking at, this would be considered a structural variant. If your data is whole-genome you may be able to call it with GenomeSTRiP.

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member
    edited March 18

    Hi Geraldine,

    OK about the deletion being too big, but why does the HaplotypeCaller have this kind of dependence on read length. I thought this was a thing of the past ;)

    With the UG, I understand the dependency on the read length because we are aligning the reads to the reference, so you need enough read beyond the deletion to favour opening a gap rather than allowing mismatches. And there was always an asymetry between deletions (where you could detect them up to about a half of the read length in size) and insertions (where it was more like 1/4 of the read length because with the insertion you always have a part of the read that is non-reference sequence).

    But with HC, I don't quite see how read length comes in to the picture. Longer read lengths make it easier to assemble the haplotype unambiguously, but even with short reads it should be possible to assemble the haplotypes and it is from the haplotypes that one calls the variants, so why/how does read length affect the size of an indel that can be detected?

    Any help is much appreciated :)

    Tim

    Post edited by TimHughes on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    As far as I know (but @ebanks may jump in to correct me) the problem is that the HC needs to see reads that span the entire indel to distinguish between real indels vs. lack of coverage...

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member
    edited March 18

    Hmmm, so HC cannot exploit a situation like the one that I posted above, where the mapper has not opened a gap in individual reads spanning the deletion due to the deletion size, but has "correctly" soft clipped these reads and has correctly mapped on either side of the deletion any pairs of reads that span the deletion. I had been thinking that the soft clipping and the abnormal insert size would trigger the HC to attempt to assemble haplotypes over the whole region and then compare the long haplotypes to the reference.

    Sounds like this makes the HC more sensitive than I thought to the quality of alignments it is fed and the read lengths. I was under the impression that as long as reads were mapped correctly, the haplotype assembly would isolate variant calling from alignment issues (in particular alignment around indels)....?

    Post edited by TimHughes on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    It's not really an alignment quality issue, it's just that it's missing a crucial piece of information (the presence of spanning reads) to distinguish between a legitimate deletion and lack of coverage in the intervening region. If I had to speculate I would say you could solve this by adding long reads from e.g. PacBio data...

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member

    I would have to disagree there: the information is there in that there are lots of reads that contain the deletion but the aligner will not open the gap (bcse of the size of the indel, the mapper prefers mismatches which it soft clips).

    No chance of improving sensitivity to long indels with any of the HC parameters, like activeRegionMaxsize? I am desperate here ;) The data above is small target with 300 bp PE reads of long fragments (>600 bp) and most CNV software requires WG and usually also multiple samples.....

    Just want to understand more of the inner workings of the HC and, hopefully in the process, increase sensitivity to larger indels :)

    Tim.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Increasing the active region size will work up to a point but this is just too big for what HC can currently do with the reads you have, sorry. I wish I could give you the answer you want but I don't have magical powers (sadly).

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member

    I have seen you display magical powers many times on this forum :)

    I might have a go with some simulations to see where the indel size limit lies for the HC and how it correlates with read length.

  • TimHughesTimHughes Posts: 60Member
    edited March 19

    Here is some data which is rather crude, but might be useful to others than me.

    I generate a truth VCF containing both insertions and deletions:

    * deletions (1800 in total): sizes from 1 to 60 (20 HET and 10 HOM at each size)

    * Insertions (1800 in total): sizes from 1 to 60 (20 HET and 10 HOM at each size)

    * And generate simulated reads (given the truth VCF) with dwgsim 100 bp PE and average coverage 20X

    Analyse simulated reads:

    * Map reads with bwa mem

    * No refinement

    * Variant call with HC version v2.7-4-g6f46d11

    For the deletions (first column is count which should be 30 and second column is indel size), seems like HC will detect deletions at least up to size 60 bp with reads of length 100 bp.

    grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.qualAnnot.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($4) != 1){print length($4)};END{}' | sort -n | uniq -c
    28 2
    30 3
    30 4
    30 5
    30 6
    30 7
    30 8
    29 9
    30 10
    29 11
    30 12
    30 13
    29 14
    30 15
    30 16
    30 17
    28 18
    30 19
    30 20
    30 21
    30 22
    30 23
    30 24
    30 25
    30 26
    29 27
    29 28
    30 29
    30 30
    30 31
    29 32
    30 33
    30 34
    30 35
    30 36
    29 37
    30 38
    30 39
    30 40
    30 41
    29 42
    29 43
    29 44
    30 45
    30 46
    30 47
    30 48
    30 49
    30 50
    29 51
    30 52
    30 53
    30 54
    19 55
    2 56
    10 57
    30 58
    30 59
    30 60
    30 61

    For the insertions (first column is count which should be 30 and second column is indel size), seems like 100 bp reads will do fine up to about 25 bp insertions, after that we stop getting all 30 events of that size and we are getting a number of false positives with sizes over 60 bp.

    grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.qualAnnot.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($5) != 1){print length($5)};END{}' | sort -n | uniq -c
    34 2
    32 3
    30 4
    32 5
    31 6
    30 7
    31 8
    33 9
    30 10
    32 11
    30 12
    30 13
    30 14
    30 15
    30 16
    30 17
    29 18
    31 19
    32 20
    28 21
    29 22
    28 23
    28 24
    30 25
    33 26
    27 27
    23 28
    26 29
    27 30
    24 31
    28 32
    32 33
    23 34
    23 35
    28 36
    28 37
    21 38
    26 39
    17 40
    21 41
    25 42
    26 43
    22 44
    24 45
    22 46
    16 47
    24 48
    19 49
    16 50
    18 51
    22 52
    18 53
    30 54
    14 55
    23 56
    23 57
    17 58
    19 59
    18 60
    26 61
    3 62
    1 63
    4 64
    1 65
    2 66
    2 69
    2 70
    2 71
    4 72
    1 74
    1 77
    3 78
    5 80
    4 82
    2 83
    1 86
    4 88
    4 90
    1 94
    1 95
    7 96
    1 98
    2 102
    1 103
    2 104
    1 105
    4 106
    1 110
    5 112
    2 114
    2 118
    1 120
    Post edited by TimHughes on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    You flatter me, monsieur :)

    Interesting observations, thanks. For the most part this fits my expectations, although I'm a little surprised by the >60 bp insertion FPs.

    If you have a few spare cycles, could you possibly run this through the HC in the very latest version (3.1-1)?

    Geraldine Van der Auwera, PhD

  • rpoplinrpoplin Posts: 121GATK Developer mod

    I'm curious, if you set -activeRegionMaxSize to 3000 and run on just an interval around your large deletion, what happens? In principle I think the HaplotypeCaller should be able to call such events when the signal is so clear in the reads but it isn't something we've really tried to do before. The activeRegionMaxSize parameter was put in there for someone who wanted to experiment with this could do so if they were inclined. We've restricted ourselves to the range of +/- ~100 bp events in order to trade off the runtime considerations when the haplotypes get so large.

  • TimHughesTimHughes Posts: 60Member
    edited March 19

    I will give your suggestion of increasing the -activeRegionMaxSize to 3000.

    On the issue of FP insertions beyond a certain size. My stats above were too crude. I investigated what these are and they are all cases where HC has called two alternative alleles when there is actually only one in the truth VCF: a much lesser short coming than calling large FP insertions, but interesting nevertheless since it is only beyond a certain size and only for insertions (no such issue for deletions at least below 60 bp in 100 bp reads).

    grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($4)!=1 || length($5)!=1){size=length($4)-length($5); if(size < -60){print $0}};END{}' 
    5 96350606 . T TAGATGTTGCAGCGTTGCTGTCGTGGAAAC,TAGATGTTGCAGCGTTGCTGTCGTGGAAACA 471.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.80 GT:AD:DP:GQ:PL 1/2:0,10,7:17:99:742,264,229,384,0,354
    5 96362276 . T TACCCCAGGATGGTGCTTAGCGACCTCACG,TACCCCAGGATGGTGCTTAGCGACCTCACGA 436.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=22.96 GT:AD:DP:GQ:PL 1/2:0,12,4:16:99:771,145,111,460,0,437
    5 96364082 . A AATAATATCCTAAAAAGTGTTGTGCGCGGC,AATAATATCCTAAAAAGTGTTGTGCGCGGCC 518.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=29;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.87 GT:AD:DP:GQ:PL 1/2:0,16,5:21:99:1169,189,122,567,0,508
    5 98208075 . T TAACAATTACCACTCAACTAACGCACGGGTC,TACAATTACCACTCAACTAACGCACGGGTCG 649.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=31;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.94 GT:AD:DP:GQ:PL 1/2:0,15,15:30:99:1192,571,766,634,0,956
    5 98217663 . A AATCGTCACTCTCCTTGAAGCGCAATAGTCC,AATCGTCACTCTCCTTGAAGCGCAATAGTCCC 485.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=33;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.70 GT:AD:DP:GQ:PL 1/2:0,14,8:22:99:1308,271,190,484,0,407
    5 100231334 . T TAGGTCCCTCGTGGTAGCAGCGACCCCAGATC,TGGTCCCTCGTGGTAGCAGCGACCCCAGATCG 683.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=27.33 GT:AD:DP:GQ:PL 1/2:0,14,11:25:99:1014,459,691,604,0,973
    5 101572552 . T TAGTGACAACTTACTTTCGCCTTTAGATTACA,TAGTGACAACTTACTTTCGCCTTTAGATTACAA 220.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.23 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:727,209,177,188,0,155
    5 102237057 . T TGTGTCTCGACCGCGGGTCTCCATGTCTTC,TGTGTCTCGACCGCGGGTCTCCATGTCTTCAGA 284.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=11;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.84 GT:AD:DP:GQ:PL 1/2:0,3,6:9:99:452,265,303,132,0,141
    5 102249651 . T TGGTGAAGTCTTAAACTCCTGAGTGGCGAG,TGGTGAAGTCTTAAACTCCTGAGTGGCGAGAGA 323.29 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=21.55 GT:AD:DP:GQ:PL 1/2:0,2,8:10:92:606,344,410,96,0,92
    5 102262301 . G GACCATTATATTGTCTTACCAATGACCACCAGA,GCTAACCATTATATTGTCTTACCAATGACCACC 511.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.26 GT:AD:DP:GQ:PL 1/2:0,15,13:28:99:1155,519,789,664,0,1264
    5 102432213 . A AATTGCGTACAAGGATGGTGGTGACCCAGGATCC,AATTGCGTACAAGGATGGTGGTGACCCAGGATCCG 624.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=26.01 GT:AD:DP:GQ:PL 1/2:0,17,5:22:99:954,162,117,649,0,619
    5 102433275 . A AATAAACTGTAGCGCACCTTGTGTCGGAATGGCT,ATAAACTGTAGCGCACCTTGTGTCGGAATGGCTG 366.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=39;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=9.39 GT:AD:DP:GQ:PL 1/2:0,13,25:38:99:1542,978,1116,505,0,538
    5 102440223 . C CACCCGCTTCCGTGTCGGGAGGCTTATTTCGGAA,CACCCGCTTCCGTGTCGGGAGGCTTATTTCGGAAA 349.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=15.18 GT:AD:DP:GQ:PL 1/2:0,10,6:16:99:928,197,153,356,0,308
    5 102444252 . A AACAAGTGGGTGCAGAGTACCGTTACATGCGTGC,ACAAGTGGGTGCAGAGTACCGTTACATGCGTGCT 524.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.16 GT:AD:DP:GQ:PL 1/2:0,16,10:26:99:1045,385,483,666,0,974
    5 102474100 . T TATAAAGGGTTTGGGATAAATCACTGTGGAATGG,TTATAAAGGGTTTGGGATAAATCACTGTGGAATG 408.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.75 GT:AD:DP:GQ:PL 1/2:0,8,15:23:99:907,611,847,309,0,405
    5 102611601 . T TAGAGATTTCTCTGTAACACACGAATTCGCGGAGG,TGAGATTTCTCTGTAACACACGAATTCGCGGAGGC 328.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=21.88 GT:AD:DP:GQ:PL 1/2:0,8,6:14:99:561,244,346,346,0,581
    5 102890468 . T TAGCTGCGGCTCCGCATACTGGCATACCCTCAGGG,TAGCTGCGGCTCCGCATACTGGCATACCCTCAGGGC 502.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.92 GT:AD:DP:GQ:PL 1/2:0,12,6:18:99:910,228,184,465,0,430
    5 102891626 . G GAAATTATCTAGTCAACCGGTTTTGGGACCCGATA,GAATTATCTAGTCAACCGGTTTTGGGACCCGATAT 471.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.12 GT:AD:DP:GQ:PL 1/2:0,16,9:25:99:1020,358,429,662,0,955
    5 108382822 . A AACATCTATTATGTCAAAGGGGTATAAGGCGGACGG,AACATCTATTATGTCAAAGGGGTATAAGGCGGACGGC 379.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.96 GT:AD:DP:GQ:PL 1/2:0,9,6:15:99:754,221,182,347,0,309
    5 108516434 . T TGTGACCCAGAATCGTCCGGCCCGTGCTCAAGCGGC,TGTGACCCAGAATCGTCCGGCCCGTGCTCAAGCGGCG 402.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=23.66 GT:AD:DP:GQ:PL 1/2:0,9,6:15:99:689,223,195,342,0,319
    5 109152942 . A ACAGCTGAGTTCCGTCGCACACGATACTCGTTCT,ACAGCTGAGTTCCGTCGCACACGATACTCGTTCTAGC 486.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.59 GT:AD:DP:GQ:PL 1/2:0,5,9:14:99:721,403,497,195,0,219
    5 109155892 . T TCCCCGTTACTGCAACTAGGGCGTGTAAGCGATG,TCCCCGTTACTGCAACTAGGGCGTGTAAGCGATGAGC 721.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.57;MQ0=0;QD=28.85 GT:AD:DP:GQ:PL 1/2:0,4,16:20:99:1024,660,812,174,0,179
    5 109181546 . G GACCCCAGGCCCAAAGGGTTGAATGGTTTAAAAT,GACCCCAGGCCCAAAGGGTTGAATGGTTTAAAATAGC 390.34 . AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.52 GT:AD:DP:GQ:PL 1/2:0,2,9:11:89:798,413,508,100,0,89
    5 110438015 . A AGACCTAGCACCTGAGCATAATATTCAGAACTATTTAC,AGACCTAGCACCTGAGCATAATATTCAGAACTATTTACT 228.49 . AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.69 GT:AD:DP:GQ:PL 1/2:0,7,3:10:86:704,118,86,279,0,254
    5 110439436 . A AACCCCAACTATCACAACGGCTATTGGACTAGAGTGAC,AACCCCAACTATCACAACGGCTATTGGACTAGAGTGACT 329.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.36 GT:AD:DP:GQ:PL 1/2:0,8,5:13:99:695,187,157,308,0,284
    5 110440972 . G GAGAAACATGGGGTTCTAGCGTGTTCACCGACGCGTTA,GAGAAACATGGGGTTCTAGCGTGTTCACCGACGCGTTAA 199.22 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.28 GT:AD:DP:GQ:PL 1/2:0,7,4:11:96:590,124,96,241,0,215
    5 110445915 . T TAGAACTGGCCTTCAATCCCGTGCGAGGCACGATTGAG,TGAACTGGCCTTCAATCCCGTGCGAGGCACGATTGAGC 244.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.28 GT:AD:DP:GQ:PL 1/2:0,8,7:15:99:615,284,359,346,0,527
    5 110446892 . A AAGGGAAGATATACTAAAACGGGATGAGGAATCCTTAG,ATAGGGAAGATATACTAAAACGGGATGAGGAATCCTTAG 298.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.91 GT:AD:DP:GQ:PL 1/2:0,7,5:12:99:756,200,163,281,0,247
    5 111066566 . T TATAATGGAGTGCAAACTTAGGTCGTCCCCAGCGCCCGC,TATAATGGAGTGCAAACTTAGGTCGTCCCCAGCGCCCGCC 326.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=48.17;MQ0=0;QD=21.75 GT:AD:DP:GQ:PL 1/2:0,10,4:14:99:593,127,105,350,0,333
    5 111071136 . C CATGTGATTTCTACTGGGCTGCTACAGAGTGGTGGGGTA,CTGTGATTTCTACTGGGCTGCTACAGAGTGGTGGGGTAT 466.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.01;MQ0=0;QD=19.42 GT:AD:DP:GQ:PL 1/2:0,17,7:24:99:940,247,370,710,0,1012
    5 111500661 . C CAGCTGGCCTGTTGGTTGCGATCGTATCAAATCGCTAAG,CGCTGGCCTGTTGGTTGCGATCGTATCAAATCGCTAAGG 312.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.01 GT:AD:DP:GQ:PL 1/2:0,14,10:24:99:873,323,372,520,0,787
    5 112090540 . T TAGCGCATTGGACAGAGGCTCTCCAGTTCTCGAATATGGG,TAGCGCATTGGACAGAGGCTCTCCAGTTCTCGAATATGGGG 279.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.14 GT:AD:DP:GQ:PL 1/2:0,9,7:16:99:826,208,169,260,0,222
    5 112111329 . A ATGCAGCGAATTGATAGCCTGCGGGACCTAAACTTGGCGT,ATGCAGCGAATTGATAGCCTGCGGGACCTAAACTTGGCGTT 279.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.45 GT:AD:DP:GQ:PL 1/2:0,8,6:14:99:608,188,157,263,0,234
    5 112136991 . A AAGGGCTTGGGGAGGACACGTCTCTTACAATATTTGTGAG,AAGGGCTTGGGGAGGACACGTCTCTTACAATATTTGTGAGG 138.34 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=8.14 GT:AD:DP:GQ:PL 1/2:0,3,5:8:89:581,187,161,116,0,89
    5 112151211 . A AAAACAAAAGGATCAGCGTCTATCACGTATGTCCTTCGGG,AAAACAAAAGGATCAGCGTCTATCACGTATGTCCTTCGGGT 487.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.96;MQ0=0;QD=20.30 GT:AD:DP:GQ:PL 1/2:0,12,6:18:99:994,233,185,456,0,414
    5 112154642 . C CACCGTGCATTTATCAAAAATTGAAAGTCTAAACCCAAAG,CCCGTGCATTTATCAAAAATTGAAAGTCTAAACCCAAAGG 323.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.05 GT:AD:DP:GQ:PL 1/2:0,14,8:22:99:875,319,434,540,0,731
    5 112337055 . A ACATACAAGACGGTGGATAGAATCGTGGCTGAGACGTGAGG,ATCACATACAAGACGGTGGATAGAATCGTGGCTGAGACGTG 520.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.56;MQ0=0;QD=20.01 GT:AD:DP:GQ:PL 1/2:0,8,16:24:99:943,659,1134,341,0,624
    5 112337262 . A AACTTACCTGGCAACGAACCTAGGCATCTCGGTTGGTGAGG,ACAGACTTACCTGGCAACGAACCTAGGCATCTCGGTTGGTG 797.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=33;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.83;MQ0=0;QD=24.16 GT:AD:DP:GQ:PL 1/2:0,14,19:33:99:1357,836,1349,614,0,1115
    5 112349010 . T TAGCAATAGGTCGCGTCCCGCCAACTCCTAAGGGGACA,TAGCAATAGGTCGCGTCCCGCCAACTCCTAAGGGGACAAGG 496.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=29.19 GT:AD:DP:GQ:PL 1/2:0,6,8:14:99:640,349,438,259,0,308
    5 112362999 . T TCGAAGCAATGGGTACACCGAGATCTCGCTCCTACTGAAGG,TTGCCGAAGCAATGGGTACACCGAGATCTCGCTCCTACTGA 674.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=28.09 GT:AD:DP:GQ:PL 1/2:0,17,7:24:99:970,295,651,744,0,1445
    5 112379247 . C CTATTATGTCACATCGTCGGCCTAGTCTAATTTGTAAT,CTATTATGTCACATCGTCGGCCTAGTCTAATTTGTAATAGG 444.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.51 GT:AD:DP:GQ:PL 1/2:0,6,6:12:99:953,287,311,269,0,315
    5 112869998 . A AAACAATCTACACCTAAGGCTCAGAATTGGTTCTCCTGATTG,AAACAATCTACACCTAAGGCTCAGAATTGGTTCTCCTGATTGC 232.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=14;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.59 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:523,195,175,191,0,169
    5 114469551 . C CAGGTAGCTGGTTTAAACAACTATTTTCCAAGTACCCTCATTT,CAGGTAGCTGGTTTAAACAACTATTTTCCAAGTACCCTCATTTA 331.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.43 GT:AD:DP:GQ:PL 1/2:0,9,5:14:99:726,148,121,350,0,322
    5 114607231 . A ACAAGAAGCAAGTTCAAAAACATCAGGCTAGTGCGACCGGGCCT,AGCAAGAAGCAAGTTCAAAAACATCAGGCTAGTGCGACCGGGCC 244.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=30;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.64;MQ0=0;QD=8.14 GT:AD:DP:GQ:PL 1/2:0,14,16:30:99:1228,616,684,522,0,555
    5 114860030 . T TAAGTCGTTTCATTGTTCACACCTCGCCACTGATTACGCGGATA,TAAGTCGTTTCATTGTTCACACCTCGCCACTGATTACGCGGATAA 385.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.55;MQ0=0;QD=16.05 GT:AD:DP:GQ:PL 1/2:0,8,8:16:99:968,289,245,294,0,252
    5 114878649 . T TAACTTATTATCCGGTGCTGGCCCGTGAGAATGTCTCGTTTAGC,TAACTTATTATCCGGTGCTGGCCCGTGAGAATGTCTCGTTTAGCA 449.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=29;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.57;MQ0=0;QD=15.49 GT:AD:DP:GQ:PL 1/2:0,10,7:17:99:1141,260,207,403,0,354
    5 115319060 . G GCTTATCGCCCCTTTAAACGCCTTAATGCCCTTTCGTTACCC,GCTTATCGCCCCTTTAAACGCCTTAATGCCCTTTCGTTACCCAGT 574.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.19;MQ0=0;QD=30.22 GT:AD:DP:GQ:PL 1/2:0,6,10:16:99:761,444,551,242,0,274
    5 115335439 . C CCTTTGGCTTTGTCTCTTGCTTAATACTCCCGCCAGGAGTAA,CCTTTGGCTTTGTCTCTTGCTTAATACTCCCGCCAGGAGTAAAGT 486.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=30.39 GT:AD:DP:GQ:PL 1/2:0,4,10:14:99:640,434,554,164,0,187
    5 115336132 . A ACCCGAGCAGAACGTTAGGATGCCCCTCCAGCATGTAGTAGTAGT,ATGGCCCGAGCAGAACGTTAGGATGCCCCTCCAGCATGTAGTAGT 305.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.07 GT:AD:DP:GQ:PL 1/2:0,10,6:16:99:600,249,498,392,0,709
    5 115338540 . G GAGCATATTAACGAGGAAGTCCGCAGTGCACTCGGGCTTTAT,GAGCATATTAACGAGGAAGTCCGCAGTGCACTCGGGCTTTATAGT 428.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.19 GT:AD:DP:GQ:PL 1/2:0,5,7:12:99:704,314,354,226,0,252
    5 118280243 . C CTAGGTAGATGAACATTGCAGTCTTTATTTACAGTACCTTTGAACT,CTAGGTAGATGAACATTGCAGTCTTTATTTACAGTACCTTTGAACTA 179.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=11.20 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:659,192,160,163,0,131
    5 118556137 . T TTTAGATAAAGGATCATTACGGGCCCGGCTAAAAGAAGGCCGTCCTG,TTTAGATAAAGGATCATTACGGGCCCGGCTAAAAGAAGGCCGTCCTGA 322.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.65 GT:AD:DP:GQ:PL 1/2:0,8,6:14:99:832,207,175,290,0,259
    5 118556616 . G GAAAAACTCAGTCGATCGGGTCAGCACTTCCAGCGCCGTGCTGACAC,GAAAACTCAGTCGATCGGGTCAGCACTTCCAGCGCCGTGCTGACACT 369.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.61;MQ0=0;QD=13.19 GT:AD:DP:GQ:PL 1/2:0,13,15:28:99:1105,582,732,489,0,628
    5 118560404 . A AAGGGTTTGATTGGGAGATTCATATATCCGGGGATAACGATTTACAC,AGGGTTTGATTGGGAGATTCATATATCCGGGGATAACGATTTACACT 250.27 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=8.94 GT:AD:DP:GQ:PL 1/2:0,15,13:28:99:1139,482,509,575,0,748
    5 118866940 . T TAGACCCAACCACTAATTCGAATTTCTACAACCAACCGCAATGTGAGA,TAGACCCAACCACTAATTCGAATTTCTACAACCAACCGCAATGTGAGAA 198.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.65;MQ0=0;QD=15.25 GT:AD:DP:GQ:PL 1/2:0,5,5:10:99:499,161,153,173,0,158
    5 121761047 . C CGTGAACAAAGCGTATATCAAATTATCTATGGTGCGTCTGAGTGCGATA,CTCCGTGAACAAAGCGTATATCAAATTATCTATGGTGCGTCTGAGTGCG 648.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.60;MQ0=0;QD=30.87 GT:AD:DP:GQ:PL 1/2:0,14,7:21:99:845,296,548,610,0,1341
    5 121767690 . G GTGACTGGACGAACAATGCATGTCCCGATTGCCACCTTTTCCTATT,GTGACTGGACGAACAATGCATGTCCCGATTGCCACCTTTTCCTATTATA 422.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.83 GT:AD:DP:GQ:PL 1/2:0,5,7:12:99:632,311,348,223,0,243
    5 121780253 . G GATAAGAGGGCTGGTCGGTAACACTCTGTCCCTTCGTAGTGCATTCATA,GTAAATAAGAGGGCTGGTCGGTAACACTCTGTCCCTTCGTAGTGCATTC 666.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=26.65 GT:AD:DP:GQ:PL 1/2:0,15,9:24:99:956,377,683,648,0,1405
    5 122281747 . T TAAGGCGCTAAGAAGGTTCTATCGTCGATATCCTATGAACCGGAACTATA,TAAGGCGCTAAGAAGGTTCTATCGTCGATATCCTATGAACCGGAACTATAC 317.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.79 GT:AD:DP:GQ:PL 1/2:0,8,5:13:99:889,182,145,319,0,284
    5 122361481 . T TACATATACGCCCACGAGCAAGCAGGTGCCCCAACGTGTGATACATCAAG,TACATATACGCCCACGAGCAAGCAGGTGCCCCAACGTGTGATACATCAAGG 388.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.88 GT:AD:DP:GQ:PL 1/2:0,8,8:16:99:956,298,250,297,0,250
    5 122924133 . A AAGGTGCTCGTTAGGTAGTTCTTCTTAATTATTGTGCGACCCACAACCGGC,AGGTGCTCGTTAGGTAGTTCTTCTTAATTATTGTGCGACCCACAACCGGCT 609.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=36;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.02;MQ0=0;QD=16.92 GT:AD:DP:GQ:PL 1/2:0,17,19:36:99:1432,789,1086,637,0,832
    5 125822562 . C CACGTGCGATCGGGATAGCAGGCCTGTTCGAAATAGCTTGGCAGCTAATATA,CACGTGCGATCGGGATAGCAGGCCTGTTCGAAATAGCTTGGCAGCTAATATAT 261.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.32 GT:AD:DP:GQ:PL 1/2:0,6,5:11:99:651,188,164,231,0,209
    5 125828598 . T TAGTTGGAAGAATGTGATCAGTGTACAGCATGGGGACCCTAGTGTCGCACTC,TAGTTGGAAGAATGTGATCAGTGTACAGCATGGGGACCCTAGTGTCGCACTCA 458.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.69;MQ0=0;QD=16.97 GT:AD:DP:GQ:PL 1/2:0,7,13:20:99:1046,442,396,226,0,174
    5 125880653 . A ATCAACAAACACTGGCCTTACTGTTGTAGGTGCAGTTTATTAAGCGTTCTGC,ATCAACAAACACTGGCCTTACTGTTGTAGGTGCAGTTTATTAAGCGTTCTGCC 276.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.48;MQ0=0;QD=10.23 GT:AD:DP:GQ:PL 1/2:0,5,9:14:99:1087,316,258,189,0,130
    5 125885627 . C CGGATTGACCAAACAGCGCGGGCGGCCGTAAGTCGAGGGCGACACCGGGTTG,CTGGATTGACCAAACAGCGCGGGCGGCCGTAAGTCGAGGGCGACACCGGGTTG 633.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.26;MQ0=0;QD=23.45 GT:AD:DP:GQ:PL 1/2:0,5,17:22:99:1058,643,609,183,0,136
    5 125885906 . C CTAAGCGGGATCCAGATCCTTATACCTACCTTGATAATGGACGTAAGGCTAG,CTAGCGGGATCCAGATCCTTATACCTACCTTGATAATGGACGTAAGGCTAGT 243.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=9.01 GT:AD:DP:GQ:PL 1/2:0,8,18:26:99:1060,693,806,316,0,406
    5 126781117 . C CCCGATCGGGCTGGACGGAAGGTAAAGCAAGTCCAAGCCAAGCAACGAATTGCA,CACCGATCGGGCTGGACGGAAGGTAAAGCAAGTCCAAGCCAAGCAACGAATTGCA 328.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.89;MQ0=0;QD=17.27 GT:AD:DP:GQ:PL 1/2:0,5,8:13:99:671,280,290,182,0,160
    5 126790250 . T TAAGCGAAAGTCTCGATGCCGTTTACCGTTCGCCCTAATATCACGTCTGACACA,TAAGCGAAAGTCTCGATGCCGTTTACCGTTCGCCCTAATATCACGTCTGACACAC 140.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=12;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=11.68 GT:AD:DP:GQ:PL 1/2:0,4,4:8:99:493,155,135,137,0,117
    5 126791106 . T TATGGGGCCAAAGTGCGGGTGTTCGGAACACCAATTATTCACCTGGCTAACAAC,TATGGGGCCAAAGTGCGGGTGTTCGGAACACCAATTATTCACCTGGCTAACAACC 207.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.19;MQ0=0;QD=7.97 GT:AD:DP:GQ:PL 1/2:0,6,5:11:99:985,174,126,239,0,193
    5 127484392 . C CATAGTGATATTCCTTTAATAATAGTCAGGGCGTTAGTTGGATAAGTCTTCCTAT,CATAGTGATATTCCTTTAATAATAGTCAGGGCGTTAGTTGGATAAGTCTTCCTATT 564.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=31;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.94;MQ0=0;QD=18.20 GT:AD:DP:GQ:PL 1/2:0,14,8:22:99:1159,275,224,497,0,452
    5 127488417 . T TCCAATAAACGAGGTCCTAAAAATGCCTGCAGTGTTAATGTTCCGGAAGACCGAA,TCCAATAAACGAGGTCCTAAAAATGCCTGCAGTGTTAATGTTCCGGAAGACCGAAG 323.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.47 GT:AD:DP:GQ:PL 1/2:0,7,6:13:99:1007,245,197,284,0,238
    5 127503476 . G GACAGAGGTTCATTTCAGAAGCAAACCGGGGAGGCAATGGCCCTAAGGGATAAGA,GACAGAGGTTCATTTCAGAAGCAAACCGGGGAGGCAATGGCCCTAAGGGATAAGAA 315.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.36;MQ0=0;QD=15.01 GT:AD:DP:GQ:PL 1/2:0,6,9:15:99:838,311,275,195,0,152
    5 127507387 . G GAATAGAGTGGTTGCGAAATATCTTGCGTTTCCAAATTATCACGTCCTAATCTGC,GAATAGAGTGGTTGCGAAATATCTTGCGTTTCCAAATTATCACGTCCTAATCTGCC 204.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=10.21 GT:AD:DP:GQ:PL 1/2:0,5,6:11:99:804,203,162,196,0,154
    5 127627167 . A ACCAACCATATGCGAACACCTCTTCTCGATAGTAGGGATTTGGAGAAATGCGCCAT,ACCAACCATATGCGAACACCTCTTCTCGATAGTAGGGATTTGGAGAAATGCGCCATG 169.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.74;MQ0=0;QD=8.90 GT:AD:DP:GQ:PL 1/2:0,5,4:9:99:735,137,120,176,0,161
    5 127637048 . A AAGGGTGATGACTAAGGCTAACACTATCCTAGAACCTCGAAAAAGTGGTCCCCGCT,AAGGGTGATGACTAAGGCTAACACTATCCTAGAACCTCGAAAAAGTGGTCCCCGCTT 333.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=14.49 GT:AD:DP:GQ:PL 1/2:0,7,9:16:99:871,265,230,251,0,215
    5 127640619 . T TCAAAGATTGGGCAAATGATTCGTGGTGTATATTATCACATTACGACCATCCCCTG,TCAAAGATTGGGCAAATGATTCGTGGTGTATATTATCACATTACGACCATCCCCTGA 284.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.73;MQ0=0;QD=14.96 GT:AD:DP:GQ:PL 1/2:0,8,4:12:99:779,144,112,310,0,284
    5 127641176 . C CATTTCTCGTCGTCCTACTTCTCGCTTTGTGCGCACGTGCTCAGTATTAACCATAA,CATTTCTCGTCGTCCTACTTCTCGCTTTGTGCGCACGTGCTCAGTATTAACCATAAT 359.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.81;MQ0=0;QD=16.33 GT:AD:DP:GQ:PL 1/2:0,10,4:14:99:895,152,119,379,0,352
    5 127697364 . A AGATCATGGGTACCTTGCACGATGCGTGGGAGTTGGTCAGTTCATGTAATTAGG,AGATCATGGGTACCTTGCACGATGCGTGGGAGTTGGTCAGTTCATGTAATTAGGATG 577.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.10 GT:AD:DP:GQ:PL 1/2:0,7,9:16:99:921,388,456,301,0,355
    5 127704860 . G GGCAGAACCATCTCGTTGTCAAGGTTCCATCTGAATTCCACCACTAAGGCTTGC,GGCAGAACCATCTCGTTGTCAAGGTTCCATCTGAATTCCACCACTAAGGCTTGCGAT 783.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=27.97 GT:AD:DP:GQ:PL 1/2:0,10,11:21:99:1118,481,551,414,0,472
    5 127712395 . T TATGTAATCATTTACTTTAGTTCAAAACGACGAGCCAGCTAGATCGATTCGGGC,TATGTAATCATTTACTTTAGTTCAAAACGACGAGCCAGCTAGATCGATTCGGGCATG 474.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=22.58 GT:AD:DP:GQ:PL 1/2:0,6,8:14:99:818,355,425,231,0,269
    5 128430383 . A ATATTTAGGCGTTGCTACCTCGACGGGCCGCCCTCTCCAATTTTCGGAAGAATGCCGC,ATATTTAGGCGTTGCTACCTCGACGGGCCGCCCTCTCCAATTTTCGGAAGAATGCCGCC 389.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.87;MQ0=0;QD=17.69 GT:AD:DP:GQ:PL 1/2:0,10,6:16:99:859,207,175,355,0,326
    5 128448530 . T TTATTCACTAATCCCATTGTTCCTCCCGCAAGTTGCAGCTCAGGCAGAATACCTTGTA,TTATTCACTAATCCCATTGTTCCTCCCGCAAGTTGCAGCTCAGGCAGAATACCTTGTAG 297.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.51 GT:AD:DP:GQ:PL 1/2:0,4,8:12:99:910,318,284,163,0,125
    5 129243737 . G GAAGGTCAGTAGATTTTATATCCATAGCGCAAGCTCCGGTTACATAGATTCGACGAACT,GAAGGTCAGTAGATTTTATATCCATAGCGCAAGCTCCGGTTACATAGATTCGACGAACTT 200.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.44;MQ0=0;QD=13.35 GT:AD:DP:GQ:PL 1/2:0,6,4:10:99:584,129,119,202,0,194
    5 130825237 . A ATTCAGCAAGCATGTTGGGCGGTTGCATCCAACACTCTTACAGTGTGCCTCATTGTGGCG,ATTCAGCAAGCATGTTGGGCGGTTGCATCCAACACTCTTACAGTGTGCCTCATTGTGGCGT 180.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.02;MQ0=0;QD=8.19 GT:AD:DP:GQ:PL 1/2:0,5,4:9:99:876,170,134,194,0,158
    5 130828262 . G GAAGGCAATATCGAGTACGCCCGCGGATCTAGGGTTCTAACACCGTTGAGATGCAGAAAT,GAAGGCAATATCGAGTACGCCCGCGGATCTAGGGTTCTAACACCGTTGAGATGCAGAAATA 265.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.68;MQ0=0;QD=14.73 GT:AD:DP:GQ:PL 1/2:0,5,7:12:99:684,226,210,185,0,167
    5 130831245 . C CAGATAGTACACATCGTCACTGCTATCCCATCGTATCGGGGCGAGTCCCCGGCCGGTGAG,CAGATAGTACACATCGTCACTGCTATCCCATCGTATCGGGGCGAGTCCCCGGCCGGTGAGT 245.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.69;MQ0=0;QD=12.90 GT:AD:DP:GQ:PL 1/2:0,8,5:13:99:719,137,115,269,0,242
    5 130846006 . C CAACCTCTGGGTCACTCACCGAGAATGGGTCTGAGTCGTGACTGTAATTGGTGCGCTTGT,CAACCTCTGGGTCACTCACCGAGAATGGGTCTGAGTCGTGACTGTAATTGGTGCGCTTGTT 234.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=55.11;MQ0=0;QD=18.01 GT:AD:DP:GQ:PL 1/2:0,6,5:11:99:518,172,154,209,0,192
    5 131066577 . A AAGTCGTGGTTATTGCTCACGGTGCCGACCGCGCGCCAGGAGTAGGTGTCCCCCCATG,AAGTCGTGGTTATTGCTCACGGTGCCGACCGCGCGCCAGGAGTAGGTGTCCCCCCATGATT 315.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=56.69;MQ0=0;QD=18.54 GT:AD:DP:GQ:PL 1/2:0,4,7:11:99:691,272,298,156,0,160
    5 131296234 . A AGCACCCTTGGCTGTAGGAGCAATGCTCTTTAATCTTACAGCCGACTGAAATAGGGGC,AGCACCCTTGGCTGTAGGAGCAATGCTCTTTAATCTTACAGCCGACTGAAATAGGGGCATT 165.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=56.31;MQ0=0;QD=12.71 GT:AD:DP:GQ:PL 1/2:0,3,3:6:99:535,138,162,139,0,164
    5 131298210 . G GGGTTTTGGAATGGAACCGTATCCTCAGCAGACTCTTATTTGCATCCTCCTGATAGTC,GGGTTTTGGAATGGAACCGTATCCTCAGCAGACTCTTATTTGCATCCTCCTGATAGTCATT 398.19 . AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.89 GT:AD:DP:GQ:PL 1/2:0,4,8:12:99:603,344,408,166,0,185
    Post edited by TimHughes on
  • TimHughesTimHughes Posts: 60Member

    I took a further look at these cases listed above.

    My simulated fastq reads are designed to simulate an exome capture and some of the insertions and deletions are placed in the center of a exon whereas others are placed near the edge where one can expect to not have an even number of reads from each strand.

    It turns out that almost all the cases above, where we have two alternative alleles for insertions, are when the simulated insertion was placed near the edge of the exon.

    image

    Zoomed out

    image

    So obviously this is not a very general situation: long insertion placed near edge of exon, but not insignificant for exome capture I suppose.

    Screen Shot 2014-03-19 at 16.03.41 .png
    1593 x 293 - 30K
    Screen Shot 2014-03-19 at 16.09.09 .png
    1171 x 339 - 20K
  • TimHughesTimHughes Posts: 60Member

    I suppose all this could be driven by an inaccuracy in the details of my simulation....

  • TimHughesTimHughes Posts: 60Member

    I tried -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 when restricting HC to just the region with the deletion and it ran for about an hour but then just gave the SNPs but not the big deletion.

    The report also says Ran local assembly on 2 active regions which I guess is on either side of deletion....?

    /Users/tim/home/proj_tim_pharmGen/pharmGenRepo/code/vc_singleSample_withHC.bash *.valid.dedup.bam bigDeletion.list 
    INFO 13:12:01,110 HelpFormatter - --------------------------------------------------------------------------------
    INFO 13:12:01,125 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51
    INFO 13:12:01,125 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 13:12:01,125 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 13:12:01,130 HelpFormatter - Program Args: -T HaplotypeCaller -R /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta --dbsnp /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --downsampling_type BY_SAMPLE --downsample_to_coverage 250 --intervals bigDeletion.list --validation_strictness LENIENT -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 -I Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.bam --out Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.hc.wholeGene.variantSites.vcf -nct 4
    INFO 13:12:01,131 HelpFormatter - Date/Time: 2014/03/20 13:12:01
    INFO 13:12:01,131 HelpFormatter - --------------------------------------------------------------------------------
    INFO 13:12:01,131 HelpFormatter - --------------------------------------------------------------------------------
    INFO 13:12:01,178 ArgumentTypeDescriptor - Dynamically determined type of /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf to be VCF
    INFO 13:12:02,659 GenomeAnalysisEngine - Strictness is LENIENT
    INFO 13:12:02,828 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
    INFO 13:12:02,841 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 13:12:02,893 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
    INFO 13:12:02,948 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
    INFO 13:12:03,018 RMDTrackBuilder - Loading Tribble index from disk for file /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf
    INFO 13:12:03,449 IntervalUtils - Processing 7001 bp from intervals
    INFO 13:12:03,482 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine
    INFO 13:12:03,817 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 13:12:03,950 GenomeAnalysisEngine - Done preparing for traversal
    INFO 13:12:03,951 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 13:12:03,951 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining
    INFO 13:12:04,296 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    INFO 13:12:34,652 ProgressMeter - 2:234636000 0.00e+00 30.0 s 49.6 w 100.0% 30.0 s 0.0 s
    INFO 13:13:05,255 ProgressMeter - 2:234636000 0.00e+00 61.0 s 101.4 w 100.0% 61.0 s 0.0 s
    .....................................
    .....................................
    .....................................
    INFO 14:10:06,224 ProgressMeter - 2:234636000 0.00e+00 58.0 m 5757.7 w 100.0% 58.0 m 0.0 s
    INFO 14:10:36,228 ProgressMeter - 2:234636000 0.00e+00 58.5 m 5807.3 w 100.0% 58.5 m 0.0 s
    INFO 14:10:39,283 HaplotypeCaller - Ran local assembly on 2 active regions
    INFO 14:11:06,232 ProgressMeter - 2:234636000 0.00e+00 59.0 m 5856.9 w 100.0% 59.0 m 0.0 s
    INFO 14:11:14,831 ProgressMeter - done 7.00e+03 59.2 m 5.9 d 100.0% 59.2 m 0.0 s
    INFO 14:11:14,831 ProgressMeter - Total runtime 3550.88 secs, 59.18 min, 0.99 hours
    INFO 14:11:14,832 MicroScheduler - 34 reads were filtered out during the traversal out of approximately 1745 total reads (1.95%)
    INFO 14:11:14,832 MicroScheduler - -> 25 reads (1.43% of total) failing DuplicateReadFilter
    INFO 14:11:14,832 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO 14:11:14,832 MicroScheduler - -> 9 reads (0.52% of total) failing HCMappingQualityFilter
    INFO 14:11:14,833 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 14:11:14,833 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
    INFO 14:11:14,833 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
    INFO 14:11:14,833 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
    INFO 14:11:57,686 GATKRunReport - Uploaded run statistics report to AWS S3
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Try using -bamOut -bamWriterType ALL_POSSIBLE_HAPLOTYPES to have HC output the haplotypes it's considering.

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member

    With the following call (which took almost two hours to run) where the -nct 4 was removed so that BAM file of haplotypes could be produced, the large deletion is detected. It would seem that removing the -nct 4 is what did it. However zygosity seems to be incorrect: from the alignment it seems like the deletion is homozygous, but it is called heterozygous. You will see from the screenshot below with the haplotypes that it seems like a few reads that are mapping into the deleted region are causing this.....?

    java -jar /Users/tim/home/PLATFORM/softwareRepo/swRepo_r01/install/gatk/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T HaplotypeCaller -R /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta --dbsnp /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --downsampling_type BY_SAMPLE --downsample_to_coverage 250 --intervals bigDeletion.list --validation_strictness LENIENT -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 -I Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.bam --out Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.hc.wholeGene.variantSites.vcf --bamOutput assembledHaplotypes.bam -bamWriterType ALL_POSSIBLE_HAPLOTYPES

    The resulting VCF looks like this:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Hughes-MiSeqExcap-Lib1-907
    2 234629239 rs12468543 A G 1482.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.547;ClippingRankSum=-0.163;DB;DP=39;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.961;QD=28.99;ReadPosRankSum=0.192 GT:AD:DP:GQ:PL 0/1:16,22:38:99:1511,0,17676
    2 234629585 rs28899186 G T 838.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.322;ClippingRankSum=0.684;DB;DP=8;FS=7.068;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.684;QD=28.49;ReadPosRankSum=0.322 GT:AD:DP:GQ:PL 0/1:5,3:8:99:867,0,17365
    2 234630186 rs10929293 A T 725.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.135;ClippingRankSum=0.477;DB;DP=64;FS=3.351;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.047;QD=11.34;ReadPosRankSum=-0.222 GT:AD:DP:GQ:PL 0/1:32,32:64:99:754,0,18185
    2 234630443 rs7597496 A G 834.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.393;ClippingRankSum=-1.084;DB;DP=62;FS=2.133;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-1.126;QD=13.46;ReadPosRankSum=0.507 GT:AD:DP:GQ:PL 0/1:31,31:62:99:863,0,20065
    2 234630503 rs12475934 G A 1015.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.548;ClippingRankSum=-0.323;DB;DP=56;FS=1.008;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.902;QD=18.14;ReadPosRankSum=0.621 GT:AD:DP:GQ:PL 0/1:24,32:56:99:1044,0,17949
    2 234631477 . TAATAAGAATGTTTCTTTTTTTTTTTTTTGAAGGAAAAAATAAATTTATTGCTCATTAAGTGGAAGTGGATCATCATAAAGGTCTTCATCCTCATTGTCTCCATGCTGAGTGGGCTGAGGAGGAGGAGGAAGGGGAGGGATTGGTCCTGCTGTCTCAGCGTGGCAGAGGCAGAAGAGGATGAGAAGGTGGAAGGGCCAGCGGGAGAGGCAGGCACACTCGGTGTAACTTTATGGAAATATATCATCATTTTTGTTTGACTTTTTTCCTTTCTCATTTCTCTGAAAATGTTTCTGTACAGTACCAATTCTTCTTCCACCATTTGCTTTAGTTTCAGTGCCCAAACCATAGAAGGGTCCATGTGGTAAAAAAAGTCAAAACTGACTTTTTTTTTTTTTTTTGAGTTGGAGTCCTGCTGTCACCCAGGCTGGAGTGCAATGGCACGATGTTGGCTCACTGCAACCTCTGCCTCCCAGGTTCAAGCAATTCTCCTGTCTCAGCCTCACAAGTAGCTAGGACTACAGGCACACGTCACCACACCTGGCTAATTTTTGTACTTTTAGTAGAGATGGGGTTTCACCATACTGGTCAGGCTGGTCTCGAACTCCTGACCTCAGGTGATCCACCCGCCTCAGCCTCCCAAAGTGCTAGGATTCCAGGTGTGAGCCACTGCACCTGGTCAACAATCTTTTTTTTTTTTTTTTTTTTAATTTATTTTTTTATTGATAATTCTTGGGTGTTTCTCACAGAGGGGGATTTGGCAGGGTCATGGGACAATAGTGGAGGGAAGGTCAGCAGATAAACAAGTGAACAAAGGTCTCTGGTTTTCCCAGGCAGAGGACCCTGCGGCCTTCCGCAGTGTTTGTGTCCCTGATTACTTGAGATTAGGGATTGGTGATGACTCTTAACGAGCATGCTGCCTTCAAGCATCTGTTTAACAAAGCACATCTTGCACCGCCCTTAATCCATTTAACCCTGAGTGGACACAGCACATGTTTCAGAGAGCACAGGGTTGGGGGTAAGGTCACAGATCAACAGGATCCCAAGGCAGAGGAATTTTTCTTAGTGCAGAACAAAATGAAAAGTCTCCCATGTCTACTTCTTTCTACACAGACACGGCAACCATCCGATTTCTCAATCTTTTCCCCACCTTTCCCGCCTTTCTATTCCACAAAGCCGCCATTGTCATCCTGGCCCGTTCTCAATGAGCTGTTGGGCACACCTCCCAGACGGGGTGGTGGCCGGGCAGAGGGGCTCCTCACTTCCCAGTAGGGGCGGCCGGGCAGAGGCGCCCCTCACCTCCCGGACGGGGCGGCTGGCCGGGTGGGGGGGCTGACCCCCCCATCTCCCTCCCGGACGGGGTGGCTGGCCGGGCTGAGGGGCTCCTCACTTCCCAGTAGGGGCGGCCGGGCAGAGGCGCCCCTCACCTCCCGGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCACCTCCCTCCCGGACGGGGCGGCTGGCCAGGCGGGGGGCTGACCCCCCCCACCTCCCTCCCGGACGGGGTGGCTGCCGGGCGGAGACGCTCCTCACTTCCCAGATGGGGTGGCTGCCGGGCGGAGAGGCTCCTCACTTCTCAGACAGGGCAGCTGCCGGGCGGAGGGGCTCCTCACTTCTCAGACGGGGCGGCCGGGCAGAGACGCTCCTCACCTCCCAGATGGGGTCTCGCCGGGCAGAGGCGCTCCTCACATCCCAGATGGGGCGGCGGGGCAGAGGCGCTCCCCACATCTCAGACGATGGGCGGCCGGGCAGAGACGCTCCTCACTTCCTAGATGTGATGGCGGCTGGGAAGAGGCGCTCCTCACTTCCTAGATGGGATGGCGGCCGGGTGAAGACGCTCCTCGCTTTCCAGACTGGGCAGCCAGGCAGAGGGGCTCCTCACATCCCAGACGATGGGCGGCCAGGCAGAGACACTCCTCACTTCCCAGACGGGGTGGCGGCCGGGCAGAGGCTGCAATCTCGGCACTTTGGGAGGCCAAGGCAGGCGGCTGGGAGGTGTAGGTTGTAGTGAGCGGAGATCACGCCACTGCACTCCAGCCTGGGCACCATTGAGCACTGAGTGAACGAGACTCCGTCTGCAATCCCGGCACCTCGGGAGGCCGAGGTTGGCGGATCACTCGCGGTTAGGGGCTGGAGACCGGCCCGGCCAAACAGCAAAACCCGGTCTCCACCAAAACCAGTCAGGCGTGGCGGCGCGCGCCTGCAATCGCAGGCACTCGGCAGGCTGAGGCAGGAGAATCAGGCAGGGAGGTTGCAGTGAGCCGAGATGGCAGCAGTACAGTCCAGCTTCGGCTCTGCATGAGAGGGAGACCGTGGGGAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCGCCTGGTCAACAATCTTAAGTCC T 10293.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.694;ClippingRankSum=0.118;DP=89;FS=1.142;MLEAC=1;MLEAF=0.500;MQ=59.32;MQ0=0;MQRankSum=2.258;QD=31.96;ReadPosRankSum=1.235 GT:AD:DP:GQ:PL 0/1:13,75:88:99:10331,0,16052
    2 234634324 rs28899189 A C 1157.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.399;ClippingRankSum=-1.027;DB;DP=81;FS=5.697;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.923;QD=14.29;ReadPosRankSum=1.415 GT:AD:DP:GQ:PL 0/1:43,38:81:99:1186,0,18446
    2 234634639 rs28899191 G C 394.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.054;ClippingRankSum=0.406;DB;DP=41;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.298;QD=9.63;ReadPosRankSum=0.785 GT:AD:DP:GQ:PL 0/1:21,19:40:99:423,0,20483
    2 234634916 rs6711351 A G 342.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.310;ClippingRankSum=-0.152;DB;DP=67;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.033;QD=5.12;ReadPosRankSum=0.429 GT:AD:DP:GQ:PL 0/1:44,23:67:99:371,0,21854
    2 234635241 rs6715325 T C 836.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.471;ClippingRankSum=0.113;DB;DP=65;FS=3.363;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.166;QD=12.87;ReadPosRankSum=-0.020 GT:AD:DP:GQ:PL 0/1:38,27:65:99:865,0,1247
    2 234635367 rs17864697 T C 1205.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.213;ClippingRankSum=1.383;DB;DP=88;FS=0.806;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.547;QD=13.70;ReadPosRankSum=-1.391 GT:AD:DP:GQ:PL 0/1:46,42:88:99:1234,0,1505
    2 234635467 rs4294999 A G 1228.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.223;ClippingRankSum=0.614;DB;DP=84;FS=0.824;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.837;QD=14.63;ReadPosRankSum=0.040 GT:AD:DP:GQ:PL 0/1:44,40:84:99:1257,0,1376

    image

    Screen Shot 2014-03-21 at 09.46.57 .png
    1915 x 890 - 98K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Hmm, that's unexpected -- multithreading shouldn't have such an impact on the calls. I'd expect to see marginal differences due to downsampling effect, but this is not marginal. I'll ask @rpoplin to comment on this.

    Geraldine Van der Auwera, PhD

  • rpoplinrpoplin Posts: 121GATK Developer mod

    Sorry to say that I've gotten a little lost in this thread. Can you please post the output from the command both with and without -nct 4 so that I can help debug it?

    Thanks! Quite awesome that you can get such a large event called.

  • TimHughesTimHughes Posts: 60Member

    Yes, awesome that such large events are called.

    I am not sure there is anything really to debug here. I think the situation I set up with a limited interval of about 6k bp and a deletion of 3 k bp and an --activeRegionMaxSize 6000 was probably bound to prevent -nct 4 from working properly.

    A more interesting observation here is maybe the fact that the genotype is called heteroz when from the reads it seems homoz: it would seem like a few reads mapping into the deletion cause this. I wonder whether this could be a general "shortcoming" of the HC when deletions exceed a certain size. Would increasing pruning value resolve this?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,840Administrator, GATK Developer admin

    Yes, awesome that such large events are called.

    Indeed, sorry for my incorrect answer earlier on! My info was outdated; good thing Ryan jumped in :)

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.