Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

HaplotypeCaller and detection of large indels

TimHughesTimHughes Posts: 60Member

Hi,

I am wondering about the detection of large indels with the haplotypecaller. I have an example where to my mind there is quite clearly a large deletion (a couple of kb) in the sample, but it is not called by the HaplotypeCaller.

How do I modify the parameters of HC to detect large indels? activeRegionMaxsize? indelSizeToEliminateInRefModel

image

Tim.

Screen Shot 2014-03-18 at 12.28.30 .png
1918 x 1030 - 104K
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Hi Tim,

    The problem here is that your deletion is too big for HaplotypeCaller. As a rule of thumb, it can call indels up to half the length of the reads. I think at the size you're looking at, this would be considered a structural variant. If your data is whole-genome you may be able to call it with GenomeSTRiP.

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member
    edited March 18

    Hi Geraldine,

    OK about the deletion being too big, but why does the HaplotypeCaller have this kind of dependence on read length. I thought this was a thing of the past ;)

    With the UG, I understand the dependency on the read length because we are aligning the reads to the reference, so you need enough read beyond the deletion to favour opening a gap rather than allowing mismatches. And there was always an asymetry between deletions (where you could detect them up to about a half of the read length in size) and insertions (where it was more like 1/4 of the read length because with the insertion you always have a part of the read that is non-reference sequence).

    But with HC, I don't quite see how read length comes in to the picture. Longer read lengths make it easier to assemble the haplotype unambiguously, but even with short reads it should be possible to assemble the haplotypes and it is from the haplotypes that one calls the variants, so why/how does read length affect the size of an indel that can be detected?

    Any help is much appreciated :)

    Tim

    Post edited by TimHughes on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    As far as I know (but @ebanks may jump in to correct me) the problem is that the HC needs to see reads that span the entire indel to distinguish between real indels vs. lack of coverage...

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member
    edited March 18

    Hmmm, so HC cannot exploit a situation like the one that I posted above, where the mapper has not opened a gap in individual reads spanning the deletion due to the deletion size, but has "correctly" soft clipped these reads and has correctly mapped on either side of the deletion any pairs of reads that span the deletion. I had been thinking that the soft clipping and the abnormal insert size would trigger the HC to attempt to assemble haplotypes over the whole region and then compare the long haplotypes to the reference.

    Sounds like this makes the HC more sensitive than I thought to the quality of alignments it is fed and the read lengths. I was under the impression that as long as reads were mapped correctly, the haplotype assembly would isolate variant calling from alignment issues (in particular alignment around indels)....?

    Post edited by TimHughes on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    It's not really an alignment quality issue, it's just that it's missing a crucial piece of information (the presence of spanning reads) to distinguish between a legitimate deletion and lack of coverage in the intervening region. If I had to speculate I would say you could solve this by adding long reads from e.g. PacBio data...

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member

    I would have to disagree there: the information is there in that there are lots of reads that contain the deletion but the aligner will not open the gap (bcse of the size of the indel, the mapper prefers mismatches which it soft clips).

    No chance of improving sensitivity to long indels with any of the HC parameters, like activeRegionMaxsize? I am desperate here ;) The data above is small target with 300 bp PE reads of long fragments (>600 bp) and most CNV software requires WG and usually also multiple samples.....

    Just want to understand more of the inner workings of the HC and, hopefully in the process, increase sensitivity to larger indels :)

    Tim.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Increasing the active region size will work up to a point but this is just too big for what HC can currently do with the reads you have, sorry. I wish I could give you the answer you want but I don't have magical powers (sadly).

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member

    I have seen you display magical powers many times on this forum :)

    I might have a go with some simulations to see where the indel size limit lies for the HC and how it correlates with read length.

  • TimHughesTimHughes Posts: 60Member
    edited March 19

    Here is some data which is rather crude, but might be useful to others than me.

    I generate a truth VCF containing both insertions and deletions:
    * deletions (1800 in total): sizes from 1 to 60 (20 HET and 10 HOM at each size)
    * Insertions (1800 in total): sizes from 1 to 60 (20 HET and 10 HOM at each size)
    * And generate simulated reads (given the truth VCF) with dwgsim 100 bp PE and average coverage 20X

    Analyse simulated reads:
    * Map reads with bwa mem
    * No refinement
    * Variant call with HC version v2.7-4-g6f46d11

    For the deletions (first column is count which should be 30 and second column is indel size), seems like HC will detect deletions at least up to size 60 bp with reads of length 100 bp.

    grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.qualAnnot.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($4) != 1){print length($4)};END{}' | sort -n | uniq -c
    28 2
    30 3
    30 4
    30 5
    30 6
    30 7
    30 8
    29 9
    30 10
    29 11
    30 12
    30 13
    29 14
    30 15
    30 16
    30 17
    28 18
    30 19
    30 20
    30 21
    30 22
    30 23
    30 24
    30 25
    30 26
    29 27
    29 28
    30 29
    30 30
    30 31
    29 32
    30 33
    30 34
    30 35
    30 36
    29 37
    30 38
    30 39
    30 40
    30 41
    29 42
    29 43
    29 44
    30 45
    30 46
    30 47
    30 48
    30 49
    30 50
    29 51
    30 52
    30 53
    30 54
    19 55
    2 56
    10 57
    30 58
    30 59
    30 60
    30 61
    

    For the insertions (first column is count which should be 30 and second column is indel size), seems like 100 bp reads will do fine up to about 25 bp insertions, after that we stop getting all 30 events of that size and we are getting a number of false positives with sizes over 60 bp.

    grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.qualAnnot.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($5) != 1){print length($5)};END{}' | sort -n | uniq -c
    34 2
    32 3
    30 4
    32 5
    31 6
    30 7
    31 8
    33 9
    30 10
    32 11
    30 12
    30 13
    30 14
    30 15
    30 16
    30 17
    29 18
    31 19
    32 20
    28 21
    29 22
    28 23
    28 24
    30 25
    33 26
    27 27
    23 28
    26 29
    27 30
    24 31
    28 32
    32 33
    23 34
    23 35
    28 36
    28 37
    21 38
    26 39
    17 40
    21 41
    25 42
    26 43
    22 44
    24 45
    22 46
    16 47
    24 48
    19 49
    16 50
    18 51
    22 52
    18 53
    30 54
    14 55
    23 56
    23 57
    17 58
    19 59
    18 60
    26 61
    3 62
    1 63
    4 64
    1 65
    2 66
    2 69
    2 70
    2 71
    4 72
    1 74
    1 77
    3 78
    5 80
    4 82
    2 83
    1 86
    4 88
    4 90
    1 94
    1 95
    7 96
    1 98
    2 102
    1 103
    2 104
    1 105
    4 106
    1 110
    5 112
    2 114
    2 118
    1 120
    
    Post edited by TimHughes on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    You flatter me, monsieur :)

    Interesting observations, thanks. For the most part this fits my expectations, although I'm a little surprised by the >60 bp insertion FPs.

    If you have a few spare cycles, could you possibly run this through the HC in the very latest version (3.1-1)?

    Geraldine Van der Auwera, PhD

  • rpoplinrpoplin Posts: 122GATK Developer mod

    I'm curious, if you set -activeRegionMaxSize to 3000 and run on just an interval around your large deletion, what happens? In principle I think the HaplotypeCaller should be able to call such events when the signal is so clear in the reads but it isn't something we've really tried to do before. The activeRegionMaxSize parameter was put in there for someone who wanted to experiment with this could do so if they were inclined. We've restricted ourselves to the range of +/- ~100 bp events in order to trade off the runtime considerations when the haplotypes get so large.

  • TimHughesTimHughes Posts: 60Member
    edited March 19

    I will give your suggestion of increasing the -activeRegionMaxSize to 3000.

    On the issue of FP insertions beyond a certain size. My stats above were too crude. I investigated what these are and they are all cases where HC has called two alternative alleles when there is actually only one in the truth VCF: a much lesser short coming than calling large FP insertions, but interesting nevertheless since it is only beyond a certain size and only for insertions (no such issue for deletions at least below 60 bp in 100 bp reads).

    grep -v "#" simul_agilentV1_chr5_140211_simul_simul_none.aln.valid.hc.wholeGene.variantSites.vcf | awk 'BEGIN{OFS="\t"; FS="\t";};(length($4)!=1 || length($5)!=1){size=length($4)-length($5); if(size < -60){print $0}};END{}' 
    5   96350606    .   T   TAGATGTTGCAGCGTTGCTGTCGTGGAAAC,TAGATGTTGCAGCGTTGCTGTCGTGGAAACA  471.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.80   GT:AD:DP:GQ:PL  1/2:0,10,7:17:99:742,264,229,384,0,354
    5   96362276    .   T   TACCCCAGGATGGTGCTTAGCGACCTCACG,TACCCCAGGATGGTGCTTAGCGACCTCACGA  436.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=22.96   GT:AD:DP:GQ:PL  1/2:0,12,4:16:99:771,145,111,460,0,437
    5   96364082    .   A   AATAATATCCTAAAAAGTGTTGTGCGCGGC,AATAATATCCTAAAAAGTGTTGTGCGCGGCC  518.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=29;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.87   GT:AD:DP:GQ:PL  1/2:0,16,5:21:99:1169,189,122,567,0,508
    5   98208075    .   T   TAACAATTACCACTCAACTAACGCACGGGTC,TACAATTACCACTCAACTAACGCACGGGTCG 649.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=31;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.94   GT:AD:DP:GQ:PL  1/2:0,15,15:30:99:1192,571,766,634,0,956
    5   98217663    .   A   AATCGTCACTCTCCTTGAAGCGCAATAGTCC,AATCGTCACTCTCCTTGAAGCGCAATAGTCCC    485.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=33;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.70   GT:AD:DP:GQ:PL  1/2:0,14,8:22:99:1308,271,190,484,0,407
    5   100231334   .   T   TAGGTCCCTCGTGGTAGCAGCGACCCCAGATC,TGGTCCCTCGTGGTAGCAGCGACCCCAGATCG   683.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=27.33   GT:AD:DP:GQ:PL  1/2:0,14,11:25:99:1014,459,691,604,0,973
    5   101572552   .   T   TAGTGACAACTTACTTTCGCCTTTAGATTACA,TAGTGACAACTTACTTTCGCCTTTAGATTACAA  220.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.23   GT:AD:DP:GQ:PL  1/2:0,5,6:11:99:727,209,177,188,0,155
    5   102237057   .   T   TGTGTCTCGACCGCGGGTCTCCATGTCTTC,TGTGTCTCGACCGCGGGTCTCCATGTCTTCAGA    284.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=11;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.84   GT:AD:DP:GQ:PL  1/2:0,3,6:9:99:452,265,303,132,0,141
    5   102249651   .   T   TGGTGAAGTCTTAAACTCCTGAGTGGCGAG,TGGTGAAGTCTTAAACTCCTGAGTGGCGAGAGA    323.29  .   AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=21.55   GT:AD:DP:GQ:PL  1/2:0,2,8:10:92:606,344,410,96,0,92
    5   102262301   .   G   GACCATTATATTGTCTTACCAATGACCACCAGA,GCTAACCATTATATTGTCTTACCAATGACCACC 511.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.26   GT:AD:DP:GQ:PL  1/2:0,15,13:28:99:1155,519,789,664,0,1264
    5   102432213   .   A   AATTGCGTACAAGGATGGTGGTGACCCAGGATCC,AATTGCGTACAAGGATGGTGGTGACCCAGGATCCG  624.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=26.01   GT:AD:DP:GQ:PL  1/2:0,17,5:22:99:954,162,117,649,0,619
    5   102433275   .   A   AATAAACTGTAGCGCACCTTGTGTCGGAATGGCT,ATAAACTGTAGCGCACCTTGTGTCGGAATGGCTG   366.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=39;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=9.39    GT:AD:DP:GQ:PL  1/2:0,13,25:38:99:1542,978,1116,505,0,538
    5   102440223   .   C   CACCCGCTTCCGTGTCGGGAGGCTTATTTCGGAA,CACCCGCTTCCGTGTCGGGAGGCTTATTTCGGAAA  349.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=15.18   GT:AD:DP:GQ:PL  1/2:0,10,6:16:99:928,197,153,356,0,308
    5   102444252   .   A   AACAAGTGGGTGCAGAGTACCGTTACATGCGTGC,ACAAGTGGGTGCAGAGTACCGTTACATGCGTGCT   524.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.16   GT:AD:DP:GQ:PL  1/2:0,16,10:26:99:1045,385,483,666,0,974
    5   102474100   .   T   TATAAAGGGTTTGGGATAAATCACTGTGGAATGG,TTATAAAGGGTTTGGGATAAATCACTGTGGAATG   408.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.75   GT:AD:DP:GQ:PL  1/2:0,8,15:23:99:907,611,847,309,0,405
    5   102611601   .   T   TAGAGATTTCTCTGTAACACACGAATTCGCGGAGG,TGAGATTTCTCTGTAACACACGAATTCGCGGAGGC 328.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=21.88   GT:AD:DP:GQ:PL  1/2:0,8,6:14:99:561,244,346,346,0,581
    5   102890468   .   T   TAGCTGCGGCTCCGCATACTGGCATACCCTCAGGG,TAGCTGCGGCTCCGCATACTGGCATACCCTCAGGGC    502.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=20.92   GT:AD:DP:GQ:PL  1/2:0,12,6:18:99:910,228,184,465,0,430
    5   102891626   .   G   GAAATTATCTAGTCAACCGGTTTTGGGACCCGATA,GAATTATCTAGTCAACCGGTTTTGGGACCCGATAT 471.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.12   GT:AD:DP:GQ:PL  1/2:0,16,9:25:99:1020,358,429,662,0,955
    5   108382822   .   A   AACATCTATTATGTCAAAGGGGTATAAGGCGGACGG,AACATCTATTATGTCAAAGGGGTATAAGGCGGACGGC  379.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.96   GT:AD:DP:GQ:PL  1/2:0,9,6:15:99:754,221,182,347,0,309
    5   108516434   .   T   TGTGACCCAGAATCGTCCGGCCCGTGCTCAAGCGGC,TGTGACCCAGAATCGTCCGGCCCGTGCTCAAGCGGCG  402.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=23.66   GT:AD:DP:GQ:PL  1/2:0,9,6:15:99:689,223,195,342,0,319
    5   109152942   .   A   ACAGCTGAGTTCCGTCGCACACGATACTCGTTCT,ACAGCTGAGTTCCGTCGCACACGATACTCGTTCTAGC    486.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.59   GT:AD:DP:GQ:PL  1/2:0,5,9:14:99:721,403,497,195,0,219
    5   109155892   .   T   TCCCCGTTACTGCAACTAGGGCGTGTAAGCGATG,TCCCCGTTACTGCAACTAGGGCGTGTAAGCGATGAGC    721.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.57;MQ0=0;QD=28.85   GT:AD:DP:GQ:PL  1/2:0,4,16:20:99:1024,660,812,174,0,179
    5   109181546   .   G   GACCCCAGGCCCAAAGGGTTGAATGGTTTAAAAT,GACCCCAGGCCCAAAGGGTTGAATGGTTTAAAATAGC    390.34  .   AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.52   GT:AD:DP:GQ:PL  1/2:0,2,9:11:89:798,413,508,100,0,89
    5   110438015   .   A   AGACCTAGCACCTGAGCATAATATTCAGAACTATTTAC,AGACCTAGCACCTGAGCATAATATTCAGAACTATTTACT  228.49  .   AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.69   GT:AD:DP:GQ:PL  1/2:0,7,3:10:86:704,118,86,279,0,254
    5   110439436   .   A   AACCCCAACTATCACAACGGCTATTGGACTAGAGTGAC,AACCCCAACTATCACAACGGCTATTGGACTAGAGTGACT  329.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.36   GT:AD:DP:GQ:PL  1/2:0,8,5:13:99:695,187,157,308,0,284
    5   110440972   .   G   GAGAAACATGGGGTTCTAGCGTGTTCACCGACGCGTTA,GAGAAACATGGGGTTCTAGCGTGTTCACCGACGCGTTAA  199.22  .   AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.28   GT:AD:DP:GQ:PL  1/2:0,7,4:11:96:590,124,96,241,0,215
    5   110445915   .   T   TAGAACTGGCCTTCAATCCCGTGCGAGGCACGATTGAG,TGAACTGGCCTTCAATCCCGTGCGAGGCACGATTGAGC   244.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.28   GT:AD:DP:GQ:PL  1/2:0,8,7:15:99:615,284,359,346,0,527
    5   110446892   .   A   AAGGGAAGATATACTAAAACGGGATGAGGAATCCTTAG,ATAGGGAAGATATACTAAAACGGGATGAGGAATCCTTAG  298.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.91   GT:AD:DP:GQ:PL  1/2:0,7,5:12:99:756,200,163,281,0,247
    5   111066566   .   T   TATAATGGAGTGCAAACTTAGGTCGTCCCCAGCGCCCGC,TATAATGGAGTGCAAACTTAGGTCGTCCCCAGCGCCCGCC    326.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=48.17;MQ0=0;QD=21.75   GT:AD:DP:GQ:PL  1/2:0,10,4:14:99:593,127,105,350,0,333
    5   111071136   .   C   CATGTGATTTCTACTGGGCTGCTACAGAGTGGTGGGGTA,CTGTGATTTCTACTGGGCTGCTACAGAGTGGTGGGGTAT 466.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.01;MQ0=0;QD=19.42   GT:AD:DP:GQ:PL  1/2:0,17,7:24:99:940,247,370,710,0,1012
    5   111500661   .   C   CAGCTGGCCTGTTGGTTGCGATCGTATCAAATCGCTAAG,CGCTGGCCTGTTGGTTGCGATCGTATCAAATCGCTAAGG 312.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.01   GT:AD:DP:GQ:PL  1/2:0,14,10:24:99:873,323,372,520,0,787
    5   112090540   .   T   TAGCGCATTGGACAGAGGCTCTCCAGTTCTCGAATATGGG,TAGCGCATTGGACAGAGGCTCTCCAGTTCTCGAATATGGGG  279.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=12.14   GT:AD:DP:GQ:PL  1/2:0,9,7:16:99:826,208,169,260,0,222
    5   112111329   .   A   ATGCAGCGAATTGATAGCCTGCGGGACCTAAACTTGGCGT,ATGCAGCGAATTGATAGCCTGCGGGACCTAAACTTGGCGTT  279.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.45   GT:AD:DP:GQ:PL  1/2:0,8,6:14:99:608,188,157,263,0,234
    5   112136991   .   A   AAGGGCTTGGGGAGGACACGTCTCTTACAATATTTGTGAG,AAGGGCTTGGGGAGGACACGTCTCTTACAATATTTGTGAGG  138.34  .   AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=8.14    GT:AD:DP:GQ:PL  1/2:0,3,5:8:89:581,187,161,116,0,89
    5   112151211   .   A   AAAACAAAAGGATCAGCGTCTATCACGTATGTCCTTCGGG,AAAACAAAAGGATCAGCGTCTATCACGTATGTCCTTCGGGT  487.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.96;MQ0=0;QD=20.30   GT:AD:DP:GQ:PL  1/2:0,12,6:18:99:994,233,185,456,0,414
    5   112154642   .   C   CACCGTGCATTTATCAAAAATTGAAAGTCTAAACCCAAAG,CCCGTGCATTTATCAAAAATTGAAAGTCTAAACCCAAAGG   323.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.05   GT:AD:DP:GQ:PL  1/2:0,14,8:22:99:875,319,434,540,0,731
    5   112337055   .   A   ACATACAAGACGGTGGATAGAATCGTGGCTGAGACGTGAGG,ATCACATACAAGACGGTGGATAGAATCGTGGCTGAGACGTG 520.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.56;MQ0=0;QD=20.01   GT:AD:DP:GQ:PL  1/2:0,8,16:24:99:943,659,1134,341,0,624
    5   112337262   .   A   AACTTACCTGGCAACGAACCTAGGCATCTCGGTTGGTGAGG,ACAGACTTACCTGGCAACGAACCTAGGCATCTCGGTTGGTG 797.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=33;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.83;MQ0=0;QD=24.16   GT:AD:DP:GQ:PL  1/2:0,14,19:33:99:1357,836,1349,614,0,1115
    5   112349010   .   T   TAGCAATAGGTCGCGTCCCGCCAACTCCTAAGGGGACA,TAGCAATAGGTCGCGTCCCGCCAACTCCTAAGGGGACAAGG    496.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=29.19   GT:AD:DP:GQ:PL  1/2:0,6,8:14:99:640,349,438,259,0,308
    5   112362999   .   T   TCGAAGCAATGGGTACACCGAGATCTCGCTCCTACTGAAGG,TTGCCGAAGCAATGGGTACACCGAGATCTCGCTCCTACTGA 674.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=28.09   GT:AD:DP:GQ:PL  1/2:0,17,7:24:99:970,295,651,744,0,1445
    5   112379247   .   C   CTATTATGTCACATCGTCGGCCTAGTCTAATTTGTAAT,CTATTATGTCACATCGTCGGCCTAGTCTAATTTGTAATAGG    444.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=18.51   GT:AD:DP:GQ:PL  1/2:0,6,6:12:99:953,287,311,269,0,315
    5   112869998   .   A   AAACAATCTACACCTAAGGCTCAGAATTGGTTCTCCTGATTG,AAACAATCTACACCTAAGGCTCAGAATTGGTTCTCCTGATTGC  232.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=14;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.59   GT:AD:DP:GQ:PL  1/2:0,5,6:11:99:523,195,175,191,0,169
    5   114469551   .   C   CAGGTAGCTGGTTTAAACAACTATTTTCCAAGTACCCTCATTT,CAGGTAGCTGGTTTAAACAACTATTTTCCAAGTACCCTCATTTA    331.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=17.43   GT:AD:DP:GQ:PL  1/2:0,9,5:14:99:726,148,121,350,0,322
    5   114607231   .   A   ACAAGAAGCAAGTTCAAAAACATCAGGCTAGTGCGACCGGGCCT,AGCAAGAAGCAAGTTCAAAAACATCAGGCTAGTGCGACCGGGCC   244.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=30;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.64;MQ0=0;QD=8.14    GT:AD:DP:GQ:PL  1/2:0,14,16:30:99:1228,616,684,522,0,555
    5   114860030   .   T   TAAGTCGTTTCATTGTTCACACCTCGCCACTGATTACGCGGATA,TAAGTCGTTTCATTGTTCACACCTCGCCACTGATTACGCGGATAA  385.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.55;MQ0=0;QD=16.05   GT:AD:DP:GQ:PL  1/2:0,8,8:16:99:968,289,245,294,0,252
    5   114878649   .   T   TAACTTATTATCCGGTGCTGGCCCGTGAGAATGTCTCGTTTAGC,TAACTTATTATCCGGTGCTGGCCCGTGAGAATGTCTCGTTTAGCA  449.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=29;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.57;MQ0=0;QD=15.49   GT:AD:DP:GQ:PL  1/2:0,10,7:17:99:1141,260,207,403,0,354
    5   115319060   .   G   GCTTATCGCCCCTTTAAACGCCTTAATGCCCTTTCGTTACCC,GCTTATCGCCCCTTTAAACGCCTTAATGCCCTTTCGTTACCCAGT    574.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.19;MQ0=0;QD=30.22   GT:AD:DP:GQ:PL  1/2:0,6,10:16:99:761,444,551,242,0,274
    5   115335439   .   C   CCTTTGGCTTTGTCTCTTGCTTAATACTCCCGCCAGGAGTAA,CCTTTGGCTTTGTCTCTTGCTTAATACTCCCGCCAGGAGTAAAGT    486.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=30.39   GT:AD:DP:GQ:PL  1/2:0,4,10:14:99:640,434,554,164,0,187
    5   115336132   .   A   ACCCGAGCAGAACGTTAGGATGCCCCTCCAGCATGTAGTAGTAGT,ATGGCCCGAGCAGAACGTTAGGATGCCCCTCCAGCATGTAGTAGT 305.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=19.07   GT:AD:DP:GQ:PL  1/2:0,10,6:16:99:600,249,498,392,0,709
    5   115338540   .   G   GAGCATATTAACGAGGAAGTCCGCAGTGCACTCGGGCTTTAT,GAGCATATTAACGAGGAAGTCCGCAGTGCACTCGGGCTTTATAGT    428.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.19   GT:AD:DP:GQ:PL  1/2:0,5,7:12:99:704,314,354,226,0,252
    5   118280243   .   C   CTAGGTAGATGAACATTGCAGTCTTTATTTACAGTACCTTTGAACT,CTAGGTAGATGAACATTGCAGTCTTTATTTACAGTACCTTTGAACTA  179.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=11.20   GT:AD:DP:GQ:PL  1/2:0,5,6:11:99:659,192,160,163,0,131
    5   118556137   .   T   TTTAGATAAAGGATCATTACGGGCCCGGCTAAAAGAAGGCCGTCCTG,TTTAGATAAAGGATCATTACGGGCCCGGCTAAAAGAAGGCCGTCCTGA    322.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=14.65   GT:AD:DP:GQ:PL  1/2:0,8,6:14:99:832,207,175,290,0,259
    5   118556616   .   G   GAAAAACTCAGTCGATCGGGTCAGCACTTCCAGCGCCGTGCTGACAC,GAAAACTCAGTCGATCGGGTCAGCACTTCCAGCGCCGTGCTGACACT 369.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.61;MQ0=0;QD=13.19   GT:AD:DP:GQ:PL  1/2:0,13,15:28:99:1105,582,732,489,0,628
    5   118560404   .   A   AAGGGTTTGATTGGGAGATTCATATATCCGGGGATAACGATTTACAC,AGGGTTTGATTGGGAGATTCATATATCCGGGGATAACGATTTACACT 250.27  .   AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=8.94    GT:AD:DP:GQ:PL  1/2:0,15,13:28:99:1139,482,509,575,0,748
    5   118866940   .   T   TAGACCCAACCACTAATTCGAATTTCTACAACCAACCGCAATGTGAGA,TAGACCCAACCACTAATTCGAATTTCTACAACCAACCGCAATGTGAGAA  198.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.65;MQ0=0;QD=15.25   GT:AD:DP:GQ:PL  1/2:0,5,5:10:99:499,161,153,173,0,158
    5   121761047   .   C   CGTGAACAAAGCGTATATCAAATTATCTATGGTGCGTCTGAGTGCGATA,CTCCGTGAACAAAGCGTATATCAAATTATCTATGGTGCGTCTGAGTGCG 648.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.60;MQ0=0;QD=30.87   GT:AD:DP:GQ:PL  1/2:0,14,7:21:99:845,296,548,610,0,1341
    5   121767690   .   G   GTGACTGGACGAACAATGCATGTCCCGATTGCCACCTTTTCCTATT,GTGACTGGACGAACAATGCATGTCCCGATTGCCACCTTTTCCTATTATA    422.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.83   GT:AD:DP:GQ:PL  1/2:0,5,7:12:99:632,311,348,223,0,243
    5   121780253   .   G   GATAAGAGGGCTGGTCGGTAACACTCTGTCCCTTCGTAGTGCATTCATA,GTAAATAAGAGGGCTGGTCGGTAACACTCTGTCCCTTCGTAGTGCATTC 666.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=25;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=26.65   GT:AD:DP:GQ:PL  1/2:0,15,9:24:99:956,377,683,648,0,1405
    5   122281747   .   T   TAAGGCGCTAAGAAGGTTCTATCGTCGATATCCTATGAACCGGAACTATA,TAAGGCGCTAAGAAGGTTCTATCGTCGATATCCTATGAACCGGAACTATAC  317.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.79   GT:AD:DP:GQ:PL  1/2:0,8,5:13:99:889,182,145,319,0,284
    5   122361481   .   T   TACATATACGCCCACGAGCAAGCAGGTGCCCCAACGTGTGATACATCAAG,TACATATACGCCCACGAGCAAGCAGGTGCCCCAACGTGTGATACATCAAGG  388.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.88   GT:AD:DP:GQ:PL  1/2:0,8,8:16:99:956,298,250,297,0,250
    5   122924133   .   A   AAGGTGCTCGTTAGGTAGTTCTTCTTAATTATTGTGCGACCCACAACCGGC,AGGTGCTCGTTAGGTAGTTCTTCTTAATTATTGTGCGACCCACAACCGGCT 609.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=36;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.02;MQ0=0;QD=16.92   GT:AD:DP:GQ:PL  1/2:0,17,19:36:99:1432,789,1086,637,0,832
    5   125822562   .   C   CACGTGCGATCGGGATAGCAGGCCTGTTCGAAATAGCTTGGCAGCTAATATA,CACGTGCGATCGGGATAGCAGGCCTGTTCGAAATAGCTTGGCAGCTAATATAT  261.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=16.32   GT:AD:DP:GQ:PL  1/2:0,6,5:11:99:651,188,164,231,0,209
    5   125828598   .   T   TAGTTGGAAGAATGTGATCAGTGTACAGCATGGGGACCCTAGTGTCGCACTC,TAGTTGGAAGAATGTGATCAGTGTACAGCATGGGGACCCTAGTGTCGCACTCA  458.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.69;MQ0=0;QD=16.97   GT:AD:DP:GQ:PL  1/2:0,7,13:20:99:1046,442,396,226,0,174
    5   125880653   .   A   ATCAACAAACACTGGCCTTACTGTTGTAGGTGCAGTTTATTAAGCGTTCTGC,ATCAACAAACACTGGCCTTACTGTTGTAGGTGCAGTTTATTAAGCGTTCTGCC  276.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.48;MQ0=0;QD=10.23   GT:AD:DP:GQ:PL  1/2:0,5,9:14:99:1087,316,258,189,0,130
    5   125885627   .   C   CGGATTGACCAAACAGCGCGGGCGGCCGTAAGTCGAGGGCGACACCGGGTTG,CTGGATTGACCAAACAGCGCGGGCGGCCGTAAGTCGAGGGCGACACCGGGTTG  633.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.26;MQ0=0;QD=23.45   GT:AD:DP:GQ:PL  1/2:0,5,17:22:99:1058,643,609,183,0,136
    5   125885906   .   C   CTAAGCGGGATCCAGATCCTTATACCTACCTTGATAATGGACGTAAGGCTAG,CTAGCGGGATCCAGATCCTTATACCTACCTTGATAATGGACGTAAGGCTAGT   243.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=27;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=9.01    GT:AD:DP:GQ:PL  1/2:0,8,18:26:99:1060,693,806,316,0,406
    5   126781117   .   C   CCCGATCGGGCTGGACGGAAGGTAAAGCAAGTCCAAGCCAAGCAACGAATTGCA,CACCGATCGGGCTGGACGGAAGGTAAAGCAAGTCCAAGCCAAGCAACGAATTGCA  328.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.89;MQ0=0;QD=17.27   GT:AD:DP:GQ:PL  1/2:0,5,8:13:99:671,280,290,182,0,160
    5   126790250   .   T   TAAGCGAAAGTCTCGATGCCGTTTACCGTTCGCCCTAATATCACGTCTGACACA,TAAGCGAAAGTCTCGATGCCGTTTACCGTTCGCCCTAATATCACGTCTGACACAC  140.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=12;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=11.68   GT:AD:DP:GQ:PL  1/2:0,4,4:8:99:493,155,135,137,0,117
    5   126791106   .   T   TATGGGGCCAAAGTGCGGGTGTTCGGAACACCAATTATTCACCTGGCTAACAAC,TATGGGGCCAAAGTGCGGGTGTTCGGAACACCAATTATTCACCTGGCTAACAACC  207.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=26;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.19;MQ0=0;QD=7.97    GT:AD:DP:GQ:PL  1/2:0,6,5:11:99:985,174,126,239,0,193
    5   127484392   .   C   CATAGTGATATTCCTTTAATAATAGTCAGGGCGTTAGTTGGATAAGTCTTCCTAT,CATAGTGATATTCCTTTAATAATAGTCAGGGCGTTAGTTGGATAAGTCTTCCTATT    564.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=31;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.94;MQ0=0;QD=18.20   GT:AD:DP:GQ:PL  1/2:0,14,8:22:99:1159,275,224,497,0,452
    5   127488417   .   T   TCCAATAAACGAGGTCCTAAAAATGCCTGCAGTGTTAATGTTCCGGAAGACCGAA,TCCAATAAACGAGGTCCTAAAAATGCCTGCAGTGTTAATGTTCCGGAAGACCGAAG    323.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=24;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.47   GT:AD:DP:GQ:PL  1/2:0,7,6:13:99:1007,245,197,284,0,238
    5   127503476   .   G   GACAGAGGTTCATTTCAGAAGCAAACCGGGGAGGCAATGGCCCTAAGGGATAAGA,GACAGAGGTTCATTTCAGAAGCAAACCGGGGAGGCAATGGCCCTAAGGGATAAGAA    315.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.36;MQ0=0;QD=15.01   GT:AD:DP:GQ:PL  1/2:0,6,9:15:99:838,311,275,195,0,152
    5   127507387   .   G   GAATAGAGTGGTTGCGAAATATCTTGCGTTTCCAAATTATCACGTCCTAATCTGC,GAATAGAGTGGTTGCGAAATATCTTGCGTTTCCAAATTATCACGTCCTAATCTGCC    204.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=20;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=10.21   GT:AD:DP:GQ:PL  1/2:0,5,6:11:99:804,203,162,196,0,154
    5   127627167   .   A   ACCAACCATATGCGAACACCTCTTCTCGATAGTAGGGATTTGGAGAAATGCGCCAT,ACCAACCATATGCGAACACCTCTTCTCGATAGTAGGGATTTGGAGAAATGCGCCATG  169.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.74;MQ0=0;QD=8.90    GT:AD:DP:GQ:PL  1/2:0,5,4:9:99:735,137,120,176,0,161
    5   127637048   .   A   AAGGGTGATGACTAAGGCTAACACTATCCTAGAACCTCGAAAAAGTGGTCCCCGCT,AAGGGTGATGACTAAGGCTAACACTATCCTAGAACCTCGAAAAAGTGGTCCCCGCTT  333.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=14.49   GT:AD:DP:GQ:PL  1/2:0,7,9:16:99:871,265,230,251,0,215
    5   127640619   .   T   TCAAAGATTGGGCAAATGATTCGTGGTGTATATTATCACATTACGACCATCCCCTG,TCAAAGATTGGGCAAATGATTCGTGGTGTATATTATCACATTACGACCATCCCCTGA  284.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.73;MQ0=0;QD=14.96   GT:AD:DP:GQ:PL  1/2:0,8,4:12:99:779,144,112,310,0,284
    5   127641176   .   C   CATTTCTCGTCGTCCTACTTCTCGCTTTGTGCGCACGTGCTCAGTATTAACCATAA,CATTTCTCGTCGTCCTACTTCTCGCTTTGTGCGCACGTGCTCAGTATTAACCATAAT  359.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.81;MQ0=0;QD=16.33   GT:AD:DP:GQ:PL  1/2:0,10,4:14:99:895,152,119,379,0,352
    5   127697364   .   A   AGATCATGGGTACCTTGCACGATGCGTGGGAGTTGGTCAGTTCATGTAATTAGG,AGATCATGGGTACCTTGCACGATGCGTGGGAGTTGGTCAGTTCATGTAATTAGGATG    577.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=23;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=25.10   GT:AD:DP:GQ:PL  1/2:0,7,9:16:99:921,388,456,301,0,355
    5   127704860   .   G   GGCAGAACCATCTCGTTGTCAAGGTTCCATCTGAATTCCACCACTAAGGCTTGC,GGCAGAACCATCTCGTTGTCAAGGTTCCATCTGAATTCCACCACTAAGGCTTGCGAT    783.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=28;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=58.95;MQ0=0;QD=27.97   GT:AD:DP:GQ:PL  1/2:0,10,11:21:99:1118,481,551,414,0,472
    5   127712395   .   T   TATGTAATCATTTACTTTAGTTCAAAACGACGAGCCAGCTAGATCGATTCGGGC,TATGTAATCATTTACTTTAGTTCAAAACGACGAGCCAGCTAGATCGATTCGGGCATG    474.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=21;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=22.58   GT:AD:DP:GQ:PL  1/2:0,6,8:14:99:818,355,425,231,0,269
    5   128430383   .   A   ATATTTAGGCGTTGCTACCTCGACGGGCCGCCCTCTCCAATTTTCGGAAGAATGCCGC,ATATTTAGGCGTTGCTACCTCGACGGGCCGCCCTCTCCAATTTTCGGAAGAATGCCGCC  389.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.87;MQ0=0;QD=17.69   GT:AD:DP:GQ:PL  1/2:0,10,6:16:99:859,207,175,355,0,326
    5   128448530   .   T   TTATTCACTAATCCCATTGTTCCTCCCGCAAGTTGCAGCTCAGGCAGAATACCTTGTA,TTATTCACTAATCCCATTGTTCCTCCCGCAAGTTGCAGCTCAGGCAGAATACCTTGTAG  297.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=13.51   GT:AD:DP:GQ:PL  1/2:0,4,8:12:99:910,318,284,163,0,125
    5   129243737   .   G   GAAGGTCAGTAGATTTTATATCCATAGCGCAAGCTCCGGTTACATAGATTCGACGAACT,GAAGGTCAGTAGATTTTATATCCATAGCGCAAGCTCCGGTTACATAGATTCGACGAACTT    200.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=15;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.44;MQ0=0;QD=13.35   GT:AD:DP:GQ:PL  1/2:0,6,4:10:99:584,129,119,202,0,194
    5   130825237   .   A   ATTCAGCAAGCATGTTGGGCGGTTGCATCCAACACTCTTACAGTGTGCCTCATTGTGGCG,ATTCAGCAAGCATGTTGGGCGGTTGCATCCAACACTCTTACAGTGTGCCTCATTGTGGCGT  180.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=22;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.02;MQ0=0;QD=8.19    GT:AD:DP:GQ:PL  1/2:0,5,4:9:99:876,170,134,194,0,158
    5   130828262   .   G   GAAGGCAATATCGAGTACGCCCGCGGATCTAGGGTTCTAACACCGTTGAGATGCAGAAAT,GAAGGCAATATCGAGTACGCCCGCGGATCTAGGGTTCTAACACCGTTGAGATGCAGAAATA  265.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=18;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=59.68;MQ0=0;QD=14.73   GT:AD:DP:GQ:PL  1/2:0,5,7:12:99:684,226,210,185,0,167
    5   130831245   .   C   CAGATAGTACACATCGTCACTGCTATCCCATCGTATCGGGGCGAGTCCCCGGCCGGTGAG,CAGATAGTACACATCGTCACTGCTATCCCATCGTATCGGGGCGAGTCCCCGGCCGGTGAGT  245.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=19;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=57.69;MQ0=0;QD=12.90   GT:AD:DP:GQ:PL  1/2:0,8,5:13:99:719,137,115,269,0,242
    5   130846006   .   C   CAACCTCTGGGTCACTCACCGAGAATGGGTCTGAGTCGTGACTGTAATTGGTGCGCTTGT,CAACCTCTGGGTCACTCACCGAGAATGGGTCTGAGTCGTGACTGTAATTGGTGCGCTTGTT  234.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=55.11;MQ0=0;QD=18.01   GT:AD:DP:GQ:PL  1/2:0,6,5:11:99:518,172,154,209,0,192
    5   131066577   .   A   AAGTCGTGGTTATTGCTCACGGTGCCGACCGCGCGCCAGGAGTAGGTGTCCCCCCATG,AAGTCGTGGTTATTGCTCACGGTGCCGACCGCGCGCCAGGAGTAGGTGTCCCCCCATGATT    315.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=17;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=56.69;MQ0=0;QD=18.54   GT:AD:DP:GQ:PL  1/2:0,4,7:11:99:691,272,298,156,0,160
    5   131296234   .   A   AGCACCCTTGGCTGTAGGAGCAATGCTCTTTAATCTTACAGCCGACTGAAATAGGGGC,AGCACCCTTGGCTGTAGGAGCAATGCTCTTTAATCTTACAGCCGACTGAAATAGGGGCATT    165.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=13;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=56.31;MQ0=0;QD=12.71   GT:AD:DP:GQ:PL  1/2:0,3,3:6:99:535,138,162,139,0,164
    5   131298210   .   G   GGGTTTTGGAATGGAACCGTATCCTCAGCAGACTCTTATTTGCATCCTCCTGATAGTC,GGGTTTTGGAATGGAACCGTATCCTCAGCAGACTCTTATTTGCATCCTCCTGATAGTCATT    398.19  .   AC=1,1;AF=0.500,0.500;AN=2;DP=16;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;MQ0=0;QD=24.89   GT:AD:DP:GQ:PL  1/2:0,4,8:12:99:603,344,408,166,0,185
    
    Post edited by TimHughes on
  • TimHughesTimHughes Posts: 60Member

    I took a further look at these cases listed above.

    My simulated fastq reads are designed to simulate an exome capture and some of the insertions and deletions are placed in the center of a exon whereas others are placed near the edge where one can expect to not have an even number of reads from each strand.

    It turns out that almost all the cases above, where we have two alternative alleles for insertions, are when the simulated insertion was placed near the edge of the exon.

    image

    Zoomed out

    image

    So obviously this is not a very general situation: long insertion placed near edge of exon, but not insignificant for exome capture I suppose.

    Screen Shot 2014-03-19 at 16.03.41 .png
    1593 x 293 - 30K
    Screen Shot 2014-03-19 at 16.09.09 .png
    1171 x 339 - 20K
  • TimHughesTimHughes Posts: 60Member

    I suppose all this could be driven by an inaccuracy in the details of my simulation....

  • TimHughesTimHughes Posts: 60Member

    I tried -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 when restricting HC to just the region with the deletion and it ran for about an hour but then just gave the SNPs but not the big deletion.

    The report also says Ran local assembly on 2 active regions which I guess is on either side of deletion....?

    /Users/tim/home/proj_tim_pharmGen/pharmGenRepo/code/vc_singleSample_withHC.bash *.valid.dedup.bam bigDeletion.list 
    INFO  13:12:01,110 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  13:12:01,125 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:27:51 
    INFO  13:12:01,125 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  13:12:01,125 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  13:12:01,130 HelpFormatter - Program Args: -T HaplotypeCaller -R /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta --dbsnp /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --downsampling_type BY_SAMPLE --downsample_to_coverage 250 --intervals bigDeletion.list --validation_strictness LENIENT -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 -I Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.bam --out Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.hc.wholeGene.variantSites.vcf -nct 4 
    INFO  13:12:01,131 HelpFormatter - Date/Time: 2014/03/20 13:12:01 
    INFO  13:12:01,131 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  13:12:01,131 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  13:12:01,178 ArgumentTypeDescriptor - Dynamically determined type of /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf to be VCF 
    INFO  13:12:02,659 GenomeAnalysisEngine - Strictness is LENIENT 
    INFO  13:12:02,828 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
    INFO  13:12:02,841 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  13:12:02,893 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03 
    INFO  13:12:02,948 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 
    INFO  13:12:03,018 RMDTrackBuilder - Loading Tribble index from disk for file /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf 
    INFO  13:12:03,449 IntervalUtils - Processing 7001 bp from intervals 
    INFO  13:12:03,482 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine 
    INFO  13:12:03,817 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
    INFO  13:12:03,950 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  13:12:03,951 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  13:12:03,951 ProgressMeter -        Location processed.active regions  runtime per.1M.active regions completed total.runtime remaining 
    INFO  13:12:04,296 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units 
    INFO  13:12:34,652 ProgressMeter -     2:234636000        0.00e+00   30.0 s       49.6 w    100.0%        30.0 s     0.0 s 
    INFO  13:13:05,255 ProgressMeter -     2:234636000        0.00e+00   61.0 s      101.4 w    100.0%        61.0 s     0.0 s 
    .....................................
    .....................................
    .....................................
    INFO  14:10:06,224 ProgressMeter -     2:234636000        0.00e+00   58.0 m     5757.7 w    100.0%        58.0 m     0.0 s 
    INFO  14:10:36,228 ProgressMeter -     2:234636000        0.00e+00   58.5 m     5807.3 w    100.0%        58.5 m     0.0 s 
    INFO  14:10:39,283 HaplotypeCaller - Ran local assembly on 2 active regions 
    INFO  14:11:06,232 ProgressMeter -     2:234636000        0.00e+00   59.0 m     5856.9 w    100.0%        59.0 m     0.0 s 
    INFO  14:11:14,831 ProgressMeter -            done        7.00e+03   59.2 m        5.9 d    100.0%        59.2 m     0.0 s 
    INFO  14:11:14,831 ProgressMeter - Total runtime 3550.88 secs, 59.18 min, 0.99 hours 
    INFO  14:11:14,832 MicroScheduler - 34 reads were filtered out during the traversal out of approximately 1745 total reads (1.95%) 
    INFO  14:11:14,832 MicroScheduler -   -> 25 reads (1.43% of total) failing DuplicateReadFilter 
    INFO  14:11:14,832 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
    INFO  14:11:14,832 MicroScheduler -   -> 9 reads (0.52% of total) failing HCMappingQualityFilter 
    INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter 
    INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
    INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter 
    INFO  14:11:14,833 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter 
    INFO  14:11:57,686 GATKRunReport - Uploaded run statistics report to AWS S3 
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Try using -bamOut -bamWriterType ALL_POSSIBLE_HAPLOTYPES to have HC output the haplotypes it's considering.

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 60Member

    With the following call (which took almost two hours to run) where the -nct 4 was removed so that BAM file of haplotypes could be produced, the large deletion is detected. It would seem that removing the -nct 4 is what did it. However zygosity seems to be incorrect: from the alignment it seems like the deletion is homozygous, but it is called heterozygous. You will see from the screenshot below with the haplotypes that it seems like a few reads that are mapping into the deleted region are causing this.....?

    java -jar /Users/tim/home/PLATFORM/softwareRepo/swRepo_r01/install/gatk/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T HaplotypeCaller -R /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta --dbsnp /Users/tim/home/PLATFORM/draftNewRefData/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/dbsnp_137.b37.excluding_sites_after_129.vcf --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --downsampling_type BY_SAMPLE --downsample_to_coverage 250 --intervals bigDeletion.list --validation_strictness LENIENT -kmerSize 20 -minPruning 10 --forceActive --activeRegionMaxSize 6000 -I Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.bam --out Hughes-MiSeqExcap-Lib1-907_140211_M01132_0066_L001.aln.valid.dedup.hc.wholeGene.variantSites.vcf --bamOutput assembledHaplotypes.bam -bamWriterType ALL_POSSIBLE_HAPLOTYPES
    

    The resulting VCF looks like this:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Hughes-MiSeqExcap-Lib1-907
    2   234629239   rs12468543  A   G   1482.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.547;ClippingRankSum=-0.163;DB;DP=39;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.961;QD=28.99;ReadPosRankSum=0.192    GT:AD:DP:GQ:PL  0/1:16,22:38:99:1511,0,17676
    2   234629585   rs28899186  G   T   838.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.322;ClippingRankSum=0.684;DB;DP=8;FS=7.068;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.684;QD=28.49;ReadPosRankSum=0.322  GT:AD:DP:GQ:PL  0/1:5,3:8:99:867,0,17365
    2   234630186   rs10929293  A   T   725.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.135;ClippingRankSum=0.477;DB;DP=64;FS=3.351;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.047;QD=11.34;ReadPosRankSum=-0.222   GT:AD:DP:GQ:PL  0/1:32,32:64:99:754,0,18185
    2   234630443   rs7597496   A   G   834.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=3.393;ClippingRankSum=-1.084;DB;DP=62;FS=2.133;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-1.126;QD=13.46;ReadPosRankSum=0.507    GT:AD:DP:GQ:PL  0/1:31,31:62:99:863,0,20065
    2   234630503   rs12475934  G   A   1015.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.548;ClippingRankSum=-0.323;DB;DP=56;FS=1.008;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.902;QD=18.14;ReadPosRankSum=0.621    GT:AD:DP:GQ:PL  0/1:24,32:56:99:1044,0,17949
    2   234631477   .   TAATAAGAATGTTTCTTTTTTTTTTTTTTGAAGGAAAAAATAAATTTATTGCTCATTAAGTGGAAGTGGATCATCATAAAGGTCTTCATCCTCATTGTCTCCATGCTGAGTGGGCTGAGGAGGAGGAGGAAGGGGAGGGATTGGTCCTGCTGTCTCAGCGTGGCAGAGGCAGAAGAGGATGAGAAGGTGGAAGGGCCAGCGGGAGAGGCAGGCACACTCGGTGTAACTTTATGGAAATATATCATCATTTTTGTTTGACTTTTTTCCTTTCTCATTTCTCTGAAAATGTTTCTGTACAGTACCAATTCTTCTTCCACCATTTGCTTTAGTTTCAGTGCCCAAACCATAGAAGGGTCCATGTGGTAAAAAAAGTCAAAACTGACTTTTTTTTTTTTTTTTGAGTTGGAGTCCTGCTGTCACCCAGGCTGGAGTGCAATGGCACGATGTTGGCTCACTGCAACCTCTGCCTCCCAGGTTCAAGCAATTCTCCTGTCTCAGCCTCACAAGTAGCTAGGACTACAGGCACACGTCACCACACCTGGCTAATTTTTGTACTTTTAGTAGAGATGGGGTTTCACCATACTGGTCAGGCTGGTCTCGAACTCCTGACCTCAGGTGATCCACCCGCCTCAGCCTCCCAAAGTGCTAGGATTCCAGGTGTGAGCCACTGCACCTGGTCAACAATCTTTTTTTTTTTTTTTTTTTTAATTTATTTTTTTATTGATAATTCTTGGGTGTTTCTCACAGAGGGGGATTTGGCAGGGTCATGGGACAATAGTGGAGGGAAGGTCAGCAGATAAACAAGTGAACAAAGGTCTCTGGTTTTCCCAGGCAGAGGACCCTGCGGCCTTCCGCAGTGTTTGTGTCCCTGATTACTTGAGATTAGGGATTGGTGATGACTCTTAACGAGCATGCTGCCTTCAAGCATCTGTTTAACAAAGCACATCTTGCACCGCCCTTAATCCATTTAACCCTGAGTGGACACAGCACATGTTTCAGAGAGCACAGGGTTGGGGGTAAGGTCACAGATCAACAGGATCCCAAGGCAGAGGAATTTTTCTTAGTGCAGAACAAAATGAAAAGTCTCCCATGTCTACTTCTTTCTACACAGACACGGCAACCATCCGATTTCTCAATCTTTTCCCCACCTTTCCCGCCTTTCTATTCCACAAAGCCGCCATTGTCATCCTGGCCCGTTCTCAATGAGCTGTTGGGCACACCTCCCAGACGGGGTGGTGGCCGGGCAGAGGGGCTCCTCACTTCCCAGTAGGGGCGGCCGGGCAGAGGCGCCCCTCACCTCCCGGACGGGGCGGCTGGCCGGGTGGGGGGGCTGACCCCCCCATCTCCCTCCCGGACGGGGTGGCTGGCCGGGCTGAGGGGCTCCTCACTTCCCAGTAGGGGCGGCCGGGCAGAGGCGCCCCTCACCTCCCGGACGGGGCGGCTGGCCGGGCGGGGGGCTGACCCCCCCACCTCCCTCCCGGACGGGGCGGCTGGCCAGGCGGGGGGCTGACCCCCCCCACCTCCCTCCCGGACGGGGTGGCTGCCGGGCGGAGACGCTCCTCACTTCCCAGATGGGGTGGCTGCCGGGCGGAGAGGCTCCTCACTTCTCAGACAGGGCAGCTGCCGGGCGGAGGGGCTCCTCACTTCTCAGACGGGGCGGCCGGGCAGAGACGCTCCTCACCTCCCAGATGGGGTCTCGCCGGGCAGAGGCGCTCCTCACATCCCAGATGGGGCGGCGGGGCAGAGGCGCTCCCCACATCTCAGACGATGGGCGGCCGGGCAGAGACGCTCCTCACTTCCTAGATGTGATGGCGGCTGGGAAGAGGCGCTCCTCACTTCCTAGATGGGATGGCGGCCGGGTGAAGACGCTCCTCGCTTTCCAGACTGGGCAGCCAGGCAGAGGGGCTCCTCACATCCCAGACGATGGGCGGCCAGGCAGAGACACTCCTCACTTCCCAGACGGGGTGGCGGCCGGGCAGAGGCTGCAATCTCGGCACTTTGGGAGGCCAAGGCAGGCGGCTGGGAGGTGTAGGTTGTAGTGAGCGGAGATCACGCCACTGCACTCCAGCCTGGGCACCATTGAGCACTGAGTGAACGAGACTCCGTCTGCAATCCCGGCACCTCGGGAGGCCGAGGTTGGCGGATCACTCGCGGTTAGGGGCTGGAGACCGGCCCGGCCAAACAGCAAAACCCGGTCTCCACCAAAACCAGTCAGGCGTGGCGGCGCGCGCCTGCAATCGCAGGCACTCGGCAGGCTGAGGCAGGAGAATCAGGCAGGGAGGTTGCAGTGAGCCGAGATGGCAGCAGTACAGTCCAGCTTCGGCTCTGCATGAGAGGGAGACCGTGGGGAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCAGAGGCGCCTGGTCAACAATCTTAAGTCC  T   10293.73    .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.694;ClippingRankSum=0.118;DP=89;FS=1.142;MLEAC=1;MLEAF=0.500;MQ=59.32;MQ0=0;MQRankSum=2.258;QD=31.96;ReadPosRankSum=1.235 GT:AD:DP:GQ:PL  0/1:13,75:88:99:10331,0,16052
    2   234634324   rs28899189  A   C   1157.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=2.399;ClippingRankSum=-1.027;DB;DP=81;FS=5.697;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.923;QD=14.29;ReadPosRankSum=1.415 GT:AD:DP:GQ:PL  0/1:43,38:81:99:1186,0,18446
    2   234634639   rs28899191  G   C   394.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=-0.054;ClippingRankSum=0.406;DB;DP=41;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.298;QD=9.63;ReadPosRankSum=0.785 GT:AD:DP:GQ:PL  0/1:21,19:40:99:423,0,20483
    2   234634916   rs6711351   A   G   342.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.310;ClippingRankSum=-0.152;DB;DP=67;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.033;QD=5.12;ReadPosRankSum=0.429 GT:AD:DP:GQ:PL  0/1:44,23:67:99:371,0,21854
    2   234635241   rs6715325   T   C   836.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.471;ClippingRankSum=0.113;DB;DP=65;FS=3.363;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.166;QD=12.87;ReadPosRankSum=-0.020 GT:AD:DP:GQ:PL  0/1:38,27:65:99:865,0,1247
    2   234635367   rs17864697  T   C   1205.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-0.213;ClippingRankSum=1.383;DB;DP=88;FS=0.806;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.547;QD=13.70;ReadPosRankSum=-1.391   GT:AD:DP:GQ:PL  0/1:46,42:88:99:1234,0,1505
    2   234635467   rs4294999   A   G   1228.77 .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.223;ClippingRankSum=0.614;DB;DP=84;FS=0.824;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.837;QD=14.63;ReadPosRankSum=0.040 GT:AD:DP:GQ:PL  0/1:44,40:84:99:1257,0,1376
    

    image

    Screen Shot 2014-03-21 at 09.46.57 .png
    1915 x 890 - 98K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Hmm, that's unexpected -- multithreading shouldn't have such an impact on the calls. I'd expect to see marginal differences due to downsampling effect, but this is not marginal. I'll ask @rpoplin to comment on this.

    Geraldine Van der Auwera, PhD

  • rpoplinrpoplin Posts: 122GATK Developer mod

    Sorry to say that I've gotten a little lost in this thread. Can you please post the output from the command both with and without -nct 4 so that I can help debug it?

    Thanks! Quite awesome that you can get such a large event called.

  • TimHughesTimHughes Posts: 60Member

    Yes, awesome that such large events are called.

    I am not sure there is anything really to debug here. I think the situation I set up with a limited interval of about 6k bp and a deletion of 3 k bp and an --activeRegionMaxSize 6000 was probably bound to prevent -nct 4 from working properly.

    A more interesting observation here is maybe the fact that the genotype is called heteroz when from the reads it seems homoz: it would seem like a few reads mapping into the deletion cause this. I wonder whether this could be a general "shortcoming" of the HC when deletions exceed a certain size. Would increasing pruning value resolve this?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Yes, awesome that such large events are called.

    Indeed, sorry for my incorrect answer earlier on! My info was outdated; good thing Ryan jumped in :)

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.