Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

not all sites emitted with GENOTYPE_GIVEN_ALLELES

tommycarstensentommycarstensen United KingdomMember ✭✭✭
edited December 2014 in Ask the GATK team

I am running HC3.3-0 with the following options (e.g. GENOTYPE_GIVEN_ALLELES):

$java7 -Djava.io.tmpdir=tmp -Xmx3900m \
 -jar $jar \
 --analysis_type HaplotypeCaller \
 --reference_sequence $ref \
 --input_file $BAM \
 --intervals $CHROM \
 --dbsnp $dbSNP \
 --out $out \
 -stand_call_conf 0 \
 -stand_emit_conf 0 \
 -A Coverage -A FisherStrand -A HaplotypeScore -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest \
 -L $allelesVCF \
 -L 20:60000-70000 \
 --interval_set_rule INTERSECTION \
 --genotyping_mode GENOTYPE_GIVEN_ALLELES \
 --alleles $allelesVCF \
 --emitRefConfidence NONE \
 --output_mode EMIT_ALL_SITES \

The file $allelesVCF contains these neighbouring SNPs:

20  60807   .   C   T   118.96  .
20  60808   .   G   A   46.95   .
20  61270   .   A   C   2870.18 .
20  61271   .   T   A   233.60  .

I am unable to call these neighbouring SNPs; despite reads being present in the file $BAM, which shouldn't matter anyway. I also tried adding --interval_merging OVERLAPPING_ONLY to the command line, but that didn't solve the problem. What am I doing wrong? I should probably add GATK breaker/misuser to my CV...

Thank you as always.

P.S. The CommandLineGATK documentation does not say, what the default value for --interval_merging is.

P.P.S. Iterative testing a bit slow, because HC always has to do this step:

HCMappingQualityFilter - Filtering out reads with MAPQ < 20

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen‌

    Hi,

    Can you post records for the positions of interest from running Haplotype Caller without the last 5 lines of your command? Please also do not include the other intervals other than 20:60000-70000.

    Please also post the records for positions 20:60000-70000 when you run the command you posted above.

    Thank you!

    -Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Thanks @Sheila‌ !!

    Here are the records without the last 5 lines (i.e. in the default DISCOVERY mode and specifying only the interval 20:60000-70000):

    20  63799   rs1418258   C   T   82.28   .   AC=2;AF=1.00;AN=2;DB;DP=3;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.43;SOR=1.179 GT:AD:GQ:PL 1/1:0,3:9:110,9,0
    20  66370   rs6054257   G   A   210.80  .   AC=2;AF=1.00;AN=2;DB;DP=7;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=30.11;SOR=0.941 GT:AD:GQ:PL 1/1:0,7:21:239,21,0
    20  68749   rs6086616   T   C   33.77   .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.026;DB;DP=4;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=55.16;MQ0=0;MQRankSum=-1.026;QD=8.44;ReadPosRankSum=-1.026;SOR=0.105  GT:AD:GQ:PL 0/1:2,2:57:62,0,57
    20  69094   rs6039403   G   A   121.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.198;DB;DP=8;FS=3.680;MLEAC=1;MLEAF=0.500;MQ=57.63;MQ0=0;MQRankSum=-0.198;QD=15.22;ReadPosRankSum=-0.922;SOR=0.636 GT:AD:GQ:PL 0/1:2,5:52:150,0,52
    

    Here are the records when running the command above (i.e. in GENOTYPE_GIVEN_ALLELES mode):

    20  60359   .   G   T   0   .   AC=0;AF=0.00;AN=2;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=51.01;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:4,0:12:0,12,144
    20  60454   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.63;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:8,0:24:0,24,296
    20  60473   .   C   A   0   .   AC=0;AF=0.00;AN=2;DP=9;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.90;MQ0=0;SOR=0.283 GT:AD:GQ:PL 0/0:9,0:27:0,27,322
    20  60479   rs149529999 C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=11;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=58.29;MQ0=0;SOR=0.551 GT:AD:GQ:PL 0/0:11,0:33:0,33,351
    20  60522   rs150241001 T   TC  0   .   AC=0;AF=0.00;AN=2;DB;DP=9;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.90;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:8,0:24:0,24,229
    20  60571   rs116145529 C   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=7;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.495  GT:AD:GQ:PL 0/0:7,0:21:0,21,227
    20  60826   .   A   G   0   .   AC=0;AF=0.00;AN=2;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.446 GT:AD:GQ:PL 0/0:5,0:15:0,15,195
    20  61044   .   C   A   0   .   AC=0;AF=0.00;AN=2;DP=7;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.078 GT:AD:GQ:PL 0/0:7,0:21:0,21,217
    20  61098   rs6078030   C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,90
    20  61106   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:2,0:6:0,6,49
    20  61121   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105 GT:AD:GQ:PL 0/0:2,0:6:0,6,63
    20  61138   .   CTT CTTT,CT,CTTTT,C 0   .   AC=0,0,0,0;AF=0.00,0.00,0.00,0.00;AN=2;DP=2;FS=0.000;MLEAC=0,0,0,0;MLEAF=0.00,0.00,0.00,0.00;MQ=60.00;MQ0=0;SOR=0.105   GT:AD:GQ:PL 0/0:2,0,0,0,0:6:0,6,2147483647,6,2147483647,2147483647,6,2147483647,2147483647,2147483647,6,76,76,76,70
    20  61388   rs146681064 T   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=55.21;MQ0=0;SOR=0.446  GT:AD:GQ:PL 0/0:5,0:15:0,15,148
    20  61651   rs76553454  C   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105  GT:AD:GQ:PL 0/0:2,0:6:0,6,78
    20  61724   rs142532139 A   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223  GT:AD:GQ:PL 0/0:4,0:12:0,12,150
    20  61725   .   A   G   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:4,0:12:0,12,142
    20  61790   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:2,0:6:0,6,82
    20  61795   rs4814683   G   T   3.98    .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.736;DB;DP=3;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.736;QD=1.33;ReadPosRankSum=0.736;SOR=1.609    GT:AD:GQ:PL 0/1:2,1:30:30,0,66
    20  61926   .   G   T   0   .   AC=0;AF=0.00;AN=2;DP=9;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.283 GT:AD:GQ:PL 0/0:9,0:27:0,27,283
    20  61952   .   G   T   0   .   AC=0;AF=0.00;AN=2;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.169 GT:AD:GQ:PL 0/0:8,0:24:0,24,313
    20  62100   rs6047235   T   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223  GT:AD:GQ:PL 0/0:4,0:12:0,12,154
    20  62154   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:4,0:12:0,12,152
    20  62277   .   A   G   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:2,0:6:0,6,77
    20  62338   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,116
    20  62478   rs192812899 A   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223  GT:AD:GQ:PL 0/0:4,0:12:0,12,144
    20  62545   rs150267191 C   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:2,0:6:0,6,80
    20  62553   rs114190700 T   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:2,0:6:0,6,77
    20  62568   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:2,0:6:0,6,78
    20  62731   rs34147676  C   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105  GT:AD:GQ:PL 0/0:2,0:6:0,6,56
    20  62739   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105 GT:AD:GQ:PL 0/0:2,0:6:0,6,69
    20  62783   rs189195684 A   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=7;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.495  GT:AD:GQ:PL 0/0:7,0:21:0,21,232
    20  62813   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:6,0:18:0,18,208
    20  62946   rs183567118 T   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=7;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.495  GT:AD:GQ:PL 0/0:7,0:21:0,21,245
    20  62975   .   TA  T,TAA   0   .   AC=0,0;AF=0.00,0.00;AN=2;DP=6;FS=0.000;MLEAC=0,0;MLEAF=0.00,0.00;MQ=60.00;MQ0=0;SOR=0.148   GT:AD:GQ:PL 0/0:5,0,0:15:0,15,126,24,150,2147483647
    20  63054   rs116457849 A   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.307  GT:AD:GQ:PL 0/0:6,0:18:0,18,199
    20  63231   rs6076506   T   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,107
    20  63232   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,108
    20  63244   rs6139074   A   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,113
    20  63257   .   ATTT    A   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,147
    20  63360   rs186156309 C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:2,0:6:0,6,71
    20  63426   rs147063585 G   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:4,0:12:0,12,146
    20  63452   rs115017123 C   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,120
    20  63459   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,105
    20  63468   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:2,0:6:0,6,79
    20  63477   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=1;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:1,0:3:0,3,40
    20  63666   .   C   G   0   .   AC=0;AF=0.00;AN=2;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:6,0:18:0,18,225
    20  63733   rs75670495  C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:4,0:12:0,12,143
    20  63799   rs1418258   C   T   82.28   .   AC=2;AF=1.00;AN=2;DB;DP=3;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=27.43;SOR=1.179 GT:AD:GQ:PL 1/1:0,3:9:110,9,0
    20  63805   .   T   G   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:4,0:12:0,12,174
    20  63808   rs76004960  G   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105  GT:AD:GQ:PL 0/0:6,0:18:0,18,248
    20  63897   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=9;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=55.72;MQ0=0;SOR=0.061 GT:AD:GQ:PL 0/0:8,0:24:0,24,304
    20  63967   rs116770801 A   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=9;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.39;MQ0=0;SOR=0.527  GT:AD:GQ:PL 0/0:9,0:27:0,27,300
    20  63971   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=10;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.66;MQ0=0;SOR=0.693    GT:AD:GQ:PL 0/0:10,0:30:0,30,356
    20  64016   rs143263863 G   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=13;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:13,0:39:0,39,477
    20  64062   rs148297240 G   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.05;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:8,0:24:0,24,290
    20  64139   rs186497980 G   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=12;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=58.05;MQ0=0;SOR=0.446 GT:AD:GQ:PL 0/0:12,0:36:0,36,354
    20  64150   rs7274499   C   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=13;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=58.20;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:13,0:39:0,39,436
    20  64223   .   AT  ATT,A   0   .   AC=0,0;AF=0.00,0.00;AN=2;DP=8;FS=0.000;MLEAC=0,0;MLEAF=0.00,0.00;MQ=60.00;MQ0=0;SOR=0.446   GT:AD:GQ:PL 0/0:5,0,0:15:0,24,2147483647,15,137,113
    20  64277   rs186750895 A   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105  GT:AD:GQ:PL 0/0:6,0:18:0,18,221
    20  64323   rs146016700 C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223  GT:AD:GQ:PL 0/0:4,0:12:0,12,138
    20  64346   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,119
    20  64382   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=1;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:1,0:3:0,3,37
    20  64539   .   A   T   0   .   AC=0;AF=0.00;AN=2;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=53.95;MQ0=0;SOR=0.169 GT:AD:GQ:PL 0/0:8,0:24:0,24,302
    20  65028   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,108
    20  65130   .   G   GA  0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105 GT:AD:GQ:PL 0/0:2,0:6:0,6,54
    20  65161   .   T   A   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:4,0:12:0,12,144
    20  65168   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:4,0:12:0,12,148
    20  65201   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:4,0:12:0,12,120
    20  65284   rs373019376 T   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.307  GT:AD:GQ:PL 0/0:6,0:18:0,18,199
    20  65288   rs13043000  G   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.307  GT:AD:GQ:PL 0/0:6,0:18:0,18,169
    20  65335   rs115575112 T   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.061  GT:AD:GQ:PL 0/0:3,0:9:0,9,104
    20  65468   .   C   A   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:4,0:12:0,12,153
    20  65619   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.446 GT:AD:GQ:PL 0/0:5,0:15:0,15,178
    20  65900   rs6053810   G   A   13.72   .   AC=2;AF=1.00;AN=2;DB;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=13.72;SOR=1.609 GT:AD:GQ:PL 1/1:0,1:3:40,3,0
    20  66094   rs141766135 G   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:8,0:24:0,24,247
    20  66191   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:8,0:24:0,24,314
    20  66248   rs187878174 C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:4,0:12:0,12,151
    20  66330   .   A   G   0   .   AC=0;AF=0.00;AN=2;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105 GT:AD:GQ:PL 0/0:6,0:18:0,18,207
    20  66370   rs6054257   G   A   210.80  .   AC=2;AF=1.00;AN=2;DB;DP=7;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=30.11;SOR=0.941 GT:AD:GQ:PL 1/1:0,7:21:239,21,0
    20  66476   .   A   G   0   .   AC=0;AF=0.00;AN=2;DP=12;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.148    GT:AD:GQ:PL 0/0:12,0:36:0,36,436
    20  66662   .   TGAAAGACAGG T   0   .   AC=0;AF=0.00;AN=2;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.05;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:8,0:24:0,24,938
    20  66705   rs145589984 C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.446  GT:AD:GQ:PL 0/0:5,0:15:0,15,179
    20  66745   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:4,0:12:0,12,142
    20  66911   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:6,0:18:0,18,216
    20  67461   rs150209226 A   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,89
    20  67474   .   A   AT,T    0   .   AC=0,0;AF=0.00,0.00;AN=2;DP=3;FS=0.000;MLEAC=0,0;MLEAF=0.00,0.00;MQ=60.00;MQ0=0;SOR=0.368   GT:AD:GQ:PL 0/0:3,0,0:9:0,9,71,9,75,88
    20  67482   .   A   T   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,86
    20  67500   rs112142516 T   TTGGTATCTAG 48.70   .   AC=2;AF=1.00;AN=2;DB;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=30.20;SOR=1.609 GT:AD:GQ:PL 1/1:0,1:6:85,6,0
    20  67506   .   T   TCTAGTGG,TCTAG,TCTAGTGGTA   19.30   .   AC=0,0,2;AF=0.00,0.00,1.00;AN=2;DP=1;FS=0.000;MLEAC=0,0,1;MLEAF=0.00,0.00,0.500;MQ=60.00;MQ0=0;QD=19.30;SOR=0.693   GT:AD:GQ:PL 3/3:0,0,0,0:3:53,18,18,19,21,45,6,3,3,0
    20  67624   rs200619864 T   TG  0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,88
    20  67641   rs76107804  C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:2,0:6:0,6,80
    20  67644   rs138724446 T   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:2,0:6:0,6,74
    20  67650   .   T   G   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:2,0:6:0,6,80
    20  67740   .   G   T   0   .   AC=0;AF=0.00;AN=2;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=56.16;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:4,0:12:0,12,146
    20  67765   rs112130015 C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=52.03;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,113
    20  67773   .   A   C   0   .   AC=0;AF=0.00;AN=2;DP=5;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=52.03;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,110
    20  67788   .   A   G   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=49.84;MQ0=0;SOR=0.105 GT:AD:GQ:PL 0/0:2,0:6:0,6,74
    20  67960   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=7;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=56.62;MQ0=0;SOR=0.495 GT:AD:GQ:PL 0/0:7,0:21:0,21,275
    20  68182   rs144901224 A   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:6,0:18:0,18,207
    20  68204   .   T   G   0   .   AC=0;AF=0.00;AN=2;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:6,0:18:0,18,200
    20  68264   rs60878529  T   C   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693  GT:AD:GQ:PL 0/0:6,0:18:0,18,227
    20  68422   .   A   G   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.061 GT:AD:GQ:PL 0/0:3,0:9:0,9,110
    20  68535   rs141828970 G   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=7;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.078  GT:AD:GQ:PL 0/0:7,0:21:0,21,259
    20  68660   rs146256770 C   A   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.307  GT:AD:GQ:PL 0/0:6,0:18:0,18,196
    20  68667   rs148031635 T   G   0   .   AC=0;AF=0.00;AN=2;DB;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.307  GT:AD:GQ:PL 0/0:6,0:18:0,18,208
    20  68810   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105 GT:AD:GQ:PL 0/0:2,0:6:0,6,71
    20  69065   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=8;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.63;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:7,0:21:0,21,270
    20  69094   rs6039403   G   A   121.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.198;DB;DP=8;FS=3.680;MLEAC=1;MLEAF=0.500;MQ=57.63;MQ0=0;MQRankSum=0.198;QD=15.22;ReadPosRankSum=-0.922;SOR=0.636  GT:AD:GQ:PL 0/1:2,5:52:150,0,52
    20  69135   .   CA  C   0   .   AC=0;AF=0.00;AN=2;DP=9;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.90;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:8,0:24:0,24,298
    20  69149   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=9;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=57.90;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:8,0:24:0,24,281
    20  69160   .   C   T   0   .   AC=0;AF=0.00;AN=2;DP=6;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.693 GT:AD:GQ:PL 0/0:6,0:18:0,18,195
    20  69275   .   C   G   0   .   AC=0;AF=0.00;AN=2;DP=1;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:1,0:3:0,3,37
    20  69305   .   T   C   0   .   DP=0;FS=0.000;MLEAC=0;MLEAF=NaN;MQ=0.00;MQ0=0;SOR=0.693 GT  ./.
    20  69311   .   A   G   0   .   DP=0;FS=0.000;MLEAC=0;MLEAF=NaN;MQ=0.00;MQ0=0;SOR=0.693 GT  ./.
    20  69408   rs17685809  C   T   0   .   AC=0;AF=0.00;AN=2;DB;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368  GT:AD:GQ:PL 0/0:3,0:9:0,9,106
    20  69481   rs11477748  CT  C   0   .   AC=0;AF=0.00;AN=2;DB;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105  GT:AD:GQ:PL 0/0:2,0:6:0,6,60
    20  69506   .   GACACAC GACACACACAC,GACACACAC,GACACACACACAC,GACACACACACACAC,GACAC,G 0.01    .   AC=0,0,0,0,0,0;AF=0.00,0.00,0.00,0.00,0.00,0.00;AN=2;DP=1;FS=0.000;MLEAC=0,0,0,0,0,0;MLEAF=0.00,0.00,0.00,0.00,0.00,0.00;MQ=60.00;MQ0=0;SOR=0.693   GT:AD:GQ:PL 0/0:0,0,0,0,0,0,0:0:0,3,2147483647,3,2147483647,2147483647,3,2147483647,2147483647,2147483647,3,2147483647,2147483647,2147483647,2147483647,3,2147483647,2147483647,2147483647,2147483647,2147483647,0,3,3,3,3,3,0
    20  69536   .   G   C   0   .   AC=0;AF=0.00;AN=2;DP=2;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.105 GT:AD:GQ:PL 0/0:2,0:6:0,6,60
    20  69537   .   T   A,TCACA 0   .   AC=0,0;AF=0.00,0.00;AN=2;DP=2;FS=0.000;MLEAC=0,0;MLEAF=0.00,0.00;MQ=60.00;MQ0=0;SOR=0.105   GT:AD:GQ:PL 0/0:2,0,0:6:0,6,73,6,78,111
    20  69540   .   A   C,G 0   .   AC=0,0;AF=0.00,0.00;AN=2;DP=2;FS=0.000;MLEAC=0,0;MLEAF=0.00,0.00;MQ=60.00;MQ0=0;SOR=0.105   GT:AD:GQ:PL 0/0:2,0,0:6:0,6,73,6,73,73
    20  69765   .   T   C   0   .   AC=0;AF=0.00;AN=2;DP=3;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.368 GT:AD:GQ:PL 0/0:3,0:9:0,9,84
    20  69987   .   G   A   0   .   AC=0;AF=0.00;AN=2;DP=4;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;MQ0=0;SOR=0.223 GT:AD:GQ:PL 0/0:4,0:12:0,12,154
    

    Here are columns 1-8 of $allelesVCF (20:60000-65000 because of character limitation):

    20  60359   .   G   T   18.61   .   AC=1;AF=2.960e-04;AN=3378;BaseQRankSum=-9.880e-01;DP=4551;FS=0.000;GQ_MEAN=7.52;GQ_STDDEV=4.64;InbreedingCoeff=-0.0964;MLEAC=1;MLEAF=2.960e-04;MQ=57.28;MQ0=0;MQRankSum=-9.880e-01;NCC=298;QD=3.10;ReadPosRankSum=0.406
    20  60454   .   T   C   50.52   .   AC=4;AF=1.115e-03;AN=3586;BaseQRankSum=0.736;DP=5714;FS=0.000;GQ_MEAN=8.77;GQ_STDDEV=5.60;InbreedingCoeff=-0.0822;MLEAC=3;MLEAF=8.366e-04;MQ=51.77;MQ0=0;MQRankSum=0.736;NCC=194;QD=5.05;ReadPosRankSum=-3.580e-01
    20  60473   .   C   A   16.25   .   AC=1;AF=2.795e-04;AN=3578;BaseQRankSum=-3.580e-01;DP=5730;FS=0.000;GQ_MEAN=8.49;GQ_STDDEV=5.47;InbreedingCoeff=-0.0911;MLEAC=1;MLEAF=2.795e-04;MQ=49.96;MQ0=0;MQRankSum=1.23;NCC=198;QD=3.25;ReadPosRankSum=-1.231e+00
    20  60479   .   C   T   2058.24 .   AC=28;AF=7.856e-03;AN=3564;BaseQRankSum=-7.360e-01;DP=5820;FS=8.913;GQ_MEAN=9.14;GQ_STDDEV=8.78;InbreedingCoeff=-0.0855;MLEAC=33;MLEAF=9.259e-03;MQ=56.62;MQ0=0;MQRankSum=-6.700e-02;NCC=205;QD=13.91;ReadPosRankSum=0.358
    20  60522   .   T   TC  7289.13 .   AC=149;AF=0.042;AN=3516;BaseQRankSum=0.198;DP=5969;FS=1.767;GQ_MEAN=10.33;GQ_STDDEV=11.56;InbreedingCoeff=-0.0488;MLEAC=171;MLEAF=0.049;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=229;QD=11.28;ReadPosRankSum=0.358
    20  60571   .   C   A   2897.36 .   AC=50;AF=0.014;AN=3544;BaseQRankSum=0.731;DP=5758;FS=6.421;GQ_MEAN=9.45;GQ_STDDEV=9.48;InbreedingCoeff=-0.0743;MLEAC=57;MLEAF=0.016;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=215;QD=12.93;ReadPosRankSum=0.358
    20  60765   .   CTGTT   C   214.48  .   AC=4;AF=1.089e-03;AN=3672;BaseQRankSum=0.00;DP=6158;FS=0.000;GQ_MEAN=8.98;GQ_STDDEV=6.85;InbreedingCoeff=-0.0816;MLEAC=3;MLEAF=8.170e-04;MQ=60.00;MQ0=0;MQRankSum=-5.500e-01;NCC=151;QD=12.37;ReadPosRankSum=0.00
    20  60807   .   C   T   118.96  .   AC=1;AF=2.737e-04;AN=3654;BaseQRankSum=0.922;DP=6006;FS=0.000;GQ_MEAN=8.78;GQ_STDDEV=5.67;InbreedingCoeff=-0.0870;MLEAC=1;MLEAF=2.737e-04;MQ=60.00;MQ0=0;MQRankSum=0.198;NCC=160;QD=16.99;ReadPosRankSum=-9.220e-01
    20  60808   .   G   A   46.95   .   AC=1;AF=2.738e-04;AN=3652;BaseQRankSum=1.75;DP=5986;FS=3.136;GQ_MEAN=8.78;GQ_STDDEV=5.92;InbreedingCoeff=-0.0870;MLEAC=1;MLEAF=2.738e-04;MQ=60.00;MQ0=0;MQRankSum=0.00;NCC=161;QD=6.71;ReadPosRankSum=1.30
    20  60826   .   A   G   108.40  .   AC=2;AF=5.491e-04;AN=3642;BaseQRankSum=-7.270e-01;DP=5967;FS=2.276;GQ_MEAN=8.81;GQ_STDDEV=6.06;InbreedingCoeff=-0.0849;MLEAC=2;MLEAF=5.491e-04;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=166;QD=8.34;ReadPosRankSum=0.727
    20  61044   .   C   A   434.36  .   AC=6;AF=1.682e-03;AN=3568;BaseQRankSum=0.198;DP=5663;FS=6.273;GQ_MEAN=8.61;GQ_STDDEV=5.89;InbreedingCoeff=-0.0886;MLEAC=7;MLEAF=1.962e-03;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=203;QD=14.98;ReadPosRankSum=0.922
    20  61098   .   C   T   24721.57    .   AC=349;AF=0.101;AN=3448;BaseQRankSum=-7.270e-01;DP=6682;FS=1.866;GQ_MEAN=13.83;GQ_STDDEV=18.79;InbreedingCoeff=0.0303;MLEAC=417;MLEAF=0.121;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=263;QD=17.62;ReadPosRankSum=0.573
    20  61106   .   C   T   48.02   .   AC=1;AF=2.831e-04;AN=3532;BaseQRankSum=0.727;DP=6323;FS=0.000;GQ_MEAN=8.91;GQ_STDDEV=5.39;InbreedingCoeff=-0.0796;MLEAC=1;MLEAF=2.831e-04;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=221;QD=12.01;ReadPosRankSum=-7.270e-01
    20  61121   .   T   C   208.04  .   AC=3;AF=8.701e-04;AN=3448;BaseQRankSum=0.727;DP=6326;FS=1.922;GQ_MEAN=7.56;GQ_STDDEV=6.09;InbreedingCoeff=-0.0964;MLEAC=4;MLEAF=1.160e-03;MQ=60.00;MQ0=0;MQRankSum=1.30;NCC=263;QD=10.40;ReadPosRankSum=-3.610e-01
    20  61138   .   CTT CTTT,CT,CTTTT,C 2726.13 .   AC=101,49,1,1;AF=0.031,0.015,3.028e-04,3.028e-04;AN=3302;BaseQRankSum=0.696;DP=7056;FS=1.738;GQ_MEAN=5.44;GQ_STDDEV=7.09;InbreedingCoeff=0.3816;MLEAC=98,47,0,1;MLEAF=0.030,0.014,0.00,3.028e-04;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=336;QD=4.77;ReadPosRankSum=0.727
    20  61270   .   A   C   2870.18 .   AC=56;AF=0.017;AN=3300;BaseQRankSum=0.736;DP=5772;FS=0.641;GQ_MEAN=8.05;GQ_STDDEV=6.77;InbreedingCoeff=-0.0432;MLEAC=61;MLEAF=0.018;MQ=57.08;MQ0=0;MQRankSum=0.727;NCC=337;QD=17.50;ReadPosRankSum=0.736
    20  61271   .   T   A   233.60  .   AC=5;AF=1.517e-03;AN=3296;BaseQRankSum=0.922;DP=5766;FS=0.000;GQ_MEAN=7.22;GQ_STDDEV=5.54;InbreedingCoeff=-0.0947;MLEAC=5;MLEAF=1.517e-03;MQ=60.00;MQ0=0;MQRankSum=-1.980e-01;NCC=339;QD=12.29;ReadPosRankSum=0.358
    20  61272   .   C   A   235.31  .   AC=5;AF=1.555e-03;AN=3216;BaseQRankSum=-1.980e-01;DP=5681;FS=0.000;GQ_MEAN=6.76;GQ_STDDEV=5.61;InbreedingCoeff=-0.1004;MLEAC=6;MLEAF=1.866e-03;MQ=60.00;MQ0=0;MQRankSum=-1.231e+00;NCC=379;QD=12.38;ReadPosRankSum=-3.580e-01
    20  61279   .   C   T   9494.29 .   AC=154;AF=0.049;AN=3144;BaseQRankSum=0.736;DP=5673;FS=30.800;GQ_MEAN=8.07;GQ_STDDEV=9.14;InbreedingCoeff=0.0418;MLEAC=176;MLEAF=0.056;MQ=60.00;MQ0=0;MQRankSum=0.720;NCC=415;QD=21.01;ReadPosRankSum=0.727
    20  61375   .   C   A   129.02  .   AC=3;AF=8.301e-04;AN=3614;BaseQRankSum=0.322;DP=5538;FS=1.719;GQ_MEAN=8.33;GQ_STDDEV=5.97;InbreedingCoeff=-0.0903;MLEAC=3;MLEAF=8.301e-04;MQ=60.00;MQ0=0;MQRankSum=1.03;NCC=180;QD=6.14;ReadPosRankSum=0.322
    20  61388   .   T   C   85.51   .   AC=2;AF=5.510e-04;AN=3630;BaseQRankSum=0.736;DP=5533;FS=2.808;GQ_MEAN=8.42;GQ_STDDEV=5.66;InbreedingCoeff=-0.0891;MLEAC=2;MLEAF=5.510e-04;MQ=60.00;MQ0=0;MQRankSum=0.736;NCC=172;QD=9.50;ReadPosRankSum=0.736
    20  61651   .   C   A   3858.78 .   AC=58;AF=0.016;AN=3688;BaseQRankSum=-7.270e-01;DP=6168;FS=3.892;GQ_MEAN=10.19;GQ_STDDEV=10.85;InbreedingCoeff=-0.0660;MLEAC=65;MLEAF=0.018;MQ=57.63;MQ0=0;MQRankSum=0.358;NCC=143;QD=13.21;ReadPosRankSum=0.720
    20  61724   .   A   C   3461.24 .   AC=44;AF=0.012;AN=3718;BaseQRankSum=0.361;DP=6425;FS=16.076;GQ_MEAN=10.12;GQ_STDDEV=10.85;InbreedingCoeff=-0.0673;MLEAC=49;MLEAF=0.013;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=128;QD=14.42;ReadPosRankSum=-5.720e-01
    20  61725   .   A   G   183.40  .   AC=4;AF=1.075e-03;AN=3722;BaseQRankSum=0.727;DP=6390;FS=2.062;GQ_MEAN=9.36;GQ_STDDEV=6.03;InbreedingCoeff=-0.0832;MLEAC=4;MLEAF=1.075e-03;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=126;QD=13.10;ReadPosRankSum=0.358
    20  61790   .   G   A   76.42   .   AC=1;AF=2.684e-04;AN=3726;BaseQRankSum=-9.370e-01;DP=7317;FS=8.451;GQ_MEAN=10.44;GQ_STDDEV=6.40;InbreedingCoeff=-0.0708;MLEAC=1;MLEAF=2.684e-04;MQ=49.17;MQ0=0;MQRankSum=1.30;NCC=124;QD=10.92;ReadPosRankSum=-5.500e-01
    20  61795   .   G   T   181660.40   .   AC=1933;AF=0.541;AN=3574;BaseQRankSum=0.358;DP=9271;FS=0.000;GQ_MEAN=28.11;GQ_STDDEV=32.12;InbreedingCoeff=0.1430;MLEAC=2143;MLEAF=0.600;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=200;QD=25.95;ReadPosRankSum=0.358
    20  61926   .   G   T   381.89  .   AC=8;AF=2.198e-03;AN=3640;BaseQRankSum=-9.220e-01;DP=5711;FS=5.636;GQ_MEAN=8.49;GQ_STDDEV=5.57;InbreedingCoeff=-0.0760;MLEAC=7;MLEAF=1.923e-03;MQ=60.00;MQ0=0;MQRankSum=1.03;NCC=167;QD=13.17;ReadPosRankSum=0.922
    20  61952   .   G   T   42.08   .   AC=1;AF=2.762e-04;AN=3620;BaseQRankSum=0.956;DP=5608;FS=0.000;GQ_MEAN=8.29;GQ_STDDEV=5.50;InbreedingCoeff=-0.0907;MLEAC=1;MLEAF=2.762e-04;MQ=60.00;MQ0=0;MQRankSum=-6.840e-01;NCC=177;QD=5.26;ReadPosRankSum=0.322
    20  62100   .   T   C   218.69  .   AC=4;AF=1.084e-03;AN=3690;BaseQRankSum=0.720;DP=5892;FS=3.976;GQ_MEAN=8.90;GQ_STDDEV=6.46;InbreedingCoeff=-0.0887;MLEAC=5;MLEAF=1.355e-03;MQ=56.04;MQ0=0;MQRankSum=1.38;NCC=142;QD=9.51;ReadPosRankSum=0.406
    20  62154   .   G   A   73  .   AC=2;AF=5.353e-04;AN=3736;BaseQRankSum=-3.580e-01;DP=6410;FS=2.808;GQ_MEAN=9.44;GQ_STDDEV=6.21;InbreedingCoeff=-0.0833;MLEAC=2;MLEAF=5.353e-04;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=119;QD=8.11;ReadPosRankSum=0.727
    20  62277   .   A   G   111.64  .   AC=4;AF=1.055e-03;AN=3790;BaseQRankSum=0.406;DP=6555;FS=2.463;GQ_MEAN=9.64;GQ_STDDEV=6.34;InbreedingCoeff=-0.0760;MLEAC=3;MLEAF=7.916e-04;MQ=60.00;MQ0=0;MQRankSum=0.406;NCC=92;QD=8.59;ReadPosRankSum=0.406
    20  62338   .   T   C   21.97   .   AC=1;AF=2.723e-04;AN=3672;BaseQRankSum=-3.820e-01;DP=5768;FS=2.632;GQ_MEAN=8.58;GQ_STDDEV=5.58;InbreedingCoeff=-0.0894;MLEAC=1;MLEAF=2.723e-04;MQ=60.00;MQ0=0;MQRankSum=0.00;NCC=151;QD=2.00;ReadPosRankSum=0.382
    20  62478   .   A   G   310.39  .   AC=7;AF=1.942e-03;AN=3604;BaseQRankSum=0.720;DP=5326;FS=1.680;GQ_MEAN=8.20;GQ_STDDEV=5.50;InbreedingCoeff=-0.0887;MLEAC=7;MLEAF=1.942e-03;MQ=60.00;MQ0=0;MQRankSum=0.720;NCC=185;QD=11.94;ReadPosRankSum=0.731
    20  62545   .   C   G   2319.26 .   AC=39;AF=0.011;AN=3662;BaseQRankSum=0.550;DP=5900;FS=1.241;GQ_MEAN=9.56;GQ_STDDEV=9.49;InbreedingCoeff=-0.0717;MLEAC=43;MLEAF=0.012;MQ=60.00;MQ0=0;MQRankSum=0.550;NCC=156;QD=13.97;ReadPosRankSum=0.736
    20  62553   .   T   C   1153.84 .   AC=19;AF=5.177e-03;AN=3670;BaseQRankSum=0.736;DP=5897;FS=1.891;GQ_MEAN=9.10;GQ_STDDEV=7.25;InbreedingCoeff=-0.0790;MLEAC=21;MLEAF=5.722e-03;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=152;QD=14.24;ReadPosRankSum=0.731
    20  62568   .   T   C   109.89  .   AC=1;AF=2.710e-04;AN=3690;BaseQRankSum=0.956;DP=5929;FS=0.000;GQ_MEAN=8.76;GQ_STDDEV=5.79;InbreedingCoeff=-0.0876;MLEAC=1;MLEAF=2.710e-04;MQ=57.05;MQ0=0;MQRankSum=-3.220e-01;NCC=142;QD=13.74;ReadPosRankSum=-1.517e+00
    20  62731   .   C   A   2858.35 .   AC=32;AF=8.584e-03;AN=3728;BaseQRankSum=-7.360e-01;DP=6538;FS=1.105;GQ_MEAN=10.32;GQ_STDDEV=10.74;InbreedingCoeff=-0.0813;MLEAC=38;MLEAF=0.010;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=123;QD=14.22;ReadPosRankSum=0.720
    20  62739   .   T   C   1009.92 .   AC=15;AF=4.039e-03;AN=3714;BaseQRankSum=1.03;DP=6416;FS=0.000;GQ_MEAN=9.73;GQ_STDDEV=8.64;InbreedingCoeff=-0.0746;MLEAC=16;MLEAF=4.308e-03;MQ=60.00;MQ0=0;MQRankSum=0.633;NCC=130;QD=12.78;ReadPosRankSum=0.358
    20  62783   .   A   G   49.85   .   AC=1;AF=2.698e-04;AN=3706;BaseQRankSum=0.727;DP=6040;FS=0.000;GQ_MEAN=8.70;GQ_STDDEV=5.37;InbreedingCoeff=-0.0881;MLEAC=1;MLEAF=2.698e-04;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=134;QD=12.46;ReadPosRankSum=0.727
    20  62813   .   T   C   763.24  .   AC=10;AF=2.687e-03;AN=3722;BaseQRankSum=1.92;DP=6087;FS=0.000;GQ_MEAN=8.94;GQ_STDDEV=7.26;InbreedingCoeff=-0.0848;MLEAC=11;MLEAF=2.955e-03;MQ=60.00;MQ0=0;MQRankSum=0.421;NCC=126;QD=14.97;ReadPosRankSum=0.736
    20  62946   .   T   A   209.43  .   AC=4;AF=1.076e-03;AN=3718;BaseQRankSum=0.550;DP=6703;FS=0.000;GQ_MEAN=9.32;GQ_STDDEV=6.05;InbreedingCoeff=-0.0815;MLEAC=5;MLEAF=1.345e-03;MQ=60.00;MQ0=0;MQRankSum=1.23;NCC=128;QD=11.02;ReadPosRankSum=-5.500e-01
    20  62975   .   TA  T,TAA   103.36  .   AC=18,10;AF=4.878e-03,2.710e-03;AN=3690;BaseQRankSum=0.00;DP=7060;FS=0.000;GQ_MEAN=7.93;GQ_STDDEV=6.52;InbreedingCoeff=0.0742;MLEAC=13,6;MLEAF=3.523e-03,1.626e-03;MQ=60.00;MQ0=0;MQRankSum=0.736;NCC=142;QD=0.65;ReadPosRankSum=0.00
    20  63054   .   A   G   1770.35 .   AC=19;AF=5.003e-03;AN=3798;BaseQRankSum=0.731;DP=7061;FS=0.714;GQ_MEAN=10.43;GQ_STDDEV=9.33;InbreedingCoeff=-0.0724;MLEAC=20;MLEAF=5.266e-03;MQ=60.00;MQ0=0;MQRankSum=0.421;NCC=88;QD=15.26;ReadPosRankSum=0.727
    20  63231   .   T   G   28037.76    .   AC=347;AF=0.095;AN=3652;BaseQRankSum=0.922;DP=6766;FS=0.000;GQ_MEAN=15.24;GQ_STDDEV=21.07;InbreedingCoeff=0.0068;MLEAC=395;MLEAF=0.108;MQ=60.00;MQ0=0;MQRankSum=0.296;NCC=161;QD=18.12;ReadPosRankSum=0.421
    20  63232   .   C   T   49.79   .   AC=1;AF=2.703e-04;AN=3700;BaseQRankSum=1.54;DP=6360;FS=3.979;GQ_MEAN=9.16;GQ_STDDEV=5.84;InbreedingCoeff=-0.0807;MLEAC=1;MLEAF=2.703e-04;MQ=56.04;MQ0=0;MQRankSum=0.988;NCC=137;QD=8.30;ReadPosRankSum=0.988
    20  63244   .   A   C   11995.31    .   AC=155;AF=0.042;AN=3680;BaseQRankSum=0.00;DP=6330;FS=0.529;GQ_MEAN=12.08;GQ_STDDEV=16.78;InbreedingCoeff=-0.0489;MLEAC=178;MLEAF=0.048;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=147;QD=15.80;ReadPosRankSum=0.406
    20  63257   .   ATTT    A   137.82  .   AC=1;AF=2.691e-04;AN=3716;BaseQRankSum=-5.290e-01;DP=6128;FS=2.398;GQ_MEAN=8.95;GQ_STDDEV=6.91;InbreedingCoeff=-0.0842;MLEAC=1;MLEAF=2.691e-04;MQ=53.95;MQ0=0;MQRankSum=-1.134e+00;NCC=129;QD=11.48;ReadPosRankSum=0.380
    20  63360   .   C   T   1645.11 .   AC=22;AF=6.027e-03;AN=3650;BaseQRankSum=0.550;DP=5941;FS=6.269;GQ_MEAN=9.32;GQ_STDDEV=8.81;InbreedingCoeff=-0.0818;MLEAC=25;MLEAF=6.849e-03;MQ=60.00;MQ0=0;MQRankSum=0.406;NCC=162;QD=15.09;ReadPosRankSum=0.736
    20  63426   .   G   T   2348.40 .   AC=37;AF=0.010;AN=3636;BaseQRankSum=0.731;DP=5938;FS=0.000;GQ_MEAN=9.56;GQ_STDDEV=8.68;InbreedingCoeff=-0.0670;MLEAC=40;MLEAF=0.011;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=169;QD=14.59;ReadPosRankSum=0.720
    20  63452   .   C   G   8089.77 .   AC=101;AF=0.028;AN=3640;BaseQRankSum=0.421;DP=6035;FS=1.788;GQ_MEAN=10.75;GQ_STDDEV=13.44;InbreedingCoeff=-0.0431;MLEAC=113;MLEAF=0.031;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=167;QD=17.66;ReadPosRankSum=0.727
    20  63459   .   C   T   596.50  .   AC=6;AF=1.640e-03;AN=3658;BaseQRankSum=0.747;DP=5933;FS=6.522;GQ_MEAN=8.92;GQ_STDDEV=6.46;InbreedingCoeff=-0.0878;MLEAC=7;MLEAF=1.914e-03;MQ=60.00;MQ0=0;MQRankSum=-7.470e-01;NCC=158;QD=13.56;ReadPosRankSum=0.747
    20  63468   .   G   A   15.04   .   AC=1;AF=2.735e-04;AN=3656;BaseQRankSum=-1.589e+00;DP=5867;FS=3.332;GQ_MEAN=8.75;GQ_STDDEV=5.54;InbreedingCoeff=-0.0857;MLEAC=1;MLEAF=2.735e-04;MQ=60.00;MQ0=0;MQRankSum=0.572;NCC=159;QD=1.88;ReadPosRankSum=-1.067e+00
    20  63477   .   G   A   17.93   .   AC=1;AF=2.714e-04;AN=3684;BaseQRankSum=-1.495e+00;DP=5937;FS=0.000;GQ_MEAN=8.81;GQ_STDDEV=5.67;InbreedingCoeff=-0.0859;MLEAC=1;MLEAF=2.714e-04;MQ=60.00;MQ0=0;MQRankSum=-9.220e-01;NCC=145;QD=2.56;ReadPosRankSum=-1.495e+00
    20  63666   .   C   G   225.28  .   AC=3;AF=8.174e-04;AN=3670;BaseQRankSum=0.727;DP=6064;FS=7.259;GQ_MEAN=9.05;GQ_STDDEV=6.34;InbreedingCoeff=-0.0843;MLEAC=3;MLEAF=8.174e-04;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=152;QD=12.52;ReadPosRankSum=0.406
    20  63733   .   C   T   34868.96    .   AC=421;AF=0.120;AN=3522;BaseQRankSum=0.736;DP=6592;FS=0.533;GQ_MEAN=16.33;GQ_STDDEV=23.67;InbreedingCoeff=0.0066;MLEAC=491;MLEAF=0.139;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=226;QD=18.48;ReadPosRankSum=0.406
    20  63799   .   C   T   151865.20   .   AC=1818;AF=0.544;AN=3342;BaseQRankSum=0.358;DP=7441;FS=0.671;GQ_MEAN=21.90;GQ_STDDEV=28.61;InbreedingCoeff=0.1988;MLEAC=2046;MLEAF=0.612;MQ=60.00;MQ0=0;MQRankSum=0.358;NCC=316;QD=28.66;ReadPosRankSum=0.421
    20  63805   .   T   G   18.82   .   AC=1;AF=2.820e-04;AN=3546;BaseQRankSum=0.804;DP=6338;FS=0.000;GQ_MEAN=9.50;GQ_STDDEV=5.66;InbreedingCoeff=-0.0742;MLEAC=1;MLEAF=2.820e-04;MQ=60.00;MQ0=0;MQRankSum=0.804;NCC=214;QD=9.41;ReadPosRankSum=0.804
    20  63808   .   G   C   39657.62    .   AC=475;AF=0.137;AN=3460;BaseQRankSum=0.727;DP=6341;FS=1.149;GQ_MEAN=16.23;GQ_STDDEV=26.60;InbreedingCoeff=0.1598;MLEAC=515;MLEAF=0.149;MQ=60.00;MQ0=0;MQRankSum=0.684;NCC=257;QD=24.21;ReadPosRankSum=0.406
    20  63897   .   G   A   119.98  .   AC=4;AF=1.136e-03;AN=3522;DP=5160;FS=0.000;GQ_MEAN=7.91;GQ_STDDEV=4.80;InbreedingCoeff=-0.0834;MLEAC=2;MLEAF=5.679e-04;MQ=60.00;MQ0=0;NCC=226;QD=20.00
    20  63967   .   A   G   7178.05 .   AC=91;AF=0.025;AN=3594;BaseQRankSum=0.736;DP=5730;FS=2.513;GQ_MEAN=10.02;GQ_STDDEV=12.96;InbreedingCoeff=-0.0546;MLEAC=104;MLEAF=0.029;MQ=60.00;MQ0=0;MQRankSum=0.198;NCC=190;QD=17.25;ReadPosRankSum=0.406
    20  63971   .   T   C   32.53   .   AC=2;AF=5.543e-04;AN=3608;BaseQRankSum=-3.580e-01;DP=5585;FS=0.000;GQ_MEAN=8.41;GQ_STDDEV=5.27;InbreedingCoeff=-0.0910;MLEAC=2;MLEAF=5.543e-04;MQ=56.04;MQ0=0;MQRankSum=0.736;NCC=183;QD=4.07;ReadPosRankSum=0.736
    20  64016   .   G   A   371.80  .   AC=6;AF=1.636e-03;AN=3668;BaseQRankSum=-7.200e-01;DP=5667;FS=5.188;GQ_MEAN=8.45;GQ_STDDEV=5.66;InbreedingCoeff=-0.0919;MLEAC=7;MLEAF=1.908e-03;MQ=60.00;MQ0=0;MQRankSum=0.00;NCC=153;QD=11.27;ReadPosRankSum=0.727
    20  64062   .   G   A   53.30   .   AC=2;AF=5.423e-04;AN=3688;BaseQRankSum=1.03;DP=5874;FS=3.136;GQ_MEAN=8.70;GQ_STDDEV=5.64;InbreedingCoeff=-0.0880;MLEAC=2;MLEAF=5.423e-04;MQ=60.00;MQ0=0;MQRankSum=0.00;NCC=143;QD=7.61;ReadPosRankSum=1.03
    20  64139   .   G   T   444.40  .   AC=8;AF=2.189e-03;AN=3654;BaseQRankSum=1.50;DP=6329;FS=1.177;GQ_MEAN=9.25;GQ_STDDEV=7.11;InbreedingCoeff=-0.0807;MLEAC=8;MLEAF=2.189e-03;MQ=60.00;MQ0=0;MQRankSum=0.406;NCC=160;QD=10.33;ReadPosRankSum=0.406
    20  64150   .   C   A   12602.88    .   AC=181;AF=0.050;AN=3634;BaseQRankSum=0.358;DP=6672;FS=1.106;GQ_MEAN=12.33;GQ_STDDEV=17.35;InbreedingCoeff=-0.0130;MLEAC=205;MLEAF=0.056;MQ=60.00;MQ0=0;MQRankSum=0.406;NCC=170;QD=15.48;ReadPosRankSum=0.572
    20  64223   .   AT  ATT,A   1178.11 .   AC=43,62;AF=0.014,0.020;AN=3120;BaseQRankSum=0.731;DP=6794;FS=0.906;GQ_MEAN=4.62;GQ_STDDEV=5.96;InbreedingCoeff=0.1432;MLEAC=41,69;MLEAF=0.013,0.022;MQ=60.00;MQ0=0;MQRankSum=0.727;NCC=427;QD=4.13;ReadPosRankSum=0.727
    20  64277   .   A   G   86.30   .   AC=1;AF=2.859e-04;AN=3498;BaseQRankSum=0.720;DP=5759;FS=0.000;GQ_MEAN=8.43;GQ_STDDEV=5.19;InbreedingCoeff=-0.0791;MLEAC=1;MLEAF=2.859e-04;MQ=56.04;MQ0=0;MQRankSum=0.00;NCC=238;QD=14.38;ReadPosRankSum=0.00
    20  64323   .   C   T   198.23  .   AC=5;AF=1.421e-03;AN=3518;BaseQRankSum=0.736;DP=5365;FS=0.000;GQ_MEAN=8.08;GQ_STDDEV=5.63;InbreedingCoeff=-0.0864;MLEAC=5;MLEAF=1.421e-03;MQ=60.00;MQ0=0;MQRankSum=0.00;NCC=228;QD=8.62;ReadPosRankSum=-7.200e-01
    20  64346   .   G   A   129.20  .   AC=3;AF=8.616e-04;AN=3482;BaseQRankSum=-7.200e-01;DP=5075;FS=0.000;GQ_MEAN=7.74;GQ_STDDEV=4.85;InbreedingCoeff=-0.0913;MLEAC=2;MLEAF=5.744e-04;MQ=60.00;MQ0=0;MQRankSum=0.720;NCC=246;QD=16.15;ReadPosRankSum=0.00
    20  64382   .   G   A   20.64   .   AC=1;AF=2.894e-04;AN=3456;BaseQRankSum=0.736;DP=5004;FS=0.000;GQ_MEAN=7.47;GQ_STDDEV=4.55;InbreedingCoeff=-0.1009;MLEAC=1;MLEAF=2.894e-04;MQ=60.00;MQ0=0;MQRankSum=0.736;NCC=259;QD=6.88;ReadPosRankSum=0.736
    20  64433   .   G   A   109.30  .   AC=1;AF=2.817e-04;AN=3550;BaseQRankSum=1.52;DP=5292;FS=3.332;GQ_MEAN=7.97;GQ_STDDEV=5.19;InbreedingCoeff=-0.0937;MLEAC=1;MLEAF=2.817e-04;MQ=57.05;MQ0=0;MQRankSum=-3.220e-01;NCC=212;QD=13.66;ReadPosRankSum=-6.840e-01
    20  64539   .   A   T   169.55  .   AC=2;AF=5.501e-04;AN=3636;BaseQRankSum=0.736;DP=5571;FS=6.818;GQ_MEAN=8.27;GQ_STDDEV=5.40;InbreedingCoeff=-0.0889;MLEAC=2;MLEAF=5.501e-04;MQ=60.00;MQ0=0;MQRankSum=0.406;NCC=169;QD=14.13;ReadPosRankSum=0.406
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen‌

    Hi,

    Thanks for that. I see position 20:68749 is present in first vcf, but not in the second vcf. Can you post if that position is present in the $alleles vcf file?

    This is possibly a bug that we know about, but I will check and get back to you. The issue is that GGA mode drops some alleles which may lead to some missed calls.

    Thanks,
    Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    20:68749:T:C is present in the $allelesVCF file. Here a list of variants that fail to be called in GGA mode with HC3.3:

    20  60765   CTGTT   C
    20  60807   C   T
    20  60808   G   A
    20  61270   A   C
    20  61271   T   A
    20  61272   C   A
    20  61279   C   T
    20  61375   C   A
    20  64433   G   A
    20  65595   T   A
    20  65604   T   C
    20  67950   C   A
    20  68720   C   T
    20  68749   T   C
    20  69668   TAC T
    

    I just tested with UG3.3. It calls all sites, but it's classified as a defunct tool. I will not need this functionality for at least 3 months, so I await your answer very patiently. Thank you @Sheila and the rest of the team.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen‌

    Hi,

    It is time for me to request snippets from you! If you can submit them, instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited December 2014

    Hi @Sheila,

    The $allelesVCF file is small enough that I can just attach it. And you can download the remaining files from public sites and your own GATK bundle. Everything build 37. I have removed sample columns in the $allelesVCF file and used a bam file different from the one above. I have checked that this file still fails for HC3.3 and succeeds for UG3.3.

    Here commands to reproduce:

     url=ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/data/NA12878/alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam
    wget $url
    wget $url.bai
    bam=$(basename $url)
    allelesVCF=alleles.vcf.gz
    ref=hs37d5.fa
    dbSNP=dbsnp_138.vcf.gz
    java7=jre1.7.0_45/bin/java
    jar=GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar
    for T in UnifiedGenotyper HaplotypeCaller; do
     out=$T.GGA.vcf.gz
     $java7 -Djava.io.tmpdir=tmp -Xmx3900m \
      -jar $jar \
      --analysis_type $T \
      --reference_sequence $ref \
      --input_file $bam \
      --dbsnp $dbSNP \
      --out $out \
      -stand_call_conf 0 \
      -stand_emit_conf 0 \
      -A Coverage -A FisherStrand -A HaplotypeScore -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest \
      -L 20:60000-70000 \
      -L $allelesVCF \
      --interval_set_rule INTERSECTION \
      --output_mode EMIT_ALL_SITES \
      --genotyping_mode GENOTYPE_GIVEN_ALLELES \
      --alleles $allelesVCF \
    
     done
    

    I am happy to tar the files and upload them, if you prefer. Here are the sites not called by HC3.3 for NA12878:

    20  60765   CTGTT   C
    20  60807   C   T
    20  60808   G   A
    20  61375   C   A
    20  64433   G   A
    20  65595   T   A
    20  65604   T   C
    20  67950   C   A
    20  68660   C   A
    20  68667   T   G
    20  68720   C   T
    20  68749   T   C
    20  69668   TAC T
    
  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    I also tried adding the --allSitePLs flag, but the sites still fail to be called.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen‌

    Hi,

    This can be fixed by using --interval_set_rule UNION instead of --interval_set_rule INTERSECTION. I am looking into why INTERSECTION does not work.

    -Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited January 2015

    @Sheila said:

    use -isr INTERSECTION with -ip 100. This allows all sites to show up.

    --interval_padding 100 did indeed do the trick. I'm not entirely sure why/how it works. But it does. Thank you so much!

    A note to others reading this thread. --output_mode EMIT_ALL_SITES overrides -stand_emit_conf, which therefore doesn't have to be set to zero. -stand_call_conf still has to be zero to avoid records with FILTER set to LowQual.

  • Hi,

    I seem to have exactly the same problem with v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41

    Thanks @tommycarstensen @Sheila , "--interval_padding 100" seems to fix the problem, however, it makes the whole MUCH slower.

    A reasonable solution is to run the second iteration on the non-genotyped sites, however, it would be great to have this bug fixed at some moment.

    Thanks!
    Vasily.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @кфьутылн @tommycarstensen
    Hi Vasily and Tommy,

    Tommy, can you confirm you had a really large slowdown in runtime when adding -ip 100.

    Vasily, unfortunately, it will be hard to convince the developers to work on this if there is a workaround (even if it slows things down tremendously).

    -Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @Sheila Nothing noticeable, but I never timed it though. @кфьутылн can you try adding --intervals to your command to see if that speeds up things? Can you post your command, if it doesn't?

  • @tommycarstensen

    Hi Tommy, I set up both with the same argument, similar to your example:

    -L $allelesVCF \
    --alleles $allelesVCF \
    --genotyping_mode GENOTYPE_GIVEN_ALLELES \

  • @Sheila

    Hi Sheila, I think this affects the core functionality of HaplotypeCaller. I am genotyping millions of variants in a whole-genome sequencing study with 719 samples and 1 variant per each ~185 bp. Variants that are close to others (approximately 20% in my dataset) are not genotyped. It takes less than 1hr to process each chunk with remaining ~800 variants which is very good. With -isr INTERSECTION and -ip 100 it is slower by ~two orders of magnitude and thus makes GATK rather unusable. So the workaround is fine only for small sets of variants.

    Thanks,
    Vasily.

  • кфьутылнкфьутылн Member
    edited July 2015

    @Geraldine_VdAuwera

    Hi Geraldine, thanks for the good news. Let me ask then, since you were not sure about usefulness of GENOTYPE_GIVEN_ALLELES mode: is there another proper way to determine allele genotypes in a sample set? Namely: I discovered a variant in some samples of a set. In the other samples, the genotypes for this variant are therefore either 0/0 or ./. (low coverage). For many applications, e.g. statistical genetics, it is important to discriminate between these two cases, so a round of genotyping of an already known variant in all samples is needed. This will generate a matrix K (# variants) x N (# samples) where each cell is one of four: ./., 0/0, 0/1, 1/1. The same should work for indels, too. Is UnifiedHaplotyper in the GENOTYPE_GIVEN_ALLELES mode the proper tool to do this?

    Thanks,
    Vasily.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @кфьутылн
    Hi Vasily,

    Have you had a look at our GVCF workflow? https://www.broadinstitute.org/gatk/guide/article?id=3893 It produces the exact matrix you are looking for :smile:

    -Sheila

  • @Sheila

    Hi Sheila, thank you very much for pointing my attention to the GVCF workflow. It is great, however, I am not sure it is applicable for my project as it will require redo-ing the variant discovery step from scratch.
    Best,
    Vasily.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    The beauty of HC is that you can keep adding samples to your set and still get a non sparse matrix. If your data allows it, then I highly recommend the workflow suggested by Sheila.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Also, if your hesitation is because you are only interested in specific sites, know that you can run on only those sites using -L your_sites.vcf and -ip 100.

  • @Geraldine_VdAuwera

    Hi Geraldine,
    not sure if I understood your comment properly. I am indeed interested in specific (yet dense) sites genomewide, so by default I run

     --analysis_type HaplotypeCaller \
     --intervals $allelesVCF \
     --alleles $allelesVCF \
     --genotyping_mode GENOTYPE_GIVEN_ALLELES 
    

    with relatively small chunks in each $allelesVCF. It is very fast (say, <1hr for 1000 sites and ~700 samples), however, sites that are close to other sites upstream (I think, within 100bp) are skipped. If I add

    --interval_padding 100 \

    or

    --interval_padding 100 \
    -isr INTERSECTION \
    

    this is fixed and all sites are processed, however, the analysis runs slower by orders of magnitude. A workaround I use is to spread the target sites into chunks so that no sites in a chunk are in proximity.

    Thanks,
    Vasily.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @кфьутылн Vasily I was just thinking while running GGA myself... Is your actual run time slower by orders of magnitude or is it just the estimate printed to stderr, which is large? I'm running UG in GGA mode with --intervals $allelesVCF --alleles $allelesVCF -isr INTERSECTION and the time estimate is weeks/days, but it will likely finish in less than one hour for most of my 1Mbp fragments. It could of course be different for HC. I thought I would let you know. Hope things work out for you in the end.

  • @tommycarstensen

    Hi Tommy, that is a good point - it seems I noticed remarakble actual slowdown, however, I need to doublecheck keeping your experience in mind. Thanks!
    Vasily.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @Geraldine_VdAuwera said:
    Apologies for the delays in dealing with the GENOTYPE_GIVEN_ALLELES issues. For a while there we weren't sure we were going to keep supporting it, and so we were reluctant to invest any effort in fixing it. But after some discussions, our research collaborators have convinced us that it is an important feature -- so the good news is that we will be able to devote some resources to making it work better. When that might happen, I don't know yet, but I'll try to push to get this addressed soon.

    @Geraldine_VdAuwera I got excited when I read this. I have a tiny feature request (no drop is to blame for the flood). Would it be possible to have GGA override the default -maxAltAlleles of 6 of UG and HC? Would it be possible to have this be infinite, if set to for example 0 or -1? These are just suggestions. A value of 99 will work just fine in my use case.

    Issue · Github
    by Sheila

    Issue Number
    1110
    State
    closed
    Last Updated
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen
    Hi Tommy,

    I put in a feature request. :smile:

    -Sheila

  • кфьутылнкфьутылн Member
    edited August 2015

    @tommycarstensen

    the > @tommycarstensen said:

    @кфьутылн Vasily I was just thinking while running GGA myself... Is your actual run time slower by orders of magnitude or is it just the estimate printed to stderr, which is large? I'm running UG in GGA mode with --intervals $allelesVCF --alleles $allelesVCF -isr INTERSECTION and the time estimate is weeks/days, but it will likely finish in less than one hour for most of my 1Mbp fragments. It could of course be different for HC. I thought I would let you know. Hope things work out for you in the end.

    Hi Tommy, you were right: it starts out really slowly with -isr INTERSECTION, but then speeds up, so the running time seems to be only ~2X longer: estimated 48 hrs for 1000 sites X 719 samples, whereas without -isr INTERSECTION it is <24hrs.

    Thanks!
    V.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @Geraldine_VdAuwera @Sheila No need to feel sorry. I wouldn't have bothered myself if I was the dev. It's just that on the rare occassion I merge datasets from multiple sources, and I don't always know, what is the maximum number of alternate alleles of the separate and the merged dataset. I'm just lazy and forgetful and want to avoid errors caused by these two excellent attributes of mine. Thanks.

    P.S. It would be great, if UG (and HC) would consider multiple records at the same position, but I fully understand, if it's my responsibility to merge multiallelic variants prior to GGA calling. Again I'm just lazy and forgetful. Here the warning:

    #WARN  12:52:37,523 GenotypingGivenAllelesUtils - Multiple valid VCF records detected in the alleles input file at site 20:60004811, only considering the first record
    
  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited August 2015

    @Sheila @Geraldine_VdAuwera I'm not sure, if I should start a new thread. I'm having problems with GGA using UG. I expect a variant (20:259627:C:G) to be called, but instead I get ./. GTs. This happens to 1 in ~500 biallelic SNPs. Below is the output from samtools tview after merging a random subset of the ~5000 bams being called from. This error can also be generated with the 1000G data.

    Here the truncated UG output:

    20  259627  rs190796254 C   G   .   .   DB;DP=3070;Dels=0.13;FS=14.829;HaplotypeScore=0;MQ=29.59;MQ0=15;SOR=0.236;VariantType=SNP   GT  ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
    

    Here the samtools tview output:

          259631       259641                 259651          259661
    C*GG*AGG*GCGGGAC*G*G*****C**G***GGA*GG*GC*GGG**ACG*G**A*GGG*A****CGGAGGGA*GG*GC*
    S .. R.. .S...RS . .     R  .   ..R .. .S ...  .S. .  R ... R    S..R...R .. .R
    ,*,,*,,,*  ....G*.*.*****A**.***...*..*..*...**...*.**.*...*.****G.......*..*..*
    **..*...*.G...   ,*,*****g**,***,,g*,,*,g*,,,**,***a**,*,,,*c****g,,g,,,g*,,*,,*
    .*..*...*......G*.*.*****G**.***..T*..*.G*  .**...*.**.*...*.****G......C*..*.A*
    g*,,*g,,*,,,,,c,*,*c*****,**,***,,,*,,*,,*,,,**,,,*,**,*,         ......C*..*.A*
    .*..*...*.....CG*.*.*****T**.***..G*..*..*.    ...*.**.*...*.****........*..*..*
    ,*,,*,a,*,,,,,,,*,*,*****,**a***,,,*,,*,,*  .**...*.**.*...*.****G......C*..*.A*
    **,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,,g,,,**,,,*,**,*,,,*,****g,,,,,,  ..*.A*
    g*,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,,*       .*.**.*...*.****........*..*.A*
    G*..*...*.A....G*.*.*****.**.***..G*..*.G*...****.*.**G*...*******......**.C*..*
    G*..*...*.A...GG*.*.*****.**.***...*..*.  ,,,**,a,*,**,*,,,*g****t,,g,,,g*,,*,,*
    G*..*...*.A....G*.*.*****.**.***...*..*.A*...**...*.**   ..*.****........*..*.A*
    .*..*...*.A....G*.*.*****A**.***...*..*..*...***..*.**.* ..*.****........*..*.A*
    .*..*...*.G...CG*.*.*****G**.***..G*..*..*...**G..*.****...*G****...G...G*..*
    .*..*G..*.G....G*.*.*****G**.***..G*..*..*...**.G.*.**G*...*******..G...G*..*.G*
    .*..*...*.......*.*.*****A**.***...*C.*.A*...**.G.*.**.*...*      ..C....*..*..*
    .*..*...*TT.....*.*.ATGGA.******...*..*..*...***..*.**.* ,,*g****g,,g,,,g*,,*,,*
    .*..*...*TT.....*.*.A****T**.GAA...*..*..*...***..*.**.*..        .......*..*.A*
    .*..*...*.T.....*.*.*****A**.***...*C.*.         .*.**.*...*.****........*..*.A*
    G*..*...*.A....G*.*.*****A**.***...*..*TT*...**G..*.**.*...*.****.......  ,,*,,*
    g*,,*g,,*,g,,,c,g,*,*****g**,***,,g*,,*,,*,,,**,,,*,**,*,,,*,****,,,, ...*..*.A*
    g*,,*g,,*,,,,,,,*,*,*****,**,***,,,*,,*,,*,,,**,      .*...*.****G......C*..*.A*
    **,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,g*,,,**gg,*,**,*,,,*,****,,,,, ..*..*.A*
    ,*,,**,,*,*,,,***,*,*****g**,***,,g*,,*,g*,,,**,,,*,**,*,,,*,****,,,,,,,,*,,*
    g*,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,,*,,,**,,,*,**,*,,,*,**** ,,t,,,g*,,*,g*
    g*,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,,*,,,**,,,*,**,*,,,*,**** ,,g,,,,*,,****
    ,*,,*,,,*,,,,,,,*,*,*****,**,***,,,*,     ...**...*.**.*...*.****G.......*..*.A*
    **,,*g,,*,g,,,***,*,*****,**,***,,,*,,*,g*,,,**,,,*,**,*,,,*,****,,,,,,,,*,,*,,*
    g*,,*g,,*,,,,,,,*,*,*****,**,***,,,*,,*,,*,,,**,,,*,**,*,,        .......*..*.A*
    **,,*g,,*,g,,,***,*,*****g**,***,,g*,,*,g*,,,**gg,*,**,*,,,*,****g,,,,,a,*,,*,,*
    g*,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,,*,,,**,,,*,**,*,,,*,****,,,,,,,,*,,*,,*
    g*,,*g,,*,g,,,c,*,*,*****,**,***,,,*,,*,,*,,,**,,,*     ...*.****G.......*..*..*
    g*,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,g*,,,**gg,*,**g*,,,*,****,,,,,,,,*,,*,,*
    **,,*g,,*,g,,,***,*,*****g**,***,,,*,,*,g*,,,**gg,*,**g*,,,*,****,,,g,,,,*,,*,,*
    G*..*...*.G....G*.*.*****A**.***..G*..*.G*...**.G.*.**G*...*******..G...G*..*C.*
    G*..*C..*....    ,*,*****g**,***,,g*,,*,g*,,,**g,,c,**g*,,,*******,,g,,,g*,,*,g*
    G*..*C..*.G....G*.*.*****.**.***..G*..*..*...**.G.*.**G*...*.*****..C...G*..*.G*
    .*..*...*...*****.*.*****.**.***.A.G..*..*...**  ,*,**g*,,,*g****,,,*,,,g*,,*,g*
    .*..*...*.....   ,*,*****g**,***,,,*a,*,,*,,,****,*,**,*,,,*******,,,,,,,g,,*,g*
    .*..*...*.......*.*.*****.**.***...*..*..*...**...*.**G*...*.****........*..*..*
    **,,*g,,*,t,,,*g*,*,*****t**,***,,**,,*,g*,,,**,,**,**,*,,,a,****,,,g,,t,*,,*,,*
    ,**,*t,,*,,,,,***,*,********,***,,g*,,*,,*,,,**,,,*,**,*,,,*      .......*..*.A*
    ***,*,,,*,g,a,,a*,*,********,***,,**,,*,,*,,,**,,,*,**,*,,,*,****,,,,,,,,* .*..*
    G*..*...*.A.....****************...*..*TT*...**...*.**.*...*.****.......C*..*..*
    .*..*G..*.....**************.***...*..*..*.      ,*,**,*,,,*******,,,,,,t*,,****
    G*..*...*.......*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*..*
    g*,,*c,,*,,,t,***,*,********,***,,g*,,*,g*,,c**,,,*a**,*,,t*c****,,,,,,a,*,,*,,*
    .*..*...*.....   ,*,*****g**,***,,g*,,*,g*,,,**gg,*,**g*,,,*g****g,,g,,,g*,,*,g*
    .*..*...*.AC...G*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*..*
    **..*...*.A.....****************...*..*TT*...**...*.**.*...*.****........*..*.A*
    G*..*G..*.G...G.*.*.*****G**.***..G*..*.G*...**GG.*.**G*...*G****G..C....*..*.G*
    g*,,*g,,*,g,,,tt*,*,*****g**,***,,g*,,*,t*,,,**,,,*,**,*,,,*,****g,, ,,,,*,,*,g*
    g*,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,,*,,,**g,,*,**g*,,,*,****,,,,,,,,*,,*,,*
    .*..*...*.......*.*.*****.**.***..T*..*.A*...**...*.**.*...*.****...G....*..*..*
    **,,*g,,*,g,,,gg*,*,*****g**,***,,c*,,*,g*,,,**,g,*,**g*,,,*g****g,,g,,,g*,,*,,*
    .*..*...*.AC...G*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*.A*
    G*..*...*.......*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*..*
    .G..*...*.......*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*..*
    ,*,,*,,,*ct,,,***,*,*****g**,***,,g*,,*,g*,,,cc,,,*,**g*,,,*      .......*..*.A*
    **..*...*.......*.*.*****G**.***...*..*..*...**.*.*.****...*******......G*..*..*
    .*..*...*.AC...G*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*..*
    G*..*...*.AC...G*.*.*****.**.***...*..*..*...**...*.**.*...*.****........*..*.A*
    G*..*...*..C...G*.*.*****A**.***...*..*..*...**...*.**.*...*C****G..C...C*..*.A*
    .*..*...*.......*.*.*****.**.***...*..*..*...**...*.**.*...*.****........*..*..*
    .*..*...*.......*.*.*****.**.***...*..*..*...**...*.**.*...*.****.......C*..*..*
    .*..*...*.AC...G*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*.G*
    G*..*...*.......*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*.**
    .*..*...*......*************.***...*..*.A*...**.G.*.**C*...*.****........*..*..*
    G*..*...*.......*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*..*
    g*,,*g,,*,g,,,gg*,*,*****g**,***,,g*,,*,g*,,,**ga,*,**g*,,,*******,,g,,,g*,,*,g*
    G*..*...*.......*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*..*
    .*..*...*.AC...G*.*.*****A**.***...*..*..*...**...*.**.*...*.****........*..*.G*
    
  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    It is possible that this is a false positive despite being present in 1000G phase 1 and 3. I looked at the original calls on the 1000G ftp site. It seems like a suspicious variant. Maybe UG is doing the right thing by not calling it.

    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/input_callsets/bc/ALL.chr20.bc.20130502.snps_indels_mnps_complex.sites.vcf.gz
    20 259627 . C G 0 lowqual AB=0;ABP=0;AC=1778;ACmle=492;AF=1;AFmle=0.276597;AFmle_ref=0.723403;AN=1778;AO=236;CIGAR=1X;DP=1546;DPB=1746;DPRA=1.38056;EPP=3.93042;EPPR=23.748;FIC=-0.0287319;GFmle=0.445121,0.554854,2.4311e-05;GTI=10;HWE=-0;HWEpval=0.00111507;LEN=1;MEANALT=1.4456;MQM=28.7119;MQMR=32.5025;NS=889;NUMALT=1;ODDS=1424.1;PAIRED=1;PAIREDR=0.935484;PAO=26;PQA=254;PQR=172;PRO=27.5;QA=6721;QR=16004;RO=589;RPP=13.6468;RPPR=19.5599;RUN=1;SAP=262.704;SRP=1213.46;TYPE=snp;XAI=0.0129804;XAM=0.0498777;XAS=0.0368973;XRI=0.0110271;XRM=0.0415191;XRS=0.0304921;technology.ILLUMINA=1
    
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20110613_chr20_consensus_snp_v2b/ALL.chr20.projectConsensus_v2b.20101123.snps.sites.vcf.gz
    20 259627 . C G 41.36 TruthSensitivityTranche99.90to100.00 AC=7;AF=0.219;ALTFWD=5;ALTREV=4;AN=32;BaseQRankSum=0.95;DP=40;Dels=0;FS=2;HRun=2;HaplotypeScore=0.6252;InbreedingCoeff=0.2642;MQ=88.14;MQ0=3;MQRankSum=1.115;QD=10.34;REFFWD=11;REFREV=6;ReadPosRankSum=-0.75;SB=-97.4;VQSLOD=0.6212;set=ALL1
    
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130624_phase3_chr20/si/ALL.chrom20.samtools_chr20_run.20130502.snps_indels.low_coverage.genotypes.vcf.gz
    20 259627 rs190796254 C G 328 FailSOM AC=39;AF_AFR=0.22561;AF_AMR=0.176796;AF_ASN=0.153846;AF_EUR=0.205805;AF_MAX=0.22561;AN=468;BQB=4.78368;DP=538;DP4=168,89,30,15;EDB=-3.63152;FiltScore=1.01337;HWE=0.380141;MDV=2;MQ=27;MQ0=18;MQB=0.891217;MSD=4;MSQ=29;PV0=1;PV1=1.1e-06;PV2=0.22;PV3=1;PV4=1,1.1e-06,0.22,1;QBD=1.08603;RPB=2.10405;SB=0.274862;SGB=-777.146;VDB=0.343881 GT:PL:DP:DV:SP:GQ
    
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130624_phase3_chr20/bc/ALL.chr20.bc.20130502.snps_indels_mnps_complex.integrated.sites.vcf.gz
    20 259627 . C G 0 lowqual AB=0;ABP=0;AC=1356;ACmle=296;AF=1;AFmle=0.217971;AFmle_ref=0.782029;AN=1356;AO=171;CIGAR=1X;DP=1015;DPB=1086;DPRA=1.29218;EPP=3.63253;EPPR=20.863;FIC=-0.0340233;GFmle=0.558869,0.441131,6.8781e-10;GTI=10;HWE=-0;HWEpval=0.00317971;LEN=1;MEANALT=1.12414;MQM=28.5205;MQMR=32.4312;NS=678;NUMALT=1;ODDS=1804.35;PAIRED=1;PAIREDR=0.936242;PAO=19.5;PQA=257.5;PQR=228.5;PRO=20;QA=4926;QR=16171;RO=596;RPP=7.59452;RPPR=16.1266;RUN=1;SAP=308.096;SRP=1220.22;TYPE=snp;XAI=0.0110482;XAM=0.0501641;XAS=0.039116;XRI=0.0110699;XRM=0.0417015;XRS=0.0306316;technology.ILLUMINA=1
    
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130624_phase3_chr20/bc/ALL.chr20.bc.20130502.snps_indels_mnps_complex.integrated.genotypes.vcf.gz
    20 259627 . C G 0 lowqual AB=0;ABP=0;AC=1356;ACmle=296;AF=1;AFmle=0.217971;AFmle_ref=0.782029;AN=1356;AO=171;CIGAR=1X;DP=1015;DPB=1086;DPRA=1.29218;EPP=3.63253;EPPR=20.863;FIC=-0.0340233;GFmle=0.558869,0.441131,6.8781e-10;GTI=10;HWE=-0;HWEpval=0.00317971;LEN=1;MEANALT=1.12414;MQM=28.5205;MQMR=32.4312;NS=678;NUMALT=1;ODDS=1804.35;PAIRED=1;PAIREDR=0.936242;PAO=19.5;PQA=257.5;PQR=228.5;PRO=20;QA=4926;QR=16171;RO=596;RPP=7.59452;RPPR=16.1266;RUN=1;SAP=308.096;SRP=1220.22;TYPE=snp;XAI=0.0110482;XAM=0.0501641;XAS=0.039116;XRI=0.0110699;XRM=0.0417015;XRS=0.0306316;technology.ILLUMINA=1 GT:GL
    
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130723_phase3_wg/bc/ALL.chr20.bc.20130502.snps_indels_mnps_complex.genotypes.vcf.gz
    20 259627 . C G 0 lowqual AC=1778;ACmle=492;AFmle=0.276597;CIGAR=1X;DP=1546;NS=889;TYPE=snp GT
    
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130202_phase3_chr20_calls/si_samtools/ALL.chr20.si_samtools.20120522_analysis.snps_indels.low_coverage.genotypes.vcf.gz
    20 259627 . C G 46.7 MinVQSLOD DP=120;PV4=0.44,0.0039,0.31,1;DP4=26,51,4,4;MQ=29;AC=7;ICF=-0.056;AN=132;HWE=1;MDV=1;PV1=0.0039;PV2=0.31;MSD=2;PV3=1;PV0=0.44;SB=0.7404;MSQ=13;HaplotypeScore=0;FS=6.711;MQ0=3;Dels=0.06;culprit=DP;VQSLOD=-14.7943;HRun=2 GT:PL:DP:DV:SP:GQ
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen
    Hi Tommy,

    Can you post the exact command you ran in GGA mode? Does the variant get called properly without GGA mode?

    -Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited August 2015

    @Sheila It gets called for some samples without GGA mode; i.e. -gt_mode DISCOVERY. It is called by samtools in 2000 Africans and it's part of the official 1000G phase 3 (and phase 1) release. That of course doesn't mean it can't be a false variant; i.e. the mapping quality is mostly below 30 and the bwa alignment could be wrong. Here are the Program Args:

    INFO  01:00:10,006 HelpFormatter - Program Args: --analysis_type UnifiedGenotyper --reference_sequence /lustre/scratch114/resources/ref/Homo_sapiens/1000Genomes_hs37d5/hs37d5.fa --num_cpu_threads_per_data_thread 1 --num_threads 1 --input_file lists/bams.list --intervals 20:200001-300000 --alleles ../pipeline_UG3.4/out_bt_concat_isec/union.vcf.gz --interval_set_rule INTERSECTION --intervals ../pipeline_UG3.4/out_bt_concat_isec/union.vcf.gz --dbsnp /lustre/scratch114/resources/variation/Homo_sapiens/grch37/dbsnp_138.vcf.gz --output_mode EMIT_ALL_SITES --out out_UnifiedGenotyper/20/3.vcf.gz -A Coverage -A FisherStrand -A StrandOddsRatio -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -A HaplotypeScore -A InbreedingCoeff -A StrandBiasBySample -A VariantType --genotype_likelihoods_model BOTH -gt_mode GENOTYPE_GIVEN_ALLELES --max_alternate_alleles 12 --sample_ploidy 2 -stand_call_conf 0 -stand_emit_conf 0
    
  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @Sheila It doesn't get called by HaplotypeCaller 3.2.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen
    Hi Tommy,

    Thanks. Well, if GATK variant callers are not calling the variant, it is obviously a false positive! :smiley:

    Are you simply reporting this to us for development sake, or do you want us to dig into the site more?

    -Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @Sheila I like your flawless logic about it being a FP :smile: I decided to discard the ~100k SNPs, which GATK UG in GGA mode was unable to call. So as always it's entirely up to you, whether you want to pursue this. I was recalling ~2500 Africans with 1000Gp3 using the union set of SNPs. I don't know how many of those sites I would have been able to call in each of the two sets in DISCOVERY mode. Oddly enough I was able to call the SNP from above (20:259627:C:G) using the following settings in DISCOVERY mode.

    -gt_mode DISCOVERY \
    -stand_call_conf 0 -stand_emit_conf 0 \
    --output_mode EMIT_ALL_SITES
    

    The GT distribution for the call 20:259627:C:A,G,T in the Africans and 1000Gp3 excluding the AFR super population was as follows:

       3249 ./.
        527 0/0
          1 0/1
        187 0/2
          3 1/1
          5 1/2
        502 2/2
          1 2/3
          2 3/3
    
  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    I forgot to mention that HC3.2 calls some very long indels in this region. @TechnicalVault thinks that's just what it is after local realignment. He thinks samtools and 1000G are wrong. In fact it is marked as lowqual, TruthSensitivityTranche99.90to100.00, FailSOM in 1000G as shown above. Here the HC3.2 calls:

    20  259582  .   CGGGACGGAGGGAGGGAGGGCGGGACGGAGGGAGGGAGGGAGGGACGGAGGGCGGGACGGCGGGAGGGCGGGACGGAGGGACGGAGGGAGGGCGGGACGGAGGGCGGGAGGGCGGGACGGAGGGAGGGAGGGAGGGAGGGACGGAGGGCGGGACGGA   AGGGACGGAGGGAGGGAGGGCGGGACGGAGGGAGGGAGGGAGGGACGGAGGGCGGGACGGCGGGAGGGCGGGACGGAGGGACGGAGGGAGGGCGGGACGGAGGGCGGGAGGGCGGGACGGAGGGAGGGAGGGAGGGAGGGACGGAGGGCGGGACGGA,C 396.08  .   AC=24,1;AF=0.062,0.002577;AN=388;BaseQRankSum=0.736;DP=761;FS=0;GQ_MEAN=3.96;GQ_STDDEV=4.38;InbreedingCoeff=0.3024;MLEAC=19,1;MLEAF=0.049,0.002577;MQ=60;MQ0=0;MQRankSum=0.736;NCC=1793;QD=16.5;ReadPosRankSum=0.804
    20  259595  .   GGGAGGGCGGGACGGAGGGAGGGAGGGAGGGACGGAGGGCGGGACGGCGGGAGGGCGGGACGGAGGGACGGAGGGAGGGCGGGACGGAGGGCGGGAGGGCGGGACGGAGGGAGGGAGGGAGGGAGGGAC   G   32.57   .   AC=4;AF=0.016;AN=248;DP=534;FS=0;GQ_MEAN=4.97;GQ_STDDEV=3.03;InbreedingCoeff=-0.0124;MLEAC=3;MLEAF=0.012;MQ=60;MQ0=0;NCC=1863;QD=0.38
    20  259607  .   CGGAGGGAGGGAGGGAGGGACGGAGGGCGGGACGGCGGGAGGGCGGGACGGAGGGACGGAGGGAGGGCGGGACGGAGGGCGGGAGGGCGGGACGGAG   C   51.16   .   AC=1;AF=0.01;AN=96;BaseQRankSum=0.736;DP=479;FS=0;GQ_MEAN=4;GQ_STDDEV=6.73;InbreedingCoeff=-0.0792;MLEAC=2;MLEAF=0.021;MQ=60;MQ0=0;MQRankSum=0.736;NCC=1939;QD=0.53
    

    I tried calling the position with HC3.4-46 just now and it actually succeeds:
    20 259627 rs190796254 C G 1182.18 . AC=79;AF=0.123;AN=640;BaseQRankSum=4.68;DB;DP=374;FS=0;MLEAC=79;MLEAF=0.123;MQ=34.8;MQRankSum=-1.281;QD=26.27;ReadPosRankSum=-4.116;SOR=3.695

    Except most GTs are still ./.:
    4157 ./.
    279 0/0
    3 0/1
    38 1/1

    I'm probably not going to waste more time on this SNP myself. I have deleted it and it will soon disappear from my short range radar.

  • TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

    I've just run the BAMs through HaplotypeCaller with --bamOut and these are the resulting BAMs. Given the strand bias and weirdness of the region I'm inclined to agree it's a mapping issue causing a false positive.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen @TechnicalVault
    Hi,

    Yes, that is good news. As for the call in the latest version of Haplotype Caller, I suspect it will be filtered out in VQSR.

    -Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    Sorry about the slow answer. Things been busy. It is indeed filtered by VR. Later I decided to filter it prior to VR.

    20      259626  .       N       <VQSR>  .       .       END=259626;VQSLOD=-9.0474;culprit=DP
    20      259627  .       N       <VQSR>  .       .       END=259627;VQSLOD=-11.9029;culprit=FS
    20      259628  .       N       <VQSR>  .       .       END=259628;VQSLOD=-10.05;culprit=DP
    20      259629  .       N       <VQSR>  .       .       END=259629;VQSLOD=-10.6543;culprit=DP
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tommycarstensen
    Hi Tommy,

    Thanks for confirming it is filtered.

    -Sheila

Sign In or Register to comment.