Bug: Haplotype Caller Odd VCF files

aeonsimaeonsim Posts: 65Member ✭✭
edited April 2014 in Ask the GATK team

Hi I've just been having a go with the new Haplotype Caller method and I've getting some odd or malformed lines in the VCF file for example:

For example the format line has declared we should have 4 fields for each Sample record but instead we have samples with 2 records.

Two examples are shown here:

See Block 1
GT:DP:GQ:PL 1/1:.:3:32,3,0 ./.:0

Or where the format block declares 5 fields and we get 3 instead:

GT:AD:DP:GQ:PL  0/0:1,1,0,0,0,0,0,0,0:2:3:0,3,24,3,24,24        ./.:.:0 0/0:1,0,0,0,0,0,0,0,0:1:1:0,1,40,3,41,43        ./.:.:3

Full blocks, Block 1

chr1    3489714 .       G       A       9667.23 .       AC=154;AF=1.00;AN=154;DP=0;FS=0.000;InbreedingCoeff=-0.0391;MLEAC=154;MLEAF=1.00;MQ=0.00;MQ0=0;EFF=INTRAGENIC(MODIFIER|||||TIAM1||CODING|||1),INTRON(MODIFIER||||649|TIAM1|protein_coding|CODING|ENSBTAT00000064124|1|1);CSQ=A|ENSBTAG00000017839|ENSBTAT00000064124|Transcript|intron_variant||||||||1/5||1|TIAM1|HGNC|      GT:DP:GQ:PL     1/1:.:3:32,3,0  1/1:.:18:207,18,0     1/1:.:6:78,6,0  1/1:.:12:140,12,0       1/1:.:9:101,9,0 ./.:0   1/1:.:9:97,9,0  1/1:.:9:96,9,0  1/1:.:21:244,21,0       1/1:.:12:138,12,0       1/1:.:12:124,12,0       1/1:.:9:105,9,0 1/1:.:15:164,15,0     1/1:.:15:153,15,0       1/1:.:27:265,27,0       1/1:.:12:125,12,0       1/1:.:18:214,18,0       ./.:0   1/1:.:9:108,9,0 1/1:.:15:169,15,0       ./.:0   1/1:.:6:76,6,0  1/1:.:6:66,6,0  1/1:.:12:140,12,0     1/1:.:3:28,3,0  1/1:.:3:10,3,0  1/1:.:12:128,12,0       ./.:0   1/1:.:18:181,18,0       1/1:.:9:98,9,0  1/1:.:15:161,15,0       1/1:.:15:185,15,0       1/1:.:12:133,12,0       1/1:.:15:175,15,0     1/1:.:18:178,18,0       1/1:.:12:133,12,0       1/1:.:9:105,9,0 1/1:.:12:141,12,0       1/1:.:15:166,15,0       1/1:.:9:108,9,0 1/1:.:15:160,15,0       1/1:.:27:267,27,0       1/1:.:21:218,21,0    1/1:.:9:107,9,0  1/1:.:3:28,3,0  1/1:.:9:80,9,0  1/1:.:6:46,6,0  ./.:0   1/1:.:6:61,6,0  1/1:.:21:241,21,0       1/1:.:15:161,15,0       1/1:.:6:82,6,0  1/1:.:12:143,12,0       1/1:.:9:109,9,0 1/1:.:21:249,21,0     1/1:.:6:40,6,0  1/1:.:9:94,9,0  1/1:.:15:185,15,0       1/1:.:12:129,12,0       1/1:.:12:132,12,0       ./.:0   1/1:.:21:207,21,0       1/1:.:12:136,12,0       1/1:.:12:109,12,0       1/1:.:18:192,18,0     ./.:0   1/1:.:9:68,9,0  1/1:.:12:138,12,0       1/1:.:6:73,6,0  1/1:.:9:105,9,0 1/1:.:9:98,9,0  1/1:.:6:65,6,0  ./.:0   1/1:.:6:65,6,0  ./.:0   1/1:.:6:58,6,0  1/1:.:12:131,12,0       ./.:0   ./.:01/1:.:3:38,3,0   1/1:.:3:37,3,0  1/1:.:21:227,21,0       1/1:.:12:131,12,0       1/1:.:6:66,6,0  1/1:.:9:100,9,0 1/1:.:21:209,21,0       1/1:.:6:63,6,0  1/1:.:6:69,6,0

Block 2

chr1    55248   .       ACCC    A,CCCC  179.69  .       AC=13,6;AF=0.100,0.046;AN=130;BaseQRankSum=0.736;ClippingRankSum=0.736;DP=347;FS=0.000;InbreedingCoeff=0.2231;MLEAC=10,4;MLEAF=0.077,0.031;MQ=53.55;MQ0=0;MQRankSum=0.736;QD=4.99;ReadPosRankSum=0.736;EFF=INTERGENIC(MODIFIER||||||||||1),INTERGENIC(MODIFIER||||||||||2);CSQ=-||||intergenic_variant|||||||||||||     GT:AD:DP:GQ:PL  0/0:1,1,0,0,0,0,0,0,0:2:3:0,3,24,3,24,24        ./.:.:0 0/0:1,0,0,0,0,0,0,0,0:1:1:0,1,40,3,41,43        0/0:1,0,0,0,0,0,0,0,0:1:2:0,2,44,3,45,46        0/0:.:5:0:0,0,103,0,103,103     0/0:3,0,1,0,0,0,0,0,0:4:9:0,9,81,9,81,81        0/0:0,0,1,0,0,0,0,0,0:1:1:0,1,2,1,2,2   ./.:.:1 0/0:.:3:0:0,0,41,0,41,41        0/0:1,0,0,0,0,0,0,0,0:1:9:0,9,73,9,73,73        0/1:1,0,0,1,0,0,0,0,0:2:22:28,0,73,28,22,46     0/0:1,0,0,0,0,0,0,0,0:1:4:0,4,25,4,25,25        0/0:2,0,0,0,0,0,0,0,0:2:21:0,21,273,21,273,273  0/1:5,0,0,1,0,0,0,0,0:6:11:11,0,235,26,158,175  ./.:0,0,0,0,1,0,0,0,0:1 0/0:.:4:0:0,0,46,0,46,46        ./.:.:3 ./.:.:2 0/1:1,0,1,1,0,0,0,0,0:3:21:28,0,44,31,21,52     0/0:.:4:0:0,0,77,0,77,77        0/0:0,0,1,0,0,0,0,0,0:1:1:0,1,2,1,2,2   0/0:.:2:6:0,6,51,6,51,51        0/0:.:6:2:0,2,151,2,151,151     0/0:2,0,1,0,0,0,0,0,0:3:7:0,7,74,7,74,74        ./.:.:0 0/0:2,0,0,0,0,0,0,0,0:2:5:0,7,59,5,37,35        1/2:0,0,0,1,0,0,0,0,0:1:1:27,1,40,26,0,25       0/0:2,0,0,0,0,0,0,0,0:2:9:0,9,83,9,83,83        ./.:.:0 2/2:0,0,0,0,0,3,0,0,0:3:9:59,59,59,9,9,0        ./.:.:16        0/0:2,0,0,0,0,0,0,0,0:2:7:0,7,59,7,59,59        0/0:2,0,0,0,0,0,0,0,0:2:9:0,9,83,9,83,83        0/0:2,0,0,0,0,0,0,0,0:2:11:0,11,68,11,68,68     0/2:6,0,0,0,0,2,0,0,0:8:18:18,39,384,0,346,340  ./.:.:1 0/1:3,0,0,1,0,0,0,0,0:4:25:25,0,96,34,100,134   0/0:2,0,0,0,0,0,0,0,0:2:12:0,12,105,12,105,105  0/1:1,0,0,1,0,0,0,0,0:2:17:17,0,94,21,26,44     0/0:.:2:6:0,6,64,6,64,64        0/0:1,0,0,0,0,0,0,0,0:1:2:0,2,24,3,25,27        0/0:.:2:6:0,6,48,6,48,48        0/0:.:2:6:0,6,63,6,63,63        0/0:0,0,1,0,0,0,0,0,0:1:0:0,0,1,0,1,1   ./.:.:0 0/0:1,0,0,0,0,0,0,0,0:1:1:0,1,6,3,8,9   0/0:1,0,0,0,0,0,0,0,0:1:15:0,15,124,15,124,124  ./.:.:0 0/0:1,0,0,0,0,0,0,0,0:1:4:0,4,19,4,19,19        0/1:2,0,0,1,0,0,0,0,0:3:28:28,0,66,34,70,104    0/0:.:4:0:0,0,18,0,18,18        0/2:2,0,0,0,0,4,0,0,0:6:57:68,74,143,0,69,57    ./.:.:0 0/0:4,0,0,0,0,0,0,0,1:5:15:0,15,109,15,109,109  0/0:.:10:0:0,0,182,0,182,182    0/0:.:1:3:0,3,31,3,31,31        0/0:0,0,1,0,0,0,0,0,0:1:1:0,1,2,1,2,2   0/0:2,0,0,0,0,0,0,0,0:2:5:0,5,76,6,78,79        0/0:1,1,0,0,0,0,0,0,0:2:4:0,4,28,4,28,28        ./.:.:1 1/1:0,0,0,2,0,0,0,0,0:2:6:67,6,0,67,6,67        ./.:.:1 0/0:.:2:0:0,0,14,0,14,14        ./.:.:0 ./.:.:1 0/0:.:5:1:0,1,120,1,120,120     0/0:.:4:0:0,0,8,0,8,8   0/0:.:6:0:0,0,94,0,94,94        0/1:1,0,0,2,0,0,0,0,0:3:1:27,0,1,29,6,35        0/0:.:2:0:0,0,3,0,3,3   ./.:.:6 ./.:.:0 ./.:.:0 0/2:3,0,0,0,0,2,0,0,0:5:27:46,27,89,0,62,81     ./.:.:0 0/0:.:1:0:0,0,6,0,6,6   0/0:.:9:0:0,0,86,0,86,86        ./.:.:1 ./.:.:0 1/1:.:.:0:1,1,0,1,0,0   0/0:0,0,0,0,0,0,0,0,0:0:3:0,3,26,3,26,26        0/1:4,0,0,2,0,0,0,0,0:6:49:49,0,93,61,100,160   0/0:.:3:0:0,0,14,0,14,14        0/0:.:8:0:0,0,60,0,60,60        0/0:2,1,0,0,0,0,0,0,0:3:6:0,6,37,6,37,37        ./.:.:2 0/0:.:9:0:0,0,81,0,81,81        0/0:0,0,0,0,0,0,0,0,0:0:1:0,1,2,1,2,2

Any idea what the issue is?

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • aeonsimaeonsim Posts: 65Member ✭✭
    edited April 2014

    Uhm the text in this post isn't showing for some reason?? When I go to edit the above post I see all the text for my post but for some reason I can't see it in the main thread view.

    Post edited by aeonsim on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,203Administrator, GATK Developer admin

    Some formatting weirdness, I think the Markdown interpreter didn't like the characters you used to delimit blocks. Fixed now.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,203Administrator, GATK Developer admin

    Two questions: what version was this, and what were the individual steps/ command lines that produced this?

    Geraldine Van der Auwera, PhD

  • aeonsimaeonsim Posts: 65Member ✭✭

    Strange I was getting ngnix bad gateway error messages when trying to edit the post. Still now it's showing it can at least be read.

  • aeonsimaeonsim Posts: 65Member ✭✭

    GATK 3.1-1 was used for the HaplotypeCaller stages, though it was run on older BAM files that had been prepared with 2.7.4. I'll track down the exact commands in a couple of minutes.

  • aeonsimaeonsim Posts: 65Member ✭✭

    @pdexheimer said:
    The VCF spec allows trailing fields in the format block to be omitted if they have no data...

    It does? That's plain nasty, so this is working as intended?

    I take it we are to assume that if the format line is: GT:AD:DP:GQ:PL and the data line is ./.:.:2 then that is GT:AD:DP

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,203Administrator, GATK Developer admin

    Aaand @pdexheimer jumps in with the answer :)

    The VCF spec allows trailing fields in the format block to be omitted if they have no data...

    I wish it didn't because it's confusing (and I always forget this, which is embarrassing).

    Geraldine Van der Auwera, PhD

  • pdexheimerpdexheimer Posts: 419Member, GSA Collaborator ✭✭✭✭

    @aeonsim said:
    I take it we are to assume that if the format line is: GT:AD:DP:GQ:PL and the data line is ./.:.:2 then that is GT:AD:DP

    Yep. Order is preserved, and GT is special because it must always be present

    @Geraldine_VdAuwera said:
    I wish it didn't because it's confusing

    Agreed. Makes parsing (by hand or machine) more complex, and I have a hard time finding any real benefits. It saves a couple of bytes, I suppose

  • aeonsimaeonsim Posts: 65Member ✭✭

    @pdexheimer said:

    What a pain, that's going to take a bit of re-factoring to handle that in my code, was already bad enough with supporting multiple variant callers...

    Next time they update the VCF spec please suggest to them that remove that bit.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,203Administrator, GATK Developer admin

    It saves a couple of bytes, I suppose

    pained laughter Sure. Talk about penny wise and pound foolish...

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,203Administrator, GATK Developer admin

    Next time they update the VCF spec please suggest to them that remove that bit.

    I'll see if the team is receptive to not applying that particular latitude of the spec's in future versions. There's no good reason we have to omit those fields; I would prefer we emit user-friendly VCFs.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.