The genotypes in combined VCF generated by combineVariants are different from original VCFs

lindakjcaolindakjcao Member
edited October 2014 in Ask the GATK team

Hi GATK team,

I was working on generating a combined VCF using 150+ VCFs (building the sort of cohort). The purpose of it is to calculate variants cohort frequency. But I found the genotype is messed up in the combined VCF. Here is my cmd line:

java -jar /GATK/GenomeAnalysisTK-2.7-4/GenomeAnalysisTK.jar -R refernce.fasta \

-T CombineVariants \

--variant sample1.vcf \

--variant sample2.vcf \

-o combined.vcf

Here is one example of one variant/ position in the combined VCF. The record is very long in the combined VCF, I just grabbed the related columns here.

1 22082967 rs35545280 CAAA CAA,C,CA,CAAAA 76.73 PASS AC=109,6,104,6;AF=0.368,0.020,0.351,0.020;AN=296;DB;DP=9869;GC=48.13;MQ0=0;PercentNBaseSolid=0.0000;RU=A;STR;set=filterInvariant-filterInvariant2-filterInvariant3… GT:DP:GQ

In this record, sample 1 has this variant and it shows as "0/3:46:99",

but in the sample1.vcf, it is listed as

1 22082967 rs35545280 CA C 88.73 Low_Confidence AC=2;AF=1.00;AN=2;BaseCounts=0,76,0,0;BaseQRankSum=1.905;DB;DP=76;FS=0.000;GC=48.13;HaplotypeScore=195.0572;IndelType=DEL.NOVEL_2.;LowMQ=0.0000,0.0000,76;MLEAC=1;MLEAF=0.500;MQ=68.40;MQ0=0;MQRankSum=0.423;PercentNBaseSolid=0.0000;QD=1.17;RPA=20,19;RU=A;ReadPosRankSum=-0.741;STR;set=FilteredInAll GT:AD:DP:GQ:PL 1/1:0,17:76:19:587,19,0

And you can see that the genotype in combined VCF for sample 1 is 0/3, but in its original its is 1/1 which is homozygous. So when I calculate the cohort frequency, I'm confused on matching genotype of this variant for sample 1.

To give you more idea, I listed another sample of same variant in combined.VCF and its record in sample2.VCF.

In the combined.VCF, sample 2 shows as "0/3:91:99".

In the sample2.VCF, the record is:

1 22082967 rs35545280 CAA C,CA 549.19 PASS AC=1,1;AF=0.500,0.500;AN=2;BaseCounts=0,105,0,0;BaseQRankSum=-1.897;DB;DP=105;FS=0.000;GC=48.13;HaplotypeScore=238.8742;IndelType=MULTIALLELIC_INDEL;LowMQ=0.0000,0.0000,105;MLEAC=1,1;MLEAF=0.500,0.500;MQ=68.78;MQ0=0;MQRankSum=-0.992;PercentNBaseSolid=0.0000;QD=5.23;RPA=20,18,19;RU=A;ReadPosRankSum=-1.344;STR;set=variant2 GT:AD:DP:GQ:PL 1/2:0,11,25:105:99:1363,307,775,501,0,585

Where you can see the genotype is 1/2, but in the combined VCF, it shows as "0/3".

Please advise me if I should use any parameter in the cmd line to solve this problem.

Thank you.
Linda

Tagged:

Best Answers

Answers

  • Hi Sheila,

    There is no error msg. But I can give you the full record in the combined.VCF. Let me know if this works for you.

    1 22082967 rs35545280 CAAA CAA,C,CA,CAAAA 76.73 PASS AC=109,6,104,6;AF=0.368,0.020,0.351,0.020;AN=296;DB;DP=9869;GC=48.13;MQ0=0;PercentNBaseSolid=0.0000;RU=A;STR;set=filterInvariant-filterInvariant2-filterInvariant3-filterInvariant4-variant5-variant6-filterInvariant8-variant9-variant10-filterInvariant11-variant12-filterInvariant14-filterInvariant15-filterInvariant16-variant17-filterInvariant18-variant19-filterInvariant20-filterInvariant21-variant22-filterInvariant23-variant24-filterInvariant25-variant27-variant28-variant29-filterInvariant30-variant31-variant32-variant33-variant34-variant35-variant36-variant37-variant38-variant39-variant40-variant41-filterInvariant42-variant44-variant45-variant46-variant47-variant48-variant49-variant50-variant51-variant52-variant53-variant54-variant55-variant56-variant57-variant58-variant60-variant61-variant62-variant63-variant64-variant65-variant66-variant67-variant68-variant69-filterInvariant70-variant71-variant72-variant73-variant74-variant75-variant76-variant77-variant78-variant79-variant80-variant81-variant82-variant83-variant84-variant85-variant87-variant88-variant89-variant90-variant91-variant92-variant93-variant95-variant96-variant97-variant98-variant99-variant100-variant101-variant102-variant103-variant104-variant105-variant106-variant107-variant108-variant109-variant110-variant111-variant112-variant113-filterInvariant114-variant115-variant116-filterInvariant118-variant119-variant120-variant121-filterInvariant122-filterInvariant123-variant124-variant125-filterInvariant126-variant127-variant128-filterInvariant129-variant130-filterInvariant131-variant132-variant134-variant135-variant136-filterInvariant137-variant138-filterInvariant139-variant140-variant141-filterInvariant142-variant143-variant144-variant145-variant146-variant147-variant148-variant149-variant150-variant151-variant152-variant153-variant154-variant155-variant156-variant157 GT:DP:GQ 3/1:61:99 0/3:57:99 0/3:78:99 0/3:88:99 3/1:106:99 3/1:115:99 3/1:105:99 3/1:100:99 3/1:102:99 3/1:98:99 3/1:111:99 0/1:79:99 3/1:110:99 0/1:90:60 0/3:31:54 0/1:28:26 3/1:101:99 3/1:91:99 4/1:63:76 0/1:34:33 3/1:32:89 3/1:62:99 0/3:75:99 1/1:57:5 0/1:51:99 3/1:71:99 3/1:64:99 3/1:58:99 3/1:68:99 3/1:74:99 3/1:47:99 0/3:46:99 3/1:39:99 3/1:46:99 ./. 0/3:53:75 1/1:81:40 3/1:86:99 0/1:60:99 ./. 0/3:66:99 0/3:65:99 3/1:58:99 0/1:80:1 0/1:89:8 0/3:91:99 3/1:112:99 1/1:76:19 0/3:88:99 0/3:86:99 3/1:86:99 3/1:100:99 0/1:51:99 3/1:60:99 3/1:56:99 0/3:58:99 0/1:65:99 3/1:70:99 3/1:54:86 3/1:56:99 0/3:45:99 3/1:60:99 3/1:55:99 0/1:50:99 0/1:71:99 4/1:69:86 0/3:71:99 0/3:76:99 0/3:56:99 0/3:72:99 3/1:59:99 0/3:79:99 0/4:66:99 3/1:71:99 3/1:58:99 0/3:64:99 2/1:70:99 3/1:53:79 3/1:69:99 2/1:70:99 ./. 1/1:62:74 ./. 0/3:66:99 0/3:53:99 0/3:86:99 3/1:65:99 3/1:70:99 0/1:75:99 0/3:40:99 0/1:79:99 3/1:66:83 3/1:73:99 3/1:72:99 4/1:66:99 3/1:59:96 0/3:70:99 2/1:63:90 0/3:66:99 4/3:38:99 3/1:73:99 3/1:55:99 3/1:71:99 3/1:52:99 ./. 2/1:53:99 3/1:56:99 0/3:52:99 0/3:74:99 0/3:80:99 0/1:77:99 0/1:59:21 0/3:67:99 0/3:60:99 3/1:80:99 0/3:75:99 1/1:53:69 3/1:74:87 0/1:57:99 3/1:57:99 0/1:78:44 0/3:52:99 0/1:54:99 ./. 0/1:61:99 0/1:38:99 ./. ./. 0/1:49:99 3/1:60:99 0/1:51:99 0/3:68:99 3/1:53:81 0/3:68:99 3/1:50:99 0/3:55:99 0/1:63:99 0/1:43:90 3/1:69:99 3/1:63:99 0/3:77:99 3/3:63:38 ./. 3/1:57:99 0/4:76:99 3/1:78:99 3/1:69:99 3/1:85:99 0/1:61:59 0/2:46:99 0/3:61:99 0/1:60:99 0/1:49:99 0/1:53:99 0/2:47:99 3/1:49:99 3/1:90:99

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @lindakjcao‌

    Hi Linda,

    We want to be able to replicate the error here so we can try to figure out what is going on.

    I see you have one particular record where an error occurs, and the error is in sample 1.

    If you can submit a snippet of sample 1 vcf around that record plus a few other sample vcfs around that record, we can replicate it here.

    I hope this makes sense.

    Thanks,
    Sheila

  • Hi Shelia,

    The errors are in many more than two samples. I just put two samples here. However the record that involved in sample 1 and sample 2 VCFs are listed in the original post. Please check it out.

    Thanks,
    Linda

Sign In or Register to comment.