Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

The genotypes in combined VCF generated by combineVariants are different from original VCFs

lindakjcaolindakjcao Member
edited October 2014 in Ask the GATK team

Hi GATK team,

I was working on generating a combined VCF using 150+ VCFs (building the sort of cohort). The purpose of it is to calculate variants cohort frequency. But I found the genotype is messed up in the combined VCF. Here is my cmd line:

java -jar /GATK/GenomeAnalysisTK-2.7-4/GenomeAnalysisTK.jar -R refernce.fasta \

-T CombineVariants \

--variant sample1.vcf \

--variant sample2.vcf \

-o combined.vcf

Here is one example of one variant/ position in the combined VCF. The record is very long in the combined VCF, I just grabbed the related columns here.

1 22082967 rs35545280 CAAA CAA,C,CA,CAAAA 76.73 PASS AC=109,6,104,6;AF=0.368,0.020,0.351,0.020;AN=296;DB;DP=9869;GC=48.13;MQ0=0;PercentNBaseSolid=0.0000;RU=A;STR;set=filterInvariant-filterInvariant2-filterInvariant3… GT:DP:GQ

In this record, sample 1 has this variant and it shows as "0/3:46:99",

but in the sample1.vcf, it is listed as

1 22082967 rs35545280 CA C 88.73 Low_Confidence AC=2;AF=1.00;AN=2;BaseCounts=0,76,0,0;BaseQRankSum=1.905;DB;DP=76;FS=0.000;GC=48.13;HaplotypeScore=195.0572;IndelType=DEL.NOVEL_2.;LowMQ=0.0000,0.0000,76;MLEAC=1;MLEAF=0.500;MQ=68.40;MQ0=0;MQRankSum=0.423;PercentNBaseSolid=0.0000;QD=1.17;RPA=20,19;RU=A;ReadPosRankSum=-0.741;STR;set=FilteredInAll GT:AD:DP:GQ:PL 1/1:0,17:76:19:587,19,0

And you can see that the genotype in combined VCF for sample 1 is 0/3, but in its original its is 1/1 which is homozygous. So when I calculate the cohort frequency, I'm confused on matching genotype of this variant for sample 1.

To give you more idea, I listed another sample of same variant in combined.VCF and its record in sample2.VCF.

In the combined.VCF, sample 2 shows as "0/3:91:99".

In the sample2.VCF, the record is:

1 22082967 rs35545280 CAA C,CA 549.19 PASS AC=1,1;AF=0.500,0.500;AN=2;BaseCounts=0,105,0,0;BaseQRankSum=-1.897;DB;DP=105;FS=0.000;GC=48.13;HaplotypeScore=238.8742;IndelType=MULTIALLELIC_INDEL;LowMQ=0.0000,0.0000,105;MLEAC=1,1;MLEAF=0.500,0.500;MQ=68.78;MQ0=0;MQRankSum=-0.992;PercentNBaseSolid=0.0000;QD=5.23;RPA=20,18,19;RU=A;ReadPosRankSum=-1.344;STR;set=variant2 GT:AD:DP:GQ:PL 1/2:0,11,25:105:99:1363,307,775,501,0,585

Where you can see the genotype is 1/2, but in the combined VCF, it shows as "0/3".

Please advise me if I should use any parameter in the cmd line to solve this problem.

Thank you.
Linda

Tagged:

Best Answers

Answers

  • Hi Sheila,

    There is no error msg. But I can give you the full record in the combined.VCF. Let me know if this works for you.

    1 22082967 rs35545280 CAAA CAA,C,CA,CAAAA 76.73 PASS AC=109,6,104,6;AF=0.368,0.020,0.351,0.020;AN=296;DB;DP=9869;GC=48.13;MQ0=0;PercentNBaseSolid=0.0000;RU=A;STR;set=filterInvariant-filterInvariant2-filterInvariant3-filterInvariant4-variant5-variant6-filterInvariant8-variant9-variant10-filterInvariant11-variant12-filterInvariant14-filterInvariant15-filterInvariant16-variant17-filterInvariant18-variant19-filterInvariant20-filterInvariant21-variant22-filterInvariant23-variant24-filterInvariant25-variant27-variant28-variant29-filterInvariant30-variant31-variant32-variant33-variant34-variant35-variant36-variant37-variant38-variant39-variant40-variant41-filterInvariant42-variant44-variant45-variant46-variant47-variant48-variant49-variant50-variant51-variant52-variant53-variant54-variant55-variant56-variant57-variant58-variant60-variant61-variant62-variant63-variant64-variant65-variant66-variant67-variant68-variant69-filterInvariant70-variant71-variant72-variant73-variant74-variant75-variant76-variant77-variant78-variant79-variant80-variant81-variant82-variant83-variant84-variant85-variant87-variant88-variant89-variant90-variant91-variant92-variant93-variant95-variant96-variant97-variant98-variant99-variant100-variant101-variant102-variant103-variant104-variant105-variant106-variant107-variant108-variant109-variant110-variant111-variant112-variant113-filterInvariant114-variant115-variant116-filterInvariant118-variant119-variant120-variant121-filterInvariant122-filterInvariant123-variant124-variant125-filterInvariant126-variant127-variant128-filterInvariant129-variant130-filterInvariant131-variant132-variant134-variant135-variant136-filterInvariant137-variant138-filterInvariant139-variant140-variant141-filterInvariant142-variant143-variant144-variant145-variant146-variant147-variant148-variant149-variant150-variant151-variant152-variant153-variant154-variant155-variant156-variant157 GT:DP:GQ 3/1:61:99 0/3:57:99 0/3:78:99 0/3:88:99 3/1:106:99 3/1:115:99 3/1:105:99 3/1:100:99 3/1:102:99 3/1:98:99 3/1:111:99 0/1:79:99 3/1:110:99 0/1:90:60 0/3:31:54 0/1:28:26 3/1:101:99 3/1:91:99 4/1:63:76 0/1:34:33 3/1:32:89 3/1:62:99 0/3:75:99 1/1:57:5 0/1:51:99 3/1:71:99 3/1:64:99 3/1:58:99 3/1:68:99 3/1:74:99 3/1:47:99 0/3:46:99 3/1:39:99 3/1:46:99 ./. 0/3:53:75 1/1:81:40 3/1:86:99 0/1:60:99 ./. 0/3:66:99 0/3:65:99 3/1:58:99 0/1:80:1 0/1:89:8 0/3:91:99 3/1:112:99 1/1:76:19 0/3:88:99 0/3:86:99 3/1:86:99 3/1:100:99 0/1:51:99 3/1:60:99 3/1:56:99 0/3:58:99 0/1:65:99 3/1:70:99 3/1:54:86 3/1:56:99 0/3:45:99 3/1:60:99 3/1:55:99 0/1:50:99 0/1:71:99 4/1:69:86 0/3:71:99 0/3:76:99 0/3:56:99 0/3:72:99 3/1:59:99 0/3:79:99 0/4:66:99 3/1:71:99 3/1:58:99 0/3:64:99 2/1:70:99 3/1:53:79 3/1:69:99 2/1:70:99 ./. 1/1:62:74 ./. 0/3:66:99 0/3:53:99 0/3:86:99 3/1:65:99 3/1:70:99 0/1:75:99 0/3:40:99 0/1:79:99 3/1:66:83 3/1:73:99 3/1:72:99 4/1:66:99 3/1:59:96 0/3:70:99 2/1:63:90 0/3:66:99 4/3:38:99 3/1:73:99 3/1:55:99 3/1:71:99 3/1:52:99 ./. 2/1:53:99 3/1:56:99 0/3:52:99 0/3:74:99 0/3:80:99 0/1:77:99 0/1:59:21 0/3:67:99 0/3:60:99 3/1:80:99 0/3:75:99 1/1:53:69 3/1:74:87 0/1:57:99 3/1:57:99 0/1:78:44 0/3:52:99 0/1:54:99 ./. 0/1:61:99 0/1:38:99 ./. ./. 0/1:49:99 3/1:60:99 0/1:51:99 0/3:68:99 3/1:53:81 0/3:68:99 3/1:50:99 0/3:55:99 0/1:63:99 0/1:43:90 3/1:69:99 3/1:63:99 0/3:77:99 3/3:63:38 ./. 3/1:57:99 0/4:76:99 3/1:78:99 3/1:69:99 3/1:85:99 0/1:61:59 0/2:46:99 0/3:61:99 0/1:60:99 0/1:49:99 0/1:53:99 0/2:47:99 3/1:49:99 3/1:90:99

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @lindakjcao‌

    Hi Linda,

    We want to be able to replicate the error here so we can try to figure out what is going on.

    I see you have one particular record where an error occurs, and the error is in sample 1.

    If you can submit a snippet of sample 1 vcf around that record plus a few other sample vcfs around that record, we can replicate it here.

    I hope this makes sense.

    Thanks,
    Sheila

  • Hi Shelia,

    The errors are in many more than two samples. I just put two samples here. However the record that involved in sample 1 and sample 2 VCFs are listed in the original post. Please check it out.

    Thanks,
    Linda

Sign In or Register to comment.