We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

CombineGVCFs - All alleles are missing

astrandastrand New YorkMember
edited January 2015 in Ask the GATK team


I am combining gcvf files into single gvcf files by chromosome, using CombineGVCFs, in order to run GenotypeGVCFs. When I checked the first gvcf file generated by CombineGVCFs, I noticed that at each position, all the alleles were missing.

For example, at position 16050036, this is what comes up in the final gvcf file:
22 16050036 . A C,<NON_REF> . . BaseQRankSum=-7.360e-01;ClippingRankSum=-7.360e-01;DP=4;MQ=27.00;MQ0=0;MQRankSum=-7.360e-01;ReadPosRankSum=0.736 GT:AD:DP:MIN_DP:PL:SB ./.:1,2,0:3:.:55,0,23,58,29,86:1,0,2,0 ./.:.:1:1:0,0,0,0,0,0 ./.:.:0:0:0,0,0,0,0,0

But if we just take one of the precursor gvcf files (one individual), we clearly see the genotype at that site:
22 16050036 . A C,<NON_REF> 26.80 . BaseQRankSum=-0.736;ClippingRankSum=-0.736;DP=3;MLEAC=1,0;MLEAF=0.500,0.00;MQ=27.00;MQ0=0;MQRankSum=-0.736;ReadPosRankSum=0.736 GT:AD:DP:GQ:PL:SB 0/1:1,2,0:3:23:55,0,23,58,29,86:1,0,2,0

The command I'm using to generate these files is:

java -Xmx1g -jar GenomeAnalysisTK.jar -T CombineGVCFs -R hs37d5.fa -V vcfs.chr${numchr}.new.list -o mergeGvcf_${numchr}.vcf -L ${numchr}
where numchr is a variable previously defined (indicating the chromosome number).

It seems that all the information is being taken into account except the actual genotypes. How do I solve this problem?


Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You mean that in the combined GVCF, the genotypes are missing, right? The alleles seem to be there.

    Which version are you using?

  • astrandastrand New YorkMember

    Yes, that's what I meant. I am using the latest version (3.3).

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We've seen a few cases recently where GenotypeGVCFs assigns no-calls (./.) instead of a proper genotype when coverage is extremely low. I think that's what's happening here. It may take some time before we can develop a fix for this as it's fairly low priority (those sites aren't really usable anyway). If all your sites are like this you should really check your data QC. But if it's just the first sites in your files it's not necessarily unexpected -- the coverage tends to be very low at the beginning of contigs (in WGS) or capture targets (in WEx).

  • astrandastrand New YorkMember

    Hi Geraldine,

    What do you mean by "data QC"? The example I used in my post is a test run that I did for 3 individuals on 1 chromosome. I was able to generate a perfectly normal final vcf file using GenotypeGVCFs. However, because I have a lot more files for which I have to run this analysis (and for which, ideally, I should use CombineGVCFs), I decided to test CombineGVCFs on those 3 individuals. That's when I saw the missing genotypes. They are missing from every single site that gets called. Like I said, using GenotypeGVCFs produces perfectly acceptable output, but not CombineGVCFs, so not sure what is going on there.

    Thanks anyways for the information,

  • astrandastrand New YorkMember

    OK, thanks Sheila. Where does the information regarding the genotypes get stored for GenotypeGVCFs to use? I'll run the Combined files through GenotypeGVCFs and let you know if I experience any problems.


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Alva,

    The information is in the other fields present in the CombineGVCFs file. GenotypeGVCFs uses the AD and PL fields for all the samples to determine a proper final genotype.


  • Kelly135Kelly135 KoreaMember

    Hi Sheila, I am experiencing the same issue and this post helped me. I understood that genotypes are missing after CombineGVCFs, and then they appear after GenotypeGVCF.

    But the final format I want to get is "combined gvcf" not "combined vcf". In this case, how should I create this file?

    I also tried to find the way of converting vcf to gvcf, but there was no such information as most people are interested in vcf, not gvcf.

    FYI, I am using the latest version (4.0.1) and my question is not related to a bug.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    There is no way to get a final combined GVCF, as GVCFs are intermediate files not to be used in final analysis.

    Why do you need the final combined GVCF?


Sign In or Register to comment.