Illumina gVCFs

pulitpulit Member
edited February 2015 in Ask the GATK team

Hi GATK-ers,

I have been given ~2000 gVCFs generated by Illumina (one sample per gVCF). Though they are in standard gVCF format, they were generated by an Illumina pipeline ( if you're really curious) and not the Haplotype Caller. As a result (I think ... ), the GATK doesn't want to process them (I have tried CombineGVCFs and GenotypeGVCFs to no avail). Is there a GATK walker or some other tool that will make my gVCFs GATK-friendly? I need to be able to merge this data together to make it analyze-able because in single-sample VCF format it's pretty useless at the moment.

My only other thought has been to expand all the ref blocks of data and then merge everything together, but this seems like it will result in the creation of a massive amount of data.

Any suggestions you may have are greatly appreciated!!!



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The problem is there is no such thing as standard GVCF format. There is what Illumina calls GVCF, which is really just an all-sites VCF, and there is HC's GVCF format, which contains additional information that is necessary for joint genotyping. There is no way to convert Illumina files to HC GVCF, so the only good way to proceed is to recall the bams with HC. Sorry for the bad news...

  • pulitpulit Member

    Not your fault! Would be nice if Illumina had more user-friendly gVCFs ... But thanks for the help.

  • TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

    The GA4GH File Formats group has been trying to standardise some aspects of GVCF to the extent this is possible. Unfortunately it's tricky to do without some permanent commitment to annotation definitions and how to combine them. This is hard because a lot of which are by necessity rather specific to the error model of the generating tool and the specific version of that tool that they come from.

Sign In or Register to comment.