The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
GenotypeGVCFs

I was trying to do combine sets of vcf files for all my samples so that I have one single vcf output using this command option below
java -d64 -Xmx48g -jar ${GATK}/GenomeAnalysisTK.jar \
-R ${REF} \
-T GenotypeGVCFs \
--variant A.g.vcf \
--variant B.g.vcf \
--variant C.g.vcf \
-stand_emit_conf 30 \
-stand_call_conf 30 \
-o genotype.vcf
but I got this error message
“The following invalid GT allele index was encountered in the file: END=21994810”. I have tried to locate where the problem could be coming from but I do not understand this. Could you please advise me.
Best Answers
-
tommycarstensen United Kingdom ✭✭✭
It sounds to me as if one of your files
A.g.vcf
B.g.vcf
C.g.vcf
is corrupted. Can you try to grepEND=21994810
and post the line? Perhaps you ran out of disk space? -
Geraldine_VdAuwera Cambridge, MA admin
Ooh, that A.g.vcf file is broken alright. If you want to save time you could rerun on just a small region and fix up the records manually, but the safest thing to do is rerun the whole job, in case there are other issues that aren't immediately apparent.
-
Sheila Broad Institute admin
@lawal
Hi,So, the line you posted is from the re-generated GVCF? The issue is that instead of the GT field, there is an END position.
Did you restart the Haplotype Caller from the beginning when you ran out of disk space? Can you confirm you are using the latest version of GATK? You may just need to run Haplotype Caller on sample A again to get a clean GVCF.
Thanks,
Sheila
Answers
It sounds to me as if one of your files
A.g.vcf
B.g.vcf
C.g.vcf
is corrupted. Can you try to grepEND=21994810
and post the line? Perhaps you ran out of disk space?Thank you Tommy. I found this in A.g.vcf only. I remember I ran out of disk space along the line but I had to create more space later and re-generated the A.g.vcf.
1 21991582 . T . . END=21991582 GT:. END=21994810 GT:DP:GQ:MIN_DP:PL 0/0:36:96:35:0,96,1440
Ooh, that A.g.vcf file is broken alright. If you want to save time you could rerun on just a small region and fix up the records manually, but the safest thing to do is rerun the whole job, in case there are other issues that aren't immediately apparent.
@lawal
Hi,
So, the line you posted is from the re-generated GVCF? The issue is that instead of the GT field, there is an END position.
Did you restart the Haplotype Caller from the beginning when you ran out of disk space? Can you confirm you are using the latest version of GATK? You may just need to run Haplotype Caller on sample A again to get a clean GVCF.
Thanks,
Sheila
@Sheila yes i am using the latest GATK version and I did restart the Haplotype Caller from the beginning. @Geraldine_VdAuwera, thank you and I will just redo the job as advised to get clean job.