This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Error in HaplotypeCaller with GENOTYPE_GIVEN_ALLELES
I've got a few different VCFs from various experiments and sources, and am trying to get population level information for a gene of interest. I merged the vcfs with vcftools and ran into the un-called hom-refs problem, for example among a small family study most alleles are hom-ref and I get no genotype calls. After merging , the cohorts are almost mutually exclusive regarding which alleles are seen. I would need to impute uncalled SNPs but with the read data available it makes more sense to call reference genotypes at given locations. So I'll use --genotyping_mode=GENOTYPE_GIVEN_ALLELES and give the merged VCF as a reference for the genomic positions I want called for everyone. This has worked for UnifiedGenotyper last year, possibly with a different alleles format.
I cropped the alleles VCF to only contain one individual, at 600 loci. For many sites any one individual has a no-call genotype, and I find the HaplotypeCaller is trying to parse the VCF and chokes on a no-call site. Log and error are attached. Java chokes trying to parseDouble for a period character.
If I clip off the genotypes columns (only interested in sites) with cut -f1-5, this retains the chr,pos,rs,ref,alt and I think it should work, but the GATK refuses to read it as an invalid VCF
ERROR MESSAGE: Your input file has a malformed header: there are not enough columns present in the header line:
I don't think the HC should need an individual's genotype quality for the GIVEN_ALLELES option, this should just be specifying loci, right? That's almost more worrying that it wants to parse the individuals from the GIVEN_ALLELES file, but I suspect that's just code-reuse and a default behavior for any VCF reader, since the error came out of org.broadinstitute.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:113)
The error comes when it gets to this no-call line, with ./. filled in by vcftools merge operation.
14 23857606 rs115570443 G C 100 PASS . GT:GL:DS 0|0:0.00,-5.00,-5.00:0.000 14 23857626 . C A 51.84 . . GT:GQ:DP:PL:AD .:.:.:.:.
For the moment I'm going to try to edit the specified alleles VCF to put mock genotypes, and hope it doesn't impact the results.