If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Error in HaplotypeCaller with GENOTYPE_GIVEN_ALLELES
I've got a few different VCFs from various experiments and sources, and am trying to get population level information for a gene of interest. I merged the vcfs with vcftools and ran into the un-called hom-refs problem, for example among a small family study most alleles are hom-ref and I get no genotype calls. After merging , the cohorts are almost mutually exclusive regarding which alleles are seen. I would need to impute uncalled SNPs but with the read data available it makes more sense to call reference genotypes at given locations. So I'll use --genotyping_mode=GENOTYPE_GIVEN_ALLELES and give the merged VCF as a reference for the genomic positions I want called for everyone. This has worked for UnifiedGenotyper last year, possibly with a different alleles format.
I cropped the alleles VCF to only contain one individual, at 600 loci. For many sites any one individual has a no-call genotype, and I find the HaplotypeCaller is trying to parse the VCF and chokes on a no-call site. Log and error are attached. Java chokes trying to parseDouble for a period character.
If I clip off the genotypes columns (only interested in sites) with cut -f1-5, this retains the chr,pos,rs,ref,alt and I think it should work, but the GATK refuses to read it as an invalid VCF
ERROR MESSAGE: Your input file has a malformed header: there are not enough columns present in the header line:
I don't think the HC should need an individual's genotype quality for the GIVEN_ALLELES option, this should just be specifying loci, right? That's almost more worrying that it wants to parse the individuals from the GIVEN_ALLELES file, but I suspect that's just code-reuse and a default behavior for any VCF reader, since the error came out of org.broadinstitute.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:113)
The error comes when it gets to this no-call line, with ./. filled in by vcftools merge operation.
14 23857606 rs115570443 G C 100 PASS . GT:GL:DS 0|0:0.00,-5.00,-5.00:0.000 14 23857626 . C A 51.84 . . GT:GQ:DP:PL:AD .:.:.:.:.
For the moment I'm going to try to edit the specified alleles VCF to put mock genotypes, and hope it doesn't impact the results.