Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

HaplotypeCaller - Genotype Given Allele Mode

tinutinu Member
edited April 2015 in Ask the GATK team

Hi Team,

I want to call variants for some samples using Genotype Given Allele mode(GGA) of HaplotypeCaller. I have used UnifiedGenotyper earlier in GGA mode.

I am not sure about the right approach to do GGA mode in HaplotypeCaller.

You make GVCFs for all BAMs using HaplotypeCaller in GGA mode and later use the GenotypeGVCFs walker to make VCF from these GVCFs. Is this the way to do this ?

Or Since GVCFs has all the sites from BAMs, can GenotypeGVCF can do a variant calling using GGA mode. I know that the GenotypeGVCF doesn't have the GGA option currently. But just checking the right approach.

Post edited by tinu on

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @tinu
    Hi,

    It doesn't look like GGA mode is available for GVCF mode in Haplotype Caller. So, you cannot run Haplotype Caller with -ERC GVCF and -gt_mode GENOTYPE_GIVEN_ALLELES. And, you are correct GenotypeGVCFs does not have GGA mode available either. I will check if there is any plan to make this available.

    How many samples do you have? The best way to use GGA mode in Haplotype Caller right now is to run it in normal mode with all of your sample bam files as input and use GGA mode.

    -Sheila

  • pdexheimerpdexheimer Member ✭✭✭✭

    I actually just encountered this myself. I haven't run this yet, but my plan is to run GGVCFs with -allSites -L sites.vcf - this won't be exactly the same thing, as it won't restrict the analysis to only the alleles of interest, but I think it will give me hom-ref calls for the sites I'm interested in

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited April 2015

    @pdexheimer Did -allSites -L sites.vcf work for you; i.e. did it print all sites? Did you receive any -maxAltAlleles warnings? Did you combine it with -isr INTERSECTION? I am asking, because I had issues with HC in GGA mode as suggested by @Sheila above; i.e. upon testing it on a larger fragment I discovered that adding -ip 100 as suggested in another thread does not work as I had just learned, when I authored this thread. I would really like to avoid having to use UG, because it doesn't do local realignment of the reads. I would in a sense be calling from a different dataset. And I would like to avoid HC -bamout as well.

  • pdexheimerpdexheimer Member ✭✭✭✭
    edited April 2015

    @tommycarstensen

    Short answer: Yes, yes, no.

    Long answer: You know how things sometimes go wrong when you do things a little differently than normal? Rather than hacking my Variant Caller queue script into unrecognizable oblivion, I decided to just run this one command directly. Naturally, I didn't give it enough memory and it died about an hour after I left on Friday night. So I'm rerunning now. However, it does seem to be working as intended - in spot checks, I can see it outputting calls for the positions in sites.vcf, and it doesn't appear to have any off-target calls. The big caveat here is that sites.vcf only contains SNPs - I'm not expecting indels and will be ignoring them if I get any (this is to calculate concordance with a microarray).

    I do still get maxAltAlleles warnings, though that's not a huge surprise given the number of gVCFs I'm using. I did not change the isr, because I want exactly the sites in my vcf, nothing more or less. For the same reason, I'm not using -ip

    There are a lot of caveats with my approach, and it is certainly not the same as a GGA mode for GenotypeGVCFs. But for my purposes right now, it's close enough

  • heskettheskett Portland, Oregon. USAMember

    Hi there,

    I'm using GGA mode and pass 'dbsnp_138.hg19.vcf' from the GATK bundle as the argument for the --alleles parameter, however I get many warnings of this type:

    WARN 12:10:51,429 GenotypingGivenAllelesUtils - Multiple valid VCF records detected in the alleles input file at site chr1:763769, only considering the first record

    Shortly after the warnings appear I get a GATK run-time error.

    ERROR stack trace

    java.lang.IllegalArgumentException: Illegal initial capacity: -25

    Am I doing it wrong?

    Thanks!

  • SheilaSheila Broad InstituteMember, Broadie admin

    @heskett
    Hi,

    Can you please post the exact command you ran and the version of GATK you are using?

    Thanks,
    Sheila

  • heskettheskett Portland, Oregon. USAMember
    edited October 2016

    Here it is:

    INFO 11:59:45,317 HelpFormatter - Program Args: -T HaplotypeCaller -R ucsc.hg19.fasta -I sample.bam --genotyping_mode GENOTYPE_GIVEN_ALLELES --allele /dbsnp_138.hg19.HTSlib.vcf.gz -o output.vcf -stand_call_conf 10.0 -stand_emit_conf 30.0 -dontUseSoftClippedBases

    The error message is:

    WARN 12:01:11,708 GenotypingGivenAllelesUtils - Multiple valid VCF records detected in the alleles input file at site chr1:842918, only considering the first record

    which eventually leads to a run time error

    I'm using v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56

    Thanks for your help!

    @Sheila said:
    @heskett
    Hi,

    Can you please post the exact command you ran and the version of GATK you are using?

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @heskett
    Hi,

    Can you run ValidateVariants on your input GGA VCF?

    Thanks,
    Sheila

  • heskettheskett Portland, Oregon. USAMember

    @Sheila

    Here is the type of info message I get when I run validatevariants on my GGA vcf. I am using the GGA vcf provided in the GATK bundle after compressing with bgzip.

    INFO 13:09:57,310 ValidateVariants - Reference allele is too long (123) at position chr1:74755167; skipping that record. Set --referenceWindowStop >= 123

    If I don't zip your dbsnp vcf with bgzip before using it with GATK I get the "could not acquire shared lock on index file" error.

    Thanks for the help.

  • SheilaSheila Broad InstituteMember, Broadie admin
    edited October 2016

    @heskett
    Hi,

    Oh, what about using the dbSNP VCF straight from the bundle that is zipped already? We have had reports of people having issues when bgzipping files themselves.

    As for the last error about shared locks on index files, can you try adding --disable_auto_index_creation_and_locking_when_reading_rods to your command line?

    Thanks,
    Sheila

    P.S. If none of those work, have a look at this thread where I gave a workaround for using GGA. Although since you are not using GVCF mode, GGA mode should work.

  • heskettheskett Portland, Oregon. USAMember

    When I use the dbSNP VCF straight from the bundle HC can't find the index file even though it is in the same directory.

    ERROR MESSAGE: An index is required, but none found., for input source: .../refs/dbsnp_138.hg19.vcf.gz

    ;(

    @Sheila said:
    @heskett
    Hi,

    Oh, what about using the dbSNP VCF straight from the bundle that is zipped already? We have had reports of people having issues when bgzipping files themselves.

    As for the last error about shared locks on index files, can you try adding --disable_auto_index_creation_and_locking_when_reading_rods to your command line?

    Thanks,
    Sheila

    P.S. If none of those work, have a look at this thread where I gave a workaround for using GGA. Although since you are not using GVCF mode, GGA mode should work.

  • heskettheskett Portland, Oregon. USAMember

    It looks like GATK can't handle two dbSNP variant records at the same position?

    If this is true how do you get around this?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @heskett
    Hi,

    Did you manage to get the bundle file to work? If not, can you try unzipping it? Why do you say GATK cannot handle two dbSNP records at the same position? Are you getting an error message? If so, can you post it?

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @heskett
    Hi again!

    Okay, I think Geraldine has helped you here.

    -Sheila

Sign In or Register to comment.