Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

positions not being reported when using GENOTYPE_GIVEN_ALLELES

rcorbett2rcorbett2 vancouverMember

Hi All,

Here is my command:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R GRCh37-lite.fa --disable_read_filter DuplicateRead -I myBam.bam --genotyping_mode GENOTYPE_GIVEN_ALLELES -alleles all.gatk.sorted.vcf -o test.vcf -L all.gatk.sorted.vcf

This seems to work well, but there are a few variants in all.gatk.sorted.vcf whose positions don't show up in test.vcf.

I can look in the bam file and see that there is coverage in the positions of interest. I can also confirm that there are many positions in my list for which the bam doesn't have a variant in the reads and it gets reported correctly, so I don't think the issue is with having any of the files read correctly, or negative reporting.

I'm looking at amplicon data which have a very irregular coverage profile and I am using version 3.4-46-gbc02625

Has anyone seen this sort of problem, or can anyone suggest a solution?

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    We've seen some suboptimal behavior of HC run in GENOTYPE_GIVEN_ALLELES mode. What I would recommend as a workaround is to run it in its normal DISCOVERY mode but with the VCF of positions you're interested in as -L list of intervals, and some interval padding (eg -ip 50). Optionally if you want all sites output to have info on the true negatives, use -allSites.

  • rcorbett2rcorbett2 vancouverMember
    edited February 2016

    Thanks Geraldine,
    I am looking for information about negatives so I tried -allSites.
    It doesn't seem to be an option for me. I see -allSitesPLs, is that what you meant to recommend?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, sorry I was thinking of the GVCF workflow where you run HC in -ERC GVCF mode first then run GenotypeGVCFs, which is the tool that takes -allSites. You can use that workflow for your use case. It was originally developed as a way to scale the variant discovery process to large numbers of samples, but it should work just fine on a single sample if that's what you're working with.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rcorbett2
    Hi,

    This thread will probably be of use as well.

    -Sheila

Sign In or Register to comment.