We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

positions not being reported when using GENOTYPE_GIVEN_ALLELES

rcorbett2rcorbett2 vancouverMember

Hi All,

Here is my command:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R GRCh37-lite.fa --disable_read_filter DuplicateRead -I myBam.bam --genotyping_mode GENOTYPE_GIVEN_ALLELES -alleles all.gatk.sorted.vcf -o test.vcf -L all.gatk.sorted.vcf

This seems to work well, but there are a few variants in all.gatk.sorted.vcf whose positions don't show up in test.vcf.

I can look in the bam file and see that there is coverage in the positions of interest. I can also confirm that there are many positions in my list for which the bam doesn't have a variant in the reads and it gets reported correctly, so I don't think the issue is with having any of the files read correctly, or negative reporting.

I'm looking at amplicon data which have a very irregular coverage profile and I am using version 3.4-46-gbc02625

Has anyone seen this sort of problem, or can anyone suggest a solution?

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    We've seen some suboptimal behavior of HC run in GENOTYPE_GIVEN_ALLELES mode. What I would recommend as a workaround is to run it in its normal DISCOVERY mode but with the VCF of positions you're interested in as -L list of intervals, and some interval padding (eg -ip 50). Optionally if you want all sites output to have info on the true negatives, use -allSites.

  • rcorbett2rcorbett2 vancouverMember
    edited February 2016

    Thanks Geraldine,
    I am looking for information about negatives so I tried -allSites.
    It doesn't seem to be an option for me. I see -allSitesPLs, is that what you meant to recommend?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, sorry I was thinking of the GVCF workflow where you run HC in -ERC GVCF mode first then run GenotypeGVCFs, which is the tool that takes -allSites. You can use that workflow for your use case. It was originally developed as a way to scale the variant discovery process to large numbers of samples, but it should work just fine on a single sample if that's what you're working with.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @rcorbett2
    Hi,

    This thread will probably be of use as well.

    -Sheila

Sign In or Register to comment.