We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

HC missing SNPs

Good day,
HC does not detect a small number of SNPs that appear very clearly in the alignment. HC reported >4k variants, some with much more challenging alignments, so I just can't figure out why it misses some of these variants since the conditions look ideal for detection. In the sample NA12878, the C/T variant at chr15:66735551 has the following characteristics:
1. The ratio of C:T alleles is 55:45, and A,G and N are all 0%.
2. Base qualites are high for both alleles: upper 20s to lower 30s.
3. Mapping qualities are high: ~70.
4. There are no nearby indels or unusual variants/sequencing errors.

Looking at the reassembled bam, the SNP is present almost completely unchanged, as seen in the lower panel in the screenshot.

image

I am using v3.3 with default settings and I also tried --maxNumHaplotypesInPopulation 500 --kmerSize 25
Stuart

Answers

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @sbaker
    Hi Stuart,

    Can you explain your workflow in more detail? Are you running on one sample or multiple samples? Does Haplotype Caller in normal single sample mode not call this, or does Haplotype Caller in GVCF mode not report it?

    Also, can you post the vcf record at the site?

    Thanks,
    Sheila

  • sbakersbaker USAMember

    Hi Sheila,
    I'm running a single sample, following best practices except remove duplicates because it's amplicon data. The default mode does not report the variant. GVCF mode gives the following line:
    chr15 66735551 . C . . END=66735551 GT:DP:GQ:MIN_DP:PL 0/0:262:0:262:0,0,561

    The the resulting vcf from the following command is uploaded as hc-missing-snps.vcf.gz:

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I ./NA12878.sort.bam -o ./NA12878.vcf --maxNumHaplotypesInPopulation 500 --kmerSize 25 --bamOutput out.bam --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -L chr15:66735000-66736000

    Thanks again!
    Stuart

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Stuart, the VCF record looks like the HC has zero confidence in the genotype, between het and hom-ref. I can't tell from the info posted here why that would happen, though.

Sign In or Register to comment.