GenotypeGVCFs - Sites missing

Hi,

I'm trying to call genotypes at a set of given positions from ~5,000 samples. To do so, I first generate gvcf files using HaplotypeCaller for the region around a gene for all samples separately with HaplotypeCaller. Then I'm merging the single gvcf files into batches of 1000 and then I'm generating the genotypes at ~180 sites using GenotypeGVCFs and the flag --includeNonVariantSites. Everything runs without errors, but in the end I noticed that ~20 sites were missing from the final VCF file. I then looked into these sites and the single files in detail and found out that there are simply no genotypes called for some samples at some positions.

I attached a small example. When I'm running the following command, no variant is in the output file:
java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L region.bed --includeNonVariantSites \ -T GenotypeGVCFs \ --variant wrong.gvcf --variant correct.gvcf \ -o out.vcf

The same command works when running it with only "correct.gvcf" as an input, but obviously not when running it with only "wrong.gvcf". I noticed that wrong.gvcf might show a variant at this exact position.

Best,
Thomas

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Thomas,

    Can you tell us exactly which position you're looking at, and what the desired result looks like in the final file?

  • thomas_wthomas_w MunichMember

    Hi Geraldine,

    here is what I do in detail in this example (in reality I'm doing this with 5,000 instead of 2 samples):

    I want to get the genotype at position chr12:64875771 for the two samples CORRECT and WRONG. When I'm running Haplotype Caller directly at the BAM files I get these results:
    java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L chr12:64875771 -ERC BP_RESOLUTION \ -T HaplotypeCaller \ -I correct.bam \ -o hc.correct.vcf
    grep -v "#" hc.correct.vcf chr12 64875771 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:67,0:67:99:0,120,1800
    And:
    java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L chr12:64875771 -ERC BP_RESOLUTION \ -T HaplotypeCaller \ -I wrong.bam \ -o hc.wrong.vcf
    grep -v "#" hc.wrong.vcf chr12 64875771 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:56,2:58:99:0,120,1800

    Since I want to do this for 5,000 samples in reality, what I actually want to do is to first get gvcf files from the BAM files and then use GenotypeGVCFs to get the genotpyes of all samples:
    java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L chr12:64845000-64900000 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 \ -T HaplotypeCaller \ -I correct.bam \ -o correct.gvcf
    java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L chr12:64875771 --includeNonVariantSites \ -T GenotypeGVCFs \ --variant correct.gvcf \ -o ggvcf.correct.vcf
    grep -v "#" ggvcf.correct.vcf chr12 64875771 . C . . . DP=22;NCC=1 GT:DP 0/0:22

    And:
    java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L chr12:64845000-64900000 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 \ -T HaplotypeCaller \ -I wrong.bam \ -o wrong.gvcf
    java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L chr12:64875771 --includeNonVariantSites \ -T GenotypeGVCFs \ --variant wrong.gvcf \ -o ggvcf.wrong.vcf
    grep -v "#" ggvcf.wrong.vcf

    So for the sample WRONG, the file ggvcf.wrong.vcf is empty. Also when I'm running GenotypeGVCFs on both files together, the resulting file is empty:
    java -jar GenomeAnalysisTK.jar \ -R hg19.fa \ -L chr12:64875771 --includeNonVariantSites \ -T GenotypeGVCFs \ --variant wrong.gvcf \ --variant correct.gvcf \ -o ggvcf.both.vcf
    grep -v "#" ggvcf.both.vcf

    All files from this example are attached.

    Best,
    Thomas

  • thomas_wthomas_w MunichMember

    Hi Geraldine,

    apparently v.3.3 solves the problem! Thanks for your help!

    Best,
    Thomas

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Wonderful! Thanks for letting me know :)

Sign In or Register to comment.