We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

file size - Haplotypecaller input file - > correlation with output *.g.vcf file

medgenmedgen NorwayMember

I have been making several g.vcf-files by Haplotpecaller, for later combined variantcalling. Usually there has been a reasonable correlation between the input recal*.bam file and the output g.vcf file. Like 81Gb (bam) -> 69 Gb (g.vcf), 101Gb (bam) -> 79 Gb (g.vcf). The last file I made - the biggest input-bam file I had so far (171 Gb) ended up, after maaaaanw hours With a g.vcf of just 27Gb.
Should I be worried- does the smaller file size indicate that somthing is wrong? (no special error Messages shown)



  • EADGEADG KielMember ✭✭✭

    HI @medgen,

    its depend on what you expect from your data. If the region(genomic)/sample which your bam comes from only carrying a few mutations you would end up with "smaller" output-files.

    In your position, I would first check the filtering statistics from the HaplotypeCaller, maybe the greater part of your reads are filtered for different reasons (quality, malformed, and so on ).

    Greetings EADG

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Have a look at this thread.


Sign In or Register to comment.