To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Rough file size for a BP_RESOLUTION GVCF on a whole genome

JverlouwJverlouw Erasmus MC, RotterdamMember


Does anyone know a rough estimate of the file size of a gvcf produced at BP_RESOLUTION by the HaplotypeCallerfor a whole genome sequencing experiment. Perhaps a rather simple question, but i cannot find it elsewhere on the forum or other places like seqanswers.

Thanks in advance,

Best Answer


  • tommycarstensentommycarstensen United KingdomMember
    edited July 2015

    Human? How many annotations? Compressed? I suggest you run on a fragment that you know have an average SNP density and which is much larger than the size of the metadata lines and multiply/extrapolate.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    Yes, as Tommy suggested, the best thing to do is test out a small portion. I tried on 1,000,000 bases, and the BP_RESOLUTION file is 64 MB. So, for the whole genome, the BP_RESOLUTION file should be around 2TB.

    I hope I did the math right and that this makes sense! :smile:


  • JverlouwJverlouw Erasmus MC, RotterdamMember

    @ Tommycarstensen:
    Should indeed have given that information! It would be a human genome, no annotations, no compression. Due to time and computing constraints we don't really have the time to test like that, but it is a very good idea!

    Many thanks! That is actually a lot smaller than we initially thought (made a safe bet at 1 TB).

Sign In or Register to comment.