Rough file size for a BP_RESOLUTION GVCF on a whole genome

JverlouwJverlouw Erasmus MC, RotterdamMember


Does anyone know a rough estimate of the file size of a gvcf produced at BP_RESOLUTION by the HaplotypeCallerfor a whole genome sequencing experiment. Perhaps a rather simple question, but i cannot find it elsewhere on the forum or other places like seqanswers.

Thanks in advance,

Best Answer


  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited July 2015

    Human? How many annotations? Compressed? I suggest you run on a fragment that you know have an average SNP density and which is much larger than the size of the metadata lines and multiply/extrapolate.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Yes, as Tommy suggested, the best thing to do is test out a small portion. I tried on 1,000,000 bases, and the BP_RESOLUTION file is 64 MB. So, for the whole genome, the BP_RESOLUTION file should be around 2TB.

    I hope I did the math right and that this makes sense! :smile:


  • JverlouwJverlouw Erasmus MC, RotterdamMember

    @ Tommycarstensen:
    Should indeed have given that information! It would be a human genome, no annotations, no compression. Due to time and computing constraints we don't really have the time to test like that, but it is a very good idea!

    Many thanks! That is actually a lot smaller than we initially thought (made a safe bet at 1 TB).

Sign In or Register to comment.