On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

Rough file size for a BP_RESOLUTION GVCF on a whole genome

JverlouwJverlouw Erasmus MC, RotterdamMember

Hello,

Does anyone know a rough estimate of the file size of a gvcf produced at BP_RESOLUTION by the HaplotypeCallerfor a whole genome sequencing experiment. Perhaps a rather simple question, but i cannot find it elsewhere on the forum or other places like seqanswers.

Thanks in advance,

Best Answer

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited July 2015

    Human? How many annotations? Compressed? I suggest you run on a fragment that you know have an average SNP density and which is much larger than the size of the metadata lines and multiply/extrapolate.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Jverlouw
    Hi,

    Yes, as Tommy suggested, the best thing to do is test out a small portion. I tried on 1,000,000 bases, and the BP_RESOLUTION file is 64 MB. So, for the whole genome, the BP_RESOLUTION file should be around 2TB.

    I hope I did the math right and that this makes sense! :smile:

    -Sheila

  • JverlouwJverlouw Erasmus MC, RotterdamMember

    @ Tommycarstensen:
    Should indeed have given that information! It would be a human genome, no annotations, no compression. Due to time and computing constraints we don't really have the time to test like that, but it is a very good idea!

    @Sheila:
    Many thanks! That is actually a lot smaller than we initially thought (made a safe bet at 1 TB).

Sign In or Register to comment.