Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

Rough file size for a BP_RESOLUTION GVCF on a whole genome

JverlouwJverlouw Erasmus MC, RotterdamMember

Hello,

Does anyone know a rough estimate of the file size of a gvcf produced at BP_RESOLUTION by the HaplotypeCallerfor a whole genome sequencing experiment. Perhaps a rather simple question, but i cannot find it elsewhere on the forum or other places like seqanswers.

Thanks in advance,

Best Answer

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited July 2015

    Human? How many annotations? Compressed? I suggest you run on a fragment that you know have an average SNP density and which is much larger than the size of the metadata lines and multiply/extrapolate.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Jverlouw
    Hi,

    Yes, as Tommy suggested, the best thing to do is test out a small portion. I tried on 1,000,000 bases, and the BP_RESOLUTION file is 64 MB. So, for the whole genome, the BP_RESOLUTION file should be around 2TB.

    I hope I did the math right and that this makes sense! :smile:

    -Sheila

  • JverlouwJverlouw Erasmus MC, RotterdamMember

    @ Tommycarstensen:
    Should indeed have given that information! It would be a human genome, no annotations, no compression. Due to time and computing constraints we don't really have the time to test like that, but it is a very good idea!

    @Sheila:
    Many thanks! That is actually a lot smaller than we initially thought (made a safe bet at 1 TB).

Sign In or Register to comment.