Jverlouw


Does anyone know a rough estimate of the file size of a gvcf produced at BP_RESOLUTION by the HaplotypeCallerfor a whole genome sequencing experiment. Perhaps a rather simple question, but i cannot find it elsewhere on the forum or other places like seqanswers.

  tommycarstensen
    Human? How many annotations? Compressed? I suggest you run on a fragment that you know have an average SNP density and which is much larger than the size of the metadata lines and multiply/extrapolate.

  Sheila


    Yes, as Tommy suggested, the best thing to do is test out a small portion. I tried on 1,000,000 bases, and the BP_RESOLUTION file is 64 MB. So, for the whole genome, the BP_RESOLUTION file should be around 2TB.

  Jverlouw

    @ Tommycarstensen:
    Should indeed have given that information! It would be a human genome, no annotations, no compression. Due to time and computing constraints we don't really have the time to test like that, but it is a very good idea!

    Many thanks! That is actually a lot smaller than we initially thought (made a safe bet at 1 TB).

