We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

size of gvcf generated by Haplotypecaller of 30x and 100x coverage of the same sample are different

I have a RAW reads with 30x and 100x coverage. When I followed all pre-processed steps as mentioned in GATK best practices to call variant. At last, gvcf files have been generated from both data by Haplotypecaller of GATK 4 but file size different. Why size of gvcf files are different, even same reference sequence was used in alignment by BWA mem? I think size of both gvcf files should be same if variants have been called by using same reference, same aligner i.e. BWA-mem with default parameters for both samples and same per-process steps were followed.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi,

    The GATK support team is currently primarily focusing on resolving questions about GATK tool specific errors or abnormal results from the GATK tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and the tools.

    We cannot guarantee a reply, however we ask other community members to help out if you know the answer.

    For more information:

    https://software.broadinstitute.org/gatk/blog?id=24419

    https://gatkforums.broadinstitute.org/gatk/discussion/24417/what-types-of-questions-will-the-gatk-frontline-team-answer/p1?new=1

  • gauthiergauthier Member, Broadie, Dev ✭✭✭

    Our standard GVCF format is compressed in a somewhat lossy way by combining together reference positions with similar confidence into blocks or bands. The default is for more resolution at the lower quality end. I'm guessing that the 30X GVCF is bigger because there are more regions of low confidence due to coverage fluctuation while the 100X GVCF has many large blocks of GQ99 reference calls. If this is a significant issue, you can use the relatively new ReblockGVCF tool to further compress those low confidence blocks.

Sign In or Register to comment.