I have question about if a) I should run haplotype caller in single sample mode or b) if I should GVCF mode combining all samples from my study or c) if I should just group all the cases together and all controls together and then run GVCF mode seperately for both groups

I am currently following GATK best practices guideline for germline variant calling.
I am interested in analyzing germline mutations and signatures and all samples are from the same cancer type but divided into two categories: a) either had somatic mutation in my gene of interest and a high somatic mutation burden (cases) or b) no mutation in my gene of interest and low mutation burden (controls).

I am using aligned sequencing reads from blood derived normal for the TCGA data in GDC and using haplotype caller. I have 10 samples in the control and 10 for the cases.

I would appreciate any feedback and or advice and Thank you in advance


    I wanted to clarify my previous post

    I want to know what defines a cohort to use the Joint-Call cohort? and to Clarify two of my samples in the cases are not from blood derived normal but are from normal tissue instead. However all cases and controls are from the same cancer type (uterine endometrial carcinoma)

    Do I joint-call the controls and joint-call the cases separately, or should I joint-call them all together? What is a cohort?"
    Hi @wi24 thanks for your post. In general, joint calling your cohort together is the preferred way to do variant calling, as it does a better job of calling variants within the cohort that were not well covered in that particular cohort, so any group of samples that may have such variation can be grouped into a "cohort" in order to get the benefit of joint calling. If you have a set of samples with a variant that is common in reality but not well-covered in your dataset, joint calling will reduce the chance of discarding the variant.

    In the case you describe, joint calling the 10 normal samples is the best approach. Haplotype caller would not be the recommended for tumor samples, as it will miscall the somatic mutations in the cancerous tissue as if they were germline mutations.

    Take a look at this doc which explains why we do joint calling: https://software.broadinstitute.org/gatk/documentation/article?id=11019

    If I am understanding this correctly and you want to call germline variants on all case and control samples, then you want to do them all together, i.e. all the 20(case+control) samples using HaplotypeCaller. Please be advised that if you pass tumor samples through HaplotypeCaller, it may call somatic variants as germline variants. A way around this would be, as @akovalsk mentioned, use HaplotypeCaller only on normal samples. Take a look at this doc: https://software.broadinstitute.org/gatk/documentation/article?id=11127

    However, if you want to call somatic variants then you should use the somatic variant caller, Mutect2. Mutect2 also has a joint calling feature to improve detection in low coverage regions. See: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_mutect_Mutect2.php

