We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

What should I specify as intervals to create panels of normal from the matched whole genome data?

I have 16 matched normal samples for my cancer-normal matched dataset and want to call somatic SNVs. Referring to your best practices I want to create the panels of normals. Towards this I have already generated 16 vcfs using GATK with Mutect2 but while merging these 16 independent VCFs to a common one big GVCF file using GenomicsDBImport I am confused what I should pass as intervals with the -L option. Since this is a whole genome data set I am not sure about these intervals. Any suggestion will be highly appreciated.


  • I used a bed created from the reference I used, because I wanted everything

    command would be

    awk '{print $1"\t0\t"$2}' /path/to/ref.fa.fai > yourRegions.bed

    I assume, in the future, we do not need to specify this anymore.

    however you could also use the "callableregions" annotations available all over the internet, which exclude heterochromatin regions as well as centromere region and similar problematic sites.

    its a bit of a preference thing there (in my opinion)

Sign In or Register to comment.