Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What should I specify as intervals to create panels of normal from the matched whole genome data?

I have 16 matched normal samples for my cancer-normal matched dataset and want to call somatic SNVs. Referring to your best practices I want to create the panels of normals. Towards this I have already generated 16 vcfs using GATK 4.1.3.0 with Mutect2 but while merging these 16 independent VCFs to a common one big GVCF file using GenomicsDBImport I am confused what I should pass as intervals with the -L option. Since this is a whole genome data set I am not sure about these intervals. Any suggestion will be highly appreciated.

Answers

  • SHollizeckSHollizeck Member

    I used a bed created from the reference I used, because I wanted everything

    command would be

    awk '{print $1"\t0\t"$2}' /path/to/ref.fa.fai > yourRegions.bed
    

    I assume, in the future, we do not need to specify this anymore.

    however you could also use the "callableregions" annotations available all over the internet, which exclude heterochromatin regions as well as centromere region and similar problematic sites.

    its a bit of a preference thing there (in my opinion)

Sign In or Register to comment.